Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Front Range Hosting - Down
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Front Range Hosting - Down

FRCoreyFRCorey Member
edited March 2013 in General

This has been a great community to me since I showed up around Sept 2012. I've done good by my customers ensuring a quality service and I've been proud about that. This is not a letter about me closing up shop though the obstacles I'm facing are pretty steep.

On 3/7 around 2:30PM MST Data102 was doing some preventive maintenance on their UPS systems. While a official RFO will come in the morning from them to me, but the gist of it was an internal piece of the UPS failed and when they tried to put the load on the UPS it shut off completely. This was after they had the system in by pass for preventive maintenance to replace a logic board that had nothing to do with the other part.

So far the damage is the following nodes have died (Failed Raid Cards)

Pike - KVM - Failed Raid Card
Breck - KVM - Failed Raid Card
Kenobi - OpenVZ - might be fixable just not sure.
Webserver - the boot partition appears to be corrupted, we hope to recover the database off of it and rebuild WHMCS sometime tomorrow.

We still have 1 KVM and 2 OpenVZ nodes running, the VPS Control Panel is still running so you can manage your VPS's but right now I have no way to handle trouble tickets.

While we have a spare server handy, it's not going to replace 3 server's worth of customers and the 3 Nodes above and this one were all ordered at the same time, I'm loathe to even put it into service right now.

I will work to make this right, just give me a few hours to collect my thoughts, talk to my insurance company on what they can help with, yell at my system builder for a bit, and a couple hours of sleep please. My first priority is restoring the website and the customer portal while getting the bits and pieces together to fix the other servers.

Thanks for everyone's patience with this and I'm terribly sorry. I'll know more later on today.

Corey
CEO Front Range Hosting, LLC

«134

Comments

  • trewqtrewq Administrator, Patron Provider

    I'm not a customer but I wish you good luck.

  • Are you sure the RAID cards really died? They shouldn't be that fragile... Maybe just the configuration was lost and the arrays need to be imported?

  • IshaqIshaq Member
    edited March 2013

    Wow..

    How did this happen all at once :/

  • @Ishaq said: Wow..

    How did this happen all at once :/

    From the power outage more than likely.

  • @rds100 no BIOS does not even see them, 3 nodes purchased at the same time all have this problem now.

  • The real kick in the gut guys was our webserver going kaput, and then noticing backups quit running 3 weeks ago, but kept sending us cron job completed messages. But that's about 60 customers data we wont have. I'm hopeful that we can recover the webserver data partitions just the boot partition looks corrupted.

    Jury is still out on Kenobi.

  • Thanks for all the updates FRCorey. I'm a KVM customer, and look forward to hearing how this all plays out. I'm not using the VPS for anything business related or mission critical, so I'm not here to complain or anything. I guess you could say I'm just sitting back with the popcorn watching.

    Good luck and hopefully things work out!

  • RyanDRyanD Member

    @FRCorey

    Try to disable fast-boot on the systems. It's possible bios settings may have changed and depending on the raid card model it's possible that if the bios inits too quickly it will bypass the secondary init of the raid card.

  • Wow, Murphy found you and wasn't shy, time to sacrifice a small child before he does more, good luck on the battle.

  • Sorry to hear about this. I wondered why I couldn't connect this morning.

    I see Torrey is still down. I'm thankful it wasn't a casualty of Murphy....

  • well if you can have the data from 3 weeks ago restored that'll be some sort of comfort for the people affected by this.

  • jarjar Patron Provider, Top Host, Veteran

    Good job on keeping people up to speed with the raw details. Sorry to see that you got slammed with all that at once, but you push forward and do your best. No one can ask for more.

    Holler if you need a hand with anything.

  • Thank You Corey for the continual updates. This is all I ask for from a provider when something like this happens. You have done a wonderful job of me updated of the situation.

    I hope kenobi can be fixed even if it has to be restored from an earlier backup. Power outages do funny things I understand this and I hope you and your team can work things out.

    I applaud you for not running and hiding from this and standing up to say this is what is happening and this is what we are doing to fix it.

  • i'm on kenobi kvm, but not hosting something important there. so you're free :)
    and also thank's for the notification email

  • @ErawanArifNugroho kenobi is their OpenVZ mode. Are you sure you have KVM there?

  • mikhomikho Member, Host Rep
    edited March 2013

    @nstorm said: @ErawanArifNugroho kenobi is their OpenVZ mode. Are you sure you have KVM there?

    Maybe its the Obi-wan kenobi ?

    bad joke, I know ....

  • JacobJacob Member

    Why haven't you got spare RAID Controllers? I forgot we only had one spare left, so i've just ordered a couple for now.

    Unfortunate about all the controllers dieing, although I'm burning through BBUs for some reason. :-(

  • I'm on a OpenVZ plan and it's now running, I'm sorry to see this and wish you getting back soon.

    I have my personal code repo server hosting with FRH and was working very well.

  • vldvld Member
    edited March 2013

    This has affected my websites. I am not happy.

    /rage

  • @Jacob said: Why haven't you got spare RAID Controllers? I forgot we only had one spare left, so i've just ordered a couple for now.

    It's a bit more difficult when you're not using $40 controllers off ebay. I would be interested to know how many providers here keep spare current-generation controllers on hand.

  • Can we know the controllers you used? The make and model of the servers and the datacenter?

  • @serverian said: Can we know the controllers you used?

    He uses LSI 9260-8i

  • @FRCorey said: @rds100 no BIOS does not even see them, 3 nodes purchased at the same time all have this problem now.

    If in such situation and nothing else helps i would have backed up all the data from the drives manually, then tried powering off the machines then remove power cables, wait 30 seconds, then insert power cables and power on, then check if the RAID cards are detected by the BIOS. If not power off, remove power cables, wait 30 seconds, reload the default BIOS settings (hardware reset), then power on, and then check if the BIOS detects the RAID cards.
    Also check the cards on other testing machine.
    But still this is only me talking, not sure if it would be successful in Your scenario.

    However, since You are insured, the best option would be not to touch anything until insurance pays off.

  • JacobJacob Member
    edited March 2013

    @Damian I don't know about you, but I don't buy critical parts from eBay. We get ours from pinnacle data, good vendor with next business day, and a saturday option.

    They stock anything, and everything pretty much.

  • emgemg Veteran

    Corey deserves notice for his honesty, candor, and detailed reports. All too often, I see vendor statements like "we're working on it and everything will be back to normal soon."

    Corey's specificity is so helpful to those of us who understand the implications of the various issues, and we can all empathize with the multiple, cascading problems he faces. His reports engender customer trust and a feeling of inclusion.

    As far as I am concerned, Corey is setting an example for how to handle customer relations when facing a difficult situation. I hope it plays out quickly, and in his favor.

  • @emg said: Corey deserves notice for his honesty, candor, and detailed reports. All too often, I see vendor statements like "we're working on it and everything will be back to normal soon."

    Corey's specificity is so helpful to those of us who understand the implications of the various issues, and we can all empathize with the multiple, cascading problems he faces. His reports engender customer trust and a feeling of inclusion.

    As far as I am concerned, Corey is setting an example for how to handle customer relations when facing a difficult situation. I hope it plays out quickly, and in his favor.

    This.

  • rm_rm_ IPv6 Advocate, Veteran

    Interestingly mine is not on a node in the list of those being down/damaged, but my VPS is still offline for about 10 hours (booting from solus doesn't work). So it might be more widespread than that, although since it isn't even mentioned I can't help but think it's something entirely different in case of my node, like "oh yeah we forgot to turn that one on", or "ah right, it's those new network settings that didn't apply after the reboot", etc.

  • InfinityInfinity Member, Host Rep

    @emg said: Corey deserves notice for his honesty, candor, and detailed reports. All too often, I see vendor statements like "we're working on it and everything will be back to normal soon."

    Corey's specificity is so helpful to those of us who understand the implications of the various issues, and we can all empathize with the multiple, cascading problems he faces. His reports engender customer trust and a feeling of inclusion.

    As far as I am concerned, Corey is setting an example for how to handle customer relations when facing a difficult situation. I hope it plays out quickly, and in his favor.

    +1 also

  • The updates are nice, yes, but I can't change what they are going to do to fix it, where they order from, etc. What what would be better is an ETA. If this takes any longer than 48H, I'm probably going to have to switch providers (sadly, of course, because it's been pretty good up till now). If it takes till Monday, at a specified time, at least I can tell people when my services are available again... That's totally different.

    Another thing I could mention is social network updates... everyone else does it. If you can't, pay someone to. I (tried) to check that first before I noticed FRH's e-mail's in my spam box.

    Great work though, and I totally understand someone else's problems messing up all your stuff.

    It happens =D

Sign In or Register to comment.