Front Range Hosting - Down

FRCorey · March 2013

This has been a great community to me since I showed up around Sept 2012. I've done good by my customers ensuring a quality service and I've been proud about that. This is not a letter about me closing up shop though the obstacles I'm facing are pretty steep.

On 3/7 around 2:30PM MST Data102 was doing some preventive maintenance on their UPS systems. While a official RFO will come in the morning from them to me, but the gist of it was an internal piece of the UPS failed and when they tried to put the load on the UPS it shut off completely. This was after they had the system in by pass for preventive maintenance to replace a logic board that had nothing to do with the other part.

So far the damage is the following nodes have died (Failed Raid Cards)

Pike - KVM - Failed Raid Card
Breck - KVM - Failed Raid Card
Kenobi - OpenVZ - might be fixable just not sure.
Webserver - the boot partition appears to be corrupted, we hope to recover the database off of it and rebuild WHMCS sometime tomorrow.

We still have 1 KVM and 2 OpenVZ nodes running, the VPS Control Panel is still running so you can manage your VPS's but right now I have no way to handle trouble tickets.

While we have a spare server handy, it's not going to replace 3 server's worth of customers and the 3 Nodes above and this one were all ordered at the same time, I'm loathe to even put it into service right now.

I will work to make this right, just give me a few hours to collect my thoughts, talk to my insurance company on what they can help with, yell at my system builder for a bit, and a couple hours of sleep please. My first priority is restoring the website and the customer portal while getting the bits and pieces together to fix the other servers.

Thanks for everyone's patience with this and I'm terribly sorry. I'll know more later on today.

Corey
CEO Front Range Hosting, LLC

trewq · March 2013

I'm not a customer but I wish you good luck.

rds100 · March 2013

Are you sure the RAID cards really died? They shouldn't be that fragile... Maybe just the configuration was lost and the arrays need to be imported?

nstorm · March 2013

@FRCorey http://www.lowendtalk.com/discussion/8608/is-frh-down#Item_26

Ishaq · March 2013

Wow..

How did this happen all at once

Ash_Hawkridge · March 2013

@Ishaq said: Wow..

How did this happen all at once

From the power outage more than likely.

FRCorey · March 2013

@rds100 no BIOS does not even see them, 3 nodes purchased at the same time all have this problem now.

FRCorey · March 2013

The real kick in the gut guys was our webserver going kaput, and then noticing backups quit running 3 weeks ago, but kept sending us cron job completed messages. But that's about 60 customers data we wont have. I'm hopeful that we can recover the webserver data partitions just the boot partition looks corrupted.

Jury is still out on Kenobi.

ClownJugglar · March 2013

Thanks for all the updates FRCorey. I'm a KVM customer, and look forward to hearing how this all plays out. I'm not using the VPS for anything business related or mission critical, so I'm not here to complain or anything. I guess you could say I'm just sitting back with the popcorn watching.

Good luck and hopefully things work out!

RyanD · March 2013

@FRCorey

Try to disable fast-boot on the systems. It's possible bios settings may have changed and depending on the raid card model it's possible that if the bios inits too quickly it will bypass the secondary init of the raid card.

miTgiB · March 2013

Wow, Murphy found you and wasn't shy, time to sacrifice a small child before he does more, good luck on the battle.

nsnadell · March 2013

Sorry to hear about this. I wondered why I couldn't connect this morning.

I see Torrey is still down. I'm thankful it wasn't a casualty of Murphy....

cosmicgate · March 2013

well if you can have the data from 3 weeks ago restored that'll be some sort of comfort for the people affected by this.

jar · March 2013

Good job on keeping people up to speed with the raw details. Sorry to see that you got slammed with all that at once, but you push forward and do your best. No one can ask for more.

Holler if you need a hand with anything.

AuroraZ · March 2013

Thank You Corey for the continual updates. This is all I ask for from a provider when something like this happens. You have done a wonderful job of me updated of the situation.

I hope kenobi can be fixed even if it has to be restored from an earlier backup. Power outages do funny things I understand this and I hope you and your team can work things out.

I applaud you for not running and hiding from this and standing up to say this is what is happening and this is what we are doing to fix it.

ErawanArifNugroho · March 2013

i'm on kenobi kvm, but not hosting something important there. so you're free
and also thank's for the notification email

nstorm · March 2013

@ErawanArifNugroho kenobi is their OpenVZ mode. Are you sure you have KVM there?

mikho · March 2013

@nstorm said: @ErawanArifNugroho kenobi is their OpenVZ mode. Are you sure you have KVM there?

Maybe its the Obi-wan kenobi ?

bad joke, I know ....

Jacob · March 2013

Why haven't you got spare RAID Controllers? I forgot we only had one spare left, so i've just ordered a couple for now.

Unfortunate about all the controllers dieing, although I'm burning through BBUs for some reason. :-(

AstroProfundis · March 2013

I'm on a OpenVZ plan and it's now running, I'm sorry to see this and wish you getting back soon.

I have my personal code repo server hosting with FRH and was working very well.

vld · March 2013

This has affected my websites. I am not happy.

/rage

Damian · March 2013

@Jacob said: Why haven't you got spare RAID Controllers? I forgot we only had one spare left, so i've just ordered a couple for now.

It's a bit more difficult when you're not using $40 controllers off ebay. I would be interested to know how many providers here keep spare current-generation controllers on hand.

serverian · March 2013

Can we know the controllers you used? The make and model of the servers and the datacenter?

Spencer · March 2013

@serverian said: Can we know the controllers you used?

He uses LSI 9260-8i

Janevski · March 2013

@FRCorey said: @rds100 no BIOS does not even see them, 3 nodes purchased at the same time all have this problem now.

If in such situation and nothing else helps i would have backed up all the data from the drives manually, then tried powering off the machines then remove power cables, wait 30 seconds, then insert power cables and power on, then check if the RAID cards are detected by the BIOS. If not power off, remove power cables, wait 30 seconds, reload the default BIOS settings (hardware reset), then power on, and then check if the BIOS detects the RAID cards.
Also check the cards on other testing machine.
But still this is only me talking, not sure if it would be successful in Your scenario.

However, since You are insured, the best option would be not to touch anything until insurance pays off.

Jacob · March 2013

@Damian I don't know about you, but I don't buy critical parts from eBay. We get ours from pinnacle data, good vendor with next business day, and a saturday option.

They stock anything, and everything pretty much.

emg · March 2013

Corey deserves notice for his honesty, candor, and detailed reports. All too often, I see vendor statements like "we're working on it and everything will be back to normal soon."

Corey's specificity is so helpful to those of us who understand the implications of the various issues, and we can all empathize with the multiple, cascading problems he faces. His reports engender customer trust and a feeling of inclusion.

As far as I am concerned, Corey is setting an example for how to handle customer relations when facing a difficult situation. I hope it plays out quickly, and in his favor.

Damian · March 2013

@emg said: Corey deserves notice for his honesty, candor, and detailed reports. All too often, I see vendor statements like "we're working on it and everything will be back to normal soon."

Corey's specificity is so helpful to those of us who understand the implications of the various issues, and we can all empathize with the multiple, cascading problems he faces. His reports engender customer trust and a feeling of inclusion.

As far as I am concerned, Corey is setting an example for how to handle customer relations when facing a difficult situation. I hope it plays out quickly, and in his favor.

This.

rm_ · March 2013

Interestingly mine is not on a node in the list of those being down/damaged, but my VPS is still offline for about 10 hours (booting from solus doesn't work). So it might be more widespread than that, although since it isn't even mentioned I can't help but think it's something entirely different in case of my node, like "oh yeah we forgot to turn that one on", or "ah right, it's those new network settings that didn't apply after the reboot", etc.

Infinity · March 2013

@emg said: Corey deserves notice for his honesty, candor, and detailed reports. All too often, I see vendor statements like "we're working on it and everything will be back to normal soon."

Corey's specificity is so helpful to those of us who understand the implications of the various issues, and we can all empathize with the multiple, cascading problems he faces. His reports engender customer trust and a feeling of inclusion.

As far as I am concerned, Corey is setting an example for how to handle customer relations when facing a difficult situation. I hope it plays out quickly, and in his favor.

+1 also

flexnsniff · March 2013

The updates are nice, yes, but I can't change what they are going to do to fix it, where they order from, etc. What what would be better is an ETA. If this takes any longer than 48H, I'm probably going to have to switch providers (sadly, of course, because it's been pretty good up till now). If it takes till Monday, at a specified time, at least I can tell people when my services are available again... That's totally different.

Another thing I could mention is social network updates... everyone else does it. If you can't, pay someone to. I (tried) to check that first before I noticed FRH's e-mail's in my spam box.

Great work though, and I totally understand someone else's problems messing up all your stuff.

It happens =D

Howdy, Stranger!

Categories

In this Discussion

Front Range Hosting - Down

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Front Range Hosting - Down

Comments