Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


More ChicagoVPS Data Loss
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

More ChicagoVPS Data Loss

Just recieved this a few mins ago. This isn't the first time this has happened, correct?

Hello,

You are receiving this email because our records show that you have one or more VPS with us located on the la-vps12 node. We received an alert from our monitoring system that a drive in the RAID array of this system was in degrading condition. We verified this and contacted the Los Angeles datacenter remote hands technicians to go ahead and replace the failed HDD.

On 09-11-2013 11:27 PM EDT the remote hands technician proceeded with the hotswap replacement, however during the RAID rebuild process another hard drive failed during the live RAID rebuild process, which caused inconsistant data on the array. We have attempted to salvage the data on the RAID array and even hired external assistance to attempt to recover the server, but despite our attempts, it has been concluded that the RAID array on this server is corrupt and is in an unrecoverable state.

Unfortunately this matter was out of our hands as in the nature of technology, hardware failures can and do happen. Like everyone else, we cannot always predict when hardware may fail, however we can certainly take steps to greatly minimize the chance of something like this from happening again. We have taken the appropriate measures and have revised future node setup procedures. From now on, HDD components for all of our servers will be ordered in separate batches, so in the case of a bad shipment or HDD batch the chances of multiple drives failing is very slim. What happened with la-vps12 is very rare to see but we suspect it was due to a bad Hard Drive shipment as the HDD's on this server came from the same batch.

We have brought a brand new server online with drives from separate shipment batches and set up a replacement server for la-vps12. At the moment, we are working on reinstalling all of the VPS instances on la-vps12, your VPS should be coming back online within the next few hours as we work to reinstall all of the VPS instances hosted on this hypervisor. We appreciate your patience during this time and apologize for the inconvenience caused.

Regards,

ChicagoVPS Team

«1

Comments

  • SaikuSaiku Member, Host Rep

    HDD failed can't be expected. So yeah.. This is why you always make backups .-.

  • $2/mo VPS?

  • @doughmanes said:
    $2/mo VPS?

    Nope, the one I had was an "Enterprise VPS".

  • @billnye while is very rare to happen drives do die, but even rarer for multiple in the same array at about the same time. This sounds like it wasn't cvps fault. The other times they lost clients data was due to people hacking in and wiping servers.

  • I had a worse experience with VPS6.
    Server went offline without any notification, I opened a ticket and they said the RAID array failed, told me to wait for the array to rebuild. 5 days later, they told me it was unrecoverable (due to 1 hard drive failure in raid 10? lolwut?). Then they asked where I'd like to be moved to.

  • SpiritSpirit Member
    edited September 2013

    Do they reinstall all of the VPS instances from backups or new recreation without data? They said something about weekly backups since one of the previous incidents.

    ChicagoVPS has two separate backup facilities, a free public facing system called Central Backup and a secondary backup, which automatically ran each week.

  • @Spirit said:
    Do they reinstall all of the VPS instances from backups or new recreation without data? They said something about weekly backups since one of previous incidents.

    ChicagoVPS has two separate backup facilities, a free public facing system called Central Backup and a secondary backup, which automatically ran each week.

    That would be great! But I doubt that I'm that lucky.

    @PcJamesy said:
    billnye while is very rare to happen drives do die, but even rarer for multiple in the same array at about the same time. This sounds like it wasn't cvps fault. The other times they lost clients data was due to people hacking in and wiping servers.

    Ah, I thought the last one was hardware failure as well. Good to know.

  • Just posted an offer, needed to make some extra space on nodes to fit in VMs #301-#380

  • MaouniqueMaounique Host Rep, Veteran
    edited September 2013

    If you have hundreds of servers, multiple drives failure in an array has a higher probability to happen. It is still small, but it adds up in time.
    Everyone should have backups because the raid controller can also fail in ways that can corrupt the data. We had such an issue when half the disks were suddenly unavailable for the controller, even though were not defective, thus corrupting the data. Had to rebuild from back-ups and lost some 12 hours of data. We have more than 50 nodes and it happened once in almost 2 years, but cvps has hundreds I think so the chance is bigger.

  • @black said:
    I had a worse experience with VPS6.
    Server went offline without any notification, I opened a ticket and they said the RAID array failed, told me to wait for the array to rebuild. 5 days later, they told me it was unrecoverable (due to 1 hard drive failure in raid 10? lolwut?). Then they asked where I'd like to be moved to.

    coughdon't piss off the remote DC staffcough

  • @billnye said:
    This isn't the first time this has happened, correct?

    Probably not, and this is probably not the last time - no matter the host. Failure is inevitable.

  • Nick_ANick_A Member, Top Host, Host Rep

    Sometimes it's not even the drives. I've had an angry RAID card pick drives at random to mark as "bad". I replaced half of the array thinking maybe it was a bad batch. Then the demon card picked two new ones from the same mirror to knock offline.

  • Looks like they installed a new instance, although it's only accessible from inside their console. All of my data is gone. I haven't gotten any responses to tickets asking about my data (or anything else), so I don't have high hopes.

  • Shit happens. Keep backups.

  • I had the sites I was running backed up, so I didn't lose anything that was crucial. They're already up and running. I did have a bunch of stuff that I was sort of messing around with that is gone, which is sad. But I guess if I have to learn a lesson about best backup practices, this one isn't that bad.

  • so whats advantage of raid 10 over raid 0 when this thing is easy to happen

  • Wira_SoenaryoWira_Soenaryo Member
    edited September 2013

    My vps is up just now after more than 26 hours down..

    But all data gone... lucky this server is not for production site.

  • This isn't the first time this has happened, correct?

    It's the first time this month ChicagoVPS users have suffered data loss. :)

  • @DomainBop said:
    It's the first time this month ChicagoVPS users have suffered data loss. :)

    Reminds me of those "It has been X amount of days since an accident due to lack of safety" signs outside of big industrial areas or outside of city ran utilities

  • @jcaleb said:
    so whats advantage of raid 10 over raid 0 when this thing is easy to happen

    To make it less likely to happen.

  • MaouniqueMaounique Host Rep, Veteran
    edited September 2013

    @jcaleb said:
    so whats advantage of raid 10 over raid 0 when this thing is easy to happen

    Easy to happen, well, having 2 hard drives fail in a short window of time is much more unlikely than having one. Also, if the raid controller fails, chances are you can still recover the data in raid 10 more often than in raid 0.

    It is much less likely to happen, like, chance for a car to drive off road at a sharp turn on a highway is small but not that much rare, while having 2 cars at the same turn half and hour away going off road is much less likely even with the distraction of police near the road and ppl looking (the raid is more stressed when one drive fails).

  • Like in the opening scenes of Armageddon: "This has happened before. This will happen again. Only question is when." ;)

    I wouldn't put much stock in ChicagoVPS's official communiques. While HDD and other equipment failures do happen to everybody, it is hard to trust and respect a company that consistently makes promises that are not kept. Every time this happened to me (2 or 3 such major incidents), their well advertised "Central Backup" and later equally well promoted additional offsite backup also failed miserably. Two possibilities as I see it: either the backups are not being made, or their support staff can't be bothered to recover from backups. Probably too much hassle, easier to say shit happens, and what are you gonna do to them... Well, I for one, have decided to move on. Thank you for a year and a half of pretty good service, ChicagoVPS. But not anymore. Their customer support has been sliding steadily these past few months... Chris seems to have disappeared... so mice are playing with no cat around to keep them in line...? ;)

  • @concerto49 @Maounique i just worry because reactions of sysadmin in this thread is that its not uncommon.

  • MaouniqueMaounique Host Rep, Veteran

    It is not, this is why you need to have back-ups. All admins tell you so. We offer free backup space. Now with the cloud can also be done automatically with a bit of point and click.
    There are really no excuses for not having your own backup. Shit happens and will keep happening, as the "ad" says, past performance is no guarantee for the future.
    It is much less likely with a raid where R really mean's redundancy (other than 0) but it is not unheard of for the raid controller to malfunction, or power failures to take them out even with BBU. You can take precautions, but you can never be sure.
    Even before the cloud we offered a bit of SAN storage, because it is much less likely to fail than one raid controller since it has 2, active-active and is also built for the purpose, specially engineered to offer best data protection. It can still go out in flames if there is a fire or an earthquake that triggers one, etc, so you better have back-ups, no matter how sure is the provider and the storage.

  • @jcaleb said:
    concerto49 Maounique i just worry because reactions of sysadmin in this thread is that its not uncommon.

    It's just chances. It depends on a lot of factors. If you oversell a lot and constantly use 100% of HDD all the time, that adds to the wear and tear. If your node is not stable and have to constantly reboot that might be another factor. It all adds up.

    Sometimes it might not happen for years and sometimes it might happen every month. All we can do is minimize those risks.

  • TheLinuxBugTheLinuxBug Member
    edited September 2013

    I mean, I completely understand that shit happens, and I am sure people appreciate the explanations @Maounique however, it is more interesting that CVPS didn't even have time to come here and comment on the thread. Why are you doing their dirty work for them? They need to come and take the blame their due and explain this stuff to their customers instead of letting other people make excuses for them regardless of why it happened.

    With a provider that is overselling and pumping up the 2GB promotions as much as CVPS does it is surprising to me as more of these incidents happen that people haven't started to get the idea that you are trading the reliability of the server for those extra resources. The more users on the node, the more drives get thrashed, the quicker they fail... and as I understand CVPS specializes in stuffing as many users as they can on to a server. So my advice to anyone who chooses to use them is: regardless of any claims of back-ups they say they might have, ALWAYS... ALWAYS keep your own somewhere offsite (not with the same provider) for situations like this that are bound to happen.

    Cheers!

  • Thanks for the explanations... i guess it is related to oversell

  • MaouniqueMaounique Host Rep, Veteran
    edited September 2013

    I always take the opportunity to remind people that backups are mandatory if you have important data.
    We take this extremely seriously and take backups at times when we are not required to do so (tho only on some plans, the very low end or overzold not so, only biz and SSD), we offer free ftp backup, now even with ssh for old customers in good standing, also with static web serving, the cloud can be configured to take snapshots automatically, just a little point and click work. There are no excuses.

    Bottom of line, if your provider does not include backups in the contract AND does not specify they are responsible for them, if you lost data you have nobody else to blame but you.
    Even in case of automated backups, they are not taken when you need them to be taken, almost surely, so if you need consistent data and at a good time, take your own too.
    I am not doing anyone's work, but it is too much to ask from the provider to have backups of your data taken when you need it, at the frequency you need it, be absolutely sure they are not borked somehow and hackers cannot reach them after compromising the node. I mean, it is possible, but it costs a lot of money and this is not a managed service and even with a lot of investment and no overselling, it may still happen to lose data permanently, the more servers you have, the bigger the probability...
    BuyVM lost data, we lost some 12 hours when a raid failed, everyone lost data at least once, therefore there are 2 kinds of people in the world, those that take regular backups and those that didnt lose yet any data. Unfortunately i see the third category around, those that have lost data and still dont take back-ups.

  • That is understood M. For me, I have 3-4 other VPS just to backup my 1 vps. I just thought that raid 10 can make me sleep much better at night, that I have very small chance of needing to restore from my backups

  • AnthonySmithAnthonySmith Member, Patron Provider
    edited September 2013

    It Happens, and sadly it is not one of those things that gives you even a few hours notice so it cannot always be prevented. I have had it happen twice once on raid 10 which spat out 3 drives at the same time but the data could be recovered and just yesterday I lost a node in Germany on Raid 5 which about 95% data recovery but some corruption.

    In much the same way you cant know when a sudden engine failure is going to happen in a car hard drives can fail with a seconds notice and no raid is perfect, depending on the nature of the failure of which there are 100's of ways depends how likely the raid array is to cope. I have seen £250,000 EMC SAN's completely fail in less than a month and have seen software raid 0 last for 5+ years.

    The bottom line is, raid is not a backup system.

    It really makes your heart sink when these things happen and all you can do is your best.

    Edit: I also wanted to say from a providers perspective is it tough because naturally you want to get everything back on line in its original state but with the size of arrays these days you have to weigh up the benefit and potential failure of your attempts to resolve the problem against the down time for customers, if it was not a VPS environment and you could spend 2 - 3 days working on it then you probably have a great chance of complete data recovery but no one wants a VPS down for 3 days.

    Thanked by 1DeletedUser
Sign In or Register to comment.