Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Data loss and 24khost
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Data loss and 24khost

24khost24khost Member
edited April 2013 in General

Unfortunately, I have to report that we have experienced catastrophic data corruption on one of our main servers in our Choopa Dataceter location. This has left a lot of data unrecoverable. We will continue to attempt to recover data that we can, however, thus far, attempts have been unsuccessful. We have set up a new node. All containers have moved to the new node. Just to let you know it was not only your data that was lost. We also had one of our backup nameservers on that box.

While we do have systems in place to prevent from data loss, in this case it did not help. There was a corruption that caused they sync between the master and the slave to corrupt the data on the slave. These things are very rare as we have not had an incident of data loss on our virtual servers in nearly 2 years with the hundreds of virtual servers that we manage. We do apologize for any inconvenience this may cause you and we plan on providing a free month worth for the servers that are affected.

Most people know that Devon from Rockmyweb manages our servers. They were also hit with this issue. We are still investigating the issue. Will post with any updates we get.

«1

Comments

  • We also had one of our backup nameservers on that box.

    word of advice: do not put a nameserver on the same box (or even in the same datacenter) as your production sites/nodes/etc.

  • We don't it was one of our backup name servers. We have one in our choopa location and one in fiber hub along with our 2 in colostore.

  • yeah, but even they can have a failure. murphy is back to work I guess!

  • SLA? joke =)

  • @24khost Yes, mine too. fortunately I have a mirror at BuyVM so I don't loose too much data.

    This happened to me before with BudgetVM & ChicagoVPS, so I think it is good to have a mirror ready to fire up.

    Anyway I have contacted Devon and he is trying hard to recover the data. "Just hope for the best, prepare for the worst" (Bourne Ultimatum)

  • So what exactly is corrupted ? Hosting customer's data or your corp. data/boxes ?

  • Around 14 hours ago when I logged in to my VPS using ssh, the first thing I do is checking the memory and directories. I can list the directory with ls -l but when I tried to enter the directory, it hangs and the link was disconnected.

    I tried to re-login, but this time I barely passed the authentication key and it was disconnected.

    I went to the VPS control panel and tried to enter via the shell console it didn't work, then I tried to restart.

    The next thing I did was firing up the mirror and open a support ticket. It seems that the system replicated corrupted data, so both the good and bad data are mixed up.

    I had this experience at my day job and it was caused by a faulty raid card, and since then on, manual backups were re-initiated.

  • @jcaleb our servers are up. It is odd but 2 servers exact same setup one of his and one of mine raid/disk issue.

  • @qhoster the raid array was corrupted.
    Devon saw that the server was experiencing high load, restarted the server everything was fine, then started noticing that there were corrupted files. As our servers are mirrored, they synced and all the data on both servers became corrupted.

  • We are continuing to work around the clock to restore what we can from the server corruption. We have had some recent success but restoration is slow. We have successfully recovered all MySQL folders/data from the virtual servers and are now working to restore all other data. We focused on MySQL folders first as databases usually change the most and therefore have important data that is often not backed up immediately.

    Please send me an email or open a ticket if you do not have any backups to restore from and we will put a higher priority on your container.

    We have no guarantees on what we will be able to recover and if there will be minor corruption in the files that we do recover but we seem to have been successful in MySQL thus far.

    This was the message from Devon, just here a few minutes ago.

  • gbshousegbshouse Member, Host Rep

    @24khost - PM me if you want to move DNS to us

  • @gbshouse, we are fine I just have to rebuild it and resync it. not a big issue, small for us.

  • NekkiNekki Veteran
    edited April 2013

    I shit my pants when I read the topic, just finished off a project on my 24khost VPS yesterday - fortunately I'm not in Choopa. Lucky I forgot to asked to be migrated there....

  • AnthonySmithAnthonySmith Member, Patron Provider

    @24khost I feel for you, came close when the raid failed in epic fashion on a UK node, it is not a good feeling, I was lucky enough to be able to recover the data but I imagine I would have been pretty depressed had I had to make the 'data loss post of shame'

    Wish the the best of luck.

  • Maybe tell us what were those raid and disks?

  • 24khost24khost Member
    edited April 2013

    @AnthonySmith It happens, it is how you deal with it. only 6 customers were on that node thank god there were only 7 containers that were effected. Mine is not a critical issue and has already been reinstalled and back to work.

  • AnthonySmithAnthonySmith Member, Patron Provider
    edited April 2013

    @24khost indeed sadly when I had my failure the raid 10 array spat out 3 disks at the same time, it took under 24 hours to do full recovery of around 50 Servers/disk images (XEN PV) and migrate them to another node so it was dealt with well, but even if it was only 6 customers it is never nice to make this sort of post.

    Sounds like you dealt (are dealing) with it well.

  • bdtechbdtech Member
    edited April 2013

    @24khost what do you mean by servers are mirrored? Your vps containers are rynched to another box?

  • After seeing how others dealt with these things. We put in ground rules! Follow what the leaders do.

  • Yes are servers are High availability, and directly synced with a mirrored server.

  • Just an update, my VPS has been successfully recovered. Thanks to Devon and the team.

  • MaouniqueMaounique Host Rep, Veteran

    We also had this issue and had to restore from a 12h+ backup last week I think.
    It seems it was a case of power problem intermittent in the MB, not the PSU which is redundant anyway, which ended up breaking the raid and corrupting data. We had a failure before on the same node, but was "orderly", i.e. fell all at once and the storage vanished from under the server before being corrupted.
    Had no idea since a reboot fixed it completely and fsck reported no serious error.
    Just a couple of days ago, another failure, this time without corrupting data, just a drive, forced us to move everyone from that node overnight, just to make sure it is not something similar to the other node, almost same config and same make.
    This being said, I wish to make it certain that we do not officially take any back-up of the data, we do not offer this service, just back-up space for people interested, even 10 GB free, for that matter because nobody knows better what to save and when than the customer.
    Backing up data is the customer's duty, EVEN if it is offered as a service from the host, a fire can take place, an earthquake, mirrored corruption, host bankrupt, servers seized in a police raid, terrorists hauling them out, many things can go wrong, therefore, ppl, please take care of your data, if you dont, how can you expect others to do ?

  • @Maounique Great input. The mirrored corruption is what killed us.

  • @24khost I would just like to take this opportunity and say thank you. I do not currently have services with but in the past you have helped me before. I have to give you a thumbs up for coming out in the open with this and not trying to hide behind anything. You have been honest and straight with every update you have given. I appreciate this even though I am not a customer. This shows me you, as a person, take your responsibility seriously. You are now on the short list for my projects and I look forward to testing your Indiana location since it is close to me in Michigan.

    I have used RockMyWeb services and have nothing but respect for Devon and his people. They are top notch and I will use them again.

    @Maounique I agree completely but only will add one thing more. Keep back ups of your back ups. Recently in another unrelated fiasco I had back ups of my sites only to find out they were also corrupted. Might be a good idea to keep an eye on those back ups and make sure they are good as well just to make sure. Murphy being the kind of guy he is and all.

  • Thanks @AuroraZ

  • MaouniqueMaounique Host Rep, Veteran
    edited April 2013

    @AuroraZ said: Keep back ups of your back ups.

    That will not solve the issue of corrupted backups being backed up.
    Incremental back-ups would. I.e. you can go back a certain time when it was a consistent back-up while keeping the size sane.
    This, however, must be supervised from time to time. Scripts can and will go wrong, at my former job I was religiously restoring last backup every 1st of the month (well 2nd, 3rd if there was a week-end or holiday) to make sure it works. That after a near-miss when I had to restore a back-up partially from more tapes which were corrupted and salvage the latest data from the most recent ones, and older from older ones. Almost everything was recovered, but was very close, been very lucky that time.
    This is what has to be done with very important data.
    That being said, sorry 24k for invading this thread, I think every opportunity to explain ppl about backups is an opportunity which has to be taken.
    Great openness there :)

  • @Maounique I wish it were just easy to say hey you should have kept backup. It just feels so JackAss-ish.

  • MaouniqueMaounique Host Rep, Veteran

    I know, this is why I stepped up and said it for you :)

  • @Maounique thanks for being the jackass for me!

  • @24khost - Our vps with you in Choopa has been great and hasn't had any trouble through any of this. Thanks for the transparency and quick response on this issue though. Gives me confidence that if my vps was affected, you'd be working this hard for me!

    Best of luck with the recovery effort.

Sign In or Register to comment.