Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


ZxHost Failure - OpenVZ
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

ZxHost Failure - OpenVZ

I've been at ZHost for two years, and I've never had a problem.

On March 25 I received this mail

Since then, I have not had any news, and the VPS does not work.

Did other people have the same problem?

Hello Stephane,

The node your OpenVZ VM operates on has been effected by a RAID failure before all migrations to our KVM enviornment could be fully completed.

We are working to restore the data if possible from the RAID set, however to get you back online asap we are looking to setup your new KVM VM.

If you can please reply to this email with the OS you require along with any further requirement's, we will be providing an extension of 1 month to all effected services.

Thanks,

ZXHost

Comments

  • NekkiNekki Veteran

    Did you reply to the email?

    Thanked by 1netomx
  • jarjar Patron Provider, Top Host, Veteran
    edited April 2017

    comeback said: Did other people have the same problem?

    Nope, you're the only one on the node! RAID failures generally only cause problems for one person. I've never heard of one happening before though.

    :)

  • BopieBopie Member

    @jarland i almost thought that your comment was from @nekki had to check the names twice ;)

  • @jarland said:

    comeback said: Did other people have the same problem?

    Nope, you're the only one on the node! RAID failures generally only cause problems for one person. I've never heard of one happening before though.

    :)

    I have misunderstood myself,

    Have you ever had this problem with another provider?

    Do you think they can fix it?

  • NekkiNekki Veteran

    @Bopie said:
    @jarland i almost thought that your comment was from @nekki had to check the names twice ;)

    There would almost certainly be more swearing if it was me.

    Thanked by 1Bopie
  • HarambeHarambe Member, Host Rep

    @comeback said:

    I have misunderstood myself,

    Have you ever had this problem with another provider?

    Yes.

    Do you think they can fix it?

    Who knows. Shit happens. This is why you always need backups.

  • jarjar Patron Provider, Top Host, Veteran

    comeback said: Have you ever had this problem with another provider?

    Pretty much every provider. Depends on the reason the RAID failed. Honestly, you can never know who is going to be honest with you about why. Could be they were lazy replacing a drive and another went out, could be a controller going nuts. If a controller failed then I'd give it a 50/50 shot of recovery.

  • pbgbenpbgben Member, Host Rep

    @jarland said:

    comeback said: Have you ever had this problem with another provider?

    Pretty much every provider. Depends on the reason the RAID failed. Honestly, you can never know who is going to be honest with you about why. Could be they were lazy replacing a drive and another went out, could be a controller going nuts. If a controller failed then I'd give it a 50/50 shot of recovery.

    "What do you mean I can't buy these controllers anymore"

    Thanked by 1jar
  • NekkiNekki Veteran

    @comeback

    DID YOU RESPOND TO THE FUCKING EMAIL.

    Thanked by 1imok
  • BopieBopie Member

    @Nekki said:
    @comeback

    DID YOU RESPOND TO THE FUCKING EMAIL.

    And there is the real nekki

    Thanked by 1ErawanArifNugroho
  • FalzoFalzo Member

    I am also affected by this, but luckily I only used this old storage node for backups only and don't need the data restored after all.

    to satisfy @nekki of course I replied and waited patiently since then... ;-)

    those services on the old hetzner nodes were to be transferred to the frankfurt location of zxhost since a while but probably @AshleyUk couldn't make it as fast as planned before.

    I also noticed some changes at least to the naming of the storages nodes in frankfurt recently where the newer nodes are located. so I assume Ashley is steadily working on this issue, including migrating services over which might take quite some time depending on how much services there are and how much data were put into them...

    I'd also appreciate a tad more info or status-updates in between - there probably already is a big ticket backlog anyways ^^

    Thanked by 1AshleyUk
  • Thanks for the tags! People who where effected and replied to the email got their new VM setup.

    Have been working my way trough the migration of the VM's for a while now, quite a few storage nodes from hetzner. Has taken longer due to many reasons including some people just not replying to emails :) and sadly this happened on one of the nodes before it was fully empty.

    I think I know who the OP is as received a reply to the email around the same time as this post, awaiting further details and will happily resolve for the OP.

    Thanked by 1Falzo
  • Ashley, thanks for posting. How is the raid recovery going? Can I ask what the raid levels was? Are you saying you're migrating all your old Hetzner storage servers to Frankfurt?

  • @willie said:
    Ashley, thanks for posting. How is the raid recovery going? Can I ask what the raid levels was? Are you saying you're migrating all your old Hetzner storage servers to Frankfurt?

    Was running Raid10, does not look too good to be honest but still trying.

    And yes we have been working on it for a while, we had nearly finished just happened to be one of the last few servers with the issue.

  • AshleyUk said:

    Was running Raid10, does not look too good to be honest but still trying.

    Oh yikes. Is this multiple drive failures, or hw controller or what, if you don't mind my asking? (I have some storage with you but it's in your Frankfurt Ceph cluster which sounds safer than raid-10).

  • @willie said:

    AshleyUk said:

    Was running Raid10, does not look too good to be honest but still trying.

    Oh yikes. Is this multiple drive failures, or hw controller or what, if you don't mind my asking? (I have some storage with you but it's in your Frankfurt Ceph cluster which sounds safer than raid-10).

    Multiple drive failures.

  • rokokrokok Member

    AshleyUk said: Are you saying you're migrating all your old Hetzner storage servers to Frankfurt?

    i got no issue on my old storage, but i need answer: will Frankfurt location free incoming bandwidth like Hetzner??

  • HarambeHarambe Member, Host Rep

    @rokok said:

    AshleyUk said: Are you saying you're migrating all your old Hetzner storage servers to Frankfurt?

    i got no issue on my old storage, but i need answer: will Frankfurt location free incoming bandwidth like Hetzner??

    If it's the same as the ceph plans, then yeah, free inbound.

    Thanked by 1AshleyUk
  • williewillie Member
    edited April 2017

    AshleyUk said: Multiple drive failures.

    Thanks. I'm getting less enthusiastic about raid-10. Will try to aim for Raid-6, Ceph, ZFS etc. I'm liking my VPS on your Ceph cluster. It feels incredibly solid for some reason.

  • jarjar Patron Provider, Top Host, Veteran
    edited April 2017

    @willie said:

    AshleyUk said: Multiple drive failures.

    Thanks. I'm getting less enthusiastic about raid-10. Will try to aim for Raid-6, Ceph, ZFS etc. I'm liking my VPS on your Ceph cluster. It feels incredibly solid for some reason.

    I just don't understand how multiple drives fail at once unless you either get a really really unlucky dice roll or someone was lazy about replacing one of the bad drives because "it's fine, array is still alive."

    RAID10 is amazing. But you also have to consider the possibility SO many people are running fleets of RAID10 that you're going to hear more failure stories about it than a configuration that less servers are running.

    I say don't throw out the popular choice because you hear about the few times that one array fails. It's popular because you have to lose two drives to kill it, and if people are on top of things and not lazy, and controllers don't fail themselves in a spectacular way (a risk on any RAID array, don't buy shit controllers and keep spares), cases of failure should be exceptionally minimal.

    And let's be real, hosts don't have to tell you it's because they were slow replacing the first drive that failed. You won't know any different, you weren't there. So it's real easy to blame something else, and you'll never really know who is telling the truth.

    Thanked by 1MikeA
  • williewillie Member
    edited April 2017

    It's popular because you have to lose two drives to kill it,

    Yes, and the same is true of raid-5 which is deprecated with large drives these days. The other alternatives I mentioned can survive every possible 2-drive failure, or even 3-drive (etc.) depending on configuration. They were designed for the specific reason that 2-drive failures aren't all that rare. Remember also that in a raid-10 rebuild you're pounding the crap out of the surviving member of the pair that had a failure. That increases its own likelihood of failure.

    From what I understand, Online's Enterprise C14 product ($$$$) is a distributed software RAID with something like 47 servers (rack of 1U's I guess) and up to something like 20 can fail. For the non-enterprise version the number is lower but it's still a lot compared to what we're used to.

  • daffydaffy Member

    We had a batch of 12 drives where 2 went belly up within 48 hours now. 1.5 year old Toshiba enterprise drives. Luckily, the rebuild finished on the first drive just before the second one decided to die.

  • This is one of the things that we look at when we build our servers, making sure that drives aren't all from the same batch. We have been caught out by HP drives failing at the same time. We tend to make sure at the very minimum the hot spare is from a different batch, as it gives us a chance if it is a model issue to correct it. Have also experienced drives failing during a rebuild. The RAID model isn't so great on new really big drives because of the rebuild times... Is there anything as good as it yet? Not as far as I have seen.

Sign In or Register to comment.