Automatic failover of a VPS

Raymii · January 2018

So how many providers have a system like Hyper-V's failover cluster, where with shared storage when a hypervisor goes down, the VPS is booted up on another node automatically? (Not like vmware's fault tolerant where a second instance is also running with cpu/ram syncs).

This costs money in the sense that you must reserve spare capacity which otherwise can be sold.

A dutch provider with 4 virtualization platforms (xen/hyper-v and 2 openstacks) has this where almost all platforms provide this failover function to protect against hypervisor failure. OpenStack for example can utilize https://github.com/ntt-sic/masakari and Hyper-V has it built in.

If there are other provider do this, how and why? What are your considerations and did cost increase?

Providers that don't provide this, why not except for the cost and increased complexity?

IonSwitch_Stan · January 2018

From my experience its considerably easier to engineer a platform that is horizontally scalable than it is to setup failover clustering. For many "low end" activities, its simply more economical and better availability to setup two servers to host your Wordpress site, your application, etc. This is the pattern generally suggested for the big three cloud providers (AWS, GCE, Azure) even when they do offer shared storage and automatic recovery when there is hardware failure.

A true failover cluster usually requires (at minimum) two hypervisors, two storage controllers, redundant storage on enterprise disks, etc. Some folks have tried CEPH at LEB scale, and had fairly catastrophic failures (ZXHost?). The cost to run this sort of infrastructure is also more than 2x a single VPS, its likely more like 4x the cost, and still has little evidence that it is fully setup and tested.

If you are willing to pay 2-4x a single VPS price (and probably 10-20x a LEB price), there are likely providers that offer this, but its really an anti-pattern on shared infrastructure.

raindog308 · January 2018

IonSwitch_Stan said: From my experience its considerably easier to engineer a platform that is horizontally scalable than it is to setup failover clustering.

So true it brought a tear to my eye.

Raymii said: Providers that don't provide this, why not except for the cost and increased complexity?

What other reasons do you need?

RR DNS + web servers + replicated DB = LowEndHA. 99% of people don't need anything else. 99% of people probably don't even need that.

Raymii said: shared storage

Shared storage = SAN = massive increase in cost/complexity. Shared storage means now you have to monitor the paths leading to it (which also must be HA), performance within the SAN, etc. My F500 employer does exactly as you describe - big VMware farms on top of the best SSD SAN money can buy, but we're not selling $5 VMs and have a team of VM people.

Raymii said: If there are other provider do this, how and why?

OTOH, I believe DO, Vultr, etc. do not use SANs and favor a replication-based approach. Someone from Azure told me once that Microsoft does not deploy SANs in any of their DCs. And they do as you describe - if an Azure VM fails, it is restarted on a different physical node.

hostdare · January 2018

raindog308 said: OTOH, I believe DO, Vultr, etc. do not use SANs and favor a replication-based approach. Someone from Azure told me once that Microsoft does not deploy SANs in any of their DCs. And they do as you describe - if an Azure VM fails, it is restarted on a different physical node.

Means two running vms replicating each other in real time ?

IonSwitch_Stan · January 2018

Azure/Google/Amazon all offer both in-host lower cost, higher performance, or more storage options that are all considered non-ha. They also have off-hypervisor storage on a storage cluster, such that if the host fails it can restart. Amazon calls this EBS or Elastic Block Store. Any EC2 instance that boots from EBS can sustain a hypervisor failure.

While this isn’t technically “SAN” with a big name like EMC/NetApp/etc on it, it is shared storage with redundant access.

Running two proxmox hosts with ceph would give the same type of host failover. (Until Ceph fails and you lose all your customer data).

sureiam · January 2018

I've wanted to do a VPS cluster for a while. I just wish there was a Linux distribution with this feature built in. But i guess I'm being too lazy

willie · January 2018

Raymii said:

So how many providers have a system like Hyper-V's failover cluster, where with shared storage when a hypervisor goes down, the VPS is booted up on another node automatically?

OVH Cloud VPS does this using Ceph, and has been around for a while. There have been occasional bad performance snags but appears to mostly work ok.

The new Hetzner cloud product claims to also do it, but it just became available last week, so reliability is yet to be seen.

Howdy, Stranger!

Categories

In this Discussion

Automatic failover of a VPS

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Automatic failover of a VPS

Comments