Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


How would YOU achieve 100% uptime?
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

How would YOU achieve 100% uptime?

BlanozBlanoz Member

Greetings,

I bring to your attention a subject which I find to lack popularity. I'd appreciate if you could share the setup that you have personally tested or even better, already implemented.

You have 2 servers, dedicated or VPS's spread around the globe. Or at least not in the same datacenter. What is your solution of having literally no downtime during hard times?

It is frustrating to experience downtime for personal servers and up to extreme annoyance for production servers. Your hints or actual setups will be greatly appreciated.

PS: if this has been discussed already, please point me to the thread; I have failed to find one as such.

Comments

  • 100% is a myth or to expensive. Everybody has downtime at some point.

  • linuxthefishlinuxthefish Member
    edited July 2014

    Redundant DNS servers, round robin DNS, and high availability proxies! It all depends on what 100% uptime means to you, for example if my servers and backups are available to me when I need them - that's fine for me!

    Thanked by 1Blanoz
  • BlanozBlanoz Member
    edited July 2014

    round robin DNS

    Round robin DNS wouldn't cause ~50% of requests to fail?

    Redundant DNS servers

    So a 3rd "master" server is necessary to point to one of the available servers.

  • ./god/configure --disable-downtime --prefix=/proc/heaven && cd god && make && make install

    Thanked by 2Hybrid jetchirag
  • linuxthefishlinuxthefish Member
    edited July 2014

    Blanoz said: Round robin DNS wouldn't cause ~50% of requests to fail?

    Not normally, most browsers and such are smart enough to go to the next IP.

    Blanoz said: So a 3rd "master" server is necessary to point to one of the available servers.

    If the master server is down, the slaves still serve DNS requests.

  • Go with ColoCrossing ? lol

  • ChuckChuck Member

    @XxNisseGamerxX said:
    Go with ColoCrossing ? lol

    ColoCrossing is a myth.

  • If I had an actual business need for 100%, anycast with VRRP/CARP on the local side + some sort of load balancing would probably be the way to go.

    pfsync with pf if a firewall is required locally, et al.

    Thanked by 1Blanoz
  • rds100rds100 Member

    Even with failover setup, etc. there is still some technical time needed for the switchover to happen. So zero downtime is not possible, but achieving small maximum downtime (a couple of minutes or so) should be possible.

    Thanked by 1Dylan
  • there is still some technical time needed for the switchover to happen

    Depends on what kind of switchover you're talking about.

    DNS based ones, absolutely. Routing based ones where multiple routes existed to multiple fully redundant gateways? Nope.

    Thanked by 2Zen Blanoz
  • rds100rds100 Member
    edited July 2014

    @Wintereise even if you anycast it and withdraw the routes for one of the locations from BGP, there is still some time needed for the routes to reconverge. BGP updates are not instant.

  • rds100 said: @Wintereise even if you anycast it and withdraw the routes for one of the locations from BGP, there is still some time needed for the routes to reconverge. BGP updates are not instant.

    The point is that routes should never have to be withdrawn, so reconvergence should never happen -- period.

    This part is easily accomplishable with redundant edge/border routing and some sort of redundancy protocol like HSRP/VRRP/CARP. Use diverse fiber paths, if possible different buildings to house core gear and you're very likely to be on the way to a full 100%.

    Now, of course, if a Tornado knocks down a whole city, then you're gonna have to deal with reconvergence -- but the likelihood of that happening is little to nil on most dcs, and in those cases, I think a few seconds to reconverge is perfectly acceptable.

  • AnthonySmithAnthonySmith Member, Patron Provider
    edited July 2014

    In previous work we had 2 DC's (small ish) side by side with fiber between them for the SAN <> SAN replication (these were $250,000 EMC SANS) with, cross site clusters that could take something silly like an 8 in 10 failure before service was impacted etc.

    This was a money is no object project, probably cost about $15,000,000 for file storage/ app hosting and email that simply could not go down.

    It was based on an island, the amount of cables ran with different carriers was insane.

    Sadly about 2 weeks after it was signed off as complete the whole island lost power, due to an offshore disaster, after 4 days of people literally running backwards and forwards night and day with diesel for the generators a fire broke out and the whole thing completely failed.

    So nothing is 100% because something will always happen, it is fair to say it is achievable but only with good luck :)

    That is why I like lowendspirit, €13.00 p/year for 5 x servers in different countries/ DC's with HAproxy for fail-over so even if 4/5 DC's blew up at the same time you would still stay up :)

    In the above example had they taken the initial advice and replicated it with the mainland this would not have happened but I think it was an extra $7,000,000 for the cross connect run under the water.

  • rds100rds100 Member
    edited July 2014

    Even if you go to extremes for redundancies at your end, your upstreams will fail you sooner or later. They all do services affecting maintenances some times. And they all screw up without any warning sometimes. I've seen this from all kind of carriers - both premium like Level3 or shitty like Cogent. So reconvergence will be necessary sometimes. It is unavoidable. And since IP is a "best effort" protocol - it's OK too, you just have to accept it. You can minimize it, but can never 100% guarantee that it won't happen.

    Thanked by 2Zen Wintereise
  • rds100 said: And since IP is a "best effort" protocol - it's OK too, you just have to accept it. You can minimize it, but can never 100% guarantee that it won't happen.

    Correct, all I'm trying to say is that it's possible to minimize that window down to near zero levels.

    A downtime that nobody noticed isn't a downtime, etc ;)

    Thanked by 2rds100 fisle
  • I'd definately use proxies and go multisite, will always beat a single location no matter how good the hardware/UPS/generators/etc.

    Thanked by 1AnthonySmith
Sign In or Register to comment.