Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Spartan Host Seattle knocked offline - no tears
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Spartan Host Seattle knocked offline - no tears

yoursunnyyoursunny Member, IPv6 Advocate
edited March 2021 in Outages

My main website is normally deployed on vps4 located in @SpartanHost Seattle data center.
This morning I'm getting multiple alerts that the server is offline.
According to reports in other forums, I'm not the only one affected.
It seems that someone managed to knock the DDoS protected network offline.


After the downtime reached one hour, I decided to activate my disaster recovery plan and re-deploy the website to vps6 on Nexril network.
https://yoursunny.com is online again after 15 minutes, avoiding losing billions per hour.

My steps for recovery include:

  1. Install Caddy, nginx, and PHP.
  2. Edit IP address in the configuration files.
  3. rsync content and configuration from laptop.
  4. Change DNS records.

Today's experience revealed a shortcoming in my disaster recovery plan:
The installation steps are written for Ubuntu and involve installing packages from Ubuntu PPA, but most of my servers except vps4 and vps6 are setup with Debian.
In the future, I need to change nginx and PHP to use Docker containers, which would eliminate such problem.

Thanked by 1xaoc

Comments

  • SpartanHostSpartanHost Member, Host Rep

    Hi

    The intermittent downtime was not related to a conventional DDoS attack but rather a DoS vulnerability in the version of Junos we were using on our Juniper MX240, the router has had a software update and everything should be fine now.

    Apologies for the inconvenience caused.

  • skorupionskorupion Member, Host Rep

    @yoursunny said: avoiding losing billions per hour.

    You should have everything ready, so you would need to only change the IP address on DNS (less downtime)

    Thanked by 2yoursunny TimboJones
  • yoursunnyyoursunny Member, IPv6 Advocate

    @SpartanHost said:
    The intermittent downtime was not related to a conventional DDoS attack but rather a DoS vulnerability in the version of Junos we were using on our Juniper MX240

    Is it a software bug, or is it a Solarflare type attack?


    everything should be fine now.

    Not really.
    My VM is still unreachable.

    Strangely, if I enable VNC in Virtualizor, the VM is reachable (on both IPv4 and IPv6); if I disable VNC, the VM becomes inaccessible.
    Of course, correlation is not causation.

    Did you happen to upgrade Virtualizor recently?
    We all know that Virtualizor updates gives VPS Providers grey hair and sleep deprievation.

  • yoursunnyyoursunny Member, IPv6 Advocate
    edited March 2021

    @skorupion said:

    @yoursunny said: avoiding losing billions per hour.

    You should have everything ready, so you would need to only change the IP address on DNS (less downtime)

    It's a trade-off between how much time I spend maintaining 2x servers and how much time would be needed to re-deploy.
    Currently, if I have to re-deploy from scratch including OS installation, it would take 1 hour.
    If I make the decision after 1 hour of downtime, the total downtime would be no more than 2 hours.
    Not too bad.

    However, if I'm out geocaching when the server becomes offline, it would be a longer downtime.

    I am thinking about an automatic failover procedure:

    1. The secondary server periodically rsync content and configuration from the primary, so that I don't need to upload twice from the laptop.
    2. The secondary frequently checks whether the primary is accessible.
    3. If the secondary cannot reach the primary, and UptimeRobot API also reports a downtime, DNS records are changed via Cloudflare API.
    4. To avoid potential issues with TLS certificates, Cloudflare MITM proxy will be enabled when secondary server is in use.
  • SpartanHostSpartanHost Member, Host Rep

    @yoursunny said:

    @SpartanHost said:
    The intermittent downtime was not related to a conventional DDoS attack but rather a DoS vulnerability in the version of Junos we were using on our Juniper MX240

    Is it a software bug, or is it a Solarflare type attack?


    everything should be fine now.

    Not really.
    My VM is still unreachable.

    Strangely, if I enable VNC in Virtualizor, the VM is reachable (on both IPv4 and IPv6); if I disable VNC, the VM becomes inaccessible.
    Of course, correlation is not causation.

    Did you happen to upgrade Virtualizor recently?
    We all know that Virtualizor updates gives VPS Providers grey hair and sleep deprievation.

    Juniper Software bug.

    Virtualizor hasn't had an update recently. I haven't seen that specific issue with VNC before and the VPS's network. Will do some checks to see if I can replicate and will PM you.

  • bulbasaurbulbasaur Member
    edited March 2021

    So much engineering, for what is essentially a meme website.

    But, as far as the commands are concerned, I suggest that you use Ansible. That way, you can just run ansible-playbook -i inventory playbook.yml to get up and running within minutes.

    Thanked by 3lanefu ddvu bdl
  • @skorupion said: You should have everything ready, so you would need to only change the IP address on DNS (less downtime)

    What about everything automated? In the moment it detects anything wrong it will auto deploy and call dns servers api to change the record.

  • yoursunnyyoursunny Member, IPv6 Advocate

    @SpartanHost said:

    @yoursunny said:
    Not really.
    My VM is still unreachable.

    Strangely, if I enable VNC in Virtualizor, the VM is reachable (on both IPv4 and IPv6); if I disable VNC, the VM becomes inaccessible.
    Of course, correlation is not causation.

    Did you happen to upgrade Virtualizor recently?
    We all know that Virtualizor updates gives VPS Providers grey hair and sleep deprievation.

    Virtualizor hasn't had an update recently. I haven't seen that specific issue with VNC before and the VPS's network. Will do some checks to see if I can replicate and will PM you.

    Ticket 521048.


    @stevewatson301 said:
    So much engineering, for what is essentially a meme website.

    My website has more than memes.
    I have 15 years of technical articles on https://yoursunny.com/t/ , including the top result of "compile one kernel module" and the second result of "install opencv3 on raspberry pi zero".

    Also, push-ups aren't even on this server.
    Frontend is Netlify.
    Video repository is on vps5 (VirMach BUF); it's a single point of failure at the moment, because I deleted box6 last week and haven't deployed the planned replica.

    But, as far as the commands are concerned, I suggest that you use Ansible. That way, you can just run ansible-playbook -i inventory playbook.yml to get up and running within minutes.

    Yes, the global NDN network is controlled by Ansible.
    However, it's not as easy as it sounds if there are slight differences between required configurations.


    @Boogeyman said:
    What about everything automated? In the moment it detects anything wrong it will auto deploy and call dns servers api to change the record.

    I'll get there, eventually.

  • @yoursunny said:

    My steps for recovery include:

    and PHP.

    and PHP to use.

    So you do push ups but still use a language for little girls?

    Thanked by 2skorupion TimboJones
  • Fork you router. Get this update and work like you should. This is SPARTAAAAAAAAAA!

  • @yoursunny said:

    @Boogeyman said:
    What about everything automated? In the moment it detects anything wrong it will auto deploy and call dns servers api to change the record.

    I'll get there, eventually. , probably , Never

    Thanked by 1yoursunny
  • yoursunnyyoursunny Member, IPv6 Advocate

    @Jona4s said:

    @yoursunny said:

    My steps for recovery include:

    and PHP.

    and PHP to use.

    So you do push ups but still use a language for little girls?

    yoursunny.com was created in 2006.
    There were no Node.js or Go or Flask at that time; Django was only 1 year old.
    The popular choices were:

    • Active Server Pages 3.0
    • Microsoft .Net Framework 2.0
    • Personal Homepage Preprocessor 5.0

    I had pages in all three technologies, deployed on a Windows 2003 dedicated server.
    By 2009, most pages were turned into PHP.
    I switched to Linux shared hosting, then VPS in 2011.

    If you travel back to 2006 and release a good framework at that time, I would be happy to consider.


    The push-ups site is a static site.
    Video repository is Node.js: https://github.com/yoursunny/NDNts-video-server

    Thanked by 2lentro brueggus
  • @yoursunny said: If you travel back to 2006 and release a good framework at that time, I would be happy to consider.

    I fear someone from js will get the idea of timemachine.js from here.

  • If the PPA packages you wanted to install is Sury PHP, they provide Debian repo as well.
    I've wrote a script to install some popular repos on both Debian and Ubuntu.
    https://gist.github.com/bohanyang/a789784012ee1da5949f609195ac2d1c

    Thanked by 1yoursunny
  • ddvuddvu Member

    Do more push-ups, that will give you the muscles to compile PHP from source like a real man. No ppa required.

    Might add another hour of downtime though

    Thanked by 2yoursunny webcraft
  • It is offline again for about totally SIX minutes. I am crying...

  • yoursunnyyoursunny Member, IPv6 Advocate

    The VNC cause offline issue was resolved. The technical reason will deserve its own article.


    @ddvu said:
    Do more push-ups, that will give you the muscles to compile PHP from source like a real man. No ppa required.

    Might add another hour of downtime though

    If I attempt to compile software, I'm going to trigger the high CPU suspension hammer.


    @codydoby said:
    It is offline again for about totally SIX minutes. I am crying...

    Where's your (or your renter's) disaster recovery plan?

    Thanked by 1dahartigan
  • @yoursunny said: If I attempt to compile software, I'm going to trigger the high CPU suspension hammer.

    Why don't you compile it on a VM or your local system and then package the binaries up, and then deploy them on the required hosts?

  • yoursunnyyoursunny Member, IPv6 Advocate

    @stevewatson301 said:

    @yoursunny said: If I attempt to compile software, I'm going to trigger the high CPU suspension hammer.

    Why don't you compile it on a VM or your local system and then package the binaries up, and then deploy them on the required hosts?

    My local system is ARMv7.
    Packages compiled here wouldn't work on VPS, except Scaleway C1 and DataIdeas.

    .deb packages I produced are no different from .deb packages from publishers.

    I have a GitHub workflow for building Named Data Networking packages every week, because the publisher only updates their packages twice a year.

  • codydobycodydoby Member
    edited April 2021

    @yoursunny said: Where's your (or your renter's) disaster recovery plan?

    My code is simple enough. There are only several steps...
    1. Modify the Github Actions secrets (IP, Port, Username, Secret Key for SSH)
    2. Trigger the Github Action
    3. Complete || Failed && Notify me

    But the problem is, I might be sleeping when the server is disconnected. So it is not smart enough atm.

Sign In or Register to comment.