Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


BREAKING: Multiple global websites shown as offline, including Amazon Web Services and Reddit - Page 2
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

BREAKING: Multiple global websites shown as offline, including Amazon Web Services and Reddit

2»

Comments

  • In today’s episode of “I will use big providers because they never have downtime. Small webhosting company Blyat”

  • jon617jon617 Veteran

    Did anyone try turning it off and on? Power cycling usually fixes the Internet for me.

  • DataIdeas-JoshDataIdeas-Josh Member, Patron Provider

    @PeterP said:
    To bring this thread back on-topic, has Fastly mentioned anything yet as to what caused the outage this morning? I'm yet to see anything as I just got home for the night, but I'm sure I'll come across something if not here.

    probably was bgp...

  • PeterPPeterP Member, Host Rep

    @DataIdeas-Josh said:

    @PeterP said:
    To bring this thread back on-topic, has Fastly mentioned anything yet as to what caused the outage this morning? I'm yet to see anything as I just got home for the night, but I'm sure I'll come across something if not here.

    probably was bgp...

    Swap out DNS for BGP and you've got yourself a new meme :lol:

  • raindog308raindog308 Administrator, Veteran

    I'm confused...why would Amazon Web Services use Fastly? They have their own CDN, CloudFront.

    Thanked by 2lentro Tony40
  • lentrolentro Member, Host Rep

    @raindog308 said:
    I'm confused...why would Amazon Web Services use Fastly? They have their own CDN, CloudFront.

    “ Amazon’s own retail website actually runs through Fastly, rather than CloudFront, and has done since May 2020.”

    https://www.theguardian.com/technology/2021/jun/08/edge-cloud-error-tuesday-internet-outage-fastly-speed

    :lol:

    AWS is even too expensive for Amazon lol, I can’t come up with another explanation. Reminds me of a time when an employee of a big tech company with a respected cloud service rented dedicated servers from me because their employee discount still costed 2x low end rates lol.

    Thanked by 2raindog308 iNK79
  • @raindog308 said:
    I'm confused...why would Amazon Web Services use Fastly? They have their own CDN, CloudFront.

    More POPs:
    https://aws.amazon.com/about-aws/global-infrastructure/ vs
    https://www.fastly.com/network-map

    Thanked by 1raindog308
  • I suspect it was Chia.

  • LeviLevi Member

    @raindog308 said:
    I'm confused...why would Amazon Web Services use Fastly?

    Fastly is more advanced? And if your own cdn will go down what will you do?

  • MaouniqueMaounique Host Rep, Veteran

    @LTniger said: And if your own cdn will go down what will you do?

    And web site. Use Twitter... If it is not on AWS :pensive:
    So that is why Paypal wouldn't take my password? It shouldn't be related...

  • @sdglhm said:
    I suspect it was Chia.

    The Chia virus involucrated their servers

  • jsgjsg Member, Resident Benchmarker

    @PeterP said:
    To bring this thread back on-topic, has Fastly mentioned anything yet as to what caused the outage this morning?

    Fastly SVP said:
    What happened?
    On May 12, we began a software deployment that introduced a bug that could be triggered by a specific customer configuration under specific circumstances.

    Early June 8, a customer pushed a valid configuration change that included the specific circumstances that triggered the bug, which caused 85% of our network to return errors.

    Translation: They incompetently committed two arch-sins:

    • they don't properly check user input/user provided data
    • their design isn't properly compartmentalized and so one single customer could bring down virtually the whole service by (presumably inadvertently) making a change in their config.
  • MaouniqueMaounique Host Rep, Veteran
    edited June 2021

    @jsg said: they don't properly check user input/user provided data

    I don't see anything like that in the post-mortem. In fact they specify the config was valid, just applying it triggered a bug previously introduced (long ago, in fact, so it was a very rare thing/combination of things, presumably).

  • @jsg nice throwing around the words "incompetent", but:

    • they don't properly check user input/user provided data

    This seems like an edge case in a valid configuration change, how do you prevent that from happening with input validation.

    • their design isn't properly compartmentalized and so one single customer could bring down virtually the whole service by (presumably inadvertently) making a change in their config.

    PoPs are usually shared with event loop based software running, how do you perform isolation with such software? (Please don't say threads or per customer IPs. None of those scale for a CDN service.)

  • yoursunnyyoursunny Member, IPv6 Advocate

    @stevewatson301 said:

    • their design isn't properly compartmentalized and so one single customer could bring down virtually the whole service by (presumably inadvertently) making a change in their config.

    PoPs are usually shared with event loop based software running, how do you perform isolation with such software? (Please don't say threads or per customer IPs. None of those scale for a CDN service.)

    Run code that require isolation in eBPF or WebAssembly.
    I can reach 100Gbps on six CPU cores using eBPF (and a lot of non eBPF code).
    Also, my software is based on continuous polling, not event loops.

    Thanked by 1bulbasaur
  • jsgjsg Member, Resident Benchmarker
    edited June 2021

    Fastly said:
    On May 12, we began a software deployment that introduced a bug that could be triggered by a specific customer configuration under specific circumstances.

    Early June 8, a customer pushed a valid configuration change that included the specific circumstances that triggered the bug, which caused 85% of our network to return errors.

    I accept that that may sound normal/OK to many. From an IT-security and proper system and software design POV however this sounds like a confession of bloody incompetence.

    Now, one may discuss whether the bug is the problem or the fact that one customer "pushing a config" can bring down virtually the whole network or ... but to me it all says the same: too much marketing and large corp blabla and too little tech competence.

    Note that I did not say "run away!" or "switch over to competitor XYZ!". Simple reason: while one may exist I do not know of any major CDN provider with sound engineering down to the core.

  • @jsg said: too little tech competence.

    And yet I'm about to hear from you how you would prevent such a situation, apart from giving each customer their own thread, IP (or maybe containers), none of which actually scale for a CDN service.

    If a certain engineering methodology is nearly impossible to put in practice, you can hardly claim incompetence.

  • jsgjsg Member, Resident Benchmarker
    edited June 2021

    @stevewatson301 said:

    @jsg said: too little tech competence.

    And yet I'm about to hear from you how you would prevent such a situation,

    By properly designing and engineering.

    If a certain engineering methodology is nearly impossible to put in practice, you can hardly claim incompetence.

    When they have bugs and when a bug can bring down virtually their whole operation, one can claim incompetence.

    Btw, Stockholm syndrome? Or why do we have this discussion? Hello, earth to some users here: they f_cked up. Big time. And your position is "No, one must not call them incompetent!" and protecting them?

    Sorry but contrary to, so it seems, popular believe bugs and shoddy designing and engineering are not somehow God-given and unavoidable.

  • jsg be like:

  • defaultdefault Veteran

    The end is nigh.

    Thanked by 2alilet dahartigan
  • jsgjsg Member, Resident Benchmarker
    edited June 2021

    @stevewatson301

    • I did and do not promote HostSolutions - no matter what you feel the reality to be
    • You repeatedly try to treat me as the matter, which I'm not. This thread has a topic.
    • Thanks for demonstrating your lack of arguments and your way of dealing with it: attacking the guy who disagrees with you.
    • Thanks for demonstrating your "logic": you don't care about the fact that Factly had a clusterf_ck but you promote them based on them being a 6 bln$ company.
    • Btw, I did not say "don't buy from Fastly". Explicitly. But hey, don't let get reality in your way ...

    [self censored for the sake of politeness]

  • aliletalilet Member

    @stevewatson301 said:
    jsg be like:

    jsg be like:

  • jsgjsg Member, Resident Benchmarker

    Thanks a lot for confirming my sig.

  • EvoxtEvoxt Member

    Oh wow, I always thought Amazon uses their own CDN. Didn't know they are using Fastly as well

  • @jsg said:

    @stevewatson301 said:

    @jsg said: too little tech competence.

    And yet I'm about to hear from you how you would prevent such a situation,

    By properly designing and engineering.

    If a certain engineering methodology is nearly impossible to put in practice, you can hardly claim incompetence.

    When they have bugs and when a bug can bring down virtually their whole operation, one can claim incompetence.

    Btw, Stockholm syndrome? Or why do we have this discussion? Hello, earth to some users here: they f_cked up. Big time. And your position is "No, one must not call them incompetent!" and protecting them?

    Sorry but contrary to, so it seems, popular believe bugs and shoddy designing and engineering are not somehow God-given and unavoidable.

    That's why there's QA, because non-zero bugs are unavoidable and QA is there to catch and prevent catastrophic events. This is a QA failure, not an engineering failure.

    Did it work as designed or did it not? If it worked according to design but led to catastrophe, it was a design fault. If it didn't function according to design, it's QA's fault. The way it was described is a QA fail.

Sign In or Register to comment.