[154280] in North American Network Operators' Group
Re: FYI Netflix is down
daemon@ATHENA.MIT.EDU (Todd Underwood)
Sat Jun 30 16:25:45 2012
In-Reply-To: <CACnPsNVHpp4T9q7vufjM0L9rSo3FJJaxpd1XY3UDAYjZ9RpZKA@mail.gmail.com>
From: Todd Underwood <toddunder@gmail.com>
Date: Sat, 30 Jun 2012 16:24:41 -0400
To: Scott Howard <scott@doc.net.au>
Cc: nanog@nanog.org
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
scott,
>>
>> This was not a cascading failure. =C2=A0It was a simple power outage
>>
>> Cascading failures involve interdependencies among components.
>
>
> Not always.=C2=A0 Cascading failures can also occur when there is zero de=
pendency
> between components.=C2=A0 The simplest form of this is where one environm=
ent
> fails over to another, but the target environment is not capable of handl=
ing
> the additional load and then "fails" itself as a result (in some form or
> other, but frequently different to the mode of the original failure).
indeed. and that is an interdependency among components. in
particular, it is a capacity interdependency.
> Whilst the Amazon outage might have been a "simple" power outage, it's
> likely that at least some of the website outages caused were a combinatio=
n
> of not just the direct Amazon outage, but also the flow-on effect of thei=
r
> redundancy attempting (but failing) to kick in - potentially making the
> problem worse than just the Amazon outage caused.
i think you over-estimate these websites. most of them simply have no
redundancy (and obviously have no tested, effective redundancy) and
were simply hoping that amazon didn't really go down that much.
hope is not the best strategy, as it turns out.
i suspect that randy is right though: many of these businesses do not
promise perfect uptime and can survive these kinds of failures with
little loss to business or reputation. twitter has branded it's early
failures with a whale that no only didn't hurt it but helped endear
the service to millions. when your service fits these criteria, why
would you bother doing the complicated systems and application
engineering necessary to actually have functional redundancy?
it simply isn't worth it.
t
>
> =C2=A0 Scott