[181903] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: United Airlines is Down (!) due to network connectivity problems

daemon@ATHENA.MIT.EDU (Patrick W. Gilmore)
Thu Jul 9 03:19:01 2015

X-Original-To: nanog@nanog.org
From: "Patrick W. Gilmore" <patrick@ianai.net>
In-Reply-To: <3D6FF3DA-722A-4710-8B83-2EEA3DD53A94@baylink.com>
Date: Wed, 8 Jul 2015 15:31:06 -0400
To: NANOG list <nanog@nanog.org>
Errors-To: nanog-bounces@nanog.org

I=E2=80=99m with Ferg-dog.

I can=E2=80=99t tell you the number of times someone (yes, including me) =
has designed, purchased, and installed a system with multiple backups, =
failovers, redundancies, etc., and some vital piece fails in a weird way =
which sends the whole thing into a tailspin.

Taking UA as an example, since we have the most information (FSVO =
=E2=80=9Cmost=E2=80=9D), namely it was a =E2=80=9Cbad router=E2=80=9D. =
Let=E2=80=99s assume they had multiple routers configured with VRRP, =
BGP, OSPF, and an alphabet soup of other ways to detect and route-around =
failures. Now further assume one of those routers has a software or =
hardware bug which doesn=E2=80=99t take the router out of service, but =
leaves it up, replying to pings, answer SNMP polls, speaking BGP or =
OSPF, sending VRRP hellos, etc., etc. - but also eats half of all =
packets going _through_ the router. That can happen, I=E2=80=99ve seen =
it first hand.

All those redundant systems do nothing, since the =E2=80=9Cbad router=E2=80=
=9D is doing everything a good router would do. The systems designed to =
catch such problems all think things are fine, but they are not. Is it =
an attack? No, it=E2=80=99s bad luck.

Now some will claim - and perhaps rightfully - that UA should have =
systems which monitor for exactly this type of failure as well. Perhaps =
they should have, or perhaps the problem was nothing like what I =
explained. Either way, the point still stands that a company can have =
had multiple redundancies in place, but still experienced a failure mode =
which caused exactly the problem described.


At this point, we move on to: =E2=80=9CAll three simultaneously?!? NO =
WAY!!=E2=80=9D To which I would point out they were not simultaneous. UA =
was back up before NYSE went down. But even if they were simultaneous, =
sometimes stuff happens. The human mind is very good at seeing =
connections, even when there are none. Absent other evidence, I=E2=80=99m =
going to believe the companies=E2=80=99 public statements that this was =
not a hack. Perhaps I am being naive, but as I said, absent other =
evidence, it is a perfectly plausible explanation.

--=20
TTFN,
patrick


> On Jul 08, 2015, at 14:56 , Jay Ashworth <jra@baylink.com> wrote:
>=20
> UA, WSJ /and/ NYSE all in the same day?
>=20
> Once is an accident;  twice is a coincidence...
>=20
> Three times is enemy action.
>=20
> On July 8, 2015 1:18:47 PM EDT, Paul Ferguson =
<fergdawgster@mykolab.com> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>=20
>> Given that the Internet is held together with paper clips, bailing
>> twine, and bubblegum, I'd prefer to take theses organizations' =
initial
>> word for the fact that there is nothing obviously malicious in these
>> outages.
>>=20
>> The mainstream press, on the other hand, seems to want it to be a =
hack
>> or data breach or... something other than a "glitch". :-)
>>=20
>> - - ferg
>>=20
>>=20
>> On 7/8/2015 10:15 AM, Mel Beckman wrote:
>>=20
>>> It's important to not form an opinion too early, especially anyone=20=

>>> involved with forensic analysis of these systems. This is a
>>> classic fault in amateur investigation: an early opinion will lead
>>> you into confirmation bias, irrationally accepting data agreeing
>>> with your opinions and rejecting that disproving it.
>>>=20
>>> -mel beckman
>>>=20
>>>> On Jul 8, 2015, at 10:07 AM, Paul Ferguson=20
>>>> <fergdawgster@mykolab.com> wrote:
>>>>=20
>>> NYSE: "The issue we are experiencing is an internal technical issue
>>> and is not the result of a cyber breach."
>>>=20
>>> https://twitter.com/NYSE/status/618818929906085888
>>>=20
>>> United Air statement CNBC: =E2=80=9CAn issue with a router degraded =
network
>>> connectivity for various applications. We fixed the router."
>>>=20
>>> https://twitter.com/barronstechblog/status/618816643821633536
>>>=20
>>> - ferg
>>>=20
>>=20
>>=20
>> - --=20
>> Paul Ferguson
>> PGP Public Key ID: 0x54DC85B2
>> Key fingerprint: 19EC 2945 FEE8 D6C8 58A1 CE53 2896 AC75 54DC 85B2
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v2
>>=20
>> iF4EAREIAAYFAlWdW3cACgkQKJasdVTchbLr/wD/aBNnLFv+MU+QI1ja7dd9LiSN
>> Zkum4lSIutxFn1NmaYoBAIgO/Ig7FxD4vRzQK8bUturn4YGw9FXMT+EzVTKhIbVG
>> =3D/yYp
>> -----END PGP SIGNATURE-----
>=20
> --=20
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.


home help back first fref pref prev next nref lref last post