[181978] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: United Airlines is Down (!) due to network connectivity problems

daemon@ATHENA.MIT.EDU (Dovid Bender)
Thu Jul 9 04:19:06 2015

X-Original-To: nanog@nanog.org
In-Reply-To: <5B17ED50-A8E5-464F-A889-FE2C35035147@ox.com>
Date: Wed, 8 Jul 2015 17:40:42 -0400
From: Dovid Bender <dovid@telecurve.com>
To: Matthew Huff <mhuff@ox.com>
Cc: nanog2 <nanog@nanog.org>
Errors-To: nanog-bounces@nanog.org

Other than for an emergency repair who roles out a software update in
middle of the week? We test, test and then test some more and only then
roll out on weekends. Our maintenance window is 00:00 - 01:00 Sunday
mornings for sw updates etc.


On Wed, Jul 8, 2015 at 3:02 PM, Matthew Huff <mhuff@ox.com> wrote:

> Traders on the floor are being told that it=E2=80=99s a software glitch f=
rom new
> software that was rolled out Tuesday night. Nothing official has been
> said.  The only thing I know for sure is that if the NYSE was hacked, the=
y
> wouldn=E2=80=99t tell anyone the details for a long time, if ever.
>
> The impact of the NYSE being down is much less significant than it used t=
o
> be since most stocks are multiple-listed on other exchanges.
>
> The lack of information through official channels is unusual though. In
> previous situations, there has been at least a little hand-holding. So fa=
r,
> nada. In fact, other than financial service provider=E2=80=99s emails, th=
ere has
> been no emails so far today from the NYSE, including the announcement of
> resumption of service. According the the NYSE web page, trading will resu=
me
> at 3:05pm EST today with primary specialist, and 3:10 for everyone.
>
>
>
>
> > On Jul 8, 2015, at 2:33 PM, Brett Frankenberger <rbf+nanog@panix.com>
> wrote:
> >
> > On Wed, Jul 08, 2015 at 01:55:43PM -0400, Valdis.Kletnieks@vt.edu wrote=
:
> >> On Wed, 08 Jul 2015 17:42:52 -0000, Matthew Huff said:
> >
> >>> Given that the technical resources at the NYSE are significant and
> >>> the lengthy duration of the outage, I believe this is more serious
> >>> than is being reported.
> >>
> >> My personal, totally zero-info suspicion:
> >>
> >> Some chuckleheaded NOC banana-eater made a typo, and discovered an
> >> entirely new class of wondrous BGP-wedgie style "We know how we got
> >> here, but how do we get back?" network misbehaviors....
> >
> > We don't know how long the underlying problem lasted, and how much of
> > the continued outage time is dealing with the logistics of restarting
> > trading mid-day.  Completely stopping and then restarting trading
> > mid-day is likely not a quick process even if the underlying technical
> > issue is immediately resolved.
> >
> >> (Such things have happened before - like the med school a few years ag=
o
> that
> >> extended their ethernet spanning tree one hop too far, and discovered
> that
> >> merely removing the one hop too far wasn't sufficient to let it come
> back up...)
> >
> > No, but picking a bridge in the center, giving it priority sufficient
> > for it to become root, and then configuring timers[1] that would
> > support a much larger than default diameter, possibly followed by some
> > reboots, probably would have.
> >
> > From what has been publicly stated, they likely took a much longer and
> > more complicated path to service restoration than was strictly
> > necessary.  (I have no non-public information on that event.  There may
> > be good reasons, technical or otherwise, why that wasn't the chosen
> > solution.)
> >
> >     -- Brett
> >
> > [1] You only have to configure them on the root; non-root bridges use
> > what root sends out, not what they ahve configured.
>
>

home help back first fref pref prev next nref lref last post