[160508] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Level3 worldwide emergency upgrade?

daemon@ATHENA.MIT.EDU (Brett Watson)
Wed Feb 6 20:07:20 2013

From: Brett Watson <brett@the-watsons.org>
In-Reply-To: <51C66083768C2949A7EB14910C78B01701D94198@embgsr24.pateam.com>
Date: Wed, 6 Feb 2013 18:06:39 -0700
To: nanog@nanog.org
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

Hell, we used to not have to bother notifying customers of anything, we =
just fixed the problem. Reminds me a of a story I've probably shared on =
the past.=20

1995, IETF in Dallas. The "big ISP" I worked for at the time got tripped =
up on a 24-day IS-IS timer bug (maybe all of them at the time did, I =
don't recall)  where all adjacencies reset at once. That's like, entire =
network down. Working with our engineering team in the *terminal* lab =
mind you, and Ravi Chandra (then at Cisco) we reloaded the entire =
network of routers with new code from Cisco once they'd fixed the bug. I =
seem to remember this being my first exposure to Tony Li's infamous =
line, "... Confidence Level: boots in the lab."

Good times.

-b


On Feb 6, 2013, at 5:41 PM, Brandt, Ralph wrote:

> David. I am on an evening shift and am just now reading this thread.  =20=

>=20
> I was almost tempted to write an explanation that would have had
> identical content with yours based simply on Level3 doing something =
and
> keeping the information close. =20
>=20
> Responsible Vendors do not try to hide what is being done unless it is
> an Op Sec issue and I have never seen Level3 act with less than
> responsibility so it had to be Op Sec.=20
>=20
> When it is that, it is best if the remainder of us sit quietly on the
> sidelines.
>=20
> Ralph Brandt
>=20
>=20
> -----Original Message-----
> From: Siegel, David [mailto:David.Siegel@Level3.com]=20
> Sent: Wednesday, February 06, 2013 12:01 PM
> To: 'Ray Wong'; nanog@nanog.org
> Subject: RE: Level3 worldwide emergency upgrade?
>=20
> Hi Ray,
>=20
> This topic reminds me of yesterday's discussion in the conference =
around
> getting some BCOP's drafted.  it would be useful to confirm my own =
view
> of the BCOP around communicating security issues.  My understanding =
for
> the best practice is to limit knowledge distribution of security =
related
> problems both before and after the patches are deployed.  You limit
> knowledge before the patch is deployed to prevent yourself from being
> exploited, but you also limit knowledge afterwards in order to limit
> potential damage to others (customers, competitors...the Internet at
> large).  You also do not want to announce that you will be deploying a
> security patch until you have a fix in hand and know when you will
> deploy it (typically, next available maintenance window unless the cat
> is out of the bag and danger is real and imminent).
>=20
> As a service provider, you should stay on top of security alerts from
> your vendors so that you can make your own decision about what action =
is
> required.  I would not recommend relying on service provider =
maintenance
> bulletins or public operations mailing lists for obtaining this type =
of
> information.  There is some information that can cause more harm than
> good if it is distributed in the wrong way and information relating to
> security vulnerabilities definitely falls into that category.
>=20
> Dave
>=20
> -----Original Message-----
> From: Ray Wong [mailto:rayw@rayw.net]=20
> Sent: Wednesday, February 06, 2013 9:16 AM
> To: nanog@nanog.org
> Subject: Re: Level3 worldwide emergency upgrade?
>=20
>>=20
>=20
> OK, having had that first cup of coffee, I can say perhaps the main
> reason I was wondering is I've gotten used to Level3 always being on =
top
> of things (and admittedly, rarely communicating). They've reached the
> top by often being a black box of reliability, so it's (perhaps
> unrealistically) surprising to see them caught by surprise. Anything
> that pushes them into scramble mode causes me to lose a little sleep
> anyway. The alternative to what they did seems likely for at least a =
few
> providers who'll NOT manage to fix things in time, so I may well be
> looking at longer outages from other providers, and need to issue
> guidance to others on what to do if/when other links go down for =
periods
> long enough that all the cost-bounding monitoring alarms start to =
scream
> even louder.
>=20
> I was also grumpy at myself for having not noticed advance
> communication, which I still don't seem to have, though since I
> outsourced my email to bigG, I've noticed I'm more likely to miss
> things. Perhaps giving up maintaining that massive set of procmail =
rules
> has cost me a bit more edge.
>=20
> Related, of course, just because you design/run your network to =
tolerate
> some issues doesn't mean you can also budget to be in support contract
> as well. :) Knowing more about the exploit/fix might mean trying to =
find
> a way to get free upgrades to some kit to prevent more localized =
attacks
> to other types of gear, as well, though in this case it's all about
> Juniper PR839412 then, so vendor specific, it seems?
>=20
> There are probably more reasons to wish for more info, too. There's
> still more of them (exploiters/attackers) than there are those of us
> trying to keep things running smoothly and transparently, so anything
> that smells of "OMG new exploit found!" also triggers my desire to =
share
> information. The network bad guys share information far more quickly =
and
> effectively than we do, it often seems.
>=20
> -R>
>=20
>=20
>=20



home help back first fref pref prev next nref lref last post