[160534] in North American Network Operators' Group
RE: Level3 worldwide emergency upgrade?
daemon@ATHENA.MIT.EDU (Siegel, David)
Thu Feb 7 17:03:00 2013
From: "Siegel, David" <David.Siegel@Level3.com>
To: Brett Watson <brett@the-watsons.org>, "nanog@nanog.org" <nanog@nanog.org>
Date: Thu, 7 Feb 2013 21:19:22 +0000
In-Reply-To: <1BE2AC63-06D7-4FFA-B95A-7678E1C46A51@the-watsons.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
I remember being glued to my workstation for 10 straight hours due to an OS=
PF bug that took down the whole of net99's network.
I was pretty proud of our size at the time...about 30Mbps at peak. Times a=
re different and so are expectations. :-)
Dave
-----Original Message-----
From: Brett Watson [mailto:brett@the-watsons.org]=20
Sent: Wednesday, February 06, 2013 6:07 PM
To: nanog@nanog.org
Subject: Re: Level3 worldwide emergency upgrade?
Hell, we used to not have to bother notifying customers of anything, we jus=
t fixed the problem. Reminds me a of a story I've probably shared on the pa=
st.=20
1995, IETF in Dallas. The "big ISP" I worked for at the time got tripped up=
on a 24-day IS-IS timer bug (maybe all of them at the time did, I don't re=
call) where all adjacencies reset at once. That's like, entire network dow=
n. Working with our engineering team in the *terminal* lab mind you, and Ra=
vi Chandra (then at Cisco) we reloaded the entire network of routers with n=
ew code from Cisco once they'd fixed the bug. I seem to remember this being=
my first exposure to Tony Li's infamous line, "... Confidence Level: boots=
in the lab."
Good times.
-b
On Feb 6, 2013, at 5:41 PM, Brandt, Ralph wrote:
> David. I am on an evening shift and am just now reading this thread. =20
>=20
> I was almost tempted to write an explanation that would have had=20
> identical content with yours based simply on Level3 doing something=20
> and keeping the information close.
>=20
> Responsible Vendors do not try to hide what is being done unless it is=20
> an Op Sec issue and I have never seen Level3 act with less than=20
> responsibility so it had to be Op Sec.
>=20
> When it is that, it is best if the remainder of us sit quietly on the=20
> sidelines.
>=20
> Ralph Brandt
>=20
>=20
> -----Original Message-----
> From: Siegel, David [mailto:David.Siegel@Level3.com]
> Sent: Wednesday, February 06, 2013 12:01 PM
> To: 'Ray Wong'; nanog@nanog.org
> Subject: RE: Level3 worldwide emergency upgrade?
>=20
> Hi Ray,
>=20
> This topic reminds me of yesterday's discussion in the conference=20
> around getting some BCOP's drafted. it would be useful to confirm my=20
> own view of the BCOP around communicating security issues. My=20
> understanding for the best practice is to limit knowledge distribution=20
> of security related problems both before and after the patches are=20
> deployed. You limit knowledge before the patch is deployed to prevent=20
> yourself from being exploited, but you also limit knowledge afterwards=20
> in order to limit potential damage to others (customers,=20
> competitors...the Internet at large). You also do not want to=20
> announce that you will be deploying a security patch until you have a=20
> fix in hand and know when you will deploy it (typically, next=20
> available maintenance window unless the cat is out of the bag and danger =
is real and imminent).
>=20
> As a service provider, you should stay on top of security alerts from=20
> your vendors so that you can make your own decision about what action=20
> is required. I would not recommend relying on service provider=20
> maintenance bulletins or public operations mailing lists for obtaining=20
> this type of information. There is some information that can cause=20
> more harm than good if it is distributed in the wrong way and=20
> information relating to security vulnerabilities definitely falls into th=
at category.
>=20
> Dave
>=20
> -----Original Message-----
> From: Ray Wong [mailto:rayw@rayw.net]
> Sent: Wednesday, February 06, 2013 9:16 AM
> To: nanog@nanog.org
> Subject: Re: Level3 worldwide emergency upgrade?
>=20
>>=20
>=20
> OK, having had that first cup of coffee, I can say perhaps the main=20
> reason I was wondering is I've gotten used to Level3 always being on=20
> top of things (and admittedly, rarely communicating). They've reached=20
> the top by often being a black box of reliability, so it's (perhaps
> unrealistically) surprising to see them caught by surprise. Anything=20
> that pushes them into scramble mode causes me to lose a little sleep=20
> anyway. The alternative to what they did seems likely for at least a=20
> few providers who'll NOT manage to fix things in time, so I may well=20
> be looking at longer outages from other providers, and need to issue=20
> guidance to others on what to do if/when other links go down for=20
> periods long enough that all the cost-bounding monitoring alarms start=20
> to scream even louder.
>=20
> I was also grumpy at myself for having not noticed advance=20
> communication, which I still don't seem to have, though since I=20
> outsourced my email to bigG, I've noticed I'm more likely to miss=20
> things. Perhaps giving up maintaining that massive set of procmail=20
> rules has cost me a bit more edge.
>=20
> Related, of course, just because you design/run your network to=20
> tolerate some issues doesn't mean you can also budget to be in support=20
> contract as well. :) Knowing more about the exploit/fix might mean=20
> trying to find a way to get free upgrades to some kit to prevent more=20
> localized attacks to other types of gear, as well, though in this case=20
> it's all about Juniper PR839412 then, so vendor specific, it seems?
>=20
> There are probably more reasons to wish for more info, too. There's=20
> still more of them (exploiters/attackers) than there are those of us=20
> trying to keep things running smoothly and transparently, so anything=20
> that smells of "OMG new exploit found!" also triggers my desire to=20
> share information. The network bad guys share information far more=20
> quickly and effectively than we do, it often seems.
>=20
> -R>
>=20
>=20
>=20