[160536] in North American Network Operators' Group


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
Re: Level3 worldwide emergency upgrade?

daemon@ATHENA.MIT.EDU (Dorian Kim)
Thu Feb 7 17:12:28 2013

From: Dorian Kim <dorian@blackrose.org>
In-Reply-To: <970945E55BFD8C4EA4CAD74B647A9DC0B0740E@USIDCWVEMBX10.corp.global.level3.com>
Date: Thu, 7 Feb 2013 17:12:14 -0500
To: "Siegel, David" <David.Siegel@Level3.com>
Cc: "nanog@nanog.org" <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

No one had hit the ISIS bug before the IETF enforced maintenance freeze =
because no one in their right mind would be running three week old code =
back then. I don't think things have changed that much. ;)

-dorian

On Feb 7, 2013, at 4:19 PM, Siegel, David wrote:

> I remember being glued to my workstation for 10 straight hours due to =
an OSPF bug that took down the whole of net99's network.
>=20
> I was pretty proud of our size at the time...about 30Mbps at peak.  =
Times are different and so are expectations.  :-)
>=20
> Dave
>=20
>=20
> -----Original Message-----
> From: Brett Watson [mailto:brett@the-watsons.org]=20
> Sent: Wednesday, February 06, 2013 6:07 PM
> To: nanog@nanog.org
> Subject: Re: Level3 worldwide emergency upgrade?
>=20
> Hell, we used to not have to bother notifying customers of anything, =
we just fixed the problem. Reminds me a of a story I've probably shared =
on the past.=20
>=20
> 1995, IETF in Dallas. The "big ISP" I worked for at the time got =
tripped up on a 24-day IS-IS timer bug (maybe all of them at the time =
did, I don't recall)  where all adjacencies reset at once. That's like, =
entire network down. Working with our engineering team in the *terminal* =
lab mind you, and Ravi Chandra (then at Cisco) we reloaded the entire =
network of routers with new code from Cisco once they'd fixed the bug. I =
seem to remember this being my first exposure to Tony Li's infamous =
line, "... Confidence Level: boots in the lab."
>=20
> Good times.
>=20
> -b
>=20
>=20
> On Feb 6, 2013, at 5:41 PM, Brandt, Ralph wrote:
>=20
>> David. I am on an evening shift and am just now reading this thread.  =
=20
>>=20
>> I was almost tempted to write an explanation that would have had=20
>> identical content with yours based simply on Level3 doing something=20=

>> and keeping the information close.
>>=20
>> Responsible Vendors do not try to hide what is being done unless it =
is=20
>> an Op Sec issue and I have never seen Level3 act with less than=20
>> responsibility so it had to be Op Sec.
>>=20
>> When it is that, it is best if the remainder of us sit quietly on the=20=

>> sidelines.
>>=20
>> Ralph Brandt
>>=20
>>=20
>> -----Original Message-----
>> From: Siegel, David [mailto:David.Siegel@Level3.com]
>> Sent: Wednesday, February 06, 2013 12:01 PM
>> To: 'Ray Wong'; nanog@nanog.org
>> Subject: RE: Level3 worldwide emergency upgrade?
>>=20
>> Hi Ray,
>>=20
>> This topic reminds me of yesterday's discussion in the conference=20
>> around getting some BCOP's drafted.  it would be useful to confirm my=20=

>> own view of the BCOP around communicating security issues.  My=20
>> understanding for the best practice is to limit knowledge =
distribution=20
>> of security related problems both before and after the patches are=20
>> deployed.  You limit knowledge before the patch is deployed to =
prevent=20
>> yourself from being exploited, but you also limit knowledge =
afterwards=20
>> in order to limit potential damage to others (customers,=20
>> competitors...the Internet at large).  You also do not want to=20
>> announce that you will be deploying a security patch until you have a=20=

>> fix in hand and know when you will deploy it (typically, next=20
>> available maintenance window unless the cat is out of the bag and =
danger is real and imminent).
>>=20
>> As a service provider, you should stay on top of security alerts from=20=

>> your vendors so that you can make your own decision about what action=20=

>> is required.  I would not recommend relying on service provider=20
>> maintenance bulletins or public operations mailing lists for =
obtaining=20
>> this type of information.  There is some information that can cause=20=

>> more harm than good if it is distributed in the wrong way and=20
>> information relating to security vulnerabilities definitely falls =
into that category.
>>=20
>> Dave
>>=20
>> -----Original Message-----
>> From: Ray Wong [mailto:rayw@rayw.net]
>> Sent: Wednesday, February 06, 2013 9:16 AM
>> To: nanog@nanog.org
>> Subject: Re: Level3 worldwide emergency upgrade?
>>=20
>>>=20
>>=20
>> OK, having had that first cup of coffee, I can say perhaps the main=20=

>> reason I was wondering is I've gotten used to Level3 always being on=20=

>> top of things (and admittedly, rarely communicating). They've reached=20=

>> the top by often being a black box of reliability, so it's (perhaps
>> unrealistically) surprising to see them caught by surprise. Anything=20=

>> that pushes them into scramble mode causes me to lose a little sleep=20=

>> anyway. The alternative to what they did seems likely for at least a=20=

>> few providers who'll NOT manage to fix things in time, so I may well=20=

>> be looking at longer outages from other providers, and need to issue=20=

>> guidance to others on what to do if/when other links go down for=20
>> periods long enough that all the cost-bounding monitoring alarms =
start=20
>> to scream even louder.
>>=20
>> I was also grumpy at myself for having not noticed advance=20
>> communication, which I still don't seem to have, though since I=20
>> outsourced my email to bigG, I've noticed I'm more likely to miss=20
>> things. Perhaps giving up maintaining that massive set of procmail=20
>> rules has cost me a bit more edge.
>>=20
>> Related, of course, just because you design/run your network to=20
>> tolerate some issues doesn't mean you can also budget to be in =
support=20
>> contract as well. :) Knowing more about the exploit/fix might mean=20
>> trying to find a way to get free upgrades to some kit to prevent more=20=

>> localized attacks to other types of gear, as well, though in this =
case=20
>> it's all about Juniper PR839412 then, so vendor specific, it seems?
>>=20
>> There are probably more reasons to wish for more info, too. There's=20=

>> still more of them (exploiters/attackers) than there are those of us=20=

>> trying to keep things running smoothly and transparently, so anything=20=

>> that smells of "OMG new exploit found!" also triggers my desire to=20
>> share information. The network bad guys share information far more=20
>> quickly and effectively than we do, it often seems.
>>=20
>> -R>
>>=20
>>=20
>>=20
>=20
>=20
>=20

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[160536] in North American Network Operators' Group

Re: Level3 worldwide emergency upgrade?

daemon@ATHENA.MIT.EDU (Dorian Kim)Thu Feb 7 17:12:28 2013

daemon@ATHENA.MIT.EDU (Dorian Kim)
Thu Feb 7 17:12:28 2013