[31922] in North American Network Operators' Group
RE: IS-IS protocol implementation problem
daemon@ATHENA.MIT.EDU (rdobbins@netmore.net)
Mon Oct 30 09:58:12 2000
Message-ID: <7BDBFDCDD02AD311AB2700104BC4F3F7B6665B@atshost001>
From: rdobbins@netmore.net
To: nanog@merit.edu
Cc: sean@donelan.com, neil@colt.net
Date: Mon, 30 Oct 2000 06:56:07 -0800
MIME-Version: 1.0
Content-Type: text/plain;
charset="iso-8859-1"
Errors-To: owner-nanog-outgoing@merit.edu
I had a bizarre event occur on Thursday night/Friday morning, and this is
likely the culprit.
I peer with AS701. At approximately 11:15PM PDT or thereabouts (2:15AM
EDT), the 7507 which provides my connectivity to uu.net went belly-up in a
very strange way. The BGP session with 701 showed active, with full tables;
existing TCP connections stayed up. However, it appeared that all new
connections inbound from 701 were being dropped on the floor, and my
outbound traffic with them dropped from 40mb/sec down to about 5kb/sec. The
same router was also handling a secondary connection to pbi.net; because the
BGP stayed active and in a supposedly functional state, traffic didn't get
routed in that direction as it should've been.
I had to reload the router to get it to function properly. Very odd.
Nothing in the logs, etc. The router just essentially went on strike, and
I've no idea why.
I don't run IS-IS, needless to say, especially with a foreign AS. This
particular 7507 was running an 11.3.x CC-train IOS, and hadn't had any of
the ISO/CLNS family of protocols enabled, ever.
This is a bit earlier than the timeframe Sean cited, but I don't think it
was a coincidence, either.
-----------------------------------------------------------
Roland Dobbins <rdobbins@netmore.net> // 818.535.5024 voice
-----Original Message-----
From: Neil J. McRae [mailto:neil@COLT.NET]
Sent: Monday, October 30, 2000 1:28 AM
To: sean@donelan.com
Cc: smd@clock.org; nanog@merit.edu
Subject: Re: IS-IS protocol implementation problem
> At approximately 7:37am EDT on Friday, about 258 Cisco 12000's on UUNET's
> primary backbone reloaded. This appeared to be isolated to routers
> in ASN 701. It disrupted reachability to about 15% of the world-wide
Internet
> based on data from Matrix measurements. A contributing cause was a bad
> IS-IS packet which confused certain IOS versions in the 12.0 IOS software
> train. I haven't heard what the root cause was or what originated the
> bad IS-IS packet. The Cisco bug id is CSCdr05779. Any provider running the
> affected IOS version may be vulnerable depending on what the root cause
> turns out to be.
>
> Although the bad IS-IS packet didn't propagate to other providers, several
> other providers did report BGP resets and route flaps about the same time.
If a large AS such as AS701 starts flapping I wouldn't be surprised
if other ASes start seeing BGP resets and route-flaps. Could be
that crud routing information was exchange when that chaos started
[jeez 258 routers I'd hate to have been the on duty NOC guy on that
morning :-)]
Interestingly though we still see alot routes with bad AS-PATH information
people should be setting more stringent configurations on the routes
the learn and subsequentally pass on to avoid this.
Regards,
Neil.