[31922] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

RE: IS-IS protocol implementation problem

daemon@ATHENA.MIT.EDU (rdobbins@netmore.net)
Mon Oct 30 09:58:12 2000

Message-ID: <7BDBFDCDD02AD311AB2700104BC4F3F7B6665B@atshost001>
From: rdobbins@netmore.net
To: nanog@merit.edu
Cc: sean@donelan.com, neil@colt.net
Date: Mon, 30 Oct 2000 06:56:07 -0800
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Errors-To: owner-nanog-outgoing@merit.edu



I had a bizarre event occur on Thursday night/Friday morning, and this is
likely the culprit.

I peer with AS701.  At approximately 11:15PM PDT or thereabouts (2:15AM
EDT), the 7507 which provides my connectivity to uu.net went belly-up in a
very strange way.  The BGP session with 701 showed active, with full tables;
existing TCP connections stayed up.  However, it appeared that all new
connections inbound from 701 were being dropped on the floor, and my
outbound traffic with them dropped from 40mb/sec down to about 5kb/sec.  The
same router was also handling a secondary connection to pbi.net; because the
BGP stayed active and in a supposedly functional state, traffic didn't get
routed in that direction as it should've been.

I had to reload the router to get it to function properly.  Very odd.
Nothing in the logs, etc.   The router just essentially went on strike, and
I've no idea why.

I don't run IS-IS, needless to say, especially with a foreign AS.  This
particular 7507 was running an 11.3.x CC-train IOS, and hadn't had any of
the ISO/CLNS family of protocols enabled, ever.

This is a bit earlier than the timeframe Sean cited, but I don't think it
was a coincidence, either.

-----------------------------------------------------------
Roland Dobbins <rdobbins@netmore.net> // 818.535.5024 voice 

-----Original Message-----
From: Neil J. McRae [mailto:neil@COLT.NET]
Sent: Monday, October 30, 2000 1:28 AM
To: sean@donelan.com
Cc: smd@clock.org; nanog@merit.edu
Subject: Re: IS-IS protocol implementation problem



> At approximately 7:37am EDT on Friday, about 258 Cisco 12000's on UUNET's
> primary backbone reloaded. This appeared to be isolated to routers
> in ASN 701. It disrupted reachability to about 15% of the world-wide
Internet
> based on data from Matrix measurements.  A contributing cause was a bad
> IS-IS packet which confused certain IOS versions in the 12.0 IOS software
> train. I haven't heard what the root cause was or what originated the
> bad IS-IS packet. The Cisco bug id is CSCdr05779. Any provider running the
> affected IOS version may be vulnerable depending on what the root cause
> turns out to be.
> 
> Although the bad IS-IS packet didn't propagate to other providers, several
> other providers did report BGP resets and route flaps about the same time.

If a large AS such as AS701 starts flapping I wouldn't be surprised
if other ASes start seeing BGP resets and route-flaps. Could be
that crud routing information was exchange when that chaos started
[jeez 258 routers I'd hate to have been the on duty NOC guy on that
morning :-)]

Interestingly though we still see alot routes with bad AS-PATH information
people should be setting more stringent configurations on the routes
the learn and subsequentally pass on to avoid this.

Regards,
Neil.


home help back first fref pref prev next nref lref last post