[30564] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: flap flap: AS 10916

daemon@ATHENA.MIT.EDU (John Todd)
Fri Aug 11 17:03:17 2000

Mime-Version: 1.0
Message-Id: <v0421012fb5ba13452332@[172.16.2.77]>
In-Reply-To: <4.3.2.7.2.20000811150249.00cae3b0@mail.conti.nu>
Date: Fri, 11 Aug 2000 16:59:21 -0400
To: Kai Schlichting <kai@pac-rim.net>
From: John Todd <jtodd@loligo.com>
Cc: nanog@merit.edu
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
Errors-To: owner-nanog-outgoing@merit.edu


>A heavily flapping AS struck my curiosity: AS 10916.
>
>Somehow, AS 6138 constantly appears and disappears out of the
>path leading through AS 1239 - the other path to them via AS 701
>is barely flapping at all.
>
>#  sh ip bgp regexp 10916
>[...]
>   Network          Next Hop         Metric LocPrf Weight Path
>*d 63.87.96.0/24    censored             95    110      0 13789 1239 10916 i
>*>                  censored             98    100      0 701 10916 i
>[...]
>
>And every few minutes:
>
>*d 63.87.96.0/24    censored             95    110      0 13789 1239 
>6138 10916 i
>*>                  censored             98    100      0 701 10916 i
>[...]
>
>What could be the cause for an AS appearing/disappearing in a path
>every few minutes? Is it really AS 6138 that is flapping for 10916?
>For some reason they prefer the indirect route through 6138 to 1239
>(SprintLink), instead of their direct connection to 1239. These are
>the times when such a peer should be shut down for the sanity of the
>rest of the network.
>
>bye,Kai
>
>

Easily identified (but certainly not complete catalog of) reasons for 
such a flap that come to mind knowing nothing else about them other 
than what you describe above:

   (a) Router with insufficient memory for full BGP table from that 
view perspective (it fills up to memory capacity, collapses, BGP is 
reset, routes flap, wash, rinse, repeat)

   (b) Link that both BGP and traffic pass through is insufficient for 
continued keepalives once traffic moves in that direction (line 
becomes preferred by a large amount of traffic, traffic floods line, 
BGP keepalives fail, BGP session fails, traffic moves away, wash, 
rinse, repeat - see RED discussion archives some months ago for more 
detailed discussions on traffic flow dampening with similar patterns.)

Quite a few people have problem type (a) happen more often that you 
might think - I've run across it several times in the dim past, 
either as a memory problem or with BGP implementations that choked on 
certain corrupted/unusual advertisements halfway through the table 
transfer.  If it's a memory problem, it's often an ACL issue that is 
related to someone removing the "sanity" ACL that otherwise would 
protect a smaller router from the falures that would occur with a 
full table update. ("Gee, this ACL seems to be preventing a full 
table from being sent to the customer.  I'm sure the customer really 
wanted a full table - I'll remove the list.")

This all being said, I'm willing to bet that neither (a) nor (b) is 
at the root of the problem here, but they're both possible.  Your 
question centered more on the path than on the cause, so I'll take a 
swing at it.

Since you're looking at the insertion of a route into a path, a 
possible situation might be that AS10916 peers with AS1239, and also 
peers with AS6138 who is also a transit user for AS1239.  Sprint 
(AS1239) would prefer and re-advertise the route from their most 
direct customer when they could hear it from their direct customer 
(AS10916), and that would override the announcement coming from 
AS6138, which would be less-preferred.  When (link 1) goes away, then 
Sprint would prefer and re-announce the route being heard from 10916 
via (link 2).

701
    \
     \____________
                 |
                 |   link 2
        1239---10916--------6138
       /  \________________/
      /        link 1
13789


JT



home help back first fref pref prev next nref lref last post