[124272] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: BGP Update Report

daemon@ATHENA.MIT.EDU (Danny McPherson)
Sun Mar 28 15:52:46 2010

From: Danny McPherson <danny@tcb.net>
In-Reply-To: <226B7CBF-ED2E-40A1-A3BC-DD9C43C71C0D@gmail.com>
Date: Sun, 28 Mar 2010 13:51:46 -0600
To: "nanog@nanog.org list" <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org


On Mar 28, 2010, at 12:00 PM, Anton Kapela wrote:

> I guess what I'm hinting at is precisely something finer-grained (path =
not prefix), as you suggest. Per-neighbor enabled, versus "entire bgp =
RIB" would be preferred. I'm also interested in the *chronic* nature of =
these apparent instabilities. An average of one flap per minute could =
imply that the end-site is not getting allot of useful TCP moved, and as =
such, after something on the (n)-hour timescale, perhaps it's worth =
suppressing it.
>=20
> So, I'd ask for a long-timescale dampening function, indexed against =
per-path, and enforced per neighbor. Perhaps as-path lists could be =
combined with relaxed timers on existing implementations to achieve this =
today (in a VRF target/context).=20

It's not just AS_PATH, a lot of the reason so many duplicate updates =
occur=20
(nearly 50% of all updates at times, and often more during the busiest =
times)=20
is because on the other end implementations don't keep egress =
advertisement state=20
per attribute (e.g., if cluster_list length just triggered an internal =
transition
then a new update is sent to external peers with no new information =
because the
determining internal attributes are stripped before transmitting the new =
update),=20
yet those *prefixes* might well be suppressed as a result of the =
implementation
and/or network architecture on the other end of the BGP connection. =20

Then you couple what Joe was pointing out, where intermediate nodes with=20=

consistently unstable links or "paths" result in penalizing an entire =
prefix,=20
not just the unstable paths, and it makes for more brokenness than =
benefit
when route flap damping is employed.

It's not that people haven't studied and understand why this occurs, the=20=

issue is that implementation optimizations seem to always win out today =
over
systemic state effects (i.e., that "be conservative in what you send" =
thing
doesn't seem to apply in practice, unfortunately).

-danny=


home help back first fref pref prev next nref lref last post