[129216] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Did your BGP crash today?

daemon@ATHENA.MIT.EDU (Thomas Mangin)
Sun Aug 29 16:12:48 2010

From: Thomas Mangin <thomas.mangin@exa-networks.co.uk>
In-Reply-To: <AANLkTinn-jcEDEu=sh7mHQCaK4205OA1XthGyymeqZuP@mail.gmail.com>
Date: Sun, 29 Aug 2010 22:12:35 +0200
To: Paul Ferguson <fergdawgster@gmail.com>
Cc: nanog@nanog.org
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

> It would seem to me that there should actually be a better option, =
e.g.
> recognizing the malformed update, and simply discarding it (and =
sending the
> originator an error message) instead of resetting the session.
>=20
> Resetting of BGP sessions should only be done in the most dire of
> circumstances, to avoid a widespread instability incident.


I had the same thought before giving up on it.=20

Negotiating a new error message could be a per peer option. BGP has =
capabilities for this exact reason.

However to make sense you would need to find a resynchronisation point =
to only exclude the one faulty message. Initially I thought that the =
last received KEEPALIVE (for the receiver of the error message) could do =
- but you find yourselves with races conditions - so perhaps two =
KEEPALIVE back ?
Each TCP packet can contain multiple message, so the messages would have =
to be then split and ACK individually to find the faulty one and then =
ACK individually. EOR could be used for that purpose.

Still it adds lots of complexity in the conversation - are we not going =
to introduce bug in that not much used and tested code path as well ?
Unless you have a new "ACK" capability for each message - another idea =
but  those are clearly a discussions for outside NANOG.

Thomas





home help back first fref pref prev next nref lref last post