[129216] in North American Network Operators' Group
Re: Did your BGP crash today?
daemon@ATHENA.MIT.EDU (Thomas Mangin)
Sun Aug 29 16:12:48 2010
From: Thomas Mangin <thomas.mangin@exa-networks.co.uk>
In-Reply-To: <AANLkTinn-jcEDEu=sh7mHQCaK4205OA1XthGyymeqZuP@mail.gmail.com>
Date: Sun, 29 Aug 2010 22:12:35 +0200
To: Paul Ferguson <fergdawgster@gmail.com>
Cc: nanog@nanog.org
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
> It would seem to me that there should actually be a better option, =
e.g.
> recognizing the malformed update, and simply discarding it (and =
sending the
> originator an error message) instead of resetting the session.
>=20
> Resetting of BGP sessions should only be done in the most dire of
> circumstances, to avoid a widespread instability incident.
I had the same thought before giving up on it.=20
Negotiating a new error message could be a per peer option. BGP has =
capabilities for this exact reason.
However to make sense you would need to find a resynchronisation point =
to only exclude the one faulty message. Initially I thought that the =
last received KEEPALIVE (for the receiver of the error message) could do =
- but you find yourselves with races conditions - so perhaps two =
KEEPALIVE back ?
Each TCP packet can contain multiple message, so the messages would have =
to be then split and ACK individually to find the faulty one and then =
ACK individually. EOR could be used for that purpose.
Still it adds lots of complexity in the conversation - are we not going =
to introduce bug in that not much used and tested code path as well ?
Unless you have a new "ACK" capability for each message - another idea =
but those are clearly a discussions for outside NANOG.
Thomas