[193338] in North American Network Operators' Group
RE: Soliciting your opinions on Internet routing: A survey on BGP
daemon@ATHENA.MIT.EDU (Jakob Heitz (jheitz))
Wed Jan 11 14:13:24 2017
X-Original-To: nanog@nanog.org
From: "Jakob Heitz (jheitz)" <jheitz@cisco.com>
To: "nanog@nanog.org" <nanog@nanog.org>, "baldur.norddahl@gmail.com"
<baldur.norddahl@gmail.com>
Date: Wed, 11 Jan 2017 19:13:14 +0000
Errors-To: nanog-bounces@nanog.org
When you simply bring down an ebgp session, withdraws will propagate throug=
hout the network.
Soon after, the alternate routes will propagate. In the interim, some route=
rs will lose connectivity.
This problem is solved by graceful shutdown.
This only works for planned shutdown
This interim time can be many minutes because of the advertisement-interval=
(MRAI timer).
A possible solution to reduce this interim to seconds instead of minutes is=
to set the MRAI timer to 0 on all routers. A potential problem with that i=
s that any BGP instability in the network will cause some serious flapping.
Another alternative is to use BGP add-path (rfc7911) to distribute backup r=
outes.
This will avoid the MRAI problem, but requires more memory on routers.
This also works for accidental shutdown.
Thanks,
Jakob.
> -----Original Message-----
> From: Jakob Heitz (jheitz)
> Sent: Tuesday, January 10, 2017 11:52 AM
> To: nanog@nanog.org; 'baldur.norddahl@gmail.com' <baldur.norddahl@gmail.c=
om>
> Subject: RE: Soliciting your opinions on Internet routing: A survey on BG=
P convergence
>=20
> Hi Baldur,
>=20
> Have you tried graceful shutdown?
> You need redundant links, but not to the same transit.
> https://tools.ietf.org/html/draft-ietf-grow-bgp-gshut-06
> This draft is expired, but it is actually implemented by several vendors.
>=20
> I implemented this.
> http://www.slideshare.net/bduvivie/bgp-graceful-shutdown-ios-xr
> I added an option to configure AS-path prepends in case the gshut communi=
ty was not supported by peers.
>=20
> Thanks,
> Jakob.
>=20
>=20
> > Date: Tue, 10 Jan 2017 03:51:04 +0100
> > From: Baldur Norddahl <baldur.norddahl@gmail.com>
> >
> > Hello
> >
> > I find that the type of outage that affects our network the most is
> > neither of the two options you describe. As is probably typical for
> > smaller networks, we do not have redundant uplinks to all of our
> > transits. If a transit link goes, for example because we had to reboot =
a
> > router, traffic is supposed to reroute to the remaining transit links.
> > Internally our network handles this fairly fast for egress traffic.
> >
> > However the problem is the ingress traffic - it can be 5 to 15 minutes
> > before everything has settled down. This is the time before everyone
> > else on the internet has processed that they will have to switch to you=
r
> > alternate transit.
> >
> > The only solution I know of is to have redundant links to all transits.
> > Going forward I will make sure we have this because it is a huge
> > disadvantage not being able to take a router out of service without
> > causing downtime for all users. Not to mention that a router crash or
> > link failure that should have taken seconds at most to reroute, but
> > instead causes at least 5 minutes of unstable internet.
> >
> > Regards,
> >
> > Baldur