[173930] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Shaw routing issue 12 Aug 2014

daemon@ATHENA.MIT.EDU (Pete Lumbis)
Wed Aug 13 21:07:16 2014

X-Original-To: nanog@nanog.org
In-Reply-To: <m2a978t57u.fsf@localhost.localdomain>
Date: Wed, 13 Aug 2014 21:07:08 -0400
From: Pete Lumbis <alumbis@gmail.com>
To: Geoffrey Keating <geoffk@geoffk.org>
Cc: Leah Ungstad <leah.ungstad@gmail.com>, "nanog@nanog.org" <nanog@nanog.org>
Errors-To: nanog-bounces@nanog.org

Yep. Most of the time I've seen this it's two data centers, both go TCAM
exception. You reboot DC1, when it comes back up you reboot DC2. This means
no iBGP learned routes so DC1 is fine. DC 2 is fine, until the iBGP peer
comes back and then start all over again.


On Wed, Aug 13, 2014 at 6:06 PM, Geoffrey Keating <geoffk@geoffk.org> wrote:

> Pete Lumbis <alumbis@gmail.com> writes:
>
> > Maybe related to the 512k route issue?
> > http://www.bgpmon.net/what-caused-todays-internet-hiccup/
> >
> > I've seen people reboot to recover from TCAM exception without adjusting
> > TCAM size only to run into the issue all over again. It's a fun way to
> > watch the problems roll around the network.
>
> In this case, it would probably have "helped" in the same way as
> rebooting or waving a rubber chicken or whatever sometimes "helps": the
> route issue was caused initially by a problem at Verizon that
> caused them to deaggregate, which they fixed, so by the time someone had
> identified the problem, paged someone, gotten them to the data center,
> had a teleconference, rebooted the device, waited for it to come back
> up...  Verizon would have fixed it, so when it came back up it'd be
> back under 512k again.
>

home help back first fref pref prev next nref lref last post