[138700] in North American Network Operators' Group
Re: Cisco IOS MPLS VPN Bug
daemon@ATHENA.MIT.EDU (Jason Lixfeld)
Sat Mar 12 11:35:36 2011
From: Jason Lixfeld <jason@lixfeld.ca>
In-Reply-To: <AANLkTikftk=TXc8vdTx9WqCToSQ7i=8DS1CF9BFe-WO3@mail.gmail.com>
Date: Sat, 12 Mar 2011 11:34:52 -0500
To: nanog@nanog.org
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
On 2011-03-12, at 2:31 AM, Joe Renwick wrote:
> These routers
> are configured as BGP route-reflectors.
...
> Niether
> soft nor hard clears on the BGP neighbors worked, only the config =
removal.
> Once re-applied life was good.
...
> The bug itself was with the BGP updates sent by the RR. During the =
outage
> these updates did not include the Route Target Extended Community =
required
> by the route-reflector clients which identifies which VRF the route =
belongs
> too.
...
> Notice the mysterious disappearance of the RT community.
...
> Looking to see if anyone has seen this issue particularly with this =
version
> of code. TAC is trying to tell me that this was a bug in a previous =
version
> but is fixed in the code I am running.
Interesting. I recently closed off a TAC case on a similar issue, but =
not an identical issue. In my case, it was 12.2(52)EY on an ME3600 and =
in my particular topology, an ME3600 wasn't announcing a plain ol' BGP =
community to one of it's two RRs. The extended communities were fine =
tho. Also, the announcements were being stuffed into two different =
update groups; the ME that was sending the 'good' announcement was =
announcing updates to update-group 1 and 2 and the ME that was =
announcing the 'bad' announcement was announcing updates to update-group =
1 only.
We didn't spend as much time as you clearly have troubleshooting the =
issue because we caught it before it was customer affecting. That said, =
at the time, I noticed the same thing; hard clearing the sessions didn't =
fix it. I didn't try to unconfigure the neighbour though; in my case, I =
was running EY on this switch and because the ME3600s are so new and EY1 =
was available and I knew that I'd have to reboot anyway to clear the =
issue, I decided to upgrade to EY1 and that seemed to clear up the =
problem.
I haven't seen this resurface since. EY1 was available as soon as we =
started receiving our ME3600s, so as a policy we upgraded every one =
before it went into the field, except I had missed this one in =
particular.
There were no open bugs pointing to my issue that the TAC engineer could =
find, but if you could pass me the case number, I'd like to give it to =
my engineer so he can see if your issue is somehow related to mine, just =
manifested in a slightly different way.=