[112345] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Illegal header length in BGP error

daemon@ATHENA.MIT.EDU (Paul Cosgrove)
Tue Feb 24 12:28:17 2009

Date: Tue, 24 Feb 2009 17:26:13 +0000
From: Paul Cosgrove <paul.cosgrove@heanet.ie>
To: "Mills, Charles" <cmills@accessdc.com>
In-Reply-To: <58E0B21FC367B24485855A1DBD96B0BB11C056DD@adc-prd-exch1.internal.accessdc.com>
Cc: nanog@nanog.org
Errors-To: nanog-bounces@nanog.org

Are you using PMTUD?

We saw this on a couple of our route reflectors and on one occasion 
picked it up in a capture.   So I can say that the issue is due to bad 
packets being sent, rather than an inaccurate error.  It can be reported 
differently according to where the corruption occurs (e.g. unsupported 
message type, update malformed etc.). 

Two production BGP sessions were affected at different times, and one 
showed errors every few days, the other weeks apart.  Both sessions were 
from route reflectors to other routers receiving full tables, and both 
traversed multiple hops. All other sessions of these routers were fine.  
Whilst investigating we identified that different MTUs were being used 
on the device interfaces at each end of the sessions.  The session on 
which we saw most errors also had lower MTUs on intervening links, so 
PMTUD was suspected to be a factor. 

I replaced one of the paths with a direct link, using identical MTUs, 
and that stopped the errors on that session (since PMTUD had nothing to 
do anymore).  Just to be sure we recreated a multiple hop topology from 
our production route reflectors to isolated lab routers, with low 
intervening link MTUs and ACLs to keep out other unwanted traffic -  
which also produced the same error on those sessions (but only once each 
over three months). 

After correcting all the MTUs in the production network the errors 
ceased completely.  Our test routers shared these links, but also used 
an additional link with a low mtu which we deliberately did not fix; as 
it turned out we not see it again there either so the trigger was not 
entirely clear.

One other thing to note is that, at the time, we were seeing some other 
problems with these production routers, whichcisco believed may have 
been due to SNMP polling of BGP stats.  If you have been changing that 
recently I would also consider it a possibility.

Paul.



Mills, Charles wrote:
> I ran into exactly the same thing during a code upgrade a few weeks ago.
>
> I wrote it off as a bug in BGP and backed off the code until a new release was out.  I was also running 12.4(22)T
> On an NPE-G2.
>
> Chuck
>
> -----Original Message-----
> From: Renaud RAKOTOMALALA [mailto:renaud@rakotomalala.com] 
> Sent: Tuesday, February 24, 2009 10:49 AM
> To: Matthew Huff; 'nanog@nanog.org'
> Subject: Re: Illegal header length in BGP error
>
> Hello Matthew,
>
> We changed the motherboard from cisco one of our from 7206VXR (NPE-G1) 
> to 7206VXR (NPE-G2).
>
> Due to incompability with the IOS 12.3(4r)T3 we upgraded this IOS to 
> 12.4(12.2r)T. At the end we've got the same problem as you between one 
> of our 7200 in 12.3 and the new one in 12.4 ....
>
> We solved the problem by upgrading the cisco withe the IOS from 
> 12.4(12.2r) to 12.4(4)XD10 and the BGP session came back alive ....
>
> So now everything work fine between our 7200 (IOS 12.3) and the other 
> 7200 in IOS 12.4(4)XD10
>
> I hope it could help you ...
>
> Cheers,
> Renaud
>
>
> Matthew Huff a écrit :
>   
>> One of our upstream providers flapped this morning, and since then they are
>> sending corrupted BPG data. I'm running 12.4(22)T on cisco 7200s. I'm
>> getting no BGP errors from that providers and the number of routes and basic
>> sanity check looks okay. However, when it tries to redistribute the bgp
>> routes via iBGP to our other board routers, we get:
>>
>> 003372: Feb 24 09:17:13.963 EST: %BGP-5-ADJCHANGE: neighbor x.x.x.x Down BGP
>> Notification sent
>> 003373: Feb 24 09:17:13.963 EST: %BGP-3-NOTIFICATION: sent to neighbor
>> x.x.x.x 1/2 (illegal header length) 2 bytes     
>>
>>
>> All routes have identical hardware and IOS versions. My google and cisco
>> search fu leads me to the AS path length bug, but the interesting thing is
>> that since we have "bgp maxas-limit 75" configured and a recent IOS, we
>> haven't had the problem before when other people were reporting issues. I've
>> also looked at the path mtu issue, and although we haven't had a problem
>> before I disabled bgp mtu path discovery, but have the same issues.
>>
>> Anyone seeing something like this today, and or does anyone have a
>> suggestion on finding out more specific info (which as path for example so I
>> can filter it)?
>>   
>>     
>
>
>
> This e-mail message and any files transmitted with it contain confidential information intended only for the person(s) to whom this email message is addressed. If you have received this e-mail message in error, please notify the sender immediately by telephone or e-mail and destroy the original message without making a copy.  Thank you.
> Neither this information block, the typed name of the sender, nor anything else in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.
>
>
>
>
>   



home help back first fref pref prev next nref lref last post