[126877] in North American Network Operators' Group
Re: BGP convergence problem
daemon@ATHENA.MIT.EDU (Kevin Hodle)
Tue Jun 8 12:50:39 2010
In-Reply-To: <AANLkTikpfQ7jCXpqhOP3O8eFhMyMWvbx-O9zjnSf3BH7@mail.gmail.com>
Date: Tue, 8 Jun 2010 11:50:19 -0500
From: Kevin Hodle <kevin.hodle@gmail.com>
To: "Andy B." <globichen@gmail.com>
Cc: nanog@nanog.org
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
Hi Andy,
We have had similar problems with s720/3bxl on exchanges with large
numbers of peers. Exact same symptoms, can be triggered by any
significant UPDATE flux, even iBGP originated path-hunts. This problem
is compounded if you are taking full tables on the same device, to the
extent that the bgp scanner and bgp IO processes grind the
control-plane to halt causing ISIS/OSPF adjacencies to drop, SNMP and
SSH unresponsive, etc. Same behavior is seen regardless of IOS train.
As others have pointed out, the sad fact of the matter is that the
s720/3BXL simply does not have the CPU power to cope with hundreds of
neighbor sessions and the growing numbers of paths. Here are some
things that we tried with varied success to remedy bgp deadlock on
this platform:
* lower process-max-time to prevent bgp scanner/bgp io processes from
completely consuming the control-plane
* Take soft-reconfiguration off of neighbors/peer-groups where you
can, this will help tremendously
* Split the load of neighbor sessions between multiple devices, move
full table feeds to other devices
The 'final solution' is to simply replace this platform with a newer
more powerful alternative, and there are numerous candidates :)
Best Regards,
Kevin Hodle
On Tue, Jun 8, 2010 at 4:58 AM, Andy B. <globichen@gmail.com> wrote:
> Hi,
>
> This morning there was an ethernet loop problem on DECIX, causing many
> BGP sessions to flap throughout the entire platform.
> While this can happen, I am myself facing with BGP convergence
> problems on our DECIX router (SUP720-3BXL with IOS SXI3).
>
> De DECIX loop has been solved two hours ago, but my BGP sessions are
> still flapping and not converging at all. This has been flooding our
> logs, and is still going on:
>
> Jun =A08 11:47:03 x.x.x.131 239447: Jun =A08 11:48:38.364 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.32 Up
> Jun =A08 11:47:03 x.x.x.131 239448: Jun =A08 11:48:38.364 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.231 Up
> Jun =A08 11:47:03 x.x.x.131 239449: Jun =A08 11:48:38.364 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.109 Up
> Jun =A08 11:47:03 x.x.x.131 239450: Jun =A08 11:48:38.364 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.50 Up
> Jun =A08 11:47:03 x.x.x.131 239451: Jun =A08 11:48:38.364 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.81 Up
> Jun =A08 11:47:03 x.x.x.131 239452: Jun =A08 11:48:38.364 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.28 Up
> Jun =A08 11:47:03 x.x.x.131 239453: Jun =A08 11:48:38.364 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.193.212 Up
> Jun =A08 11:47:03 x.x.x.131 239454: Jun =A08 11:48:38.368 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.193.147 Up
> Jun =A08 11:47:03 x.x.x.131 239455: Jun =A08 11:48:38.368 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.74 Up
> Jun =A08 11:47:03 x.x.x.131 239456: Jun =A08 11:48:38.368 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.241 Up
> Jun =A08 11:47:03 x.x.x.131 239457: Jun =A08 11:48:38.368 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.5 Up
> Jun =A08 11:47:03 x.x.x.131 239458: Jun =A08 11:48:38.368 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.40 Up
> Jun =A08 11:47:03 x.x.x.131 239459: Jun =A08 11:48:38.368 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::1A44:0:1 Up
> Jun =A08 11:47:03 x.x.x.131 239460: Jun =A08 11:48:38.368 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::8605:0:1 Up
> Jun =A08 11:47:03 x.x.x.131 239461: Jun =A08 11:48:38.368 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::1A0B:0:1 Up
> Jun =A08 11:47:03 x.x.x.131 239462: Jun =A08 11:48:38.368 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::3029:0:1 Up
> Jun =A08 11:47:03 x.x.x.131 239463: Jun =A08 11:48:38.368 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::6E4:0:1 Up
> Jun =A08 11:47:03 x.x.x.131 239464: Jun =A08 11:48:38.372 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::CB0:0:1 Up
> Jun =A08 11:47:03 x.x.x.131 239465: Jun =A08 11:48:38.372 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::21C8:0:1 Up
> Jun =A08 11:47:03 x.x.x.131 239466: Jun =A08 11:48:38.372 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::8463:0:2 Up
> Jun =A08 11:47:04 x.x.x.131 239467: Jun =A08 11:48:38.372 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::31AA:0:1 Up
> Jun =A08 11:47:04 x.x.x.131 239468: Jun =A08 11:48:38.372 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.29 Up
> Jun =A08 11:47:04 x.x.x.131 239469: Jun =A08 11:48:38.372 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::62BF:0:1 Up
> Jun =A08 11:47:04 x.x.x.131 239470: Jun =A08 11:48:39.656 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.101 Down BGP Notification sent
> Jun =A08 11:47:04 x.x.x.131 239471: Jun =A08 11:48:39.656 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.192.101 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:47:07 x.x.x.131 239472: Jun =A08 11:48:41.696 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.104 Up
> Jun =A08 11:47:10 x.x.x.131 239473: Jun =A08 11:48:44.488 CEST:
> %BGP-3-BGP_NO_REMOTE_READ: 80.81.193.187 connection timed out - has
> not accepted a message from us for 20000ms (hold time), 1 messages
> pending transmition.
> Jun =A08 11:47:10 x.x.x.131 239474: Jun =A08 11:48:44.488 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.193.187 Down BGP Notification sent
> Jun =A08 11:47:10 x.x.x.131 239475: Jun =A08 11:48:44.488 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.193.187 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:47:10 x.x.x.131 239476: Jun =A08 11:48:44.900 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.61 Up
> Jun =A08 11:47:10 x.x.x.131 239477: Jun =A08 11:48:44.900 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.149 Up
> Jun =A08 11:47:10 x.x.x.131 239478: Jun =A08 11:48:44.900 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.136 Up
> Jun =A08 11:47:10 x.x.x.131 239479: Jun =A08 11:48:44.904 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::8463:0:1 Up
> Jun =A08 11:47:10 x.x.x.131 239480: Jun =A08 11:48:46.352 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::6268:0:1 Up
> Jun =A08 11:47:14 x.x.x.131 239481: Jun =A08 11:48:48.084 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.193.78 Up
> Jun =A08 11:47:14 x.x.x.131 239482: Jun =A08 11:48:49.172 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.193.239 Up
> Jun =A08 11:47:14 x.x.x.131 239483: Jun =A08 11:48:49.172 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.24 Up
> Jun =A08 11:47:17 x.x.x.131 239484: Jun =A08 11:48:52.160 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.45 Up
> Jun =A08 11:47:17 x.x.x.131 239485: Jun =A08 11:48:52.160 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.108 Up
> Jun =A08 11:47:17 x.x.x.131 239486: Jun =A08 11:48:52.160 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.164 Up
> Jun =A08 11:47:17 x.x.x.131 239487: Jun =A08 11:48:52.164 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.193.49 Up
> Jun =A08 11:47:17 x.x.x.131 239488: Jun =A08 11:48:52.164 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.139 Up
> Jun =A08 11:47:17 x.x.x.131 239489: Jun =A08 11:48:52.164 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::1536:0:1 Up
> Jun =A08 11:47:17 x.x.x.131 239490: Jun =A08 11:48:52.164 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::8601:0:1 Up
> Jun =A08 11:47:17 x.x.x.131 239491: Jun =A08 11:48:53.788 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.45 Up
> Jun =A08 11:47:17 x.x.x.131 239492: Jun =A08 11:48:53.788 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::A2DC:0:1 Up
> Jun =A08 11:47:21 x.x.x.131 239493: Jun =A08 11:48:55.056 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.91 Down BGP Notification sent
>
>
>
> Jun =A08 11:49:04 x.x.x.131 239583: Jun =A08 11:50:37.684 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.14 Down Peer closed the session
> Jun =A08 11:49:04 x.x.x.131 239584: Jun =A08 11:50:38.656 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.120 Down BGP Notification sent
> Jun =A08 11:49:04 x.x.x.131 239585: Jun =A08 11:50:38.656 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.192.120 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:04 x.x.x.131 239586: Jun =A08 11:50:38.656 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.229 Down BGP Notification sent
> Jun =A08 11:49:04 x.x.x.131 239587: Jun =A08 11:50:38.656 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.192.229 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:04 x.x.x.131 239588: Jun =A08 11:50:38.656 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.108 Down BGP Notification sent
> Jun =A08 11:49:04 x.x.x.131 239589: Jun =A08 11:50:38.656 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.192.108 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:07 x.x.x.131 239590: Jun =A08 11:50:41.944 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.73 Down BGP Notification sent
> Jun =A08 11:49:07 x.x.x.131 239591: Jun =A08 11:50:41.944 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.194.73 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:07 x.x.x.131 239592: Jun =A08 11:50:41.944 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::20AD:0:2 Down BGP Notification
> sent
> Jun =A08 11:49:07 x.x.x.131 239593: Jun =A08 11:50:41.944 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 2001:7F8::20AD:0:2 4/0 (hold
> time expired) 0 bytes
> Jun =A08 11:49:07 x.x.x.131 239594: Jun =A08 11:50:41.944 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.193.115 Down BGP Notification sent
> Jun =A08 11:49:07 x.x.x.131 239595: Jun =A08 11:50:41.944 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.193.115 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:07 x.x.x.131 239596: Jun =A08 11:50:44.124 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.3 Down BGP Notification sent
> Jun =A08 11:49:11 x.x.x.131 239597: Jun =A08 11:50:44.124 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.194.3 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:11 x.x.x.131 239598: Jun =A08 11:50:45.200 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.215 Down BGP Notification sent
> Jun =A08 11:49:11 x.x.x.131 239599: Jun =A08 11:50:45.200 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.192.215 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:11 x.x.x.131 239600: Jun =A08 11:50:47.336 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.141 Down BGP Notification sent
> Jun =A08 11:49:11 x.x.x.131 239601: Jun =A08 11:50:47.336 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.192.141 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:14 x.x.x.131 239602: Jun =A08 11:50:48.432 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::3B41:0:1 Down BGP Notification
> sent
> Jun =A08 11:49:14 x.x.x.131 239603: Jun =A08 11:50:48.432 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 2001:7F8::3B41:0:1 4/0 (hold
> time expired) 0 bytes
> Jun =A08 11:49:14 x.x.x.131 239604: Jun =A08 11:50:49.720 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.239 Down BGP Notification sent
> Jun =A08 11:49:14 x.x.x.131 239605: Jun =A08 11:50:49.720 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.192.239 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:17 x.x.x.131 239606: Jun =A08 11:50:50.976 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:2000:3080:B4::1 Down Peer closed the
> session
> Jun =A08 11:49:17 x.x.x.131 239607: Jun =A08 11:50:52.976 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.20 Down BGP Notification sent
> Jun =A08 11:49:17 x.x.x.131 239608: Jun =A08 11:50:52.976 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.194.20 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:17 x.x.x.131 239609: Jun =A08 11:50:54.044 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.193.21 Down BGP Notification sent
> Jun =A08 11:49:17 x.x.x.131 239610: Jun =A08 11:50:54.044 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.193.21 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:20 x.x.x.131 239611: Jun =A08 11:50:56.204 CEST:
> %BGP-5-ADJCHANGE: neighbor 2001:7F8::1A0B:0:1 Down BGP Notification
> sent
> Jun =A08 11:49:20 x.x.x.131 239612: Jun =A08 11:50:56.204 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 2001:7F8::1A0B:0:1 4/0 (hold
> time expired) 0 bytes
> Jun =A08 11:49:23 x.x.x.131 239613: Jun =A08 11:50:58.400 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.63 Down BGP Notification sent
> Jun =A08 11:49:23 x.x.x.131 239614: Jun =A08 11:50:58.400 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.194.63 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:23 x.x.x.131 239615: Jun =A08 11:50:59.448 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.97 Down BGP Notification sent
> Jun =A08 11:49:23 x.x.x.131 239616: Jun =A08 11:50:59.448 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.192.97 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:27 x.x.x.131 239617: Jun =A08 11:51:01.664 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.131 Down BGP Notification sent
> Jun =A08 11:49:27 x.x.x.131 239618: Jun =A08 11:51:01.664 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.192.131 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:27 x.x.x.131 239619: Jun =A08 11:51:03.872 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.83 Down BGP Notification sent
> Jun =A08 11:49:27 x.x.x.131 239620: Jun =A08 11:51:03.872 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.194.83 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:27 x.x.x.131 239621: Jun =A08 11:51:03.872 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.192.156 Down BGP Notification sent
> Jun =A08 11:49:30 x.x.x.131 239622: Jun =A08 11:51:03.872 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.192.156 4/0 (hold time
> expired) 0 bytes
> Jun =A08 11:49:30 x.x.x.131 239623: Jun =A08 11:51:06.056 CEST:
> %BGP-5-ADJCHANGE: neighbor 80.81.194.50 Down BGP Notification sent
> Jun =A08 11:49:30 x.x.x.131 239624: Jun =A08 11:51:06.056 CEST:
> %BGP-3-NOTIFICATION: sent to neighbor 80.81.194.50 4/0 (hold time
> expired) 0 bytes
>
> CPU load is constantly at 100% doing BGP and more BGP.
>
> We have around 200 BGP sessions on DECIX and I would not want to shut
> them down and bring them up individually.
>
> How can I get out of this deadlock?
>
>
> Andy
>
>
--=20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
:: :: Kevin Hodle | http://www.linkedin.com/in/kevinhodle
:: :: PGP Key ID | fingerprint
:: :: 0x803F24BE | 1094 FB06 837F 2FAB C86B E4BE 4680 3679 803F 24BE
"Elegance is not a dispensable luxury but a factor that decides
between success and failure. "
-Edsgar Dijkstra
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D