[186012] in North American Network Operators' Group
RE: route converge time
daemon@ATHENA.MIT.EDU (Spyros Kakaroukas)
Sat Nov 21 22:39:48 2015
X-Original-To: nanog@nanog.org
From: Spyros Kakaroukas <s.kakaroukas@connecticore.com>
To: 'Baldur Norddahl' <baldur.norddahl@gmail.com>, "nanog@nanog.org"
<nanog@nanog.org>
Date: Sat, 21 Nov 2015 15:14:25 +0000
In-Reply-To: <CAPkb-7CH7iR5ua=AkaULyeJLv_1R_DOwuP5a1UWnnApcd4G_OQ@mail.gmail.com>
Errors-To: nanog-bounces@nanog.org
------=_NextPart_000_1654_01D12480.128C6660
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: 7bit
Hey,
This is a complex problems and there are quite a few parts to consider.
Let's assume you want to optimize how fast you choose the right best exit
after a failure. The opposite ( how fast the internet chooses the best entry
point into your network after a failure ) is usually not that easy to
influence.
The first component of our total convergence time is how fast you can actually
detect the failure. If your bgp speaker is directly connected to the transit's
bgp speaker with no boxes inbetween, then you can detect the failure about as
fast as it takes your end to detect that the link is down, which is usually
pretty fast ( you could tune the carrier-delay if you want to ). If there are
any other boxes in-between , you can't rely on that. The best solution in that
case, imho, would be to use bfd. If you can't do that, you may want try and
tune bgp keepalive/holddown timers. Keep in mind that running aggressive
timers will consume cpu resources on both your and the provider's end.
The second component would be how much time it takes bgp to find the alternate
routes. As you're using l3vpn , there's an easy trick to apply here. You can
just set up a different rd on each router and both routers will end up with
routes from both providers in their bgp table. That will obviously consume
hardware resources ( usually ram, as not every route will make it to the fib
just yet ) so make sure your routers can handle it.
The third component would be how much time it takes you to update the fib
itself. This is usually fast for a single route, but not as fast as you might
think for ~550k routes. What you can do to speed this up depends somewhat on
your hardware. Most big vendors do support some flavor of a hierarchical fib
( cisco calls theirs pic core ). Keep in mind that this will also eat up
hardware resources depending on the implementation itself. Make sure you read
up before you try anything as it could end up doubling your fib requirements,
which aren't light to begin with for full tables.
Last but not least, keep scalabity in mind when reading the last 2 paragraphs.
On newer boxes, tuning for fast convergence may be more than fine for 2
providers but practically impossible for, say, 6 or 8 of them.
As for the scenarios of local failure, first of all, really try to make sure
that the ibgp session between them ( or towards their RRs/etc ) is as robust
as it gets. Assuming that's taken care of, convergence should be about as much
time as it takes your igp to figure it out. Bfd and usual igp timer/feature
adjustments do apply. Next-hop tracking and fast peering detection ( assuming
cisco ) are also nice, though if you have defaults in your network, you might
want to exclude them from being used for either.
My thoughts and words are my own.
Kind Regards,
Spyros
-----Original Message-----
From: NANOG [mailto:nanog-bounces@nanog.org] On Behalf Of Baldur Norddahl
Sent: Saturday, November 21, 2015 3:45 PM
To: nanog@nanog.org
Subject: route converge time
Hi
I got a network with two routers and two IP transit providers, each with the
full BGP table. Router A is connected to provider A and router B to provider
B. We use MPLS with a L3VPN with a VRF called "internet".
Everything happens inside that VRF.
Now if I interrupt one of the IP transit circuits, the routers will take
several minutes to remove the now bad routes and move everything to the
remaining transit provider. This is very noticeable to the customers. I am
looking into ways to improve that.
I added a default static route 0.0.0.0 to provider A on router A and did the
same to provider B on router B. This is supposed to be a trick that allows the
network to move packets before everything is fully converged.
Traffic might not leave the most optimal link, but it will be delivered.
Say I take down the provider A link on router A. As I understand it, the
hardware will notice this right away and stop using the routes to provider A.
Router A might know about the default route on router B and send the traffic
to router B. However this is not much help, because on router B there is no
link that is down, so the hardware is unaware until the BGP process is done
updating the hardware tables. Which apparently can take several minutes.
My routers also have multipath support, but I am unsure if that is going to be
of any help.
Anyone got any tricks or pointers to what can be done to optimize the downtime
in case of a IP transit link failure? Or the related case of one my routers
going down or the link between them going down (the traffic would go a
non-direct way instead if the direct link is down).
Thanks,
Baldur
------=_NextPart_000_1654_01D12480.128C6660
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIOQzCCBDYw
ggMeoAMCAQICAQEwDQYJKoZIhvcNAQEFBQAwbzELMAkGA1UEBhMCU0UxFDASBgNVBAoTC0FkZFRy
dXN0IEFCMSYwJAYDVQQLEx1BZGRUcnVzdCBFeHRlcm5hbCBUVFAgTmV0d29yazEiMCAGA1UEAxMZ
QWRkVHJ1c3QgRXh0ZXJuYWwgQ0EgUm9vdDAeFw0wMDA1MzAxMDQ4MzhaFw0yMDA1MzAxMDQ4Mzha
MG8xCzAJBgNVBAYTAlNFMRQwEgYDVQQKEwtBZGRUcnVzdCBBQjEmMCQGA1UECxMdQWRkVHJ1c3Qg
RXh0ZXJuYWwgVFRQIE5ldHdvcmsxIjAgBgNVBAMTGUFkZFRydXN0IEV4dGVybmFsIENBIFJvb3Qw
ggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC39xoz5vIABC054E5b7R+8bA/Ntfojts7e
mxEzl6QpTH2Tn71KvJPtAxrjj8/lbVBa1pcplFqAsEl62y6V/bjKvzc4LR4+kUGtcFbH8E8/6DKe
dMrIkFTpxl8PeJ2aQDwOrGGqXhSPnoehalDc15pOrwWzpnGUnHGzUGAKxxOdOAeGAqjpqGkmGJCr
TLBPI6s6T4TY386f4Wlvu9dC12tE5Met7m1BX3JacQg3s3llpFmglDf3AC8NwpJy2tA4ctsUqEXE
XSp9t7TWxO6szRNEt8kr3UMAJfphuWlqWCMRt6czj1Z1WfXNKddGtworZbbTQm8Vsrh7++/pXVPV
NFonAgMBAAGjgdwwgdkwHQYDVR0OBBYEFK29mHo0tCb3+sQmVO8DveAky1QaMAsGA1UdDwQEAwIB
BjAPBgNVHRMBAf8EBTADAQH/MIGZBgNVHSMEgZEwgY6AFK29mHo0tCb3+sQmVO8DveAky1QaoXOk
cTBvMQswCQYDVQQGEwJTRTEUMBIGA1UEChMLQWRkVHJ1c3QgQUIxJjAkBgNVBAsTHUFkZFRydXN0
IEV4dGVybmFsIFRUUCBOZXR3b3JrMSIwIAYDVQQDExlBZGRUcnVzdCBFeHRlcm5hbCBDQSBSb290
ggEBMA0GCSqGSIb3DQEBBQUAA4IBAQCwm+CFJcLWI+IPlgaSnUGYnNmEeYHZHlsUByM2ZY+w2He7
rEFsR2CDUbD5Mj3n/PYmE8eAFqW/WvyHz3h5iSGa4kwHCoY1vPLeUcTSlrfcfk7ucP0cOesMAlEU
LY69FuDB30Z15ySt7PRCtIWTcBBnup0GNUoY0yt6zFFCoXpj0ea7ocUrwja+Ew3mvWN+eXunCQ1A
q2rdj4rD9vaMGkIFUdRF9Z+nYiFoFSBDPJnnfL0k2KmRF3OIP1YbMTgYtHEPms3IDp6OLhvhjJiD
yx8x8URMxgRzSXZgD8f4vReAay7pzEwOWpp5DyAKLtWeYyYeVZKU2IIXWnvQvMePToYEMIIErzCC
A5egAwIBAgIRAOAjyxUSg1OJrWFuelRnayEwDQYJKoZIhvcNAQELBQAwbzELMAkGA1UEBhMCU0Ux
FDASBgNVBAoTC0FkZFRydXN0IEFCMSYwJAYDVQQLEx1BZGRUcnVzdCBFeHRlcm5hbCBUVFAgTmV0
d29yazEiMCAGA1UEAxMZQWRkVHJ1c3QgRXh0ZXJuYWwgQ0EgUm9vdDAeFw0xNDEyMjIwMDAwMDBa
Fw0yMDA1MzAxMDQ4MzhaMIGbMQswCQYDVQQGEwJHQjEbMBkGA1UECBMSR3JlYXRlciBNYW5jaGVz
dGVyMRAwDgYDVQQHEwdTYWxmb3JkMRowGAYDVQQKExFDT01PRE8gQ0EgTGltaXRlZDFBMD8GA1UE
AxM4Q09NT0RPIFNIQS0yNTYgQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwg
Q0EwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCJsQ3aelMZTnBSHbxWpgYmt7hJ4Jbn
Uavx8FoTSRWjtIwbYLx6UUKneYykIt8XYU6R1XYjChTTSgJ/th0JgG6lBD3ZursW/qGHqS5DUkMW
fK8yUMimT1rpCNjPkyWce4joMGTmpPhWgP0qJBQzF5msROVpi6NGBkvCM9TpQJ8GsLGsk0C5tQiT
OpwqU6MQ2z0gYTxVA47ZTnYlAiEp+qN8cXZP7uFfgen7VIDbw3s1UreE3iI9LDAtMX9ZvVI3sDNp
LUPr+tal8Zd3Z1GM2e4n67ylBzh2jKSpOP/fjPUDrEm+yvdzmToPMquclToTPQ5GOld0YVC+xkA/
y+Tin6IhAgMBAAGjggEXMIIBEzAfBgNVHSMEGDAWgBStvZh6NLQm9/rEJlTvA73gJMtUGjAdBgNV
HQ4EFgQUkmFrguGioKpP7GfxwqP3tIAAwewwDgYDVR0PAQH/BAQDAgGGMBIGA1UdEwEB/wQIMAYB
Af8CAQAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMBEGA1UdIAQKMAgwBgYEVR0gADBE
BgNVHR8EPTA7MDmgN6A1hjNodHRwOi8vY3JsLnVzZXJ0cnVzdC5jb20vQWRkVHJ1c3RFeHRlcm5h
bENBUm9vdC5jcmwwNQYIKwYBBQUHAQEEKTAnMCUGCCsGAQUFBzABhhlodHRwOi8vb2NzcC51c2Vy
dHJ1c3QuY29tMA0GCSqGSIb3DQEBCwUAA4IBAQAbKm6sVcE6q4jF2O3NVfOqa2ErwAkQI5kPxWZq
b7H1tLV3Xg8CYQDffQX+ErOkgIAA/PsdW2pyAgpBvAW6wVjVJsLq1U2E+/6CmM9YG+MiY5xS+LsF
Nqt9WKXeqztj5drVc+/s4Pt74qP/8EIjnMq2jU0+5EsYA7KoLdTYu0JLkGmFENumNzToe+ABEKWc
yjrHn0+ING6KZdAairup3MrKNtH0/MJkKTWv1rGncRHSA0Oxjz6a7J4yU/R2ksqGNAe5LMrmHErY
mQ3BhuKQkvtaQmojIRDpZcf11bt+6oyFIAJi6tE6ByxZxZkz8jiJ5bbpFnofeRT2ShAaJvp8ivub
MIIFUjCCBDqgAwIBAgIRANFl6PWG2JxR9SVMO9BS/7cwDQYJKoZIhvcNAQELBQAwgZsxCzAJBgNV
BAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAY
BgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMUEwPwYDVQQDEzhDT01PRE8gU0hBLTI1NiBDbGllbnQg
QXV0aGVudGljYXRpb24gYW5kIFNlY3VyZSBFbWFpbCBDQTAeFw0xNTEwMjIwMDAwMDBaFw0xNjEw
MjEyMzU5NTlaMC4xLDAqBgkqhkiG9w0BCQEWHXMua2FrYXJvdWthc0Bjb25uZWN0aWNvcmUuY29t
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAz6ocxx8ftUYYGdq70Q8tZmgTCgIK1nOX
h479T+0yJFjVj2FoLnJHf+RfDvvl0bq1U6zI7WljBCSMGbJwEDcixTMpw0gLunaTWRQBxl3ys9P7
Z6uDHHO9hlt3X4RxMqiiHuFLUTCCW93r7DDOCzib63VzygZZ8vgY7rGxKjAEEGObL/AJ9NBcZ4bZ
4Nt99QOrbcFiY/kmsDvbyE9wm87YrG4vDTwCo0Jf73j8ng+oKjVL0xTfM8u14USJoj+FlYvsVM7w
HJ+7sRDw2OcsCrc91pZFsL6aEtQigUZnGUfvnKElh1Vnrru4cdliNAc0L40PNkA38XHaLsJD5mac
OgFodQIDAQABo4IB+zCCAfcwHwYDVR0jBBgwFoAUkmFrguGioKpP7GfxwqP3tIAAwewwHQYDVR0O
BBYEFGxtIKKZX+13A1lPRjII1AJXt4TTMA4GA1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMCAG
A1UdJQQZMBcGCCsGAQUFBwMEBgsrBgEEAbIxAQMFAjARBglghkgBhvhCAQEEBAMCBSAwRgYDVR0g
BD8wPTA7BgwrBgEEAbIxAQIBAQEwKzApBggrBgEFBQcCARYdaHR0cHM6Ly9zZWN1cmUuY29tb2Rv
Lm5ldC9DUFMwXQYDVR0fBFYwVDBSoFCgToZMaHR0cDovL2NybC5jb21vZG9jYS5jb20vQ09NT0RP
U0hBMjU2Q2xpZW50QXV0aGVudGljYXRpb25hbmRTZWN1cmVFbWFpbENBLmNybDCBkAYIKwYBBQUH
AQEEgYMwgYAwWAYIKwYBBQUHMAKGTGh0dHA6Ly9jcnQuY29tb2RvY2EuY29tL0NPTU9ET1NIQTI1
NkNsaWVudEF1dGhlbnRpY2F0aW9uYW5kU2VjdXJlRW1haWxDQS5jcnQwJAYIKwYBBQUHMAGGGGh0
dHA6Ly9vY3NwLmNvbW9kb2NhLmNvbTAoBgNVHREEITAfgR1zLmtha2Fyb3VrYXNAY29ubmVjdGlj
b3JlLmNvbTANBgkqhkiG9w0BAQsFAAOCAQEAIRz4F4COMaLz69VcGqCnBS+O8DAh4UbnItc2gQjp
Z8CygAh/6vnyi3Tc4+sUl+DWsnW5Uo0xzXEYBd9gGaLx7U+DqpbeBlGMng7Nie0MTq9giHrdmWkK
tS/0fzpnS98gNpoXq7U7lvmCNPygkJq5VXpBbBey0gOdw6l9T847nmv80Z2o+80HSHZWagBzXxdz
O5undxSohF7cBEmddxQrl1AcsQ6EZ4AmJX3RaaCiJRORoQPMPJXNHK8ul8K+c8U8lzOT1rNQ9w8b
j1vxP1ScmS8WDt4cCmiNZVvmbIT59605/7bzaAZUdtKnzxA0DfdzVlJh//I/fiNtdW9miP8bozGC
BFwwggRYAgEBMIGxMIGbMQswCQYDVQQGEwJHQjEbMBkGA1UECBMSR3JlYXRlciBNYW5jaGVzdGVy
MRAwDgYDVQQHEwdTYWxmb3JkMRowGAYDVQQKExFDT01PRE8gQ0EgTGltaXRlZDFBMD8GA1UEAxM4
Q09NT0RPIFNIQS0yNTYgQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0EC
EQDRZej1hticUfUlTDvQUv+3MAkGBSsOAwIaBQCgggJ/MBgGCSqGSIb3DQEJAzELBgkqhkiG9w0B
BwEwHAYJKoZIhvcNAQkFMQ8XDTE1MTEyMTE1MTQyNVowIwYJKoZIhvcNAQkEMRYEFMmo2HoUgk9+
UkcW/KUME+jnU3gWMIGTBgkqhkiG9w0BCQ8xgYUwgYIwCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQB
FjAKBggqhkiG9w0DBzALBglghkgBZQMEAQIwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
MAcGBSsOAwIaMAsGCWCGSAFlAwQCAzALBglghkgBZQMEAgIwCwYJYIZIAWUDBAIBMIHCBgkrBgEE
AYI3EAQxgbQwgbEwgZsxCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIx
EDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMUEwPwYDVQQDEzhD
T01PRE8gU0hBLTI1NiBDbGllbnQgQXV0aGVudGljYXRpb24gYW5kIFNlY3VyZSBFbWFpbCBDQQIR
ANFl6PWG2JxR9SVMO9BS/7cwgcQGCyqGSIb3DQEJEAILMYG0oIGxMIGbMQswCQYDVQQGEwJHQjEb
MBkGA1UECBMSR3JlYXRlciBNYW5jaGVzdGVyMRAwDgYDVQQHEwdTYWxmb3JkMRowGAYDVQQKExFD
T01PRE8gQ0EgTGltaXRlZDFBMD8GA1UEAxM4Q09NT0RPIFNIQS0yNTYgQ2xpZW50IEF1dGhlbnRp
Y2F0aW9uIGFuZCBTZWN1cmUgRW1haWwgQ0ECEQDRZej1hticUfUlTDvQUv+3MA0GCSqGSIb3DQEB
AQUABIIBAKDLWts8DTKN2iIg2t7FJc71l/YKWCyXQjyUEveD+WVLYvfhya/Ar6lN6MgL15+feHn9
Y4IFYo4pejwrhfnWky71dXIsTvePbw946uD8W8qWvySnlSEx4RCDPzZ3ccbVzJQWr0zNrQk9iimn
/3kIHf1fG+AqCCfzPqF4WJ3e7Ox8S5N83PTo1XeQ1WcI8kxCQDhXq+AMMpBYi5p+FvAbf2mGIP5A
LEICuSaus4jGQwRucP7kYXiJc6cbBzlMdS9gN/5qWaTY4elRl+2Hp3y/dcW+E8hR92tVyY3kqRoR
ftc4jqUPpaLnx01NH69VkvOL4uSJ3WYwCHARJRUXoHOISuUAAAAAAAA=
------=_NextPart_000_1654_01D12480.128C6660--