[168215] in North American Network Operators' Group
Re: best practice for advertising peering fabric routes
daemon@ATHENA.MIT.EDU (Clay Fiske)
Wed Jan 15 14:34:27 2014
From: Clay Fiske <clay@bloomcounty.org>
In-Reply-To: <CAP-guGU6bwRw975kimWy9aRmh5t6gLao-hjNX+UL+-kg2Z17Lw@mail.gmail.com>
Date: Wed, 15 Jan 2014 11:33:57 -0800
To: William Herrin <bill@herrin.us>
Cc: "nanog@nanog.org" <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
On Jan 15, 2014, at 10:26 AM, William Herrin <bill@herrin.us> wrote:
>=20
> Of course working, monitorable and testable are three different
> things. If my NMS can't reach the IXP's addresses, my view of the IXP
> is impaired. And "the Internet is broken" is not a trouble report that
> leads to a successful outcome with customer support... it helps to be
> able to pin things down with some specificity.
This approach concerns me for a number of reasons.
First, having your NMS ping your upstream=92s IXP peers probably doesn=92t=
scale. If I=92m a peer of a reasonably large provider, I=92m pretty =
sure I don=92t want all their customers hammering my management plane. =
Even if you=92re the only one doing it, you also don=92t know if I=92m =
rate-limiting pings for that or any other reason.
Second, what information do you get that you didn=92t already have? If =
you saw the IP in a traceroute then you know it exists, is alive, is in =
the path, and a rough estimation of the latency. Pinging it may even =
give you negative information. Platforms vary and all, but in my =
experience pinging a router, especially a potentially busy one peering =
at an IXP, shows notably worse performance than =93real=94 traffic =
experiences (admittedly somewhat true of TTL Expired responses, but less =
so in my experience). Now you=92re potentially seeing high latency and =
packet loss which in reality might not even be there at all.
Third, you don=92t know that your ping to the peering IP is even taking =
the same path as the packets addressed to the real destination. MTR for =
example looks nice, but it would probably be more accurate if it simply =
ran the traceroute over and over instead of pinging each hop directly. =
You would also detect path changes for the real destination that pinging =
intermediate hops wouldn=92t show you.
While I appreciate the desire to be able to do as much of your own =
detective work as possible, I can also see where you=92re now shifting =
workload onto someone else=92s support organization when they=92re not =
necessarily the problem either (=93Hey, my NMS says your peering router =
is causing latency and packet loss, fix it!=94).
I=92m also not saying there isn=92t a troubleshooting gap caused by =
this. I=92m just not sure being able to ping the IXP hop solves that =
problem either.
Semi-related tangent: Working in an IXP setting I have seen weird corner =
cases cause issues in conjunction with the IXP subnet existing in BGP. =
Say someone=92s got proxy ARP enabled on their router (sadly, more =
common than it should be, and not just from noobs at startups). Now say =
your IXP is growing and you expand the subnet. No matter how much you =
harp on the customers to make the change, they don=92t all do it at =
once. Someone announces the new, larger subnet in BGP. Now when anyone =
ARPs for IPs in the new part of the range, proxy ARP guy (still on the =
smaller subnet) says =93hey I have a route for that, send it here=94. =
That was fun to troubleshoot. :)
-c