[164552] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

RE: tools and techniques to pinpoint and respond to loss on a path

daemon@ATHENA.MIT.EDU (James Sink)
Tue Jul 16 11:59:01 2013

From: James Sink <james.sink@freedomvoice.com>
To: "nanog@nanog.org" <nanog@nanog.org>
Date: Tue, 16 Jul 2013 15:52:54 +0000
In-Reply-To: <9F4D4FC766780045A8E7ECEA533A1A8D0367BBC8@CORPTPMAIL03.corp.theplatform.com>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

Have you looked into Cisco's OER?
-James

-----Original Message-----
From: Andy Litzinger [mailto:Andy.Litzinger@theplatform.com]=20
Sent: Monday, July 15, 2013 2:19 PM
To: nanog@nanog.org
Subject: tools and techniques to pinpoint and respond to loss on a path

Hi,

Does anyone have any recommendations on how to pinpoint and react to packet=
 loss across the internet?  preferably in an automated fashion.  For detect=
ion I'm currently looking at trying smoketrace to run from inside my networ=
k, but I'd love to be able to run traceroutes from my edge routers triggere=
d during periods of loss.  I have Juniper MX80s on one end- which I'm hopef=
ul I'll be able to cobble together some combo of RPM and event scripting to=
 kick off a traceroute.  We have Cisco4900Ms on the other end and maybe the=
 same thing is possible but I'm not so sure.

I'd love to hear other suggestions and experience for detection and also fo=
r options on what I might be able to do when loss is detected on a path.

In my specific situation I control equipment on both ends of the path that =
I care about with details below.

we are a hosted service company and we currently have two data centers, DC =
A and DC B.  DC A uses juniper MX routers, advertises our own IP space and =
takes full BGP feeds from two providers, ISPs A1 and A2.  At DC B we have a=
 smaller installation and instead take redundant drops (and IP space) from =
a single provider, ISP B1, who then peers upstream with two providers, B2 a=
nd B3

We have a fairly consistent bi-directional stream of traffic between DC A a=
nd DC B.  Both of ISP A1 and A2 have good peering with ISP B2 so under norm=
al network conditions traffic flows across ISP B1 to B2 and then to either =
ISP A1 or A2

oversimplified ascii pic showing only the normal best paths:

              -- ISP A1----------------------ISP B2--
DC A--|                                                                 |--=
-  ISP B1 ----- DC B
             -- ISP A2----------------------ISP B2--


with increasing frequency we've been experiencing packet loss along the pat=
h from DC A to DC B.  Usually the periods of loss are brief,  30 seconds to=
 a minute, but they are total blackouts.

  I'd like to be able to collect enough relevant data to pinpoint the troub=
le spot as much as possible so I can take it to the ISPs and request a solu=
tion.  The blackouts are so quick that it's impossible to log in and get a =
trace- hence the desire to automate it.

I can provide more details off list if helpful- I'm trying not to vilify an=
yone- especially without copious amounts of data points.

As a side question, what should my expectation be regarding packet loss whe=
n sending packets from point A to point B across multiple providers across =
the internet?  Is 30 seconds to a minute of blackout between two destinatio=
ns every couple of weeks par for the course?  My directly connected ISPs of=
fer me an SLA, but what should I reasonably expect from them when one of th=
eir upstream peers (or a peer of their peers) has issues?  If this turns ou=
t to be BGP reconvergence or similar do I have any options?

many thanks,
-andy



home help back first fref pref prev next nref lref last post