[164558] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

RE: tools and techniques to pinpoint and respond to loss on a path

daemon@ATHENA.MIT.EDU (Andy Litzinger)
Tue Jul 16 14:13:38 2013

X-Barracuda-Envelope-From: Andy.Litzinger@theplatform.com
From: Andy Litzinger <Andy.Litzinger@theplatform.com>
To: Blake Dunlap <ikiris@gmail.com>
Date: Tue, 16 Jul 2013 18:12:32 +0000
In-Reply-To: <9F4D4FC766780045A8E7ECEA533A1A8D03682184@CORPTPMAIL03.corp.theplatform.com>
Cc: "nanog@nanog.org" <nanog@nanog.org>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

> From: Blake Dunlap [mailto:ikiris@gmail.com]
> While any provider will attempt to fix peer / upstream issues as they can=
, any
> SLA you would have is between two points on their private network, not
> from point A to point Z that they have no control over across multiple pe=
ers
> and the public internet itself.

makes sense- thanks for confirming

> The much more common design is using a single
> provider for each thread between sites. Then at least you have an end-to-
> end SLA in effect, as well as a single entity that is responsible for the=
 entire
> link in question.
>=20
> This sounds like you're trying to achieve private link IGP / FRR level si=
te to site
> failover/convergence across the public internet. Perhaps you should rethi=
nk
> your goals here or your design?

Kind of- I can actually tolerate the blips, but I want to be able to measur=
e and track
 them in such a way that I know where the loss is occurring.  If a particul=
ar path
is reconverging more often than should be reasonably expected I want to be =
able to
prove it within reason.

We also have a customer who happens to host at DC B with the same connectiv=
ity.
Every time there is one of these blips their alerting fires off a thousand =
messages
and they open a ticket with us.  I'd like to be able to show them some good=
 data
on the path during the blip so we back a discussion along the  lines
of "live with it, or pay to privately connect to us".

-andy

> -Blake
>=20
> On Mon, Jul 15, 2013 at 4:18 PM, Andy Litzinger
> <Andy.Litzinger@theplatform.com> wrote:
> Hi,
>=20
> Does anyone have any recommendations on how to pinpoint and react to
> packet loss across the internet? =A0preferably in an automated fashion. =
=A0For
> detection I'm currently looking at trying smoketrace to run from inside m=
y
> network, but I'd love to be able to run traceroutes from my edge routers
> triggered during periods of loss. =A0I have Juniper MX80s on one end- whi=
ch I'm
> hopeful I'll be able to cobble together some combo of RPM and event
> scripting to kick off a traceroute. =A0We have Cisco4900Ms on the other e=
nd and
> maybe the same thing is possible but I'm not so sure.
>=20
> I'd love to hear other suggestions and experience for detection and also =
for
> options on what I might be able to do when loss is detected on a path.
>=20
> In my specific situation I control equipment on both ends of the path tha=
t I
> care about with details below.
>=20
> we are a hosted service company and we currently have two data centers,
> DC A and DC B. =A0DC A uses juniper MX routers, advertises our own IP spa=
ce
> and takes full BGP feeds from two providers, ISPs A1 and A2. =A0At DC B w=
e
> have a smaller installation and instead take redundant drops (and IP spac=
e)
> from a single provider, ISP B1, who then peers upstream with two provider=
s,
> B2 and B3
>=20
> We have a fairly consistent bi-directional stream of traffic between DC A=
 and
> DC B. =A0Both of ISP A1 and A2 have good peering with ISP B2 so under nor=
mal
> network conditions traffic flows across ISP B1 to B2 and then to either I=
SP A1
> or A2
>=20
> oversimplified ascii pic showing only the normal best paths:
>=20
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 -- ISP A1----------------------ISP B2-- DC A-=
-
> | =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 |--- =A0ISP B1 ---=
-- DC B
> =A0 =A0 =A0 =A0 =A0 =A0 =A0-- ISP A2----------------------ISP B2--
>=20
>=20
> with increasing frequency we've been experiencing packet loss along the
> path from DC A to DC B. =A0Usually the periods of loss are brief, =A030 s=
econds to a
> minute, but they are total blackouts.
>=20
> =A0 I'd like to be able to collect enough relevant data to pinpoint the t=
rouble
> spot as much as possible so I can take it to the ISPs and request a
> solution. =A0The blackouts are so quick that it's impossible to log in an=
d get a
> trace- hence the desire to automate it.
>=20
> I can provide more details off list if helpful- I'm trying not to vilify =
anyone-
> especially without copious amounts of data points.
>=20
> As a side question, what should my expectation be regarding packet loss
> when sending packets from point A to point B across multiple providers
> across the internet? =A0Is 30 seconds to a minute of blackout between two
> destinations every couple of weeks par for the course? =A0My directly
> connected ISPs offer me an SLA, but what should I reasonably expect from
> them when one of their upstream peers (or a peer of their peers) has
> issues? =A0If this turns out to be BGP reconvergence or similar do I have=
 any
> options?
>=20
> many thanks,
> -andy



home help back first fref pref prev next nref lref last post