[190542] in North American Network Operators' Group
Re: packet loss question
daemon@ATHENA.MIT.EDU (Mel Beckman)
Thu Jul 7 17:50:10 2016
X-Original-To: nanog@nanog.org
From: Mel Beckman <mel@beckman.org>
To: Ken Chase <math@sizone.org>
Date: Thu, 7 Jul 2016 21:50:02 +0000
In-Reply-To: <20160707213452.GF3241@sizone.org>
Cc: "nanog@nanog.org" <nanog@nanog.org>
Errors-To: nanog-bounces@nanog.org
Ken,
I should have made clear I wasn't replying to you. I was replying to Briell=
e's comment:
> Is it bad that the first thing that came to mind is "Oh FFS, another tro=
ll"?
-mel beckman
> On Jul 7, 2016, at 2:35 PM, Ken Chase <math@sizone.org> wrote:
>=20
> On Thu, Jul 07, 2016 at 08:32:19PM +0000, Mel Beckman said:
>> Yes. It indicates that there was never a time when you did not know ever=
ything :)
>>=20
>> -mel beckman
>=20
> The issue isnt knowing everything, it's making accusations of issues whil=
e you still
> dont know how much you dont know. (~D. Rumsfeld) -- My customers in a nut=
shell
> (they pay to be able to yell about random stuff I guess, and I provide th=
at service!).
>=20
> The OP didnt make any accusations however, and just asked what was going =
on (sorry
> if I sounded harsh in reply). Once, Google having a 8.8.8.8 failure local=
ly on
> its (anycast?) dns servers resulted in dozens of calls to us "your server
> hosting our site must be down!! Our website isnt working! People are call=
ing us!".
>=20
> Most of my work is with these situations is spent proving it's not our fa=
ult.
> Mtr makes it very hard because it's a very subtle tool, and only gives pa=
rtial
> information. (I still think mtr is a killer app though!)
>=20
> consider this (fake, example) trace:
>=20
> 6. 100ge13-1.core1.chi1.he.net 0.0% 10=20
> 7. 100ge14-1.core2.chi1.he.net 0.0% 10=20
> 8. 100ge3-1.core1.sjc2.he.net 30.0% 10=20
> 9. ???
> 10. UNKNOWN-216-115-101-X.yahoo.com 10.0% 10=20
> 11. routerer-ext.ysv.freebsd.org 20.0% 10=20
> 12. wfe0.ysv.freebsd.org 30.0% 10=20
>=20
> First off, the OP may have asked "who's fault is hop 9, yahoo or HE?" and=
seen it
> as an issue. Ignoring that for now, the rest of the packetloss is an issu=
e --
> where is the problem though?
>=20
> This is very tricky - it looks like hop 8 is at fault of course - or is i=
t
> just dropping ICMP as it's allowed to? How did hop 10 get only 10% loss t=
hen if
> 8 has 30? Is 8 then dropping ~20% (not statistically correct..) of ICMP j=
ust cuz
> it can, and then having a 'real' 10% loss on top of that?
>=20
> Or it's hop 11? But hop 12 has more PL, perhaps hop 12 is the issue
> all along and 8 10 and 11 are just dropping ICMP? Or it's 8, 11 and 12 do=
ing
> ~10% each? (not statistically correct.)
>=20
> Can't say for sure - it's a probabilities game - and being completely cor=
rect
> about it, hop 6 isn't blameless either (just very unlikely to be at fault
> statistically, though not impossible with only 10 pings per hop - a stati=
stician
> can calculate it for us).
>=20
> This is why more pings are required to be sure of the situation - I like =
to do
> -i 0.1 -c 100 so it's completed quickly before conditions change. Then y=
ou
> can make a statistically valid pronouncement of where the problem MIGHT B=
E
> within a useful confidence interval - however, without the return route w=
e're
> still largely in the dark as to the actual location of the issue. You can=
t be
> '100% sure' with this stuff - technically speaking, it's all 'luck of the=
draw'.
>=20
> (Beware: this one time, at band camp, some etherchannel or equiv at HE wa=
s
> showing PL only for specific ips in any target subnet -- because they wer=
e xor'ing
> the source & target IP to load balance and one channel was wonky. Fun tim=
es
> debugging that one: "WFM from here, what's your issue?")
>=20
> /kc
> --=20
> Ken Chase - ken@heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toront=
o Canada
> Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 =
Front St. W.