[45308] in North American Network Operators' Group
Re: representativeness of flow data based on samples
daemon@ATHENA.MIT.EDU (Jake Khuon)
Wed Jan 30 14:47:29 2002
Message-Id: <200201301942.g0UJgnGY002155@llama.wooj.com>
From: "Jake Khuon" <khuon@NEEBU.Net>
To: Joe Abley <jabley@automagic.org>
Cc: nanog@merit.edu
In-reply-to: Joe Abley's message of Wed, 30 Jan 2002 14:02:30 -0500.
	     <20020130140229.B49185@buffoon.automagic.org> 
Reply-To: khuon@NEEBU.Net (Jake Khuon)
Date: Wed, 30 Jan 2002 11:42:49 -0800
Errors-To: owner-nanog-outgoing@merit.edu
### On Wed, 30 Jan 2002 14:02:30 -0500, Joe Abley <jabley@automagic.org>
### casually decided to expound upon nanog@merit.edu the following thoughts
### about "representativeness of flow data based on samples":
JA> For example, if I am trying to rank the top traffic sinks for my
JA> network beyond an attached peer (i.e. an ordinal rather than cardinal
JA> measurement), will I get different answers if I use a sampling rate
JA> of 1:1000 compared to 1:50, given a statistically "long enough"
JA> measurement period?
I suspect that it will just determine the smoothness of your statistics over
the long run which I assume is what you're interested in.  I guess it will
depend on the ballpark expected packet flow.  One might ask the question of
"how close do things seem/need to be?" One has to assume the sampling run
time is bigger than the sampling rate by a certain order of magnitude
because the amount of sampling error can be predicted as the square root of
the number of samples.  So what does a per-sample loss mean to you?  And how
much error can you tolerate?  Figure that out and you can narrow in on an
appropriate sampling period.
--
/*===================[ Jake Khuon <khuon@NEEBU.Net> ]======================+
 | Packet Plumber, Network Engineers     /| / [~ [~ |) | | --------------- |
 | for Effective Bandwidth Utilisation  / |/  [_ [_ |) |_| N E T W O R K S |
 +=========================================================================*/