[134050] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: TCP congestion control and large router buffers

daemon@ATHENA.MIT.EDU (Fred Baker)
Wed Dec 22 12:14:54 2010

From: Fred Baker <fred@cisco.com>
In-Reply-To: <4D122BD6.5070503@freedesktop.org>
Date: Wed, 22 Dec 2010 09:14:33 -0800
To: Jim Gettys <jg@freedesktop.org>
Cc: nanog@nanog.org
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org


On Dec 22, 2010, at 8:48 AM, Jim Gettys wrote:
> I don't know if you are referring to the "RED in a different light" =
paper: that was never published, though an early draft escaped and can =
be found on the net.

Precisely.=20

> "RED in a different light" identifies two bugs in the RED algorithm, =
and proposes a better algorithm that only depends on the link output =
bandwidth.  That draft still has a bug.
>=20
> The (almost completed) version of the paper that never got published; =
Van has retrieved it from back up, and I'm trying to pry it out of Van's =
hands to get it converted to something we can read today (it's in =
FrameMaker).
>=20
> In the meanwhile, turn on (W)RED!  For routers run by most people on =
this list, it's always way better than nothing, even if Van doesn't =
think classic RED will solve the home router bufferbloat problem. (where =
we have 2 orders of magnitude variation of wireless bandwidth along with =
highly variable workload).  That's not true in the internet core.
>=20
>>> But yes, I agree that we'd all be much helped if manufacturers of =
both ends of all links had the common decency of introducing a WRED =
(with ECN marking) AQM that had 0% drop probability at 40ms and 100% =
drop probability at 200ms (and linear increase between).
>>=20
>> so, min-threshold=3D40 ms and max-threshold=3D200 ms. That's good on =
low speed links; it will actually control queue depths to an average of =
O(min-threshold) at whatever value you set it to. The problem with 40 ms =
is that it interacts poorly with some applications, notably voice and =
video.
>>=20
>> It also doesn't match well to published studies like =
http://www.pittsburgh.intel-research.net/~kpapagia/papers/p2pdelay-analysi=
s.pdf. In that study, a min-threshold of 40 ms would have cut in only on =
six a-few-second events in the course of a five hour sample. If 40 ms is =
on the order of magnitude of a typical RTT, it suggests that you could =
still have multiple retransmissions from the same session in the same =
queue.
>>=20
>> A good photo of buffer bloat is at
>>       ftp://ftpeng.cisco.com/fred/RTT/Pages/4.html
>>       ftp://ftpeng.cisco.com/fred/RTT/Pages/5.html
>>=20
>> The first is a trace I took overnight in a hotel I stayed in. Never =
mind the name of the hotel, it's not important. The second is the delay =
distribution, which is highly unusual - you expect to see delay =
distributions more like
>>=20
>>       ftp://ftpeng.cisco.com/fred/RTT/Pages/8.html
>=20
> Thanks, Fred!  Can I use these in the general bufferbloat talk I'm =
working on with attribution?  It's a far better example/presentation in =
a graphic form than I currently have for the internet core case (where I =
don't even have anything other than memory of probing the hotel's ISP's =
network).

Yes. Do me a favor and remove the name of the hotel. They don't need the =
bad press.

>>=20
>> (which actually shows two distributions - the blue one is fairly =
normal, and the green one is a link that spends much of the day =
chock-a-block).
>>=20
>> My conjecture re 5.html is that the link *never* drops, and at times =
has as many as nine retransmissions of the same packet in it. The spikes =
in the graph are about a TCP RTO timeout apart. That's a truly worst =
case. For N-1 of the N retransmissions, it's a waste of storage space =
and a waste of bandwidth.
>>=20
>> AQM is your friend. Your buffer should be able to temporarily buffer =
as much as an RTT of traffic, which is to say that it should be large =
enough to ensure that if you get a big burst followed by a silent period =
you should be able to use the entire capacity of the link to ride it =
out. Your min-threshold should be at a value that makes your median =
queue depth relatively shallow. The numbers above are a reasonable =
guide, but as in all things, YMMV.
>=20
> Yup. AQM is our friend.
>=20
> And we need it in many places we hadn't realised we did (like our =
OS's).
>                          - Jim
>=20



home help back first fref pref prev next nref lref last post