[134016] in North American Network Operators' Group
Re: TCP congestion control and large router buffers
daemon@ATHENA.MIT.EDU (Fred Baker)
Tue Dec 21 16:25:06 2010
From: Fred Baker <fred@cisco.com>
In-Reply-To: <alpine.DEB.1.10.1012210758140.27193@uplift.swm.pp.se>
Date: Tue, 21 Dec 2010 13:24:38 -0800
To: Mikael Abrahamsson <swmike@swm.pp.se>
Cc: nanog@nanog.org
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
On Dec 20, 2010, at 11:18 PM, Mikael Abrahamsson wrote:
> On Mon, 20 Dec 2010, Jim Gettys wrote:
>=20
>> Common knowledge among whom? I'm hardly a naive Internet user.
>=20
> Anyone actually looking into the matter. The Cisco "fair-queue" =
command was introduced in IOS 11.0 according to =
<http://www.cisco.com/en/US/docs/ios/12_2/qos/command/reference/qrfcmd1.ht=
ml#wp1098249> to somewhat handle the problem. I have no idea when this =
was in time, but I guess early 90:ties?
1995. I know the guy that wrote the code. Meet me in a bar and we can =
share war stories. The technology actually helps with problems like RFC =
6057 addresses pretty effectively.
>> is a good idea, you aren't old enough to have experienced the NSFnet =
collapse during the 1980's (as I did). I have post-traumatic stress =
disorder from that experience; I'm worried about the confluence of these =
changes, folks.
>=20
> I'm happy you were there, I was under the impression that routers had =
large buffers back then as well?
Not really. Yup, several of us were there. The common routers on the =
NSFNET and related networks were fuzzballs, which had 8 (count them, 8) =
576 byte buffers, Cisco AGS/AGS+, and Proteon routers. The Cisco routers =
of the day generally had 40 buffers on each interface by default, and =
might have had configuration changes; I can't comment on the Proteon =
routers. For a 56 KBPS line, given 1504 bytes per message (1500 bytes =
IP+data, and four bytes of HDLC overhead), that's theoretically 8.5 =
seconds. But given that messages were in fact usually 576 bytes of IP =
data (cf "fuzzballs" and unix behavior for off-LAN communications) and =
interspersed with TCP control messages (Acks, SYNs, FINs, RST), real =
queue depths were more like two seconds at a bottleneck router. The =
question would be the impact of a sequence of routers all acting as =
bottlenecks.
IMHO, AQM (RED or whatever) is your friend. The question is what to set =
min-threshold to. Kathy Nichols (Van's wife) did a lot of simulations. I =
don't know that the paper was ever published, but as I recall she wound =
up recommending something like this:
line rate ms queue depth
(MBPS) RED min-threshold
2 32
10 16
155 8
622 4
2,500 2
10,000 1
> But yes, I agree that we'd all be much helped if manufacturers of both =
ends of all links had the common decency of introducing a WRED (with ECN =
marking) AQM that had 0% drop probability at 40ms and 100% drop =
probability at 200ms (and linear increase between).
so, min-threshold=3D40 ms and max-threshold=3D200 ms. That's good on low =
speed links; it will actually control queue depths to an average of =
O(min-threshold) at whatever value you set it to. The problem with 40 ms =
is that it interacts poorly with some applications, notably voice and =
video.
It also doesn't match well to published studies like =
http://www.pittsburgh.intel-research.net/~kpapagia/papers/p2pdelay-analysi=
s.pdf. In that study, a min-threshold of 40 ms would have cut in only on =
six a-few-second events in the course of a five hour sample. If 40 ms is =
on the order of magnitude of a typical RTT, it suggests that you could =
still have multiple retransmissions from the same session in the same =
queue.
A good photo of buffer bloat is at=20
ftp://ftpeng.cisco.com/fred/RTT/Pages/4.html
ftp://ftpeng.cisco.com/fred/RTT/Pages/5.html
The first is a trace I took overnight in a hotel I stayed in. Never mind =
the name of the hotel, it's not important. The second is the delay =
distribution, which is highly unusual - you expect to see delay =
distributions more like
ftp://ftpeng.cisco.com/fred/RTT/Pages/8.html
(which actually shows two distributions - the blue one is fairly normal, =
and the green one is a link that spends much of the day chock-a-block).
My conjecture re 5.html is that the link *never* drops, and at times has =
as many as nine retransmissions of the same packet in it. The spikes in =
the graph are about a TCP RTO timeout apart. That's a truly worst case. =
For N-1 of the N retransmissions, it's a waste of storage space and a =
waste of bandwidth.
AQM is your friend. Your buffer should be able to temporarily buffer as =
much as an RTT of traffic, which is to say that it should be large =
enough to ensure that if you get a big burst followed by a silent period =
you should be able to use the entire capacity of the link to ride it =
out. Your min-threshold should be at a value that makes your median =
queue depth relatively shallow. The numbers above are a reasonable =
guide, but as in all things, YMMV.=