[96053] in North American Network Operators' Group
Re: Thoughts on increasing MTUs on the internet
daemon@ATHENA.MIT.EDU (Douglas Otis)
Sun Apr 15 00:06:52 2007
In-Reply-To: <13AF5522-C17E-497D-AAB2-63A05072E17C@muada.com>
From: Douglas Otis <dotis@mail-abuse.org>
Date: Sat, 14 Apr 2007 20:46:32 -0700
To: NANOG list <nanog@merit.edu>
Errors-To: owner-nanog@merit.edu
On Apr 14, 2007, at 1:10 PM, Iljitsch van Beijnum wrote:
> On 14-apr-2007, at 19:22, Douglas Otis wrote:
>>>
>>> 1500 byte MTUs in fact work. I'm all for 9K MTUs, and would
>>> recommend them. I don't see the point of 65K MTUs.
>>
>> Keep in mind that a 9KB MTU still reduces the Ethernet CRC
>> effectiveness by a fair amount.
>
> I can't find bit error rate specs for various types of ethernet
> real quick, but if you assume 10^-9 that means that ~ 1 in 10000
> 11454 byte packets has one bit error, so around 1 in 10^12 has four
> bit errors and has a _chance_ to defeat the CRC32. The naieve
> assumption that only 1 in 2^32 of those packets with 3 flipped bits
> will have a valid CRC32 is probably incorrect, but the CRC should
> still catch most of those packetss for a fairly large value of "most".
http://www.ietf.org/rfc/rfc3385.txt
http://citeseer.ist.psu.edu/koopman02bit.html
> For 1500 byte packets the fraction of packets with three bits
> flipped would be around 1 : 10^15, correcting for the larger number
> of packets per given amount of data, that's a difference of about
> 1 : 100.
>
Quoting from "When The CRC and TCP Checksum Disagree" by Jonathan
Stone and Craig Partridge:
http://citeseer.ist.psu.edu/cache/papers/cs/21401/
http:zSzzSzsigcomm.it.uu.sezSzconfzSzpaperzSzsigcomm2000-9-1.pdf/
stone00when.pdf
"Traces of Internet packets from the past two years show that between
1 packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even
on links where link-level CRCs should catch all but 1 in 4 billion
errors. For certain situations, the rate of checksum failures can be
even higher: in one hour-long test we observed a checksum failure of
1 packet in 400. We investigate why so many errors are observed,
when link-level CRCs should catch nearly all of them.
We have collected nearly 500,000 packets which failed the TCP or UDP
or IP checksum. This dataset shows the Internet has a wide variety of
error sources which can not be detected by link-level checks. We
describe analysis tools that have identified nearly 100 different
error patterns. Categorizing packet errors, we can infer likely
causes which explain roughly half the observed errors. The causes
span the entire spectrum of a network stack, from memory errors to
bugs in TCP.
After an analysis we conclude that the checksum will fail to detect
errors for roughly 1 in 16 million to 10 billion packets. From our
analysis of the cause of errors, we propose simple changes to several
protocols which will decrease the rate of undetected error. Even so,
the highly non-random distribution of errors strongly suggests some
applications should employ application-level checksums or equivalents."
Hardware weaknesses within DSLAMs or various memory arrays, such as a
weak driver on some internal interface, can generate high levels of
multi-bit errors not detected by TCP checksums. When affecting the
same bit within an interface, more than 1 out of 100 may go undetected.
> That seems like a lot, but getting better quality fiber easily
> compensates for this. Expressed differently, the average amount of
> data transmitted where you see one packet with three flipped bits
> is around 10 petabytes for 11454 byte packets and some 1.3 exabytes
> for 1500 byte packets. For the large packets that would be one
> packet in three years at 1 Gbps, for the small ones one packet in
> 380 years.
Consider that the CRC is not always carried with the packet between
interfaces.
-Doug