[2129] in linux-net channel archive
Reliability testing of 1.3.7x
daemon@ATHENA.MIT.EDU (Paul Gortmaker)
Sun Mar 17 00:34:59 1996
From: Paul Gortmaker <gpg109@rsphy1.anu.edu.au>
To: linux-net@vger.rutgers.edu
Date: Sun, 17 Mar 1996 16:30:08 +1100 (EST)
Here are some things I observed while doing durability testing on 1.3.74
(plus the one line removed from skb_clone() fix) made with gcc-2.5.8
1) IP fragment reassembly:
Given a typical 4MB (or less) machine, the memory fragmentation
even directly after boot is enough to make > 20k dev_alloc_skb requests
fail. What this means is that some jerk doing "ping -s 30000 linux_box"
will cause your console and system logs to fill with messages like
IP: queue_glue: no memory for gluing queue XXXXXXX
After a bit of use on >= 8MB machines, you have enough memory
fragmentation to end up seeing the exact same problem.
The easy fix is just to silently drop the queue on the floor, instead
of moaning about it. Or at least make it a KERN_DEBUG .
2) IP evictor panic:
After spamming a freshly booted box with spurts of IP frags,
I have seen the following panic:
ip_evictor: memcount
shortly after boot. Ugh. Doesn't happen every time, but it has indeed
happened. What this means is that there were in excess of 256kB of
fragments floating around, but at that instant when ip_evictor
got called, ipqueue (which is the head of the linked list) was NULL.
Dunno how that can happen yet, as all the list juggling is done inside
cli/sti pairs...
3) Socket destroy delayed:
By simply doing a "ping -f -s 30000 some_victim" and then
hitting ^C, you will get a "Socket destroy delayed" message. Every
time. 100% reproducible.
4) Double lock on socket:
Under *heavy* net load, I have seen the "double lock on
socket" messages. These have originated from udp_sendto,
skb_recv_datagram, tcp_recvmsg, and tcp_sendmsg. No ill effects
other than the messages themselves. The code says that these aren't
necessarily bugs, but usually are...
5) Bogus Packet warnings from 8390.c (wd cards)
Sometimes directly after a rec'vr overrun, you see a message
like "eth0: bogus packet status=0x0 nxpg=0xNN size=XXXX"
I don't seem to get these on the same box with a v1.2.13 kernel.
The nxpg and the size are always valid, but the status byte (which
is the first byte of the 4 byte 8390 header) is always zero, which
is undefined. I'll have to double check that the overrun code is
doing the right thing. Fortunately overruns are rare, unless you
run an 8bit card on a 7MHz ISA bus with a 32k window.
Other than that, 1.3.74 held up well to about 5hrs of net abuse,
involving three other machines (without using any fragments tho)
holding a load average of about 7 for the duration. On to '75 ...
Paul.