[580] in linux-net channel archive

home help back first fref pref prev next nref lref last post

Corrupt skb lists with lance driver

daemon@ATHENA.MIT.EDU (Paul Gortmaker)
Tue Jun 27 22:14:12 1995

From: Paul Gortmaker <gpg109@rsphy1.anu.edu.au>
To: linux-net@vger.rutgers.edu
Date: Wed, 28 Jun 1995 11:21:44 +1000 (EST)


With respect to "double lock on device queue"  and "freed while locked"
events with the lance driver:

> Now here is what (I think) is happening. During an xmit, most drivers
> peel the data out of the skb via memcpy() or whatever, and then do
> a dev_kfree_skb(skb, FREE_WRITE) before exiting the xmit function.
> However, the lance (and the tulip) driver hold onto the Tx skb until
> the interrupt handler rec's a Tx-done interrupt. My two guesses are that
> the lance driver is munging its internal skb list under heavy Tx activity, 
> or something deep in the net code is shuffling the skb's after the Tx
> function completes, and hence the lance's personal Tx skb list goes 
> out of sync. Either way this is bad. I spent a while looking at the
> code, but nothing jumped off the page at me.

I later realized that the lance driver provides a convenient way to 
test this. If we pretend that all the tx skb's live above 16MB, then 
the driver will use "bounce buffers" and the skb will be freed in the 
tx function, just like all the other drivers. To use this test is a one 
line change:

-	if ((int)(skb->data) + skb->len > 0x01000000) {
+	if (1) {	/* use bounce buffers for all skb's */

I have done this, and I get *zero* freed while locked events, and no
"double (un)lock on device queue" messages either. So this indicates
that one of my two guesses above appear to be correct. (but which ;-)
The tulip driver may be affected by this as well, as it also transmits 
the skb "in place", without freeing it until the Tx-done interrupt arrives.

Of course, using bounce buffers for every Tx packet increases the CPU 
overhead with the extra memcpy() per Tx, but I can still Tx > 850kB/s 
via ftp, so it is far from crippled. This is probably a good workaround
until the problem is solved properly. I will try to figure it out if
I get the free time and nobody else does it first.

Paul.


home help back first fref pref prev next nref lref last post