[316] in linux-net channel archive

home help back first fref pref prev next nref lref last post

Re: ne2000 and 1.2.8

daemon@ATHENA.MIT.EDU (Paul Gortmaker)
Sat May 13 01:20:55 1995

From: Paul Gortmaker <gpg109@rsphy1.anu.edu.au>
To: rj@rainbow.in-berlin.de (Robert Joop)
Date: Sat, 13 May 1995 14:44:01 +1000 (EST)
Cc: linux-net@vger.rutgers.edu
In-Reply-To: <m0s9x8n-000fuPC@rainbow.in-berlin.de> from "Robert Joop" at May 12, 95 06:00:36 pm


> sadly, the ne2000 still appears to be broken. last night my machine
> hung again. but this time, the last line on the console didn't read
> eth0: DMAing conflict in ne_block_output.[DMAstat:2][irqlock:1][intr:0]
> as it did every time the machine hung before, but instead it read
> 
> eth0: Tx access conflict. irq=0 lock=1 tx1=0 tx2=0 last=20

Yes, the lock=1 message for the ne2000 is fatal. I am aware of that,
and trying to figure out the reason why. I have been only able to
cause it to happen 3 times since 1.2.8 was released, which makes it 
very hard to track down. Please stay tuned. In the meantime, here is
a "band-aid" fix that I am presently using, which makes the above
non-fatal. Note that this condition is relatively hard to trigger
(it takes me about a *week* of abuse with cross-mounted NFS servers
plus TCP/IP traffic to cause it to appear.) As you can see from
the above, it appears that it is two back to back dev_queue_xmit()
calls (in dev.c) on an otherwise idle transmitter that cause 
the problem. Hence bursts of traffic followed by idle time may 
prove to be the trigger.

This patch also enables some extra debugging info for the ne2k. If
you use it to avoid the (hopefully) hard to achieve "lock=1" hangs,
please mail me any eth0 printk's that you get. When the "lock=1"
happens, you won't hang, but you will get a dump of some timevalues
that will help me reconstruct the scenario.

NB: As this is just a "band-aid" fix, I don't want or expect this
patch to go into 1.2.9 -- however, as a stop-gap measure, we could
use something like this in 1.2.9 if *lots* of people hit it.

 		dev->name, dev->interrupt, ei_local->irqlock, ei_local->tx1,
 		ei_local->tx2, ei_local->lasttx);
 -	restore_flags(flags);
 +	if (!dev->mem_start && ei_local->irqlock) {
 +               restore_flags(flags);
 +               ei_reset_8390(dev);
 +               NS8390_init(dev, 1);
 +       } else
 +               restore_flags(flags);
 	return 1;
     }
 

Paul.

diff -ur /opie/linux/drivers/net/8390.c linux/drivers/net/8390.c
--- /opie/linux/drivers/net/8390.c	Sat Apr 29 16:49:58 1995
+++ linux/drivers/net/8390.c	Tue May  9 19:38:42 1995
@@ -185,7 +185,16 @@
 	printk("%s: Tx access conflict. irq=%d lock=%d tx1=%d tx2=%d last=%d\n",
 		dev->name, dev->interrupt, ei_local->irqlock, ei_local->tx1,
 		ei_local->tx2, ei_local->lasttx);
-	restore_flags(flags);
+	if (ei_local->irqlock) {
+		printk("dma=%d tx=%d dir=%d lstop=%ld start=%ld stop=%ld now=%ld\n",
+			ei_local->dmaing, ei_local->txing, ei_local->lastdma,
+			ei_local->laststop, ei_local->dmastart, ei_local->dmastop, 
+			jiffies);
+		restore_flags(flags);
+		ei_reset_8390(dev);
+		NS8390_init(dev, 1);
+	} else
+		restore_flags(flags);
 	return 1;
     }
 
diff -ur /opie/linux/drivers/net/8390.h linux/drivers/net/8390.h
--- /opie/linux/drivers/net/8390.h	Tue May  9 18:01:40 1995
+++ linux/drivers/net/8390.h	Tue May  9 19:34:26 1995
@@ -56,6 +56,10 @@
   unsigned char reg0;		/* Register '0' in a WD8013 */
   unsigned char reg5;		/* Register '5' in a WD8013 */
   unsigned char saved_irq;	/* Original dev->irq value. */
+  unsigned char lastdma;	/* Direction of last DMA (1=Rx,2=Tx) */
+  unsigned long dmastart;	/* jiffies of last DMA start. */
+  unsigned long dmastop;	/* jiffies of last DMA stop. */
+  unsigned long laststop;	/* jiffies of 2nd last DMA stop. */
   /* The new statistics table. */
   struct enet_statistics stat;
 };
diff -ur /opie/linux/drivers/net/ne.c linux/drivers/net/ne.c
--- /opie/linux/drivers/net/ne.c	Tue May  9 18:01:44 1995
+++ linux/drivers/net/ne.c	Tue May  9 19:46:11 1995
@@ -86,7 +86,7 @@
 #define NESM_START_PG	0x40	/* First page of TX buffer */
 #define NESM_STOP_PG	0x80	/* Last page +1 of RX ring */
 
-#define NE_RDC_TIMEOUT	0x02	/* Max wait in jiffies for Tx RDC */
+#define NE_RDC_TIMEOUT	0x01	/* Max wait in jiffies for Tx RDC */
 
 int ne_probe(struct device *dev);
 static int ne_probe1(struct device *dev, int ioaddr);
@@ -368,6 +368,7 @@
 	return 0;
     }
     ei_status.dmaing |= 0x02;
+    ei_status.dmastart = jiffies;
     outb_p(E8390_NODMA+E8390_PAGE0+E8390_START, nic_base+ NE_CMD);
     outb_p(count & 0xff, nic_base + EN0_RCNTLO);
     outb_p(count >> 8, nic_base + EN0_RCNTHI);
@@ -409,6 +410,9 @@
     }
 #endif
     outb_p(ENISR_RDC, nic_base + EN0_ISR);	/* Ack intr. */
+    ei_status.lastdma = 0x01;			/* Last was a Rx */
+    ei_status.laststop = ei_status.dmastop;
+    ei_status.dmastop = jiffies;
     ei_status.dmaing &= ~0x03;
     return ring_offset + count;
 }
@@ -439,6 +443,7 @@
 	return;
     }
     ei_status.dmaing |= 0x04;
+    ei_status.dmastart = jiffies;
     /* We should already be in page 0, but to be safe... */
     outb_p(E8390_PAGE0+E8390_START+E8390_NODMA, nic_base + NE_CMD);
 
@@ -510,6 +515,9 @@
 	}
 
     outb_p(ENISR_RDC, nic_base + EN0_ISR);	/* Ack intr. */
+    ei_status.lastdma = 0x02;			/* Last was a Tx */
+    ei_status.laststop = ei_status.dmastop;
+    ei_status.dmastop = jiffies;
     ei_status.dmaing &= ~0x05;
     return;
 }


home help back first fref pref prev next nref lref last post