[329] in linux-net channel archive


home	help	back	first	fref	pref	prev	next	nref	lref	last	post
More driver fixes for 1.2.8

daemon@ATHENA.MIT.EDU (Paul Gortmaker)
Wed May 17 04:58:52 1995

From: Paul Gortmaker <gpg109@rsphy1.anu.edu.au>
To: linux-net@vger.rutgers.edu
Date: Wed, 17 May 1995 17:11:24 +1000 (EST)
Cc: becker@cesdis1.gsfc.nasa.gov (Donald Becker),
        iialan@www.linux.org.uk (Alan Cox)


	Okay here are my latest efforts. Note that these fixes affect
*all* drivers (ether, slip, loopback, etc.) and hopefully make them 
more stable. This includes removal of the final remaining possible 
way to get a ne2000 to hang. (I hope.... :)

I will go through and explain what I have done, as trying to read patches
is no fun for anybody, and is a good way to have my work ignored.
(BTW, this is against 1.2.8, as you would have guessed anyway.)

In inet/dev.c -- there is a function called dev_queue_xmit() that calls
that particular driver xmit function dev->hard_start_xmit(). At present
the code calls hard_start_xmit even if the device's Tx is busy (dev->tbusy)
And dev_queue_xmit is used by all the inet protocols (arp/ip/udp/raw, etc)
(maeaning *lots* of calls, some very very very close together.)
Note that the individual dev->hard_start_xmit() functions of each driver
were never meant to be re-entrant, and will not behave well if you
don't keep them non-reentrant. (eg. ne2000 driver)

This is why each driver does an atomic set_bit on dev->tbusy, to avoid
having re-entrance on the xmit core of the driver. The driver is
also expected to check to see if the last Tx timed out when called with
dev->tbusy. However, this really pounds on the driver, when there is
a bit of network load and is wasted cycles calling the driver to check
the timeout/busy status, and then return after doing nothing. Then we put
the packet onto the queue to be sent later, usually flushed in net_bh().
In some cases, checking for a timeout is invasive (eg. reading ISR and
TSR on the ne2000 during a packet upload) and the atomicity doesn't
save us from shooting ourselves in the foot.

A more sensible solution is to immediately throw the packet onto
the queue if the driver is already busy with an xmit and save those
wasted cycles. Also, using dev->tbusy to indicate our state is asking 
for tons of race conditions, as it is set by the driver itself *after* 
the Tx timeout in the hard_start_xmit function, and usually cleared 
*before* hitting the bottom of the hard_start_xmit function.

For example, try this "ping -f localhost" -- and then look at
"cat /proc/net/dev" -- see those errors registered by the loopback 
driver? Each error represents a call to loopback_xmit() while it
was already processing an xmit.

What I have done is implemented proper locking in dev.c so that
dev->hard_start_xmit() is NEVER called while there is another
hard_start_xmit in progress (on a per dev basis of course.) This
guarantees the non-reentrance that somehow got lost along the way.
This locking is implemented with the variable dev->in_xmit,
which is meant to be internal to dev.c and is something that the
driver need not be aware, of, and should NEVER touch. Hence a driver
need not use atomic locks (e.g. set_bit) to set dev->tbusy, but 
still can where it is convenient. (i.e. less lines of source ;-)

Also, the driver Tx function is NEVER called if dev->tbusy is set, 
UNLESS dev.c realizes that the driver in question has timed out. 
This means that each driver does not need to have it's own copy of 
the code that checks the jiffies since the last Tx. Instead, 
the driver can assume that if it is called with dev->tbusy set, 
then a timeout has occurred. All the driver has to do is set 
dev->tx_timeout to the value (in jiffies) it wants to use, and dev.c 
will monitor for a timeout condition for it. This makes it trivial
to implement a dev->service_tx_timeout() function that is separate
from the dev->hard_start_xmit() function, which is something Donald
mentioned a long time ago as being a wise move. (BTW, dev->tx_timeout
is an int and not a char because the slip driver uses a timeout of 
2000 jiffies. Normal ethernet devices should use a value of 
about 20 jiffies.)

Note that this change is backwards compatible with the present drivers, 
in that all the present drivers will work with this change, and the 
excess code can be removed later. I have already done the 8390 driver, 
and the loopback driver, and will do others as time permits. Interested
parties can look at the changes to them as examples. 

I have found that our wd8013 and ne2k systems are very solid with
this in place, with hardly a whimper from the drivers under the
heaviest abuse. No loss of performace either, in case you thought
that was a concern. I am interested to hear results from other
cards/drivers.

NB: For those people with the 3c503 cards that "go deaf" and have done
so since time began, this may/may not aggravate the problem. I have got
an old 8bit 3c503, and have been able to reproduce the problem, which means
that I should be able to debug it. I hope to have a fix for that by
the weekend if time permits. (If you have been living with it for
over a year or so, another few days won't kill you...  ;-)

Another thing I modified is the loopback driver. It was still using the
old (obsolete) dev_rint() function call, and then calling the bottom
half directly which was a bit ugly. It now uses netif_rx() and is more
like a "normal" net driver. It is worth noting that the 3c505, de600,
ppp, and znet driver are still using the old dev_rint() and should
be updated. Only the ppp driver is not already using skb's and so the 
change to the other three to use netif_rx() is minimal.

Okay, I think that is it. Hopefully clear as mud. Please test it and 
post good/bad/any results back here. 

Paul.

diff -ur linux-oem/drivers/net/8390.c linux/drivers/net/8390.c
--- linux-oem/drivers/net/8390.c	Sat Apr 29 16:49:58 1995
+++ linux/drivers/net/8390.c	Tue May 16 12:30:52 1995
@@ -20,6 +20,9 @@
   memory cards. We need to follow these closely for neX000 cards.
   Plus other minor cleanups.   -- Paul Gortmaker
 
+  14/05/95 -- Remove some excess locking cruft now that ei_start_xmit() is
+  guaranteed to be single threaded. (See my changes in inet/dev.c) -- Paul
+
   */
 
 static char *version =
@@ -117,6 +120,7 @@
 		return ENXIO;
     }
     
+    dev->tx_timeout = TX_TIMEOUT;
     irq2dev_map[dev->irq] = dev;
     NS8390_init(dev, 1);
     dev->start = 1;
@@ -132,33 +136,35 @@
     unsigned long flags;
     
 /*
- *  We normally shouldn't be called if dev->tbusy is set, but the
- *  existing code does anyway. If it has been too long since the
- *  last Tx, we assume the board has died and kick it.
+ *  We will not be called if dev->tbusy is set, unless the code
+ *  in dev.c sees that we have timed out. If it has been too long 
+ *  since the last Tx, we assume the board has died and kick it.
+ *  Note that we don't require the atomicity of set_bit() here, but 
+ *  do so simply because it is convenient to use.
  */
  
-    if (dev->tbusy) {	/* Do timeouts, just like the 8003 driver. */
-		int txsr = inb(e8390_base+EN0_TSR), isr;
+    if (set_bit(0, (void*)&dev->tbusy) != 0) {
+		int txsr, isr;
 		int tickssofar = jiffies - dev->trans_start;
-		if (tickssofar < TX_TIMEOUT ||	(tickssofar < (TX_TIMEOUT+5) && ! (txsr & ENTSR_PTX))) {
-			return 1;
-		}
-		isr = inb(e8390_base+EN0_ISR);
+
 		if (dev->start == 0) {
 			printk("%s: xmit on stopped card\n", dev->name);
+			dev->tbusy = 0;
 			return 1;
 		}
-		printk(KERN_DEBUG "%s: transmit timed out, TX status %#2x, ISR %#2x.\n",
-			   dev->name, txsr, isr);
-		/* Does the 8390 thinks it has posted an interrupt? */
-		if (isr)
-			printk(KERN_DEBUG "%s: Possible IRQ conflict on IRQ%d?\n", dev->name, dev->irq);
-		else {
+
+		isr = inb(e8390_base+EN0_ISR);
+		txsr = inb(e8390_base+EN0_TSR);
+
+		printk(KERN_DEBUG "%s: Tx timed out, %s TSR=%#2x, ISR=%#2x, t=%d.\n",
+			dev->name, (txsr & ENTSR_ABT) ? "excess collisions." :
+			(isr) ? "IRQ conflict?" : "cable problem?", txsr, isr, tickssofar);
+
+		if (!isr && !ei_local->stat.tx_packets) {
 			/* The 8390 probably hasn't gotten on the cable yet. */
-			printk(KERN_DEBUG "%s: Possible network cable problem?\n", dev->name);
-			if(ei_local->stat.tx_packets==0)
-				ei_local->interface_num ^= 1; 	/* Try a different xcvr.  */
+			ei_local->interface_num ^= 1; 	/* Try a different xcvr.  */
 		}
+
 		/* Try to restart the card.  Perhaps the user has fixed something. */
 		ei_reset_8390(dev);
 		NS8390_init(dev, 1);
@@ -177,14 +183,20 @@
     if (skb->len <= 0)
 		return 0;
 
+    /*
+     * This whole cli/sti section is just to be safe. The
+     * fixes to dev.c will ensure that we are non-reentrant,
+     * and hence irqlock *can't* be set. Having dev->interrupt
+     * set *may* be possible if we were already in the interrupt
+     * handler before masking 8390 interrupts. 
+     */
+
     save_flags(flags);
     cli();
 
-    /* Block a timer-based transmit from overlapping. */
-    if ((set_bit(0, (void*)&dev->tbusy) != 0) || ei_local->irqlock) {
-	printk("%s: Tx access conflict. irq=%d lock=%d tx1=%d tx2=%d last=%d\n",
-		dev->name, dev->interrupt, ei_local->irqlock, ei_local->tx1,
-		ei_local->tx2, ei_local->lasttx);
+    if (ei_local->irqlock || dev->interrupt) {
+	printk("%s: Tx access conflict. lock=%d, in_isr=%d\n",
+		dev->name, ei_local->irqlock, dev->interrupt);
 	restore_flags(flags);
 	return 1;
     }
@@ -500,7 +512,7 @@
 		ei_local->current_page = next_frame;
 		outb_p(next_frame-1, e8390_base+EN0_BOUNDARY);
     }
-    /* If any worth-while packets have been received, dev_rint()
+    /* If any worth-while packets have been received, netif_rx()
        has done a mark_bh(NET_BH) for us and will work on them
        when we get to the bottom-half routine. */
 
diff -ur linux-oem/drivers/net/loopback.c linux/drivers/net/loopback.c
--- linux-oem/drivers/net/loopback.c	Thu Jan  5 06:16:05 1995
+++ linux/drivers/net/loopback.c	Tue May 16 15:43:29 1995
@@ -12,6 +12,7 @@
  *		Donald Becker, <becker@cesdis.gsfc.nasa.gov>
  *
  *		Alan Cox	:	Fixed oddments for NET3.014
+ *		Paul Gortmaker	:	Use netif_rx() not old dev_rint()
  *
  *		This program is free software; you can redistribute it and/or
  *		modify it under the terms of the GNU General Public License
@@ -40,43 +41,48 @@
 #include <linux/etherdevice.h>
 #include <linux/skbuff.h>
 
+/*
+ * Even though this is a Tx routine, is is more like a hybrid between a
+ * standard Tx routine and the tail end of a Rx routine. The last few
+ * lines look similar to what you see in the Rx routine of a shared
+ * memory ethercard.
+ */
 
 static int
 loopback_xmit(struct sk_buff *skb, struct device *dev)
 {
+
+  struct sk_buff *rx_skb;
   struct enet_statistics *stats = (struct enet_statistics *)dev->priv;
-  int done;
 
   if (skb == NULL || dev == NULL) return(0);
 
-  cli();
-  if (dev->tbusy != 0) {
-	sti();
+  if (set_bit(0, (void*)&dev->tbusy) != 0) {
+	printk("%s: Tx timeout is a no-op.\n", dev->name);
 	stats->tx_errors++;
 	return(1);
   }
-  dev->tbusy = 1;
-  sti();
   
   /* FIXME: Optimise so buffers with skb->free=1 are not copied but
      instead are lobbed from tx queue to rx queue */
 
-  done = dev_rint(skb->data, skb->len, 0, dev);
-  dev_kfree_skb(skb, FREE_WRITE);
-
-  while (done != 1) {
-	done = dev_rint(NULL, 0, 0, dev);
-  }
   stats->tx_packets++;
 
-  dev->tbusy = 0;
-
-  if (!intr_count && (bh_active & bh_mask)) {
-	start_bh_atomic();
-	do_bottom_half();
-	end_bh_atomic();
+  rx_skb = alloc_skb(skb->len, GFP_ATOMIC);
+  if (rx_skb == NULL)  {
+	printk("%s: Couldn't allocate a sk_buff of size %ld.\n", dev->name, skb->len);
+	stats->rx_dropped++;
+	return(1);
   }
 
+  dev->trans_start = jiffies;
+  rx_skb->len = skb->len;
+  rx_skb->dev = dev;
+  memcpy(rx_skb->data, skb->data, skb->len);
+  netif_rx(rx_skb);
+  dev_kfree_skb(skb, FREE_WRITE);
+  stats->rx_packets++;
+  dev->tbusy = 0;
   return(0);
 }
 
@@ -101,6 +107,7 @@
   dev->mtu		= 2000;			/* MTU			*/
   dev->tbusy		= 0;
   dev->hard_start_xmit	= loopback_xmit;
+  dev->tx_timeout	= 10;			/* jiffies, is heaps.	*/
   dev->open		= NULL;
 #if 1
   dev->hard_header	= eth_header;
diff -ur linux-oem/include/linux/netdevice.h linux/include/linux/netdevice.h
--- linux-oem/include/linux/netdevice.h	Sun Feb  5 22:48:05 1995
+++ linux/include/linux/netdevice.h	Mon May 15 20:09:13 1995
@@ -70,7 +70,7 @@
   /* I/O specific fields - FIXME: Merge these and struct ifmap into one */
   unsigned long		  rmem_end;		/* shmem "recv" end	*/
   unsigned long		  rmem_start;		/* shmem "recv" start	*/
-  unsigned long		  mem_end;		/* sahared mem end	*/
+  unsigned long		  mem_end;		/* shared mem end	*/
   unsigned long		  mem_start;		/* shared mem start	*/
   unsigned long		  base_addr;		/* device I/O address	*/
   unsigned char		  irq;			/* device IRQ number	*/
@@ -92,6 +92,10 @@
 
   struct enet_statistics* (*get_stats)(struct device *dev);
 
+  /* If it has been this long since the last Tx, then we boot the
+     driver to let it know it has fallen asleep. */
+  unsigned short	  tx_timeout;		/* in jiffies */
+
   /*
    * This marks the end of the "visible" part of the structure. All
    * fields hereafter are internal to the system, and may change at
@@ -101,6 +105,7 @@
   /* These may be needed for future network-power-down code. */
   unsigned long		  trans_start;	/* Time (in jiffies) of last Tx	*/
   unsigned long		  last_rx;	/* Time of last Rx		*/
+  unsigned char		  in_xmit;	/* hard_start_xmit() is active	*/
 
   unsigned short	  flags;	/* interface flags (a la BSD)	*/
   unsigned short	  family;	/* address family ID (AF_INET)	*/
diff -ur linux-oem/net/inet/dev.c linux/net/inet/dev.c
--- linux-oem/net/inet/dev.c	Mon Mar  6 20:22:08 1995
+++ linux/net/inet/dev.c	Tue May 16 12:15:29 1995
@@ -29,6 +29,8 @@
  *					you start doing multicast video 8)
  *		Alan Cox	:	Rewrote net_bh and list manager.
  *		Alan Cox	: 	Fix ETH_P_ALL echoback lengths.
+ *		Paul Gortmaker	:	Added dev->in_xmit to stop drivers from
+					suffering re-entrant Tx abuse.
  *
  *	Cleaned up and recommented by Alan Cox 2nd April 1994. I hope to have
  *	the rest as well commented in the end.
@@ -190,6 +192,13 @@
   		ret = dev->open(dev);
 
 	/*
+	 * Assign a Tx timeout if the driver didn't. (it should!)
+	 */
+
+	if (!dev->tx_timeout)
+		dev->tx_timeout = 15;		/* jiffies */
+
+	/*
 	 *	If it went open OK then set the flags
 	 */
 	 
@@ -409,17 +418,45 @@
 			}
 		}
 	}
-	if (dev->hard_start_xmit(skb, dev) == 0) {
-		/*
-		 *	Packet is now solely the responsibility of the driver
-		 */
-		return;
+
+	/*
+	 *	No point in annoying the device if it is busy. Just
+	 *	throw it on the queue if dev is busy, and dev_tint()
+	 *	will get it there eventually. Using dev->in_xmit makes sure
+	 *	that dev->hard_start_xmit remains non-reentrant. Trying
+	 *	to do the same job with only dev->tbusy opens up a whole
+	 *	can of race conditions and really thrashes the drivers.
+	 */
+
+	if (set_bit(0, (void*)&dev->in_xmit) == 0) {
+		if ((dev->tbusy == 0) && (dev->hard_start_xmit(skb, dev) == 0)) {
+			/*
+			 *	Packet is now the responsibility of the driver
+			 */
+			dev->in_xmit = 0;
+			return;
+		} else if (((jiffies - dev->trans_start) > dev->tx_timeout) 
+				&& (dev->hard_start_xmit(skb, dev) == 0)) {
+			/*
+			 *	Check to see if the Tx timed out. If so, then
+			 *	call hard_start_xmit() anyway, which will let
+			 *	the device driver know we are not impressed.
+			 *	It will then handle the timeout accordingly.
+			 */
+			dev->in_xmit = 0;
+			return;
+		}
+		dev->in_xmit = 0;
 	}
 
 	/*
-	 *	Transmission failed, put skb back into a list. Once on the list it's safe and
-	 *	no longer device locked (it can be freed safely from the device queue)
+	 *	Device is busy, or we are already in hard_start_xmit() or the 
+	 *	transmission failed, so put skb back into a list. Once on the 
+	 *	list it's safe and no longer device locked.
+	 *	(i.e. it can be safely freed from the device queue)
 	 */
+
+	save_flags(flags);
 	cli();
 #ifdef CONFIG_SLAVE_BALANCING
 	skb->in_dev_queue=1;
@@ -772,7 +809,6 @@
 	struct sk_buff *skb;
 	unsigned long flags;
 	
-	save_flags(flags);	
 	/*
 	 *	Work the queues in priority order
 	 */
@@ -782,8 +818,8 @@
 		/*
 		 *	Pull packets from the queue
 		 */
-		 
 
+		save_flags(flags);	
 		cli();
 		while((skb=skb_dequeue(&dev->buffs[i]))!=NULL)
 		{
@@ -800,8 +836,9 @@
 			/*
 			 *	If we can take no more then stop here.
 			 */
-			if (dev->tbusy)
+			if (dev->in_xmit || dev->tbusy)
 				return;
+			save_flags(flags);	
 			cli();
 		}
 	}

home	help	back	first	fref	pref	prev	next	nref	lref	last	post
[329] in linux-net channel archive

More driver fixes for 1.2.8

daemon@ATHENA.MIT.EDU (Paul Gortmaker)Wed May 17 04:58:52 1995

daemon@ATHENA.MIT.EDU (Paul Gortmaker)
Wed May 17 04:58:52 1995