[787] in linux-net channel archive

home help back first fref pref prev next nref lref last post

Re: Strange behaviour with NFS

daemon@ATHENA.MIT.EDU (Rob Janssen reading Linux mailingl)
Sat Jul 29 12:57:39 1995

From: linux@pe1chl.ampr.org (Rob Janssen reading Linux mailinglist)
To: becker@cesdis1.gsfc.nasa.gov (Donald Becker)
Date: Wed, 26 Jul 1995 23:25:58 +0200 (MET DST)
Cc: linux-net@vger.rutgers.edu
In-Reply-To: <9507260415.AA10825@cesdis.gsfc.nasa.gov> from "Donald Becker" at Jul 26, 95 00:15:09 am
Reply-To: linux-vger@wab-tis.rabobank.nl

According to Donald Becker:
> >Whenever the nfs server isn't reachable the process on the client
> >machine just hangs around, partially in 'D' status which means
> >non-interruptable.
> 
> I also encountered this problem today.  This bug hung a few nodes on our
> Linux cluster.  Luckily one still had a few process slots left to figure to
> do a 'ps'.  Here a few notes:
> 	1. The processes were in the 'D' disk-wait state.  Most were
> 	swapped out.

This is (unfortunately) similar to what happens after a disk read error
on a local drive.  Processes are in 'D' state after a failed filesystem
access.

> 	2. The processes counted toward the load average, but didn't consume
> 	CPU time.  The load average on the still-working machine was >35.

This is why I always apply the following patch:

*** linux/kernel/sched.c~	Fri Dec 10 00:38:26 1993
--- linux/kernel/sched.c	Fri Dec 10 01:10:50 1993
***************
*** 441,447 ****
  
  	for(p = &LAST_TASK; p > &FIRST_TASK; --p)
  		if (*p && ((*p)->state == TASK_RUNNING ||
- 			   (*p)->state == TASK_UNINTERRUPTIBLE ||
  			   (*p)->state == TASK_SWAPPING))
  			nr += FIXED_1;
  	return nr;
--- 441,446 ----

Unfortunately, Linus does not accept it into the standard kernel.
I think a sleeping process should not count towards the CPU load average.

> 	3. Doing 'kill -1' and 'kill -9' had no effect.  The processes
> 	didn't even turn into zombies.

Yes, that is really unfortunate.  It causes the system shutdown to
fail (disks cannot be unmounted because they are busy) and a reboot
will require fsck's to run.

Rob

-- 
+------------------------------------+--------------------------------------+
| Rob Janssen         rob@knoware.nl | AMPRnet:   rob@pe1chl.ampr.org       |
| e-mail: pe1chl@wab-tis.rabobank.nl | AX.25 BBS: PE1CHL@PI8WNO.#UTR.NLD.EU |
+------------------------------------+--------------------------------------+

home help back first fref pref prev next nref lref last post