[2984] in SIPB_Linux_Development

home help back first fref pref prev next nref lref last post

Hanging install

daemon@ATHENA.MIT.EDU (Greg Hudson)
Sun Sep 17 19:13:57 2000

Date: Sun, 17 Sep 2000 19:13:51 -0400
Message-Id: <200009172313.TAA06200@egyptian-gods.MIT.EDU>
From: Greg Hudson <ghudson@MIT.EDU>
To: linux-dev@mit.edu

I've been testing the install on nephthys.mit.edu, an e-machine (Cyrix
processor, 32MB memory) at my house, which talks to the net through a
ted-williams tunnel.  The rtt from my machine to MITnet is usually in
the 20-30ms range, but sometimes a little worse.  The rtt to charon is
somewhat worse, probably on account of 18.181 being a busy net.  I get
30.57/76.7/142.6ms from a recent ping.

The install seems to like to hang.  The first time I tried, it hung
after installing ~1 package; the second time, it hung slightly before
then (while displaying that it was formatting target partitions), and
the third time it hung after installing all the packages and after I
clicked next on the "make boot disk" Screen.

Each time, I could move the mouse cursor, and I could switch to vt2
and verify that the NFS mount is still quite healthy and that I could
still write to the target disk.  I have a 100MB swap partition and
"free" reports 81MB free in it, so I'm not running out of swap.

Oh, just noticed some relevant kernel messages:

	nfs: server sipb-nfs.mit.edu not responding, still trying
	nfs: task 8325 can't get a request slot
	nfs: task 8326 can't get a request slot
	nfs: server sipb-nfs.mit.edu OK
	nfs: server sipb-nfs.mit.edu OK
	nfs: server sipb-nfs.mit.edu OK
	nfs: server sipb-nfs.mit.edu not responding, still trying
	nfs: task 8515 can't get a request slot
	nfs: server sipb-nfs.mit.edu OK
	nfs: task 8516 can't get a request slot
	nfs: server sipb-nfs.mit.edu OK
	nfs: server sipb-nfs.mit.edu OK
	nfs: server sipb-nfs.mit.edu not responding, still trying
	nfs: server sipb-nfs.mit.edu OK
	nfs: server sipb-nfs.mit.edu not responding, still trying
	nfs: server sipb-nfs.mit.edu OK

I can't correlate "task N" with the running python processes; numbers
like 8325 are definitely not pids, since the current pid has only
gotten up at 700 or so in my current shell.  But I'm willing to bet
that those are the anaconda processes.

I guess the problem results from a combination of:

	* NFS over UDP using an 8K block size performs miserably over
	  a tunnel.  The packets have to be fragmented and tcpdump
	  frequently shows IP fragment reassembly timeouts.

	* The Linux kernel doesn't seem to recover properly from NFS
	  timeouts.

home help back first fref pref prev next nref lref last post