[2984] in SIPB_Linux_Development
Hanging install
daemon@ATHENA.MIT.EDU (Greg Hudson)
Sun Sep 17 19:13:57 2000
Date: Sun, 17 Sep 2000 19:13:51 -0400
Message-Id: <200009172313.TAA06200@egyptian-gods.MIT.EDU>
From: Greg Hudson <ghudson@MIT.EDU>
To: linux-dev@mit.edu
I've been testing the install on nephthys.mit.edu, an e-machine (Cyrix
processor, 32MB memory) at my house, which talks to the net through a
ted-williams tunnel. The rtt from my machine to MITnet is usually in
the 20-30ms range, but sometimes a little worse. The rtt to charon is
somewhat worse, probably on account of 18.181 being a busy net. I get
30.57/76.7/142.6ms from a recent ping.
The install seems to like to hang. The first time I tried, it hung
after installing ~1 package; the second time, it hung slightly before
then (while displaying that it was formatting target partitions), and
the third time it hung after installing all the packages and after I
clicked next on the "make boot disk" Screen.
Each time, I could move the mouse cursor, and I could switch to vt2
and verify that the NFS mount is still quite healthy and that I could
still write to the target disk. I have a 100MB swap partition and
"free" reports 81MB free in it, so I'm not running out of swap.
Oh, just noticed some relevant kernel messages:
nfs: server sipb-nfs.mit.edu not responding, still trying
nfs: task 8325 can't get a request slot
nfs: task 8326 can't get a request slot
nfs: server sipb-nfs.mit.edu OK
nfs: server sipb-nfs.mit.edu OK
nfs: server sipb-nfs.mit.edu OK
nfs: server sipb-nfs.mit.edu not responding, still trying
nfs: task 8515 can't get a request slot
nfs: server sipb-nfs.mit.edu OK
nfs: task 8516 can't get a request slot
nfs: server sipb-nfs.mit.edu OK
nfs: server sipb-nfs.mit.edu OK
nfs: server sipb-nfs.mit.edu not responding, still trying
nfs: server sipb-nfs.mit.edu OK
nfs: server sipb-nfs.mit.edu not responding, still trying
nfs: server sipb-nfs.mit.edu OK
I can't correlate "task N" with the running python processes; numbers
like 8325 are definitely not pids, since the current pid has only
gotten up at 700 or so in my current shell. But I'm willing to bet
that those are the anaconda processes.
I guess the problem results from a combination of:
* NFS over UDP using an 8K block size performs miserably over
a tunnel. The packets have to be fragmented and tcpdump
frequently shows IP fragment reassembly timeouts.
* The Linux kernel doesn't seem to recover properly from NFS
timeouts.