[757] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Problems with SGI install

daemon@ATHENA.MIT.EDU (Bill Cattey)
Wed Oct 16 20:14:21 1996

Date: Wed, 16 Oct 1996 20:13:56 -0400 (EDT)
From: Bill Cattey <wdc@MIT.EDU>
To: dcns-cluster@MIT.EDU
Cc: ops@MIT.EDU, jis@MIT.EDU, mbarker@MIT.EDU, release-team@MIT.EDU,
        ghudson@MIT.EDU, vrt@MIT.EDU

It turned out that the outage of WHIRR killed the partition that
contained the bits to do phase two of the SGI install.  Two machines
(and probably STROBE as well which we will confirm tomorrow) did phase
1, and then died because the majority of the files were unavailable for
install.

To detect if a machine was hurt during an install by whirr being offline:

Boot the machine single user and look to see what the disk usage is on
the root partition.  It should be greater than 500 meg.  If it is only
160 MB it lost out on bits that would have been served up by whirr.

Here is the procedure (developed by Greg Hudson, amended by me):

	* Reboot the machine.  As it comes up, click on "Stop for
	  Maintenance."

	* Choose the option to go to the command monitor.

	* Type "single"

	* Type the root password at the single-user prompt.  You
	  should get a shell.

	* Type "df -k /"

If the number under "use" is around 160000 instead of over 500000
then the whirr outage was the most likely cause of the problem.

You can recover by reinstalling the machine.

If there are precious bits on the disk that reinstall would blow away,
you can do the following to fool the machine into completing the
in-progress install:

	* Run:

		cd /etc/rc2.d
		foreach i (S[0-4]*)
			./$i start
		end

	* Edit /etc/athena/version and change "8.0H" to "8.0F".

	* Run:

		/srvd/update_ws reactivate; reboot

home help back first fref pref prev next nref lref last post