[17401] in Athena Bugs


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

Re: Solaris clients with filesystem corruption

daemon@ATHENA.MIT.EDU (Greg Hudson)
Thu Dec 2 07:52:55 1999

Message-Id: <199912021252.HAA05719@small-gods.mit.edu>
To: Garry Zacheiss <zacheiss@MIT.EDU>
cc: bugs@MIT.EDU
In-Reply-To: Your message of "Wed, 01 Dec 1999 22:11:40 EST."
             <199912020311.WAA93287@oliver.mit.edu> 
Date: Thu, 02 Dec 1999 07:52:47 -0500
From: Greg Hudson <ghudson@MIT.EDU>

> At the time, I speculated that this was a syncconf bug, and I still
> think it seems likely.

After some inspection, I think you're wrong about it being a bug in
syncconf, but I think you're right that syncconf is encouraging this
behavior.

The general sequence of events is this:

	* Public machine boots and runs syncconf

	* syncconf moves aside the four network configuration files
	  and writes out new versions.  There is a short window of
	  time between moving them aside and writing out new versions
	  during which time a reboot would hose us (/etc/hostname.hme0
	  doesn't exist), but we're not getting bitten by that as far
	  as we know.

	* Standard FFS semantics are that the inode and directory
	  entry for the new versions of the files get written
	  synchronously out to disk, so on disk the four files are
	  created with size 0.  Since the files were not fsync()'d
	  during writing, the contents of the files are not
	  synchronously written to disk.  So there is another window
	  of indefinite length during which time an unclean reboot
	  will hose us (/etc/hostname.hme0 has zero length).

I can make the problem significantly better by making syncconf create
new versions of the files and only move them into place if they differ
from the old versions.  I'm not sure if I can fix the problem entirely
without writing C code, since I don't know how to get a shell script
to fsync() a file.

Incidentally, yesterday I gave cluster (well, Lou and Chris)
instructions on how to fix machines with this problem without
reinstalling them.


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

[17401] in Athena Bugs

Re: Solaris clients with filesystem corruption

daemon@ATHENA.MIT.EDU (Greg Hudson)Thu Dec 2 07:52:55 1999

daemon@ATHENA.MIT.EDU (Greg Hudson)
Thu Dec 2 07:52:55 1999