[17399] in Athena Bugs

home help back first fref pref prev next nref lref last post

Solaris clients with filesystem corruption

daemon@ATHENA.MIT.EDU (Greg Hudson)
Wed Dec 1 21:41:22 1999

Date: Wed, 1 Dec 1999 21:41:14 -0500 (EST)
Message-Id: <199912020241.VAA03418@small-gods.mit.edu>
From: Greg Hudson <ghudson@MIT.EDU>
To: bugs@mit.edu

Today I looked at a couple of machines in the field which had
experienced corruption on the root filesystem, probably after an
unclean shutdown of some kind.

The first machine was an Ultra 5.  It had a four-letter name I don't
recall right now.  At boot time, it got stuck on "retrying host
configuration", which is a message from /etc/init.d/rootusr.  The
cause of the problem was that /etc/hostname.hme0 had been truncated to
zero length.  There were a pile of fsck errors on the / and /var
partitions (I would guess also on the /usr partition, but I didn't try
that).  Mounting the root partition read-write manually and running
syncconf caused the machine to come up okay.

The second machine was hudson, a Sparc 5.  It was booting fine but
xlogin was complaining "workstation failed to activate successfully."
The machine wasn't getting cluster information.  I found two
incidences of corruption: /etc/athena/version had a bunch of NUL bytes
appended to it, and /etc/named.conf had been truncated to 0 length.

Lou claims that the first problem is happening very frequently in the
field since the last patch release.  I don't know what the patch
release could have to do with it, really.  I have no idea what is
causing these (presumed) unclean shutdowns, or why the local
filesystems are experiencing corruption as a result.

home help back first fref pref prev next nref lref last post