[7259] in testers
Re: 9.4.9 linux machines with trashed cache partitions
daemon@ATHENA.MIT.EDU (Jonathon Weiss)
Thu Jul 14 09:01:56 2005
Message-Id: <200507141301.j6ED1i6h002881@distraction.mit.edu>
From: Jonathon Weiss <jweiss@MIT.EDU>
To: Kevin Chen <kchen@MIT.EDU>
cc: Jonathon Weiss <jweiss@MIT.EDU>, testers@MIT.EDU
In-reply-to: Your message of "Thu, 14 Jul 2005 08:33:00 EDT."
<Pine.LNX.4.62L.0507140832001.19175@scyther.mit.edu>
Date: Thu, 14 Jul 2005 09:01:44 -0400
> On Thu, 14 Jul 2005, Jonathon Weiss wrote:
>
> > Something rebooted a bunch of the w20 cluster early-test linux
> > machines shortly after noon yesterday (they were still running 9.4.9,
> > mail on that soon). Almost all of them failed to come back up on
> > their own. The common failure was a failed fsck on the afs cache
> > partition, because files contain "Illegal blocks". I mkfs'd the cache
> > partition, since that was even easier than fsck -y and they all came
> > up fine. I'm a little concerned about the underlying cause, and we
> > should probably do a little testing on something that has updated to
> > 9.4.10.
>
> I needed to do this every time I rebooted as a result of a kernel oops.
> Whether that's not supposed to happen, I don't know, of course.
Oh, it's fairly clear this isn't *supposed* to happen. The real
question is whether it is another symptom of the same problem as the
OOPSes, or failing that another bug eliminated by backing out the
newest AFS client.
Jonathon