[7928] in testers
Re: 3 comments
daemon@ATHENA.MIT.EDU (Geoffrey Thomas)
Fri Mar 13 17:08:59 2009
Date: Fri, 13 Mar 2009 17:08:09 -0400 (EDT)
From: Geoffrey Thomas <geofft@MIT.EDU>
To: Christine L Moulen <orbitee@MIT.EDU>
cc: Michael Khusid <mkhusid@MIT.EDU>, testers@MIT.EDU
In-Reply-To: <1236975520.813.83.camel@guru.mit.edu>
Message-ID: <alpine.LRH.2.00.0903131700580.10779@oliver.mit.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
On Fri, 13 Mar 2009, Christine L Moulen wrote:
> On Fri, 2009-03-13 at 15:27 -0400, Michael Khusid wrote:
>> 1. I just ran into a hung Debathena workstation m66-080-1. I can see a
>> kernel panic on the screen. The machine has not rebooted.
>
> I've had my workstation lock up like this too. Network was flaky, lost
> AFS, and it could not unmount AFS on a reboot attempt. I eventually
> power-cycled it.
There's a purported fix for this in
/mit/debathena/openafs-testing/openafs-modules-intrepid.deb
The patch applied, if you're curious (or don't run Intrepid with a
-generic kernel), is
/afs/andrew/usr/cg2v/cbr-only-free-what-you-alloc.diff
It's installed on about half of the cluster machines, and it seems to
significantly reduce how often you hit this error, but there are reports
it doesn't eliminate it entirely.
I'd be interested in seeing the backtraces of machines that crash with
this patch applied, if any.
--
Geoffrey Thomas
geofft@mit.edu