[7928] in testers

home help back first fref pref prev next nref lref last post

Re: 3 comments

daemon@ATHENA.MIT.EDU (Geoffrey Thomas)
Fri Mar 13 17:08:59 2009

Date: Fri, 13 Mar 2009 17:08:09 -0400 (EDT)
From: Geoffrey Thomas <geofft@MIT.EDU>
To: Christine L Moulen <orbitee@MIT.EDU>
cc: Michael Khusid <mkhusid@MIT.EDU>, testers@MIT.EDU
In-Reply-To: <1236975520.813.83.camel@guru.mit.edu>
Message-ID: <alpine.LRH.2.00.0903131700580.10779@oliver.mit.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII

On Fri, 13 Mar 2009, Christine L Moulen wrote:
> On Fri, 2009-03-13 at 15:27 -0400, Michael Khusid wrote:
>> 1. I just ran into a hung Debathena workstation m66-080-1.  I can see a
>> kernel panic on the screen.  The machine has not rebooted.
>
> I've had my workstation lock up like this too.  Network was flaky, lost
> AFS, and it could not unmount AFS on a reboot attempt.  I eventually
> power-cycled it.

There's a purported fix for this in
   /mit/debathena/openafs-testing/openafs-modules-intrepid.deb

The patch applied, if you're curious (or don't run Intrepid with a 
-generic kernel), is
   /afs/andrew/usr/cg2v/cbr-only-free-what-you-alloc.diff

It's installed on about half of the cluster machines, and it seems to 
significantly reduce how often you hit this error, but there are reports 
it doesn't eliminate it entirely.

I'd be interested in seeing the backtraces of machines that crash with 
this patch applied, if any.

-- 
Geoffrey Thomas
geofft@mit.edu

home help back first fref pref prev next nref lref last post