[27207] in Athena Bugs

home help back first fref pref prev next nref lref last post

Re: 9.4.46 linux VMs hanging

daemon@ATHENA.MIT.EDU (andrew m. boardman)
Thu Jun 19 18:30:43 2008

Message-Id: <200806192230.m5JMUU3t024407@pothole.mit.edu>
To: Jonathon Weiss <jweiss@mit.edu>
In-Reply-To: Your message of "Mon, 05 May 2008 16:35:31 EDT."
	<200805052035.m45KZVV9008393@speaker-for-the-dead.mit.edu> 
Date: Thu, 19 Jun 2008 18:30:30 -0400
From: "andrew m. boardman" <amb@mit.edu>
X-Spam-Flag: NO
X-Spam-Score: 0.00
Cc: mmanley@mit.edu, bugs@mit.edu
Errors-To: bugs-bounces@mit.edu


> Both Mark Manley and I have noted that newly installed (since the
> installer changes last week) 9.4.46 virtual linux machines (in my
> case, on VMware server, in his, on 2 different versions of Xen)
> consistantly hang in the middle of running mkserv ops.

This is very odd.  This:

> do_IRQ: stack overflow: 336
>  [<c010795b>]

...happens for me some of the time, but I also get a variety of different
kernel panics with stack dumps, all generally related to AFS-generated
network traffic.  This is not surprising, as the actual crash during
"mkserv ops" is consistently happening during
"cp -p ${ops}/bin/gtar /var/ops/bin/gtar".  I can actually a
crash just with "ls ~~ops/bin/gtar", though not (so far) with any other
file.

Also note that occasionally it just works, after which all is well.

Running 9.4.43 won't necessarily save you; I've also seen it there,
occasionally with the default pcnet32 driver, frequently with the e1000
driver, and (so far) always with the "native" vmxnet driver.  (For
enhanced crashability during testing, I recommend clearing the AFS cache
at boot time and removing /var/ops.)

I've also tested with OpenAFS 1.4.7, and it still crashes.  I need to set
this aside for a bit, but I plan to dig into the AFS side of things more
and see if I can figure out what's going on.  (There may be a fix in 1.5
somewhere.)

home help back first fref pref prev next nref lref last post