[27207] in Athena Bugs
Re: 9.4.46 linux VMs hanging
daemon@ATHENA.MIT.EDU (andrew m. boardman)
Thu Jun 19 18:30:43 2008
Message-Id: <200806192230.m5JMUU3t024407@pothole.mit.edu>
To: Jonathon Weiss <jweiss@mit.edu>
In-Reply-To: Your message of "Mon, 05 May 2008 16:35:31 EDT."
<200805052035.m45KZVV9008393@speaker-for-the-dead.mit.edu>
Date: Thu, 19 Jun 2008 18:30:30 -0400
From: "andrew m. boardman" <amb@mit.edu>
X-Spam-Flag: NO
X-Spam-Score: 0.00
Cc: mmanley@mit.edu, bugs@mit.edu
Errors-To: bugs-bounces@mit.edu
> Both Mark Manley and I have noted that newly installed (since the
> installer changes last week) 9.4.46 virtual linux machines (in my
> case, on VMware server, in his, on 2 different versions of Xen)
> consistantly hang in the middle of running mkserv ops.
This is very odd. This:
> do_IRQ: stack overflow: 336
> [<c010795b>]
...happens for me some of the time, but I also get a variety of different
kernel panics with stack dumps, all generally related to AFS-generated
network traffic. This is not surprising, as the actual crash during
"mkserv ops" is consistently happening during
"cp -p ${ops}/bin/gtar /var/ops/bin/gtar". I can actually a
crash just with "ls ~~ops/bin/gtar", though not (so far) with any other
file.
Also note that occasionally it just works, after which all is well.
Running 9.4.43 won't necessarily save you; I've also seen it there,
occasionally with the default pcnet32 driver, frequently with the e1000
driver, and (so far) always with the "native" vmxnet driver. (For
enhanced crashability during testing, I recommend clearing the AFS cache
at boot time and removing /var/ops.)
I've also tested with OpenAFS 1.4.7, and it still crashes. I need to set
this aside for a bit, but I plan to dig into the AFS side of things more
and see if I can figure out what's going on. (There may be a fix in 1.5
somewhere.)