[3625] in Release_7.7_team
9.1.19 blocking on Linux kernel issue
daemon@ATHENA.MIT.EDU (Greg Hudson)
Tue Dec 10 16:50:28 2002
Date: Tue, 10 Dec 2002 16:50:22 -0500
Message-Id: <200212102150.QAA00514@error-messages.mit.edu>
From: Greg Hudson <ghudson@MIT.EDU>
To: release-team@MIT.EDU
When I went to sanity-check the 9.1.19 release for Linux, I found that
the update hung after:
Beginning update from 9.1.18 to 9.1.19 at Tue Dec 10 15:49:24 EST 2002.
Preparing: ###########################################
+ athena-ntp ########################################### [ 5%]
+ athena-athneteventd ########################################### [ 10%]
+ athena-krb5 ########################################### [ 15%]
+ athena-athstatusd ########################################### [ 21%]
+ athena-zephyr ########################################### [ 26%]
+ kernel ###########################################
(The machine isn't hung, just the update process.) A process listing
reveals that the hanging command is:
root 14153 99.6 0.0 1372 416 pts/1 R 15:50 39:40 umount /tmp/initrd.mnt.VedGyI
An strace of this process doesn't reveal any system calls going on.
If you try to reboot the machine in this state, then either (a) if
it's a lilo machine, the old kernel will boot with the old initrd.
This will work fine, but the the update will hang again, assuming the
rpm database is sane enough to allow an update; or (b) if it's a grub
machine, the new kernel will try to boot with the old initrd, which
might or might not work.
Derek pointed out a similar problem in source-reviewers [9854] ("the
build sequence hangs on 'umount' of the initrd and puts the machines
into a weird state"; he later noted that he didn't experience the
problem in a RH7.3 VMware install, but he didn't say what kernel that
was with). So the problem is limited to my test machine; however, I
can't find anything in Red Hat bugzilla about it.
Some conceivable ways out:
* If we could identify a source patch for the problem, we could
conceivably build fixed vmlinuz images and have them subbed in by
the update script (followed by a reboot) before the update is
done. That's assuming an initrd will continue working if the
kernel code is modified slightly.
* We could make the update build a user-mode Linux kernel with a
working loop filesystem (hopefully the new Red Hat kernel
qualifies), "boot" that, and perform the mkinitrd there.
* We could (temporarily) install a Red Hat kernel RPM which contains
a modified /sbin/mkinitrd, which produces the initrd in some other
way. The "some other way" is hard to identify; I suppose we could
have pre-built initrds in AFS for various configurations, and have
the modified mkinitrd simply figure out which one to grab. Or
maybe it turns out that the kernel changes are minimal enough that
simply using the old initrd works in all cases.
* Naturally, we can put off the problem by simply not including the
kernel upgrade as part of 9.1.19. But that won't make it go away.
It's kind of a depressing situation overall. Speak up if you have any
information I don't.