[3625] in Release_7.7_team
9.1.19 blocking on Linux kernel issue
daemon@ATHENA.MIT.EDU (Greg Hudson)
Tue Dec 10 16:50:28 2002
Date: Tue, 10 Dec 2002 16:50:22 -0500
Message-Id: <200212102150.QAA00514@error-messages.mit.edu>
From: Greg Hudson <ghudson@MIT.EDU>
To: release-team@MIT.EDU
When I went to sanity-check the 9.1.19 release for Linux, I found that
the update hung after:
Beginning update from 9.1.18 to 9.1.19 at Tue Dec 10 15:49:24 EST 2002.
Preparing:                  ###########################################
+ athena-ntp                ########################################### [  5%]
+ athena-athneteventd       ########################################### [ 10%]
+ athena-krb5               ########################################### [ 15%]
+ athena-athstatusd         ########################################### [ 21%]
+ athena-zephyr             ########################################### [ 26%]
+ kernel                    ###########################################
(The machine isn't hung, just the update process.)  A process listing
reveals that the hanging command is:
root     14153 99.6  0.0  1372  416 pts/1    R    15:50  39:40 umount /tmp/initrd.mnt.VedGyI
An strace of this process doesn't reveal any system calls going on.
If you try to reboot the machine in this state, then either (a) if
it's a lilo machine, the old kernel will boot with the old initrd.
This will work fine, but the the update will hang again, assuming the
rpm database is sane enough to allow an update; or (b) if it's a grub
machine, the new kernel will try to boot with the old initrd, which
might or might not work.
Derek pointed out a similar problem in source-reviewers [9854] ("the
build sequence hangs on 'umount' of the initrd and puts the machines
into a weird state"; he later noted that he didn't experience the
problem in a RH7.3 VMware install, but he didn't say what kernel that
was with).  So the problem is limited to my test machine; however, I
can't find anything in Red Hat bugzilla about it.
Some conceivable ways out:
  * If we could identify a source patch for the problem, we could
    conceivably build fixed vmlinuz images and have them subbed in by
    the update script (followed by a reboot) before the update is
    done.  That's assuming an initrd will continue working if the
    kernel code is modified slightly.
  * We could make the update build a user-mode Linux kernel with a
    working loop filesystem (hopefully the new Red Hat kernel
    qualifies), "boot" that, and perform the mkinitrd there.
  * We could (temporarily) install a Red Hat kernel RPM which contains
    a modified /sbin/mkinitrd, which produces the initrd in some other
    way.  The "some other way" is hard to identify; I suppose we could
    have pre-built initrds in AFS for various configurations, and have
    the modified mkinitrd simply figure out which one to grab.  Or
    maybe it turns out that the kernel changes are minimal enough that
    simply using the old initrd works in all cases.
  * Naturally, we can put off the problem by simply not including the
    kernel upgrade as part of 9.1.19.  But that won't make it go away.
It's kind of a depressing situation overall.  Speak up if you have any
information I don't.