[7033] in testers
lilo failures under 9.4 on equal-rites
daemon@ATHENA.MIT.EDU (Greg Hudson)
Tue May 24 17:06:04 2005
Date: Tue, 24 May 2005 17:05:51 -0400
Message-Id: <200505242105.j4OL5pif011297@men-at-arms.mit.edu>
From: Greg Hudson <ghudson@MIT.EDU>
To: testers@MIT.EDU
equal-rites, one of my work machines, has been getting stuck at "LI"
upon rebooting after patch releases which involve kernel upgrades. It
is clear that lilo is getting run successfully (twice, even), but for
some reason it doesn't take.
If, after the boot failure, I boot off an Athena install CD, mount the
drives, chroot to the machine environment, run "lilo", and reboot,
that restores functionality until the kernel is upgraded again.
Red Hat has grown support for automatically updating lilo.conf and
re-running lilo when a kernel is upgraded, which means the rpmupdate
support for doing the same thing is vestigial. But the rpmupdate code
is not the cause here; I can reproduce the problem using just "rpm
-Uvh" to upgrade the kernel RPM, with or without running lilo by hand
afterwards.
/ and /boot are ext3 filesystems (although the machine seems very fond
of fscking the / filesystem regardless, so maybe it has no journal);
/usr/vice/cache is an ext2 filesystem.
http://seclists.org/lists/linux-kernel/2003/May/2410.html suggests
that I'm not totally alone here, and the problem might be related to
the journaled nature of ext3. There was no closure on that thread,
however.
I'm not sure if one can safely and/or effectively migrate from ext3 to
ext2, but I tried unmounting /boot, changing it to ext2 in /etc/fstab,
and remounting it, and then performing a kernel upgrade. It still
failed, so either the kernel was still treating the filesystem as ext3
or the problem does not derive from ext3.
I don't know if this problem is restricted to equal-rites; I suspect
not, since I think we saw similar problems on astrophel. Lacking any
good way of debugging this problem, I think we are going to be stuck
with automatically migrating lilo users to grub in 9.4. I haven't
found any Red Hat code to perform this migration automatically (I
looked at the Anaconda python code and the functionality of grubby)
although "grubby --info" will do a little bit of parsing for us.
So, our options as I see them:
* Write code to translate lilo.conf to grub.conf (possibly using
grubby to parse lilo.conf entries for Linux kernels), and run
grub-install.
* Write code to create a grub.conf from scratch (possibly adapting
code from the installer's phase2.backend), and run grub-install.
* Deny 9.4 upgrades to people using lilo, and tell them to
reinstall.
I don't think we currently have any way to measure the impact of the
third option.