[2884] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Linux update post-mortem

daemon@ATHENA.MIT.EDU (Greg Hudson)
Mon Jul 30 13:44:02 2001

Date: Mon, 30 Jul 2001 13:43:57 -0400
Message-Id: <200107301743.NAA14362@egyptian-gods.MIT.EDU>
From: Greg Hudson <ghudson@MIT.EDU>
To: release-team@mit.edu

This is a status note about problems people saw in the Linux update
from 8.4 to 9.0.

The most serious problem was that some (relatively small) number of
machines did not update their lilo.conf files due to a bug in the
update system.  The bug goes like this:

	* One of the RPM scriptlets failed.  In the case I know about,
	  it was the athena-ws trigger for finger-server, but it's
	  possible that other scriptlets failed on other machines.

	* rpmupdate doesn't deal with this error very well because
	  it's hard to distinguish it from other errors (like "none of
	  the update happened at all").  As a consequence, it exits
	  without updating lilo.conf or running lilo.

Anyway, the machine will take the update mostly normally (the
scriptlet failure is usually harmless in of itself) and then boot with
the old 2.2.x kernel (even though the kernel has been unlinked, its
data generally still resides in the same disk blocks as before, since
they generally haven't been ovewritten, and lilo knows where those
disk blocks are).  The kernel will be unable to load any modules, so
the machine might not make it onto the net.

Only some machines were affected by this problem because the failing
athena-ws scriptlet only fails if athena-ws is updated before
finger-server, and that doesn't seem to have been the normal case.

I did a scan of the public-linux machines and found only three in the
losing state (9.0.x version with a 2.2.x kernel): m12-182-13,
m56-129-27, and nesthorn.  nesthorn is running kernel 2.2.18, so that
one might be deliberate.  I know that some other machines were
affected by the problem and were reinstalled, such as (reportedly) all
of the machines in the 6.001 lab.

The specific scriptlet problem I know about will be fixed in 9.0.14.
rpmupdate should definitely handle this situation better; I don't know
how many pitfalls I will run into trying to make that happen, so I
don't know how long it will take.

A second problem is that some machines came up without working X.  In
the 9.0 release, we install both the XFree86 3.3.6 suite of servers
and the XFree86 4.0 server.  An 8.4 machine upgrading to 9.0 will
continue to run an XFree86 3.3.6 server; however, at least some of the
XFree86 3.3.6 servers in Red Hat 7.1 appear to have changed relative
to the versions in red Hat 6.2 even though the upstream version hasn't
changed.  This problem might be specific to the Cirrus driver (which
would be one of the chips supported by XF86_SVGA); I'm not certain.

home help back first fref pref prev next nref lref last post