[3233] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Serious but extremely rare problem with Red Hat Kernel version

daemon@ATHENA.MIT.EDU (Bill Cattey)
Thu Apr 4 21:40:54 2002

From: Bill Cattey <wdc@MIT.EDU>
To: release-announce@mit.edu
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Date: 04 Apr 2002 21:40:45 -0500
Message-Id: <1017974445.3511.92.camel@tokata.mit.edu>
Mime-Version: 1.0

Summary:

On an extremely few systems, the kernel that Red Hat shipped out and
which Athena standardized upon starting with Athena Release 9.0.25 
hangs after displaying:

	Uncompressing Linux ... Ok, booting the kernel.

If this should happen to you with Linux Athena, email me (wdc@mit.edu),
and either I or someone who works for me will come out and get you back
in business.

If you have a large installation of identical hardware, you should
one system in our beta cluster.  Then you will get early Athena updates
and be able to identify issues like this one and minimize the risk to
your site.

Detail:

As Linux Athena shifts to providing support to a diversity of hardware,
alerts like this one may become more frequent.

Shortly after deploying the Athena update with the version 2.4.9-31
Linux Kernel from Red Hat, we got a report from a customer that a
machine took the update to Athena 9.0.25 and then hung.

Andrew Boardman and I visited the system ourselves, and sure enough, the
kernel begins to start up and displays:

	Uncompressing Linux ... Ok, booting the kernel.

and then hangs.

We discovered that this problem has been reported by someone else to Red
Hat:  http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=61340

My guess is that this problem will stump Red Hat, and that the fix will
come when the next officially blessed kernel from Red Hat comes out and
fails to reproduce the problem.  Until then, the problem has been shown
to be exceedingly rare.

I've seen it happen on an HP Vectra VEi8, but I have watched every other
hardware type I've tried work just fine.

The Athena UNIX Platform Team and Red Hat work very hard to try and
prevent these sorts of problems from ever being seen in the field.  But
with the extreme diversity of hardware, untried combinations are
inevitable.

If you have a large number of systems at your site, DEFINITELY put one
in the Beta cluster to act as a final check on problems with updates
before your whole site gets bit by a bug unique to your configuration.

The way you get a system into the Beta cluster is to send mail to
hesreq@mit.edu with the hostname and platform type (in this case 
Athena Linux).  Thereafter that system will get the beta release of 
Athena patches and releases.  For more details on the role of
the Beta cluster check steps 9 and 10 in the testing section of
the Athena Change Policy web page:
http://web.mit.edu/release/www/change-policy.html#testing

We apologize for any inconvenience, and hope that this information is
useful to you.  We continue to monitor the situation, and will deploy a
fix as soon as one becomes available.  In the meantime, the number of
affected systems is expected to be so few, and the procedure to recover
so obscure, that I've personally volunteered to make any required site
visits.

Thank you for using Athena Linux.

-Bill Cattey
Team Leader
Athena UNIX Platform Team



home help back first fref pref prev next nref lref last post