[3212] in Release_7.7_team
Cleanup plan for 9.0.25
daemon@ATHENA.MIT.EDU (Greg Hudson)
Fri Mar 29 11:04:36 2002
Date: Fri, 29 Mar 2002 11:01:07 -0500
Message-Id: <200203291601.LAA29900@error-messages.mit.edu>
From: Greg Hudson <ghudson@MIT.EDU>
To: release-team@mit.edu
The 9.0.25 release for Linux had two serious problems, one of which
has been talked about at great length and the other of which hasn't.
1. Our frobbing of the EEPROM settings on GX150s put many machines
in a worse state than they were before, knocking some off the net
entirely and leaving others in a state where network performance
is very slow and lossy.
Bill did some research and determined that we want to (1) set the
EEPROM values to be identical to the values as they come from Dell,
and (2) reset the MII unit after setting EEPROM values. Garry
verified his findings. I have put out a 9.0.26 patch release to the
dev cell which does those things. We can put it out to the Athena
cell this weekend; we will determine the exact timing when Bill and
Jonathon are around.
Machines which are in the "slow network" state will hopefully be able
to take the update and be fixed by Monday. Machines in the "no
network" state will obviously need to be visited. There are four
machines in W20 which work fine but were pegged at 10Mbps by Garry;
these machines should go back to the pristine state with the 9.0.26
patch release.
I will leave it to Bill to identify who will visit the "no network"
machines and fix them. If it's cluster services, they will need to be
issued carefully written instructions, of course. This should happen
before students return from spring break Monday, and perhaps sooner.
2. The list-9.0.25 file contained two RPMs from
redhat-7.1/RedHat/RPMs which had been obsoleted by update RPMs
from Red Hat, specifically dump-static and
XFree86-ISO8859-2-Type1-fonts. As a result, our list file
contained conflicts. We found this out when Garry noticed that
public workstation verification was spewing errors.
The update did not fail because "Obsoletes:" headers in the new RPMs
caused the old RPMs to be removed on machines taking the update, even
though the list files did not specify those removals. Machines which
updated to 9.0.25 wound up with the proper set of RPMs, but their RPM
set did not match /var/athena/release-rpms. Not a major catastrophe.
Installs of 9.0.25 did not fail because we install with
--replacefiles. However, they wound up with a set of conflicting
RPMs. A future update which removed the dump-static and font RPMs
would presumably result in important files being missing from the
machine. There is no mechanism in the update to avoid this problem.
Recognizing the possibility for disaster here, Garry and I acted this
morning. Garry produced a list of all Linux machines which have
booted since Monday and I scanned them to determine which machines had
been installed at 9.0.25. Fortunately, only four machines have been:
two test machines in W92 and two public machines in M1. The public
machines are not a concern because public workstation verification
will rectify any missing files.
To prevent any more machines from being installed in the broken state
between now and 9.0.26, I edited list-9.0.25 to remove the obsoleted
RPMs and Garry propagated that change to the athena cell.
When Andrew gets back from vacation, he should:
* Modify the install not to use --replacefiles. Using it is just
asking for trouble.
* Modify his procedures for building lists so that he can detect
when RPMs have been obsoleted, so that we don't get into this
situation again.