[6923] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Lucid status update

daemon@ATHENA.MIT.EDU (Jonathan Reed)
Fri Aug 27 08:20:48 2010

From: Jonathan Reed <jdreed@MIT.EDU>
Content-Type: text/plain; charset=us-ascii
Date: Fri, 27 Aug 2010 08:20:41 -0400
Message-Id: <7E82F7A5-C5DD-44D8-AD6B-AD7338D580E7@mit.edu>
To: "release-team@MIT.EDU" <release-team@mit.edu>
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Transfer-Encoding: 8bit

At 8:01am, the desync period expired, so any machines that are going to update today have already begun the process.  

As of 8:15, there are:

- 221 completed Lucid installs
- 193 in-progress Lucid installs (number may be high, as it includes any public machine which refuses athinfo connections)
- 31 non-upgraded Jaunty machines (I'll visit a sampling of them later today to look at the upgrade.log, but I suspect it's machines that had someone logged in.)
- 16 "other" (Athena 9, no route to host, etc)

... out of a total of 461 machines (including clusters, quickstations, dorm machines and podium machines).  The in-progress installs should finish up over the next 1-8 hours (the higher values being reserved for 2-032 and 38-370, which each managed to only install 3 machines over a period of 6 hours).  The remainder of the Jaunty machines should upgrade tonight (Friday).

Lessons learned:
- People use the clusters, even at 5am.
- Installation takes way too long, and we should either start earlier or desync less.
- Installation on 10/half networks is rapidly becoming unsustainable.
- Retry sooner than 24 hours if the upgrade fails because someone is logged in (#694)
- Add an athinfo query for the upgrade log
- Having a modified athinfo daemon (which answers all queries with "Installation started at $timestamp") running during installation would be helpful, to differentiate "connection refused" because it's installing from "connection refused" because it's broken.



home help back first fref pref prev next nref lref last post