[6923] in Release_7.7_team
Lucid status update
daemon@ATHENA.MIT.EDU (Jonathan Reed)
Fri Aug 27 08:20:48 2010
From: Jonathan Reed <jdreed@MIT.EDU>
Content-Type: text/plain; charset=us-ascii
Date: Fri, 27 Aug 2010 08:20:41 -0400
Message-Id: <7E82F7A5-C5DD-44D8-AD6B-AD7338D580E7@mit.edu>
To: "release-team@MIT.EDU" <release-team@mit.edu>
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Transfer-Encoding: 8bit
At 8:01am, the desync period expired, so any machines that are going to update today have already begun the process.
As of 8:15, there are:
- 221 completed Lucid installs
- 193 in-progress Lucid installs (number may be high, as it includes any public machine which refuses athinfo connections)
- 31 non-upgraded Jaunty machines (I'll visit a sampling of them later today to look at the upgrade.log, but I suspect it's machines that had someone logged in.)
- 16 "other" (Athena 9, no route to host, etc)
... out of a total of 461 machines (including clusters, quickstations, dorm machines and podium machines). The in-progress installs should finish up over the next 1-8 hours (the higher values being reserved for 2-032 and 38-370, which each managed to only install 3 machines over a period of 6 hours). The remainder of the Jaunty machines should upgrade tonight (Friday).
Lessons learned:
- People use the clusters, even at 5am.
- Installation takes way too long, and we should either start earlier or desync less.
- Installation on 10/half networks is rapidly becoming unsustainable.
- Retry sooner than 24 hours if the upgrade fails because someone is logged in (#694)
- Add an athinfo query for the upgrade log
- Having a modified athinfo daemon (which answers all queries with "Installation started at $timestamp") running during installation would be helpful, to differentiate "connection refused" because it's installing from "connection refused" because it's broken.