[838] in athena10

home help back first fref pref prev next nref lref last post

Re: Auto-updating clusters

daemon@ATHENA.MIT.EDU (William Cattey)
Tue Jan 13 18:09:19 2009

In-Reply-To: <496D1E47.1030002@mit.edu>
Mime-Version: 1.0 (Apple Message framework v753.1)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <0F4213C5-6F84-423C-8618-5A23E2FB2F9D@mit.edu>
Cc: debathena@mit.edu
Content-Transfer-Encoding: 7bit
From: William Cattey <wdc@MIT.EDU>
Date: Tue, 13 Jan 2009 18:07:39 -0500
To: Evan Broder <broder@mit.edu>

Thinking back to ancient experience with Athena updates, what you  
propose sounds like a sane and sensible change to me.

-Bill

On Jan 13, 2009, at 6:05 PM, Evan Broder wrote:

> Jon and I decided that we thought this was reasonable, but I wanted to
> bring it up here just in case others had input.
>
> Currently debathena-auto-update runs twice an hour, using desync  
> with a
> range of 0-1000 seconds (about 15 minutes). This means that if  
> there's a
> large update (a new version of OOo or something unfortunate like  
> that),
> it's very conceivable that you could end up with an entire cluster
> downed by the update process (since you can't login when updates are
> running).
>
> I think that we should change the cron job to run every 2 hours  
> instead
> of twice an hour, and adjust the argument to desync to space updates
> across that full 2 hour period.
>
> I figure that (ignoring upgrades from one Ubuntu release to  
> another) our
> worst case scenario update is bounded by how many changes show up  
> in the
> apt repos in a 2 hour period. Given that, the worst case is  
> probably an
> update that takes about 1/2 an hour to install (since Ubuntu  
> doesn't do
> point-releases like Debian does). A 1/2 hour install period desynced
> over 2 hours results in about 3/4 of the heads in a cluster being  
> usable
> at any given time during this worst-case update.
>
> Given that such a worst-case update is relatively unlikely to happen
> normally, meaning that the average percentage of heads downed by an
> update would be much smaller, I think this is a reasonable period  
> of time.
>
> Do people object to me making that change?
>
> - Evan


home help back first fref pref prev next nref lref last post