[868] in athena10

home help back first fref pref prev next nref lref last post

Re: Auto-updating clusters

daemon@ATHENA.MIT.EDU (Evan Broder)
Tue Jan 20 17:24:07 2009

Message-ID: <49764EC6.9050106@mit.edu>
Date: Tue, 20 Jan 2009 17:23:02 -0500
From: Evan Broder <broder@MIT.EDU>
MIME-Version: 1.0
To: Jonathon Weiss <jweiss@mit.edu>
CC: debathena@mit.edu
In-Reply-To: <200901202209.n0KM9qmV025408@speaker-for-the-dead.mit.edu>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Currently the auto-update framework desyncs before checking to see if an
update is available, so I'd plan to set the desync interval to the same
thing as the interval of the cron job.

- Evan

Jonathon Weiss wrote:
> With athena 9 we invoke update_ws every 6 minutes and desync over 4
> hours.  I'm happier with the 4 hours (14400 seconds)
> desynchronization.  As we recently demonstrated with a
> desynchronization failure, currently the server infrastructure is more
> likely to be the bottleneck than the cluster networks.
>
> Are you planing to do the desynchronization before or after you
> determine whether there are actually any updates to apply.  If it is
> after, you can certainly run the cron job every 5 minutes and desync
> for 4 hours and have the same schedule we do now.  If you're desyncing
> before figuring out if there is an update, then you probably don't
> want to run the cron job a lot more often than the desync interval,
> but a little more often might be okay.
>
> 	Jonathon
>
>
>   
>> Jon and I decided that we thought this was reasonable, but I wanted to
>> bring it up here just in case others had input.
>>
>> Currently debathena-auto-update runs twice an hour, using desync with a
>> range of 0-1000 seconds (about 15 minutes). This means that if there's a
>> large update (a new version of OOo or something unfortunate like that),
>> it's very conceivable that you could end up with an entire cluster
>> downed by the update process (since you can't login when updates are
>> running).
>>
>> I think that we should change the cron job to run every 2 hours instead
>> of twice an hour, and adjust the argument to desync to space updates
>> across that full 2 hour period.
>>
>> I figure that (ignoring upgrades from one Ubuntu release to another) our
>> worst case scenario update is bounded by how many changes show up in the
>> apt repos in a 2 hour period. Given that, the worst case is probably an
>> update that takes about 1/2 an hour to install (since Ubuntu doesn't do
>> point-releases like Debian does). A 1/2 hour install period desynced
>> over 2 hours results in about 3/4 of the heads in a cluster being usable
>> at any given time during this worst-case update.
>>
>> Given that such a worst-case update is relatively unlikely to happen
>> normally, meaning that the average percentage of heads downed by an
>> update would be much smaller, I think this is a reasonable period of time.
>>
>> Do people object to me making that change?
>>
>> - Evan
>>     
>
>
>   

home help back first fref pref prev next nref lref last post