[1409] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Re: 8.2.8 slowdown

daemon@ATHENA.MIT.EDU (Anne Salemme)
Fri Jul 31 12:04:21 1998

To: Greg Hudson <ghudson@MIT.EDU>
Cc: Bill Cattey <wdc@MIT.EDU>,
        "Ask,    and it will be given you" <mbarker@MIT.EDU>,
        release-team@MIT.EDU, kcr@MIT.EDU, jhawk@MIT.EDU, ops@MIT.EDU,
        network@MIT.EDU
In-Reply-To: Your message of "Thu, 30 Jul 1998 18:00:19 EDT."
             <199807302200.SAA21707@small-gods.mit.edu> 
Date: Fri, 31 Jul 1998 12:04:11 EDT
From: Anne Salemme <salemme@MIT.EDU>

i suggest that there's a simpler way to deal with controlling the number
of workstations that take an update at the same time. if i understand what
happened recently, we tried to make each system add some random time offset
to when it starts the update. while this may solve some problems, i think
it caused a serious operational one: namely, we couldn't predict on a
per-cluster basis when all the systems in a cluster would be finished and
back in service.

we could manage the updating proces on a per-cluster basis by putting
workstations in smaller clusters in moira..so, instead of having one huge
"public" sun cluster which includes workstations all over campus, we could
put them into smaller hesiod clusters and thereby manage the update
process based on groups of workstations.

(the update process would still need some time offset on a per workstation
basis to guarantee that even a small gorup of workstations doesn't start the
update at precisely the same time.)

just a suggestion for next time...
					anne


home help back first fref pref prev next nref lref last post