[880] in arla-drinkers

home help back first fref pref prev next nref lref last post

RE: Replicated servers, milko?

daemon@ATHENA.MIT.EDU (Lyle Seaman)
Fri Jun 18 17:57:57 1999

From owner-arla-drinkers@stacken.kth.se Fri Jun 18 21:57:57 1999
Return-Path: <owner-arla-drinkers@stacken.kth.se>
Delivered-To: arla-drinkers-mtg@bloom-picayune.mit.edu
Received: (qmail 4499 invoked from network); 18 Jun 1999 21:57:56 -0000
Received: from unknown (HELO sundance.stacken.kth.se) (130.237.234.41)
  by bloom-picayune.mit.edu with SMTP; 18 Jun 1999 21:57:56 -0000
Received: (from majordom@localhost)
	by sundance.stacken.kth.se (8.8.8/8.8.8) id XAA19098
	for arla-drinkers-list; Fri, 18 Jun 1999 23:52:13 +0200 (MET DST)
Received: from gw.stormsystems.com (stormsystems.com [209.49.190.30])
	by sundance.stacken.kth.se (8.8.8/8.8.8) with ESMTP id XAA19094
	for <arla-drinkers@stacken.kth.se>; Fri, 18 Jun 1999 23:52:08 +0200 (MET DST)
Received: from fs1.office.stormsystems.com (exchange.office.stormsystems.com [172.29.1.5])
	by gw.stormsystems.com (8.9.3/8.9.1) with ESMTP id RAA15985
	for <arla-drinkers@stacken.kth.se>; Fri, 18 Jun 1999 17:51:36 -0400
Received: by fs1.stormsystems.com with Internet Mail Service (5.5.2448.0)
	id <NBCRA3SQ>; Fri, 18 Jun 1999 17:51:36 -0400
Message-ID: <B5C3AE5DDF77D211BD2500A0C9D3A6A11BFFD6@fs1.stormsystems.com>
From: Lyle Seaman <LSeaman@stormsystems.com>
To: arla-drinkers@stacken.kth.se
Subject: RE: Replicated servers, milko?
Date: Fri, 18 Jun 1999 17:51:33 -0400
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2448.0)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-arla-drinkers@stacken.kth.se
Precedence: bulk

> Hmm....  Interesting idea.  Of course, it's asking a lot for 
> the clients to
> keep track of changes until replication happens.  There are 
> also security
> implications if a server failure occurs after the client's 
> authentication
> has gone away.  Perhaps some kind of Ubik-like quorum-based 
> thing is needed
> to handle nearly-simultaneous updates to multiple read-write 
> copies of a
> volume. Unfortunately, I don't know how well this would perform in an
> environment where there are actually lots of writes going on.

It depends on how stringent you want your synchronization semantics to be.
If you want guaranteed read-write replication, the client has to keep track
of changes until the changes become "durable", in whatever form that takes.
At present, "durable" simply means resident in the sole server's buffer
cache.  If you use a quorum model of durability, then the client has to keep
track of all the changes until the quorum is reached.

> > needed is a callback and lock recovery protocol, so the new 
> server can
> > learn about which callbacks and locks have been granted 
> already.  One way
> 
> A simple way to accomplish this would be for the new server 
> to impose a
> minimum time before it accepts new locks or writes, starting 
> when it lost
> contact with the previous read-write master.  The idea is to 
> ensure that
> any locks or callbacks issued by the previous master have 
> timed out by the
> time the new master starts issuing locks or allowing changes.

Yes, that's the Sprite and DFS model. The problem with doing only that in
AFS is that the maximum callback interval is 4.5 hours.  Forcing servers to
be effectively read-only for the first 4+ hours of uptime is untenable.
Even with the DFS model, the server goes through a "token recovery period"
for a few minutes during which it is less than completely useful.  For
maximal availability, that recovery period needs to be minimal.

home help back first fref pref prev next nref lref last post