[529] in Athena User Interface
Re: Medusa doesn't play well with AFS
daemon@ATHENA.MIT.EDU (Rebecca Schulman)
Mon Dec 18 13:05:10 2000
Message-Id: <3A3E6F6E.C968EE78@eazel.com>
Date: Mon, 18 Dec 2000 12:11:26 -0800
From: Rebecca Schulman <rebecka@eazel.com>
Mime-Version: 1.0
To: Richard Tibbetts <tibbetts@MIT.EDU>, aui@MIT.EDU
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Oops, forgot to reply to all..
Richard Tibbetts wrote:
> On 12/18 Beland wrote:
> > Ok, so I did a little more research into how Medusa works.
>
> I have done none, and have never used Meduse, so anything I say is
> based on conjecture. But I am probably right or very nearly so.
>
> > I'm curious how it deals with changes to files since the last full
> > index? Does something check to see what files have changed since
> > then, or do I just lose?
>
> It should almost certainly check the timestamp, which would make later
> indexing very cheap, especially on something as small as an AFS volume
> (usually 50 to 100 megs).
>
> > - Run an indexer for the entire athena.mit.edu cell, and allow
> > clients to connect to it via the network. (We do not trust root on
> > local workstations.) This would provide functionality not unlike a
> > web search, except somewhat more intrusive, since users often put
> > files in their Public and www directories without telling anyone or
> > any indexing services about them. It would also probably provide for
> > very inefficient seraching. Who knows how long the indexing process
> > would take. This would also represent a radical change in operation
> > from Medusa's current architecture, and would require teaching it how
> > AFS permissions work.
>
> This is bad. Ops will laugh at you if you propose it. It is too
> invasive, and adds another point of failure for the security of
> peoples data.
>
> > - Have an automatically-created index directory at the base of each
> > locker; configure to search for this directory when searching files in
> > that locker. Problem: Will only work in lockers that don't have more
> > restrictive permissions set lower down, unless a fancy kludge is
> > constructed to make multiple indexes, depending on access rights.
> > (Ick!)
>
> Ick is right. This also requires implementing a daemon, and doesn't
> really add much value over the third option.
>
> > - Allow users to run the indexer at some point when they are logged
> > in. (Either automatically in the background after a certain interval
> > since the last index, or perhaps on demand.) This solves the
> > permissions problem, though does introduce a greater load on the
> > servers if everyone is always indexing their files. If done on
> > demand, users will lose the benefit of a fast search on a whim.
> > There's also the question of whether or not to index files in group
> > lockers the user accesses frequently.
>
> I think that we should let the user index whatever lockers they want,
> including their own. Medusa may need to be hacked to not explode when
> it can't read something because of AFS but the unix permissions say it
> can. GMC had this bug. Indexing should not be expensive after the
> first time, so it can probably run in the background on every login,
> assuming it doesn't mind being truncated if the user only logs in for
> a few minutes.
>
> There might be significant architectural barriers to this. If so, I
> think that Eazel did it wrong. But if that is the case I don't think
> that the value it presents to users is worth having MIT rearchitect
> the system. However Eazel may decide that this functionality is in
> their best interest, since AFS will be showing up in more and more
> locations now that it is free.
A couple of comments :
1. The current version of medusa, the one we will be shipping with
Nautilus 1.0 doesn't
support incremental indexing. This means that a reindex does a complete
rewrite.
Yes, this is bad, and no, it won't stay this way. But medusa was one of
the last large features
added to nautilus, so it's a bit behind in terms of maturity of a
feature
set.
2. I agree that it's a ridiculous concept to think about MIT
rearchitecting
their system
to make indexing work. And at Eazel we are interested in making medusa
work in a more distributed environment for a number of reasons,
including
as you said, better compatibility with network file system environments
(including AFS and NFS), and also so you can do searches on your own
materials that may
be distributed across a network. I'd be glad to hear advice from anyone
at
I/S who has thoughts about
different designs for this feature. I've thought briefly about some,
but
I'm hardly an expert at distributed design.
3. Abandoning the multi-user one index model is probably a bad idea.
One
of the benefits of such a system is that you are not duplicating
indices.
If you go and look, you will see the index is big (around 1-2% of the
total
size of a text heavy system, less on a system with less text, but still
around 0.5%). Having an index own by a user without permission to see
everything on a volume would leave out some files by necessity, and this
would
require a second index. I am more a fan of the index-per-volume idea,
where medusa can handle meteing out files through either the standard
UNIX model or through one of the many ACL based more recent models.
Rebecca
>
>
> tibbetts
>
> -*- http://www.mit.edu/~tibbetts -*- finger tibbetts@monk.mit.edu -*-