[6145] in www-talk@info.cern.ch

home help back first fref pref prev next nref lref last post

Re: No Nasty Robots! (long)

daemon@ATHENA.MIT.EDU (Paul Everitt)
Thu Oct 13 10:18:49 1994

Date: Thu, 13 Oct 1994 15:07:39 +0100
Errors-To: listmaster@www0.cern.ch
Errors-To: listmaster@www0.cern.ch
Reply-To: paul@cminds.com
From: Paul Everitt <paul@cminds.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>


For a solution to this problem, look at Harvest:
	http://rd.cs.colorado.edu/harvest/

Why does this help? 

*Distribute Indexing 

Collect all of you index information locally using a "gatherer", then
answer requests for compiling into queryable indices.  Thus, you are only
putting the _index_ over the wire, not the full-text.  Moreover, you are
only transmitting the *changes* to that index, and these changes go in
gzip format. 

*Configurable Indices

Run a local "broker", which goes to various net "gatherers" and builds a
local index of the topics you want.  Or, go to other brokers on the net
that build datasets that interest you. 

*Customized Indices

By selecting the types of info you want (FAQs, HTML files, etc.) and using
filters for these types, you get __structured indices__ for pertinent
info!  Much nicer than full-text indices.  Moreover, write your own
summarizer for new types (IAFA templates, cc:Mail directory listings,
etc.).  Even use the ability to "explode" tar files

*Other interesting subsystems

The Harvest project has other interesting technologies.  For instance, 
using its replication features, you could move the entire object into 
Harvest (i.e. the full text, not just the index) and copy it around, 
with changes being put back in.  There is also an Object Cache with 
strong performance characteristics.  Finally, there is the forthcoming 
Harvest Object System, using object-oriented extensions to define types 
and methods.

Sorry for the verbosity.  The real point is that there is a mechanism to 
dramatically lower the load from indexing, while dramatically raising the 
functionality.

Disclaimer: I am not part of the Harvest Development Team, merely a 
ludicrously-happy beta tester.

Paul Everitt             V 703.785.7384  Email Paul.Everitt@cminds.com
Connecting Minds, Inc.   F 703.785.7385  WWW   http://www.cminds.com/ 


home help back first fref pref prev next nref lref last post