[139] in Information Retrieval

home help back first fref pref prev next nref lref last post

Fwd: Essence prototype announcement

daemon@ATHENA.MIT.EDU (Jerome H Saltzer)
Mon Jan 25 11:12:10 1993

Date: Mon, 25 Jan 1993 11:10:23 -0500
To: elibdev@MIT.EDU
From: Jerome H Saltzer <Saltzer@MIT.EDU>

Another entry in the WAIS/gopher/WWW field...

>Date: Tue, 12 Jan 1993 16:51:15 -0700
>From: Mike Schwartz <schwartz@latour.cs.colorado.edu>
>To: Essence-announcement-list@latour.cs.colorado.edu
>Subject: Essence prototype announcement
>
>Essence is a resource discovery system that exploits file semantics to
>index both textual and binary files.  Essence generates summaries that
>can be used to browse files before retrieving them across slow network
>links, as well as space efficient indexes.  Essence understands nested
>file structures (such as uuencoded, compressed, "tar" files), and
>recursively unravels such files to generate summaries for them.  These
>features allow Essence to be used in a number of useful settings, such
>as anonymous FTP archives.  The prototype generates WAIS-compatible
>indexes, allowing WAIS users to take advantage of the Essence indexing
>methods.
>
>WAIS users can try Essence using the ".src" file enclosed below.  This file
>also describes where to get the prototype source code and a paper about this
>system.
>
>	Darren Hardy and
>	Michael Schwartz
>	Dept. of Computer Science
>	Univ. of Colorado - Boulder
>
>-------------------------------------------------------------------------------
>
>(:source
>   :version		3
>   :ip-address		"128.138.243.151"
>   :ip-name		"ftp.cs.colorado.edu"
>   :tcp-port		8000
>   :database-name 	"aftp-cs-colorado-edu"
>   :cost 		0.00
>   :cost-unit 		:free
>   :maintainer 		"hardy@cs.colorado.edu"
>   :description 	
>"You can use this WAIS server to search and retrieve files from the
>anonymous ftp archive on ftp.cs.colorado.edu [128.138.243.151].  We
>used Essence, a resource discovery system based on semantic file
>indexing, to build the WAIS index for this server.  As explained below,
>Essence currently only allows the retrieval of file summaries through
>WAIS.  To retrieve entire files, use anonymous ftp on ftp.cs.colorado.edu.
>
>Essence exploits file semantics to index both textual and binary
>files.  By exploiting semantics, Essence extracts keywords that
>summarize a file, and generates a compact yet representative index.
>Essence understands nested file structures (such as uuencoded,
>compressed, ``tar'' files), and recursively unravels such files to
>generate summaries for them.  Essence generates indexes that are ten
>times smaller than WAIS indexes, but retain the fine-grained
>information access that WAIS's full-text indexes provide.
>
>Furthermore, Essence generates WAIS-compatible indexes allowing WAIS
>users to make use of Essence's indexing capabilities.  This is one of
>the ways that the Networked Resource Discovery Project at the
>University of Colorado has extended the conceptual paradigm of the type
>of information that WAIS handles.
>
>If you would like to learn more about Essence, you can obtain the
>source to the Essence prototype and a paper which appears in the 1993
>Winter USENIX Technical Conference, San Diego, CA, January 1993, 
>pp. 361-374.  Both the paper and the prototype are available via 
>anonymous ftp from ftp.cs.colorado.edu in /pub/cs/distribs/essence.  
>Or search for the keyword 'Essence' using this WAIS server to find all 
>of the files on ftp.cs.colorado.edu that are related to Essence; you 
>will find the files for both the paper and the prototype.
>
>This WAIS server was created in December 1992 by Darren R. Hardy and
>Michael F. Schwartz as part of the Networked Resource Discovery
>Project.  You may reach them at the Department of Computer Science,
>University of Colorado, Boulder, CO  80309-0430, or via email at
>hardy@cs.colorado.edu and schwartz@cs.colorado.edu.
>
>Below is some more information about the WAIS interface to Essence.
>
>	Essence exports its indexes through WAIS's search and
>	retrieval interface, allowing users to use tools such as
>	waissearch and the X Windows-based graphical user interface
>	xwais.  In order to generate WAIS-compatible indexes,
>	Essence uses WAIS's indexing software to index the Essence
>	summary files.  This mechanism generates full-text WAIS
>	indexes from the Essence summary files.
>
>	We modified the WAIS indexing mechanism to understand the
>	format of the Essence summary files, so that it generates
>	meaningful WAIS headlines.  These headlines provide users
>	with a short description of a single file, usually a
>	filename.  With Essence, headlines represent a file's core
>	filename, its actual filename, and its file type.
>
>	To support additional file types, WAIS must be recompiled
>	with new procedures that understand these file types.  With
>	Essence, one need only write a new summarizer, add its name
>	to a configuration file, and add new heuristics for
>	identifying the file type; no recompilation is necessary.
>	In this sense, Essence modularizes the typed-file indexing
>	extensions that WAIS can use, because it removes the
>	keyword extraction process from WAIS and places it instead
>	in Essence.  Essence is better suited to incorporating new
>	file types, and can be quickly adapted to become a
>	comprehensive indexing system.
>
>	The following waissearch output shows an example search of
>	an index generated by Essence of the ftp.cs.colorado.edu
>	anonymous FTP file system.  It shows an ordered list of the
>	ten files that best match the keyword netfind.  Netfind is
>	an Internet user directory service.  The headlines have up
>	to three fields representing the matching file: the core
>	filename, the filename (if different from the core
>	filename), and the file type.
> 
>------------------------------------------------------------
>
>csh% waissearch netfind
>   1:  /cs/ftp/techreports/schwartz/PostScript/Techniques.Wide.Area.ps.Z 
>       Techniques.Wide.Area.ps PostScript
>
>   2:  /cs/ftp/techreports/schwartz/PostScript/ALL.PS.tar.Z 
>       PostScript/Techniques.Wide.Area.ps PostScript
>
>   3:  /cs/ftp/distribs/netfind/netfind3.10.tar.Z ServerShell/nsh.c C
>
>   4:  /cs/ftp/distribs/netfind/README  README
>
>   5:  /cs/ftp/distribs/netfind/netfind3.10.tar.Z README README
>
>   6:  /cs/ftp/distribs/netfind/netfind3.10.tar.Z Doc/netfind.1 ManPage
>
>   7:  /cs/ftp/techreports/schwartz/PostScript/Proj.Overview.ps.Z 
>       Proj.Overview.ps PostScript
>
>   8:  /cs/ftp/techreports/schwartz/PostScript/RD.Comparison.ps.Z 
>       RD.Comparison.ps PostScript
>
>   9:  /cs/ftp/techreports/schwartz/PostScript/ALL.PS.tar.Z 
>       PostScript/Proj.Overview.ps PostScript
>
>   10: /cs/ftp/techreports/schwartz/PostScript/ALL.PS.tar.Z 
>       PostScript/RD.Comparison.ps PostScript
>csh%
>
>------------------------------------------------------------
>
>	Consider the effectiveness of the example search shown
>	above.  The best match is a PostScript paper that discusses
>	a number of techniques for distributed information systems,
>	with particular emphasis on techniques demonstrated by
>	Netfind; the second match is the same file, but found in
>	the compressed tar distribution ALL.PS.tar.Z.  The third
>	match is the C source code for the interactive user
>	interface to Netfind.  The fourth match is the README file
>	found in the Netfind distribution directory; the fifth
>	match is the same file, but found in the compressed tar
>	distribution netfind.3.10.tar.Z.  The sixth match is the
>	UNIX manual page for Netfind.  The remaining matches are
>	PostScript papers in which Netfind is discussed.
>
>	In WAIS, a user retrieves files by selecting a matching
>	headline.  With Essence, if the headline represents a file
>	hidden within a nested file (such as the first headline in the
>	example), the summary file is retrieved, instead of retrieving
>	the hidden file itself.  If the headline represents a plain
>	file (such as the fourth headline in the example), the summary
>	file is also retrieved.  This functionality requires allocating
>	storage for both the required summary files and the index.
>	However, it allows users to browse through remote file systems
>	by retrieving and viewing small summary files without having to
>	retrieve complete files.  This is useful when trying to decide
>	whether to transfer large files across a slow network.  
>" 
>)
>


home help back first fref pref prev next nref lref last post