[139] in Information Retrieval
Fwd: Essence prototype announcement
daemon@ATHENA.MIT.EDU (Jerome H Saltzer)
Mon Jan 25 11:12:10 1993
Date: Mon, 25 Jan 1993 11:10:23 -0500
To: elibdev@MIT.EDU
From: Jerome H Saltzer <Saltzer@MIT.EDU>
Another entry in the WAIS/gopher/WWW field...
>Date: Tue, 12 Jan 1993 16:51:15 -0700
>From: Mike Schwartz <schwartz@latour.cs.colorado.edu>
>To: Essence-announcement-list@latour.cs.colorado.edu
>Subject: Essence prototype announcement
>
>Essence is a resource discovery system that exploits file semantics to
>index both textual and binary files. Essence generates summaries that
>can be used to browse files before retrieving them across slow network
>links, as well as space efficient indexes. Essence understands nested
>file structures (such as uuencoded, compressed, "tar" files), and
>recursively unravels such files to generate summaries for them. These
>features allow Essence to be used in a number of useful settings, such
>as anonymous FTP archives. The prototype generates WAIS-compatible
>indexes, allowing WAIS users to take advantage of the Essence indexing
>methods.
>
>WAIS users can try Essence using the ".src" file enclosed below. This file
>also describes where to get the prototype source code and a paper about this
>system.
>
> Darren Hardy and
> Michael Schwartz
> Dept. of Computer Science
> Univ. of Colorado - Boulder
>
>-------------------------------------------------------------------------------
>
>(:source
> :version 3
> :ip-address "128.138.243.151"
> :ip-name "ftp.cs.colorado.edu"
> :tcp-port 8000
> :database-name "aftp-cs-colorado-edu"
> :cost 0.00
> :cost-unit :free
> :maintainer "hardy@cs.colorado.edu"
> :description
>"You can use this WAIS server to search and retrieve files from the
>anonymous ftp archive on ftp.cs.colorado.edu [128.138.243.151]. We
>used Essence, a resource discovery system based on semantic file
>indexing, to build the WAIS index for this server. As explained below,
>Essence currently only allows the retrieval of file summaries through
>WAIS. To retrieve entire files, use anonymous ftp on ftp.cs.colorado.edu.
>
>Essence exploits file semantics to index both textual and binary
>files. By exploiting semantics, Essence extracts keywords that
>summarize a file, and generates a compact yet representative index.
>Essence understands nested file structures (such as uuencoded,
>compressed, ``tar'' files), and recursively unravels such files to
>generate summaries for them. Essence generates indexes that are ten
>times smaller than WAIS indexes, but retain the fine-grained
>information access that WAIS's full-text indexes provide.
>
>Furthermore, Essence generates WAIS-compatible indexes allowing WAIS
>users to make use of Essence's indexing capabilities. This is one of
>the ways that the Networked Resource Discovery Project at the
>University of Colorado has extended the conceptual paradigm of the type
>of information that WAIS handles.
>
>If you would like to learn more about Essence, you can obtain the
>source to the Essence prototype and a paper which appears in the 1993
>Winter USENIX Technical Conference, San Diego, CA, January 1993,
>pp. 361-374. Both the paper and the prototype are available via
>anonymous ftp from ftp.cs.colorado.edu in /pub/cs/distribs/essence.
>Or search for the keyword 'Essence' using this WAIS server to find all
>of the files on ftp.cs.colorado.edu that are related to Essence; you
>will find the files for both the paper and the prototype.
>
>This WAIS server was created in December 1992 by Darren R. Hardy and
>Michael F. Schwartz as part of the Networked Resource Discovery
>Project. You may reach them at the Department of Computer Science,
>University of Colorado, Boulder, CO 80309-0430, or via email at
>hardy@cs.colorado.edu and schwartz@cs.colorado.edu.
>
>Below is some more information about the WAIS interface to Essence.
>
> Essence exports its indexes through WAIS's search and
> retrieval interface, allowing users to use tools such as
> waissearch and the X Windows-based graphical user interface
> xwais. In order to generate WAIS-compatible indexes,
> Essence uses WAIS's indexing software to index the Essence
> summary files. This mechanism generates full-text WAIS
> indexes from the Essence summary files.
>
> We modified the WAIS indexing mechanism to understand the
> format of the Essence summary files, so that it generates
> meaningful WAIS headlines. These headlines provide users
> with a short description of a single file, usually a
> filename. With Essence, headlines represent a file's core
> filename, its actual filename, and its file type.
>
> To support additional file types, WAIS must be recompiled
> with new procedures that understand these file types. With
> Essence, one need only write a new summarizer, add its name
> to a configuration file, and add new heuristics for
> identifying the file type; no recompilation is necessary.
> In this sense, Essence modularizes the typed-file indexing
> extensions that WAIS can use, because it removes the
> keyword extraction process from WAIS and places it instead
> in Essence. Essence is better suited to incorporating new
> file types, and can be quickly adapted to become a
> comprehensive indexing system.
>
> The following waissearch output shows an example search of
> an index generated by Essence of the ftp.cs.colorado.edu
> anonymous FTP file system. It shows an ordered list of the
> ten files that best match the keyword netfind. Netfind is
> an Internet user directory service. The headlines have up
> to three fields representing the matching file: the core
> filename, the filename (if different from the core
> filename), and the file type.
>
>------------------------------------------------------------
>
>csh% waissearch netfind
> 1: /cs/ftp/techreports/schwartz/PostScript/Techniques.Wide.Area.ps.Z
> Techniques.Wide.Area.ps PostScript
>
> 2: /cs/ftp/techreports/schwartz/PostScript/ALL.PS.tar.Z
> PostScript/Techniques.Wide.Area.ps PostScript
>
> 3: /cs/ftp/distribs/netfind/netfind3.10.tar.Z ServerShell/nsh.c C
>
> 4: /cs/ftp/distribs/netfind/README README
>
> 5: /cs/ftp/distribs/netfind/netfind3.10.tar.Z README README
>
> 6: /cs/ftp/distribs/netfind/netfind3.10.tar.Z Doc/netfind.1 ManPage
>
> 7: /cs/ftp/techreports/schwartz/PostScript/Proj.Overview.ps.Z
> Proj.Overview.ps PostScript
>
> 8: /cs/ftp/techreports/schwartz/PostScript/RD.Comparison.ps.Z
> RD.Comparison.ps PostScript
>
> 9: /cs/ftp/techreports/schwartz/PostScript/ALL.PS.tar.Z
> PostScript/Proj.Overview.ps PostScript
>
> 10: /cs/ftp/techreports/schwartz/PostScript/ALL.PS.tar.Z
> PostScript/RD.Comparison.ps PostScript
>csh%
>
>------------------------------------------------------------
>
> Consider the effectiveness of the example search shown
> above. The best match is a PostScript paper that discusses
> a number of techniques for distributed information systems,
> with particular emphasis on techniques demonstrated by
> Netfind; the second match is the same file, but found in
> the compressed tar distribution ALL.PS.tar.Z. The third
> match is the C source code for the interactive user
> interface to Netfind. The fourth match is the README file
> found in the Netfind distribution directory; the fifth
> match is the same file, but found in the compressed tar
> distribution netfind.3.10.tar.Z. The sixth match is the
> UNIX manual page for Netfind. The remaining matches are
> PostScript papers in which Netfind is discussed.
>
> In WAIS, a user retrieves files by selecting a matching
> headline. With Essence, if the headline represents a file
> hidden within a nested file (such as the first headline in the
> example), the summary file is retrieved, instead of retrieving
> the hidden file itself. If the headline represents a plain
> file (such as the fourth headline in the example), the summary
> file is also retrieved. This functionality requires allocating
> storage for both the required summary files and the index.
> However, it allows users to browse through remote file systems
> by retrieving and viewing small summary files without having to
> retrieve complete files. This is useful when trying to decide
> whether to transfer large files across a slow network.
>"
>)
>