[184] in Information Retrieval
Literature pointer - AT&T's Ferret system
daemon@ATHENA.MIT.EDU (Mitchell N Charity)
Mon Jul 19 14:11:02 1993
Date: Mon, 19 Jul 93 14:09:53 EDT
From: mcharity@hq.lcs.mit.edu (Mitchell N Charity)
To: elibdev@Athena.MIT.EDU
Reply-To: mcharity@lcs.mit.edu
Literature pointer -
The Ferret Document Browser
Katseff, Howard P.
London, Thomas B.
AT&T Bell Laboratories (Holmdel)
USENIX Summer 1993, pages 101-110.
[My thanks to Win Treese for pointing this out...]
---------------------------------------------------------------------
NOTES
21,000 AT&T Bell Labs internal technical memoranda (>=1989).
Decoupled search and image service.
100bpi,1bit multi-file G4 TIFF, with optional grayscale enhancement.
Emphasis on speed over image quality.
Very simple browser user interface.
Authors
Katseff, Howard P.
hpk@research.att.com
Broadband networks.
London, Thomas B.
tbl@research.att.com
OS, multiprocessor programming & systems,
communication intensive services & systems.
Storage & Scanning
AT&T Bell Labs technical memoranda
21,000 documents
30Gbytes.
Representations stored:
400bpi,1bit scanned image
100bpi,1bit resampled from scanned image, or generated from PostScript.
Occasional PostScript document.
Experimental distributed BroadBand FileSystem with 36 disks (40GB).
40 documents / day.
400 bpi, 1bit.
Multi-page TIFF with G4 fax compression.
40:1 compression ratio.
1.4Mbytes / document (average) [calculated-mcharity]
New scans moved nightly to fileserver over local area net.
PostScript documents:
PS documents are stored: as PS, and derived 100bpi,1bit image.
PS adds/dissads:
+ smaller
+ device independence
- viewers not always available; may be slow, have few fonts
- not always portable, perhaps due to bugs in docs or viewers
AT&T archive photos
9,000 documents.
3Gbytes on 2 external magnetic SCSI disks.
Sun workstation server.
Scanned as time permits.
62dpi, 8bit grayscale.
TIFF with LZW compression.
200Kbytes / 8x10 photograph.
Architecture
Two systems
Production service
AT&T corporate internet (local ethernet, 64kbit/sec intersite links).
Provide speed instead of beauty
<1/2 sec next-page delay
Experimental system
High speed filesystem and network:
BroadBand FileSystem research effort
to support: - data-intensive applications (ex HDTV),
- distributed and parallel computing.
Liaison network multimedia wkstation
Flipping pages as in a book.
15 pages/sec (working towards 30pgs/sec).
Information retrieval system largely decoupled from image viewing system.
multiple image stores, multiple search clients.
img dbs at different locations, maintained by different orgs
servers may be accessed by db systems other than LINUS
AT&T LINUS (LIbrary Network User Service) system.
AT&T employees
online databases, inc technical memoranda and photograph db
memoranda db includes abstracts and keywords.
Slimmer information retrieval system allows varies types of search.
Provides: search, selection, authorization
Getting a document
search for and select document using an information retrieval system
client determines corresponding image database
copies multipage TIFF file with entire document
uses TCP socket to image database
Speed - 25 page doc: ethernet - <2sec
64kbit/s - 1 min
Image Handling
Normally use 100bpi,1bit.
Preprepared multi-page G4 TIFF file copied from server.
Each page decompressed as needed.
File use begins while file still being copied.
Stats:
Page decompress time <1/3 sec on Sun IPX workstation
15Kbytes / page
20 page doc is 300Kbytes, thus likely in fs cache - no disk access.
Why 100bpi?
adequate
"lowest resolution readily visible on our workstations"
printed part of doc fits on 1152x900 screen of Suns.
higher res rejected as panning considered cumbersome
small fonts hard to read
Enhancement:
100bpi,grayscale (400mono->100gray by 4by4->1gray)
takes "several seconds" / page
slower transfer, decompress, X
Approach:
normally use mono.
can request "detail" any time
"detail mode" first does mono, then overlays detail.
[Why isnt this default? Multi-page (transfer cost) lossage? -mcharity]
User Interface Appearance
simple - image, and slidebar with number
850x1100 image
mouse based
R,L - forward,back page
M - slider to any page
Ctrl R,L - continuous forward,back page
Ctrl M - quit
User can configure to view PS when available.
X, OpenLook or Motif
Printing
Central printing + company mail.
Sun SPARCprinter (low cost).
Connected to internal bus of a Sparc workstation.
400 dpi page images sent to printer uncompressed.
11 pages/sec (rated printer speed).
FAX interface
400->200bpi, faxed in "fine" mode.
photos ->200bpi, dithered
several minutes cpu
look surprisingly good
Experimented with printing to local PostScript printers.
Saw 5 to 15 minutes/page at 400 bpi.
[This seems excessive for ethernet'ed PS printers -mcharity]
Users
Currently used by nearly 1000 people.
Survey of 140 users: (internal servey, April 1992)
>95% read documents online, instead of requesting print.
This was contrary to developer expectations.
User remarks: - faster and more efficient to read from screen
- no longer felt need to keep own paper copies
One use / week for most users.
Most found system easy to use.
[Omitted:
- description of experimental high performance network/browser
- some process/programming level details
]