[184] in Information Retrieval

home help back first fref pref prev next nref lref last post

Literature pointer - AT&T's Ferret system

daemon@ATHENA.MIT.EDU (Mitchell N Charity)
Mon Jul 19 14:11:02 1993

Date: Mon, 19 Jul 93 14:09:53 EDT
From: mcharity@hq.lcs.mit.edu (Mitchell N Charity)
To: elibdev@Athena.MIT.EDU
Reply-To: mcharity@lcs.mit.edu

Literature pointer - 

 The Ferret Document Browser
  Katseff, Howard P.
  London, Thomas B.
  AT&T Bell Laboratories (Holmdel)
  USENIX Summer 1993, pages 101-110.

[My thanks to Win Treese for pointing this out...]

---------------------------------------------------------------------
NOTES

    21,000 AT&T Bell Labs internal technical memoranda (>=1989).
    Decoupled search and image service.
    100bpi,1bit multi-file G4 TIFF, with optional grayscale enhancement.
    Emphasis on speed over image quality.
    Very simple browser user interface.

Authors

  Katseff, Howard P.
    hpk@research.att.com
    Broadband networks.

  London, Thomas B.
    tbl@research.att.com
    OS, multiprocessor programming & systems,
    communication intensive services & systems.

Storage & Scanning

  AT&T Bell Labs technical memoranda

    21,000 documents
    30Gbytes.
    Representations stored:
      400bpi,1bit scanned image
      100bpi,1bit resampled from scanned image, or generated from PostScript.
      Occasional PostScript document.
    Experimental distributed BroadBand FileSystem with 36 disks (40GB).

    40 documents / day.
    400 bpi, 1bit.
    Multi-page TIFF with G4 fax compression.
    40:1 compression ratio.
    1.4Mbytes / document (average) [calculated-mcharity]
    New scans moved nightly to fileserver over local area net.

    PostScript documents:
      PS documents are stored: as PS, and derived 100bpi,1bit image.
      PS adds/dissads:
        + smaller
        + device independence
        - viewers not always available; may be slow, have few fonts
        - not always portable, perhaps due to bugs in docs or viewers

  AT&T archive photos

    9,000 documents.
    3Gbytes on 2 external magnetic SCSI disks.
    Sun workstation server.

    Scanned as time permits.
    62dpi, 8bit grayscale.
    TIFF with LZW compression.
    200Kbytes / 8x10 photograph.

Architecture

  Two systems
    Production service
      AT&T corporate internet (local ethernet, 64kbit/sec intersite links).
      Provide speed instead of beauty
      <1/2 sec next-page delay
    Experimental system
      High speed filesystem and network:
        BroadBand FileSystem research effort
           to support: - data-intensive applications (ex HDTV),
                       - distributed and parallel computing.
        Liaison network multimedia wkstation
      Flipping pages as in a book.
      15 pages/sec (working towards 30pgs/sec).

  Information retrieval system largely decoupled from image viewing system.
    multiple image stores, multiple search clients.
    img dbs at different locations, maintained by different orgs
    servers may be accessed by db systems other than LINUS

  AT&T LINUS (LIbrary Network User Service) system.
    AT&T employees
    online databases, inc technical memoranda and photograph db
    memoranda db includes abstracts and keywords.
    Slimmer information retrieval system allows varies types of search.
    Provides: search, selection, authorization
  
  Getting a document
    search for and select document using an information retrieval system
    client determines corresponding image database
    copies multipage TIFF file with entire document
      uses TCP socket to image database
      Speed - 25 page doc: ethernet - <2sec
                           64kbit/s - 1 min

Image Handling

  Normally use 100bpi,1bit.
  Preprepared multi-page G4 TIFF file copied from server.
  Each page decompressed as needed.
  File use begins while file still being copied.
  Stats:
    Page decompress time <1/3 sec on Sun IPX workstation
    15Kbytes / page
    20 page doc is 300Kbytes, thus likely in fs cache - no disk access.

  Why 100bpi?
    adequate
    "lowest resolution readily visible on our workstations"
    printed part of doc fits on 1152x900 screen of Suns.
    higher res rejected as panning considered cumbersome
    small fonts hard to read

  Enhancement:
    100bpi,grayscale  (400mono->100gray by 4by4->1gray)
      takes "several seconds" / page
      slower transfer, decompress, X
    Approach:    
      normally use mono.
      can request "detail" any time
      "detail mode" first does mono, then overlays detail.
      [Why isnt this default?  Multi-page (transfer cost) lossage? -mcharity]

User Interface Appearance

  simple - image, and slidebar with number
  850x1100 image
  mouse based
    R,L - forward,back page
    M   - slider to any page
    Ctrl R,L - continuous forward,back page
    Ctrl M   - quit
  User can configure to view PS when available.
  X, OpenLook or Motif

Printing

  Central printing + company mail.
  Sun SPARCprinter (low cost).
  Connected to internal bus of a Sparc workstation.
  400 dpi page images sent to printer uncompressed.
  11 pages/sec (rated printer speed).

  FAX interface
    400->200bpi, faxed in "fine" mode.
    photos ->200bpi, dithered
      several minutes cpu
      look surprisingly good

  Experimented with printing to local PostScript printers.
    Saw 5 to 15 minutes/page at 400 bpi.
    [This seems excessive for ethernet'ed PS printers -mcharity]

Users

  Currently used by nearly 1000 people.
  Survey of 140 users:  (internal servey, April 1992)
    >95% read documents online, instead of requesting print.
      This was contrary to developer expectations.
    User remarks: - faster and more efficient to read from screen
                  - no longer felt need to keep own paper copies
    One use / week for most users.
    Most found system easy to use.

[Omitted:
   - description of experimental high performance network/browser
   - some process/programming level details
]   

home help back first fref pref prev next nref lref last post