[7077] in www-talk@info.cern.ch

home help back first fref pref prev next nref lref last post

Introducing FFW - Freetext search For Web

daemon@ATHENA.MIT.EDU (Baard Haafjeld)
Sat Dec 31 18:50:44 1994

Date: Sun, 1 Jan 1995 00:37:50 +0100
Errors-To: listmaster@www0.cern.ch
Reply-To: Bard.Hafjeld@nta.no
From: Baard Haafjeld <Bard.Hafjeld@nta.no>
To: Multiple recipients of list <www-talk@www0.cern.ch>

FFW version 1.0
---------------

FFW - Freetext search For Web is a package made to provide easy-to-use 
freetext searching facilities over HTML documents (and as a special case 
plain text documents). The output is intended as input 
to scripts providing the user interface, typically CGI scripts. 

FFW is basically intended to replace similar solutions based on the
Wais search engine, and solves some of the problems we experienced when
using the Wais engine.

FFW is developed by the MultiTorg project at TeleNor Research, Norway.

The FFW info pages are at http://www.nta.no/produkter/ffw/ffw.html
Sources are at ftp://ftp.nta.no/pub/ffw

Sources are compiled under SunOS 4.1.3 with gcc 2.6.2 ONLY, those using
other systems might encounter problems. This IS version 1.0 :)
I do however not expect big problems making it compile on other systems.

FFW features:

- Traditional inverted index, considerably smaller than a Wais index.
  On test datasets we have seen FFW indexes at 1/3 the size of a Wais index.
  This of course will depend on data set size and content.

- Full HTML parsing on input, reserved HTML words are not indexed.
  Input parser can easily be replaced with parser for other formats.

- Low semantic content words like and, or, not, if, etc. can be filtered
  out of the index to reduce index size. This is done by providing exclusion
  lists.

- Flexible indexer, can take document list from input, stdin or parameter 
  files.

- Memory conservative merge program allows efficient incremental building of
  huge indexes. Two FFW indexes can be quickly merged into one. Building
  huge indexes can generally be a problem because indexer program size
  outgrows machine physical memory, leading to excessive paging load.
  ffwmerge solves this problem.

- Can search in several indexes at the same time.

- Self-contained index, does not need access to the data files to construct
  the user presentation. URL's and document 'title' are stored directly in
  the index, index server can be totally independent of the server holding
  the documents. No access to the source files needed to present the search
  result to the user.

- Written in compiled C++ for efficiency.

- Searching supports a formal expression grammar with AND, OR, NOT and ().

- Program messages are separated in one file for easy nationalisation.
  Norwegian and English versions are provided.

- Support for using several indexes with one CGI script, no need to use one
  script for each searchable area.

- 8-bit characters fully supported, HTML character escape codes are changed
  into their 8-bit ISO8859-1 equivalents where possible. This makes words
  with escape codes in them searchable.

I wish you all a Merry Christmas and a Happy New Year!

 				       |
Baard Haafjeld			       | When you give a wolf a poodle cut, you 
Norwegian TeleNor Research	       | don't get a show dog but a pissed wolf.
SMTP-mail: Baard.Haafjeld@tf.tele.no   |                        -Robert Asprin


home help back first fref pref prev next nref lref last post