[3513] in WWW Security List Archive
Alta Vista may or may not harvest unadvertised documents
daemon@ATHENA.MIT.EDU (Prentiss Riddle)
Mon Nov 11 14:38:09 1996
From: Prentiss Riddle <riddle@is.rice.edu>
To: www-security@ns2.rutgers.edu
Date: Mon, 11 Nov 1996 10:32:05 -0600 (CST)
Errors-To: owner-www-security@ns2.rutgers.edu
This item from a recent Risks Digest caught my eye:
http://catless.ncl.ac.uk/Risks/18.58.html#subj8
| Date: Wed, 6 Nov 96 12:32:35 EST
| From: skill@qucis.queensu.ca (David Skillicorn)
| Subject: Web search engines find connected components
|
| The altavista search engine at least finds files that do not have
| URLs pointing to them, as long as they are in directories that it
| visits for other reasons. I discovered this when a search of a
| well-known CS repository turned up files containing all sorts of
| administrative information, not intended for public consumption.
|
| It seems sensible to keep only things you want seen in directories
| that web servers can access. Having permissions set properly will
| prevent web servers seeing other files, but fouling up permissions is
| an easy error.
In other words, when processing a URL like:
http://www.foo.com/somepath/somefilename.html
...is is alleged that the Alta Vista harvester will truncate the URL
to:
http://www.foo.com/somepath/
...in hopes that an automatically generated index of the directory
will turn up files for which there is no explicit HREF link.
I've been in touch with both Alta Vista and David Skillcorn. Alta
Vista denies that their harvesters behave this way. Skillcorn says
that his information is second hand but comes from someone he trusts.
I've tried testing the behavior of the Alta Vista harvester with a
small experiment; so far its results support Alta Vista's claim, but
since I don't know how quickly their harvesters operate, I can't be
sure.
Regardless of whether the Alta Vista harvester is this aggressive,
other harvesters (or individual human users) might be, so the prudent
thing is never to put files in a world-readable web tree that you can't
afford for the world to see. Other recent RISKS postings include a few
horror stories on this theme.
-- Prentiss Riddle ("aprendiz de todo, maestro de nada") riddle@rice.edu
-- RiceInfo Administrator, Rice University / http://is.rice.edu/~riddle
-- Opinions expressed are not necessarily those of my employer.