[3524] in WWW Security List Archive

home help back first fref pref prev next nref lref last post

RE: Alta Vista may or may not harvest unadvertised documents

daemon@ATHENA.MIT.EDU (=?iso-8859-1?Q?=22A=2E_=D6mer_K=F6)
Wed Nov 13 08:59:10 1996

From: =?iso-8859-1?Q?=22A=2E_=D6mer_K=F6ker=22?= <Omer@superonline.net>
To: "'riddle@is.rice.edu'" <riddle@is.rice.edu>,
        "'John Cronin'"
	 <John.Cronin@oit.gatech.edu>
Cc: "'www-security@ns2.rutgers.edu'" <www-security@ns2.rutgers.edu>
Date: Wed, 13 Nov 1996 13:56:27 +0200
Errors-To: owner-www-security@ns2.rutgers.edu


Even if the described reaction of the Alta Vista Crawler is true, it
still
obeys (at least from my experience) the robots exclusion standards
fully.

So simply by including robots.txt file you can instruct what parts of
your webspace to be publicly indexable.  You can also include meta
information marking it as local as well.  From experience the robots.txt
is more widely implemented.

For more details try;

http://info.webcrawler.com/mak/projects/robots/norobots.html


Regards,

A.Omer Koker
omer@superonline.net

>----------
>From: 	John Cronin[SMTP:John.Cronin@oit.gatech.edu]
>Sent: 	12 Kas=FDm 1996 Sal=FD 16:17
>To: 	riddle@is.rice.edu
>Cc: 	www-security@ns2.rutgers.edu
>Subject: 	Re: Alta Vista may or may not harvest unadvertised documents
>
>Once upon a time, Prentiss Riddle told me this tale:
>->
>->This item from a recent Risks Digest caught my eye:
>->
>->   http://catless.ncl.ac.uk/Risks/18.58.html#subj8
>->  =20
>[details deleted]
>->
>->In other words, when processing a URL like:
>->
>->	http://www.foo.com/somepath/somefilename.html
>->
>->...is is alleged that the Alta Vista harvester will truncate the URL
>->to:
>->
>->	http://www.foo.com/somepath/
>->
>->...in hopes that an automatically generated index of the directory
>->will turn up files for which there is no explicit HREF link.
>
>Come on, haven't we all done this at one time or another individually?
>I do it all the time.  Often, I have just used a web searcher to find
>an interesting link, and I decide I want to go up to the main page,
>and there is no convenient button to take me there.  I just go up
>into the "Location: " box and delete the last item in the path.  Repeat
>until you find what you want or are convinced it is not there.
>
>->Regardless of whether the Alta Vista harvester is this aggressive,
>->other harvesters (or individual human users) might be, so the prudent
>->thing is never to put files in a world-readable web tree that you =
can't
>->afford for the world to see.  Other recent RISKS postings include a =
few
>->horror stories on this theme.
>
>This is the crux of it.  Either that, or be VERY careful about always
>putting in an appropriate index file for each directory AND make sure
>all your permissions are set properly.
>
>--=20
>John Cronin
>Office of Information Technology Customer Support Center 0710
>Georgia Institute of Technology, Atlanta Georgia, 30332
>Internet: john.cronin@oit.gatech.edu
>phone: (404) 894-7563
>

home help back first fref pref prev next nref lref last post