[4983] in WWW Security List Archive

home help back first fref pref prev next nref lref last post

Re: ROBOTS

daemon@ATHENA.MIT.EDU (Irving Reid)
Sat Apr 5 17:33:46 1997

To: Deep Summer - Home of Web Site Designs Extraordinare <frank@deepsummer.com>
cc: "'www-security@ns2.rutgers.edu'" <www-security@ns2.rutgers.edu>
In-reply-to: frank's message of "Tue, 01 Apr 1997 19:23:58 -0500".
	 <97Apr1.223658est.11652@janus.border.com> 
From: "Irving Reid" <irving@border.com>
Date: Sat, 5 Apr 1997 13:40:37 -0500
Errors-To: owner-www-security@ns2.rutgers.edu

>      If you want a copy it's at http://www.deepsummer.com/robots.txt
>      (should be able to do a shift-click on it to retrieve). If
>      not, let me know and I'll mail you a copy.
>  
>      Also, if anyone wants to take a peek at it and let me know if
>      you see anything I might have done better then by all means
>      do so.
>  
>      -frank

Here's an excerpt from your robots.txt file:

    # /robots.txt for http://www.deepsummer.com
    # comments to webmaster@deepsummer.com

    User-agent: *
    Disallow: /azure/

You've just given me the exact path name for a directory you don't want 
the web crawlers to know about.

Stupid Net Trick #47: If you want to see things that people think are 
hidden, look for "Disallow" lines in their robots.txt files.

The right thing to do is deny _all_, and then explicitly allow the 
files you want indexed.  That way you don't leak information to nosy 
people.

 - irving -

(a really nasty person might write a crawler/indexer that _only_ 
indexed pages reached from peoples' "Disallow" lines.  I'm not sure if 
I'm not nasty enough, or just too lazy...)


home help back first fref pref prev next nref lref last post