[4991] in WWW Security List Archive

home help back first fref pref prev next nref lref last post

Re: ROBOTS

daemon@ATHENA.MIT.EDU (Christopher Petrilli)
Mon Apr 7 13:56:26 1997

Date: Mon, 7 Apr 97 08:57:06 -0400
From: Christopher Petrilli <petrilli@amber.org>
To: "Irving Reid" <irving@border.com>,
        "Deep Summer - Home of Web Site Designs Extraordinare" <frank@deepsummer.com>
cc: "'www-security@ns2.rutgers.edu'" <www-security@ns2.rutgers.edu>
Errors-To: owner-www-security@ns2.rutgers.edu

In reply to Irving Reid at irving@border.com:

>Here's an excerpt from your robots.txt file:
>
>    # /robots.txt for http://www.deepsummer.com
>    # comments to webmaster@deepsummer.com
>
>    User-agent: *
>    Disallow: /azure/
>
>You've just given me the exact path name for a directory you don't want 
>the web crawlers to know about.
>
>Stupid Net Trick #47: If you want to see things that people think are 
>hidden, look for "Disallow" lines in their robots.txt files.
>
>The right thing to do is deny _all_, and then explicitly allow the 
>files you want indexed.  That way you don't leak information to nosy 
>people.

Anyone who depends on robots.txt to give them "security" is getting what 
they paid for.  Simply obscuring your URL (i.e. security thru obscurity) 
is silly, as I've been known to hunt around in web sites to find the real 
pages to links that are broken.  You should also turn off the directory 
display as well.  

If you don't want the general public, which is what a robot is, quite 
honestly, to see something then put it behind a security domain and 
require a user-id and password.  We can argue about the strength of such 
systems, but it's highly unlikely a robot can get back past it.

Christopher

home help back first fref pref prev next nref lref last post