[4991] in WWW Security List Archive
Re: ROBOTS
daemon@ATHENA.MIT.EDU (Christopher Petrilli)
Mon Apr 7 13:56:26 1997
Date: Mon, 7 Apr 97 08:57:06 -0400
From: Christopher Petrilli <petrilli@amber.org>
To: "Irving Reid" <irving@border.com>,
"Deep Summer - Home of Web Site Designs Extraordinare" <frank@deepsummer.com>
cc: "'www-security@ns2.rutgers.edu'" <www-security@ns2.rutgers.edu>
Errors-To: owner-www-security@ns2.rutgers.edu
In reply to Irving Reid at irving@border.com:
>Here's an excerpt from your robots.txt file:
>
> # /robots.txt for http://www.deepsummer.com
> # comments to webmaster@deepsummer.com
>
> User-agent: *
> Disallow: /azure/
>
>You've just given me the exact path name for a directory you don't want
>the web crawlers to know about.
>
>Stupid Net Trick #47: If you want to see things that people think are
>hidden, look for "Disallow" lines in their robots.txt files.
>
>The right thing to do is deny _all_, and then explicitly allow the
>files you want indexed. That way you don't leak information to nosy
>people.
Anyone who depends on robots.txt to give them "security" is getting what
they paid for. Simply obscuring your URL (i.e. security thru obscurity)
is silly, as I've been known to hunt around in web sites to find the real
pages to links that are broken. You should also turn off the directory
display as well.
If you don't want the general public, which is what a robot is, quite
honestly, to see something then put it behind a security domain and
require a user-id and password. We can argue about the strength of such
systems, but it's highly unlikely a robot can get back past it.
Christopher