[4984] in WWW Security List Archive
RE: ROBOTS
daemon@ATHENA.MIT.EDU (DeepSummer-HomeofWebSiteDesignsExt)
Sun Apr 6 01:36:40 1997
From: Deep Summer - Home of Web Site Designs Extraordinare
<frank@deepsummer.com>
To: Deep Summer - Home of Web Site Designs Extraordinare
<frank@deepsummer.com>,
"'Irving Reid'" <irving@border.com>
Cc: "'www-security@ns2.rutgers.edu'" <www-security@ns2.rutgers.edu>
Date: Sat, 5 Apr 1997 20:17:34 -0700
Errors-To: owner-www-security@ns2.rutgers.edu
If I had proprietary info within the domain space of any of
my Disallows', then yes, you have a very good point.
I don't, however. I just notice that without the Dissallow
statements I end up finding things in search engines that
convolute what I would rather have show up.
So, for my purposes, anyway, this is much easier, and
any 'evil bot' would merely be dissapointed at what it
found by following the dissalow clauses to such things
as http://www.deepsummer.com/azure/auto_reply.txt which
says something like (paraquote here):
Thanks for stopping by and signing in!
Sincerely,
-John Q. Legenda Admin
However, for anything I'd truly prefer not be seen,
I'd compliment my .htaccess no index clause with
exactly what you pointed out for robots.txt. If anything
were of even higher security (very very secret) it'd not
have read/write/or-exec access by anyone but myself, and
would exist in a higher level directory than what a bot
would see as '/'.
Thanks for the advice.
Sincerely,
-frank
----------
From: Irving Reid[SMTP:irving@border.com]
Sent: Saturday, April 05, 1997 11:40 AM
To: Deep Summer - Home of Web Site Designs Extraordinare
Cc: 'www-security@ns2.rutgers.edu'
Subject: Re: ROBOTS
> If you want a copy it's at http://www.deepsummer.com/robots.txt
> (should be able to do a shift-click on it to retrieve). If
> not, let me know and I'll mail you a copy.
>
> Also, if anyone wants to take a peek at it and let me know if
> you see anything I might have done better then by all means
> do so.
>
> -frank
Here's an excerpt from your robots.txt file:
# /robots.txt for http://www.deepsummer.com
# comments to webmaster@deepsummer.com
User-agent: *
Disallow: /azure/
You've just given me the exact path name for a directory you don't want
the web crawlers to know about.
Stupid Net Trick #47: If you want to see things that people think are
hidden, look for "Disallow" lines in their robots.txt files.
The right thing to do is deny _all_, and then explicitly allow the
files you want indexed. That way you don't leak information to nosy
people.
- irving -
(a really nasty person might write a crawler/indexer that _only_
indexed pages reached from peoples' "Disallow" lines. I'm not sure if
I'm not nasty enough, or just too lazy...)