[4984] in WWW Security List Archive

home help back first fref pref prev next nref lref last post

RE: ROBOTS

daemon@ATHENA.MIT.EDU (DeepSummer-HomeofWebSiteDesignsExt)
Sun Apr 6 01:36:40 1997

From: Deep Summer - Home of Web Site Designs Extraordinare
	 <frank@deepsummer.com>
To: Deep Summer - Home of Web Site Designs Extraordinare
	 <frank@deepsummer.com>,
        "'Irving Reid'" <irving@border.com>
Cc: "'www-security@ns2.rutgers.edu'" <www-security@ns2.rutgers.edu>
Date: Sat, 5 Apr 1997 20:17:34 -0700
Errors-To: owner-www-security@ns2.rutgers.edu


    If I had proprietary info within the domain space of any of
    my Disallows', then yes, you have a very good point.

    I don't, however. I just notice that without the Dissallow
    statements I end up finding things in search engines that
    convolute what I would rather have show up.

    So, for my purposes, anyway, this is much easier, and
    any 'evil bot' would merely be dissapointed at what it
    found by following the dissalow clauses to such things
    as http://www.deepsummer.com/azure/auto_reply.txt which
    says something like (paraquote here):

        Thanks for stopping by and signing in!

        Sincerely,

        -John Q. Legenda Admin

    However, for anything I'd truly prefer not be seen,
    I'd compliment my .htaccess no index clause with
    exactly what you pointed out for robots.txt. If anything
    were of even higher security (very very secret) it'd not
    have read/write/or-exec access by anyone but myself, and
    would exist in a higher level directory than what a bot
    would see as '/'.

    Thanks for the advice.

    Sincerely,

    -frank

----------
From: 	Irving Reid[SMTP:irving@border.com]
Sent: 	Saturday, April 05, 1997 11:40 AM
To: 	Deep Summer - Home of Web Site Designs Extraordinare
Cc: 	'www-security@ns2.rutgers.edu'
Subject: 	Re: ROBOTS

>      If you want a copy it's at http://www.deepsummer.com/robots.txt
>      (should be able to do a shift-click on it to retrieve). If
>      not, let me know and I'll mail you a copy.
>  
>      Also, if anyone wants to take a peek at it and let me know if
>      you see anything I might have done better then by all means
>      do so.
>  
>      -frank

Here's an excerpt from your robots.txt file:

    # /robots.txt for http://www.deepsummer.com
    # comments to webmaster@deepsummer.com

    User-agent: *
    Disallow: /azure/

You've just given me the exact path name for a directory you don't want 
the web crawlers to know about.

Stupid Net Trick #47: If you want to see things that people think are 
hidden, look for "Disallow" lines in their robots.txt files.

The right thing to do is deny _all_, and then explicitly allow the 
files you want indexed.  That way you don't leak information to nosy 
people.

 - irving -

(a really nasty person might write a crawler/indexer that _only_ 
indexed pages reached from peoples' "Disallow" lines.  I'm not sure if 
I'm not nasty enough, or just too lazy...)




home help back first fref pref prev next nref lref last post