[5004] in WWW Security List Archive

home help back first fref pref prev next nref lref last post

RE: ROBOTS

daemon@ATHENA.MIT.EDU (DeepSummer-HomeofWebSiteDesignsExt)
Tue Apr 8 21:40:26 1997

From: Deep Summer - Home of Web Site Designs Extraordinare
	 <frank@deepsummer.com>
To: Irving Reid <irving@border.com>,
        Deep Summer - Home of Web Site Designs Extraordinare <frank@deepsummer.com>,
        "'Christopher Petrilli'" <petrilli@amber.org>
Cc: "'www-security@ns2.rutgers.edu'" <www-security@ns2.rutgers.edu>
Date: Tue, 8 Apr 1997 16:08:16 -0600
Errors-To: owner-www-security@ns2.rutgers.edu


    In second reply to the 'Stupid Net Trick #47...',  if I
    understand the syntax correctly, one has to first do an
    Allow: on all entities allowed, and then follow it with
    a Dissallow: '/'. If the Dissallow '/' is first, the bot
    simply goes on to another site. Though I did modify my
    robots.txt to reflect this (my site is rather small),
    I'd find it ridiculous to do that for a large site. Can
    you say 'M A I N T  H E L L'? Read the spec (it's included
    in comment in my robots.txt - I'm pretty certain this
    is how the syntax works.

    In reply to Christopher, you hit the nail squarly on
    the head. I'd find it totally absurd to consider using
    robots.txt as a 'see only what I want you to see' peep-
    hole. As I said in my last reply - I use it for one
    purpose - to guide search engine bots in the task of
    indexing the things I wish to be indexed, and to not
    have them index things that would obsure my entries in
    any search engines (lovely to find my site on a search
    for 'This is an autoresponder test'... NOT.)

    However, I think the main issue (it's fading) was to
    do with how well robots.txt files work. For benevolent
    bots, my logs indicate that robots.txt works wonderfully.
    For evil bots (remember Arnold? Okay, so he was a
    cybernetic organism...) there are other ways of dealing
    with security that have nothing at all to do with
    robots.txt. In hindsight (I see my chair) I think it
    might have been best had I made that my two cents
    in the first place rather than to issue anything
    resembling philosophy, so I'll take the hit for that.

    Anyway, I've done the dasterdly deed of obscuring my
    robots.txt now - mainly just to sate my curiosity on
    proper syntax, but I still have to agree with Christopher
    in that that sort of thinking can lead to lots of
    problems. Especially if obsurity a'la security is
    carried into other realms of life (if I paint my
    house like a police car I don't think it's going
    to make it any less easy to rob).

    -frank

----------
From: 	Christopher Petrilli[SMTP:petrilli@amber.org]
Sent: 	Monday, April 07, 1997 6:57 AM
To: 	Irving Reid; Deep Summer - Home of Web Site Designs Extraordinare
Cc: 	'www-security@ns2.rutgers.edu'
Subject: 	Re: ROBOTS 

In reply to Irving Reid at irving@border.com:

>Here's an excerpt from your robots.txt file:
>
>    # /robots.txt for http://www.deepsummer.com
>    # comments to webmaster@deepsummer.com
>
>    User-agent: *
>    Disallow: /azure/
>
>You've just given me the exact path name for a directory you don't want 
>the web crawlers to know about.
>
>Stupid Net Trick #47: If you want to see things that people think are 
>hidden, look for "Disallow" lines in their robots.txt files.
>
>The right thing to do is deny _all_, and then explicitly allow the 
>files you want indexed.  That way you don't leak information to nosy 
>people.

Anyone who depends on robots.txt to give them "security" is getting what 
they paid for.  Simply obscuring your URL (i.e. security thru obscurity) 
is silly, as I've been known to hunt around in web sites to find the real 
pages to links that are broken.  You should also turn off the directory 
display as well.  

If you don't want the general public, which is what a robot is, quite 
honestly, to see something then put it behind a security domain and 
require a user-id and password.  We can argue about the strength of such 
systems, but it's highly unlikely a robot can get back past it.

Christopher




home help back first fref pref prev next nref lref last post