[4965] in WWW Security List Archive
Question: robots.txt
daemon@ATHENA.MIT.EDU (Anthony Cuykens)
Tue Apr 1 16:30:56 1997
Date: Tue, 01 Apr 1997 10:18:05 +0200
From: Anthony Cuykens <acuykens@ulb.ac.be>
Reply-To: acuykens@ulb.ac.be
To: Robert P Cunningham <bob@lava.net>
Cc: jeffm@sgiserv3.aws.waii.com, www-security@ns2.rutgers.edu
Errors-To: owner-www-security@ns2.rutgers.edu
In the following articles, Robert speak about the 'robots.txt' file.
Could anybody tell me what it does and where i could find information
about it?
Robert P Cunningham wrote:
>
> >Is there any threat caused by allowing web indexing robots to enter your site?
> >...
>
> No more than allowing browsers to enter your site. Probably less.
> In general, robots will not execute JavaScript nor Java, and will
> ignore image maps (and often framesets as well). And they don't
> POST anything. Most robots will try not to trigger CGI programs
> if they can help it. Plus, all major indexing robots will obey a
> robots.txt file in your server root. That file gives you a great
> deal of control to tell robots what they can visit on your site,
> and what they cannot.
>
> Robots usually will not probe your site very deeply. Different
> robots have different cutoffs (and details are usually proprietary),
> but going much deeper than 4 levels (more precisely: following a chain
> of linked pages for that long) would be unusual.
>
> There was a problem with some early robots which would try to get
> as much as possible, as quickly as possible from sites. Which could
> overload some servers. But the current crop of robots--at least those
> of the major search sites--are much better-behaved. They will check
> a few pages from your site, then take a break, then check a few more, etc.
> (Actually, they're time-slicing between sites...).
>
> And, there were some other problems having to do with circular links.
> Most, probably all of the current robots now avoid those obvious traps.
--
--------------------------------------------------------------
Anthony Cuykens
Researcher in Computer Science, Security
Free university of Brussels (ULB) Belgium
e-mail: mailto://acuykens@ulb.ac.be
url: http://litpc12.ulb.ac.be/owner.html
ftp: ftp://litpc12.ulb.ac.be
phone: +32 2 650.56.01 s-mail: Boulevard du triomphe, cp 212
fax: +32 2 650.56.09 1050 Bruxelles
Belgium
-------------------------------------------------------------
Axiome d'optimalite: Quand tout le reste a echoue,
lire le mode d'emploi.