[4965] in WWW Security List Archive

home help back first fref pref prev next nref lref last post

Question: robots.txt

daemon@ATHENA.MIT.EDU (Anthony Cuykens)
Tue Apr 1 16:30:56 1997

Date: Tue, 01 Apr 1997 10:18:05 +0200
From: Anthony Cuykens <acuykens@ulb.ac.be>
Reply-To: acuykens@ulb.ac.be
To: Robert P Cunningham <bob@lava.net>
Cc: jeffm@sgiserv3.aws.waii.com, www-security@ns2.rutgers.edu
Errors-To: owner-www-security@ns2.rutgers.edu

In the following articles, Robert speak about the 'robots.txt' file.
Could anybody tell me what it does and where i could find information
about it?

Robert P Cunningham wrote:
> 
> >Is there any threat caused by allowing web indexing robots to enter your site?
> >...
> 
> No more than allowing browsers to enter your site.  Probably less.
> In general, robots will not execute JavaScript nor Java, and will
> ignore image maps (and often framesets as well).  And they don't
> POST anything.  Most robots will try not to trigger CGI programs
> if they can help it.  Plus, all major indexing robots will obey a
> robots.txt file in your server root.  That file gives you a great
> deal of control to tell robots what they can visit on your site,
> and what they cannot.
> 
> Robots usually will not probe your site very deeply.  Different
> robots have different cutoffs (and details are usually proprietary),
> but going much deeper than 4 levels (more precisely: following a chain
> of linked pages for that long) would be unusual.
> 
> There was a problem with some early robots which would try to get
> as much as possible, as quickly as possible from sites.  Which could
> overload some servers.  But the current crop of robots--at least those
> of the major search sites--are much better-behaved.  They will check
> a few pages from your site, then take a break, then check a few more, etc.
> (Actually, they're time-slicing between sites...).
> 
> And, there were some other problems having to do with circular links.
> Most, probably all of the current robots now avoid those obvious traps.

-- 
--------------------------------------------------------------
                         Anthony Cuykens

              Researcher in Computer Science, Security
              Free university of Brussels (ULB) Belgium

e-mail:	mailto://acuykens@ulb.ac.be
url:	http://litpc12.ulb.ac.be/owner.html
ftp:	ftp://litpc12.ulb.ac.be

phone:	+32 2 650.56.01		s-mail:	Boulevard du triomphe, cp 212
fax:	+32 2 650.56.09			1050 Bruxelles
					Belgium

-------------------------------------------------------------
Axiome d'optimalite: 	Quand tout le reste a echoue,
			lire le mode d'emploi.

home help back first fref pref prev next nref lref last post