[4968] in WWW Security List Archive
Re: ROBOTS
daemon@ATHENA.MIT.EDU (Charles McKenzie)
Tue Apr 1 22:00:39 1997
Date: Tue, 1 Apr 1997 17:43:51 -0600 (CST)
From: Charles McKenzie <charlesm@cs.wisc.edu>
To: Jeff Middleton <jeff.middleton@waii.com>
cc: www-security@ns2.rutgers.edu
In-Reply-To: <9703310750.ZM15252@sgiserv3.aws.waii.com>
Errors-To: owner-www-security@ns2.rutgers.edu
On Mon, 31 Mar 1997, Jeff Middleton wrote:
> Is there any threat caused by allowing web indexing robots to enter your site?
> I know there is a file "robots.txt" that can be setup, but is it necessary?
> i.e. Will all robots respond to the file anyway?
Generally, unless you have a lot of pages that you doesn't want indexed,
(list archives are an example, for some sites), it shouldn't be a problem
to leave your site searchable. The only threats I know of caused by web
robots are load spikes caused by a poorly configured robot failing to
pause between documents, or by spammerbots, which try to collect all the
mail addresses on your web site for use in bulk mail. Spam bots probably
aren't going to listen to robots.txt anyway, so it's main use is
regulating what the good robots can index.
Information on robots.txt and other exclusion methods is available at:
http://info.webcrawler.com/mak/projects/robots/exclusion.html
Chuck McKenzie
charlesm@cs.wisc.edu
UW-Madison CS Webmaster
Boycott Internet Spam - http://spam.abuse.net/