[95524] in North American Network Operators' Group
Re: Yahoo! clue (Slightly OT: Spiders)
daemon@ATHENA.MIT.EDU (Kradorex Xeron)
Thu Mar 29 10:19:13 2007
From: Kradorex Xeron <admin@digibase.ca>
Reply-To: admin@digibase.ca
To: Valdis.Kletnieks@vt.edu
Date: Thu, 29 Mar 2007 10:17:50 -0400
Cc: NANOG list <nanog@nanog.org>
In-Reply-To: <200703291315.l2TDFcR4019396@turing-police.cc.vt.edu>
Errors-To: owner-nanog@merit.edu
On Thursday 29 March 2007 09:15, Valdis.Kletnieks@vt.edu wrote:
> On Thu, 29 Mar 2007 09:05:12 EDT, Kradorex Xeron said:
> > Slightly OT: Does anyone know what is with the web spiders from
> > Yahoo/Inktomi? I've been seeing reports and have seen a problem with them
> > opening 10 to 100 connections to any specific site.
>
> And 10 concurrent connections (or 100) causes a production-quality
> webserver difficulties, how, exactly?
True - however:
It may cause certain sites to go over quota for transfer (even if you do rate
limit them via robots.txt). As well as it could cause servers that limit
to "x number of connections at once" (i.e. some public file hosting servers
that don't alow more than x users at once) to lock out legitamate requests -
which if you per-se don't control the robots.txt of such sites, you would be
unable to get access that site.
Another problem is that the Yahoo/Inktomi search robots do not stop if no site
is present at that address, Thus, someone could register a DNS name and have
a site set on it temporarily, just enough time for Yahoo/Inktomi's bots to
notice it, then redirect it thereafter to any internet host's address and the
bots would proceed to that host and access them over and over in succession,
wasting bandwidth of both the user end (Which in most cases is being
monitored and is limited, sometimes highly by the ISP), and the bot's end
wasted time that could have been used spidering other sites.
People shouldn't need to protect themselves from search engine bots, The
Internet already has enough problems as it is with Spam and Botnets among
other items, Search engine bots with large pipes don't need to be on that
list of nuicences as well.
But that aside, from what I've seen, no other search engine takes that
aggressively toword sites. -- I was just curious as to why Yahoo/Inktomi's
bots are so aggressive (Even more than Google, MSN and such), I reviewed
their site's reason, however, the others do review millions/billions as well.
Apologies if my postings are unclear.