[13214] in North American Network Operators' Group
Re: Spammer web harvesting tool countermeasures
daemon@ATHENA.MIT.EDU (Deepak Jain)
Thu Oct 30 23:24:46 1997
Date: Thu, 30 Oct 1997 23:27:03 -0500 (EST)
From: Deepak Jain <deepak@jain.com>
To: Jon Stevens <jon@clearink.com>
cc: "Jay R. Ashworth" <jra@scfn.thpl.lib.fl.us>, nanog@merit.edu,
"Brian L. Brush" <bbrush@ace.acomp.usf.edu>
In-Reply-To: <199710310304.TAA04206@clearink.com>
I didn't download it, but I looked at the first page. I figured that if
it relied on someone setting up robots.txt correctly, there would be a
lot of people who don't do it correctly and we'll see installations of
the thing slow down search engines w/o good controls. Auto Meta Tags would
certainly help, except the next generation web scrapers will be set to
ignore them too.
-Deepak.
On Thu, 30 Oct 1997, Jon Stevens wrote:
> "Deepak Jain" <deepak@jain.com> said the following at 10/30/97 6:56 PM:
>
> >And wouldn't we, in turn, see some kind of problems arise with legitimate
> >search engines because of this?
>
> If you downloaded it and looked at it, you would have noticed that it
> follows search engine guidelines by adding the appropriate <META> tag to
> the HTML as well as the fact, that you can also use the robots.txt file
> to block it.
>
> Of course this also breaks down if spammer robots actually follow the
> rules...but how many of those do you think that there are? ;-)
>
> -jon
>
> Jon (no h) S. Stevens
> Web Engineer
> j@clearink.com
> Clear Ink and The Internet Weather Report
> <http://www.clearink.com/> | <http://www.internetweather.com/>
>
>