[13214] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Spammer web harvesting tool countermeasures

daemon@ATHENA.MIT.EDU (Deepak Jain)
Thu Oct 30 23:24:46 1997

Date: Thu, 30 Oct 1997 23:27:03 -0500 (EST)
From: Deepak Jain <deepak@jain.com>
To: Jon Stevens <jon@clearink.com>
cc: "Jay R. Ashworth" <jra@scfn.thpl.lib.fl.us>, nanog@merit.edu,
        "Brian L. Brush" <bbrush@ace.acomp.usf.edu>
In-Reply-To: <199710310304.TAA04206@clearink.com>


I didn't download it, but I looked at the first page. I figured that if 
it relied on someone setting up robots.txt correctly, there would be a 
lot of people who don't do it correctly and we'll see installations of 
the thing slow down search engines w/o good controls. Auto Meta Tags would 
certainly help, except the next generation web scrapers will be set to 
ignore them too. 

-Deepak.

On Thu, 30 Oct 1997, Jon Stevens wrote:

> "Deepak Jain" <deepak@jain.com> said the following at 10/30/97 6:56 PM:
> 
> >And wouldn't we, in turn, see some kind of problems arise with legitimate 
> >search engines because of this?
> 
> If you downloaded it and looked at it, you would have noticed that it 
> follows search engine guidelines by adding the appropriate <META> tag to 
> the HTML as well as the fact, that you can also use the robots.txt file 
> to block it.
> 
> Of course this also breaks down if spammer robots actually follow the 
> rules...but how many of those do you think that there are? ;-)
> 
> -jon
> 
> Jon (no h) S. Stevens
> Web Engineer
> j@clearink.com
> Clear Ink and The Internet Weather Report
> <http://www.clearink.com/> | <http://www.internetweather.com/>
> 
> 

home help back first fref pref prev next nref lref last post