[62955] in North American Network Operators' Group
The *.com/robots.txt
daemon@ATHENA.MIT.EDU (Guy Coslado (GC0111))
Wed Sep 24 15:03:51 2003
From: "Guy Coslado (GC0111) " <guy@coslado.com>
To: <nanog@merit.edu>
Date: Wed, 24 Sep 2003 20:52:24 +0200
Reply-To: guy@coslado.com
Errors-To: owner-nanog-outgoing@merit.edu
I've found inconsistencies in search engines mainly with domain name
having transient status. Such dn inherit a new IP , the *.com IP ( the
sitefinder IP).
And sitefinder itself has its own inconsistency:
Here an example using Nestscape or Mozilla (my IE6 config gives
other results).
http://www.pallet-containers-unlimited.com/bizdc.html
http://sitefinder.verisign.com/lpc?url=pallet-containers-unlimited.com/bizdc.html&host=pallet-containers-unlimited.com
That gives a link in
Did You Mean ?
We did find these similar Web addresses.
http://www.pallet-containers-unlimited.com/bizdc.html
And now searching with sitefinder
http://sitefinder.verisign.com/spc?sb=pallet-containers-unlimited.com&searchboxtype=1&op=landing&search=Search
If VeriSign sitefinder doesnt take care of this case, what can we
wait with other search engines ?
The query :
http://www.pallet-containers-unlimited.com/robots.txt
gives
User-agent: *
Disallow: /
is also a false answer that can confuse lot of http agents
=>
for simple example, sites with dn in REDEMPTIONPERIOD can be
suppressed or blacklisted from search engines indexes for a while.
Because nobody knows already all the side effects
I'm not sure having a robots.txt here is the best choice.
On the other hand SE indexes can keep undefinitively no (more) existent sites
without the *.com/robots.txt
Possibly the *.com redirect will give us other surprises with search engines.
Guy Coslado.
http://www.coslado.com Bots & Smart Agents
Pour la Guilde des metiers du logiciel: admin@fr.scguild.org
http://www.fr.scguild.com