[86690] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Sorry! Here's the URL content (re. Paging Google...)

daemon@ATHENA.MIT.EDU (Matthew Elvey)
Mon Nov 14 16:57:05 2005

Date: Mon, 14 Nov 2005 13:56:30 -0800
From: Matthew Elvey <matthew@elvey.com>
To: nanog@merit.edu
Errors-To: owner-nanog@merit.edu


Doh!  I had no idea my thread would require login/be hidden from general 
view!  (A robots.txt info site had directed me there...)   It seems I 
fell for an SEO scam... how ironic.  I guess that's why I haven't heard 
from google...

Anyway, here's the page content (with some editing and paraphrasing):

Subject: paging google! robots.txt being ignored!

Hi. My robots.txt was put in place in August!
But google still has tons of results that violate the file.

http://www.searchengineworld.com/cgi-bin/robotcheck.cgi
doesn't complain (other than about the use of google's nonstandard 
extensions described at
http://www.google.com/webmasters/remove.html )

The above page says that it's OK that

#per [[AdminRequests]]
User-agent: Googlebot
Disallow: /*?*

is last (after User-agent: *)

and seems to suggest that the syntax is OK.

I also tried

User-agent: Googlebot
Disallow: /*?
but it hasn't helped.



I asked google to review it via the automatic URL removal system 
(http://services.google.com/urlconsole/controller).
Result:
URLs cannot have wild cards in them (e.g. "*"). The following line 
contains a wild card:
DISALLOW: /*?

How insane is that?

Oh, and while /*?* wasn't per their example, it was legal, per their 
syntax, same as /*?  !

The site as around 35,000 pages, and I don't think a small robots.txt to 
do what I want is possible without using the wildcard extension.







home help back first fref pref prev next nref lref last post