[91326] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: www.gigablast.com

daemon@ATHENA.MIT.EDU (Stephane Bortzmeyer)
Mon Jul 17 05:08:15 2006

Date: Mon, 17 Jul 2006 11:07:32 +0200
From: Stephane Bortzmeyer <bortzmeyer@nic.fr>
To: Jim Popovitch <jimpop@yahoo.com>
Cc: nanog <nanog@merit.edu>
In-Reply-To: <44B57688.4030407@yahoo.com>
Errors-To: owner-nanog@merit.edu


On Wed, Jul 12, 2006 at 06:24:08PM -0400,
 Jim Popovitch <jimpop@yahoo.com> wrote 
 a message of 32 lines which said:

> The strangeness is that some of their crawling is looking for URLs
> with multiple exclamation points, those URLs never existed. This may
> be indicative of a character translation on my system or theirs.

From my experience (and I talked with people - or at least intelligent
bots - at Gigablast), their HTML parser is seriously broken and it
generates non-existing URL quite often. For instance <a
href="http://www.example.fr/Cafe%20au%20lait"> will make their crawler
ask for "/Cafe".

I reported the problem months ago but I got nothing except standard
"Thanks for telling us".


home help back first fref pref prev next nref lref last post