[6169] in www-talk@info.cern.ch
Re: What if we offered a local spider?
daemon@ATHENA.MIT.EDU (Martijn Koster)
Fri Oct 14 03:32:04 1994
Date: Fri, 14 Oct 1994 08:28:17 +0100
Errors-To: listmaster@www0.cern.ch
Errors-To: listmaster@www0.cern.ch
Reply-To: m.koster@nexor.co.uk
From: Martijn Koster <m.koster@nexor.co.uk>
To: Multiple recipients of list <www-talk@www0.cern.ch>
> The robots discussion that I prompted with my indexing offer gave me an idea.
>
> If we built a free spider that operated only via the file system, which
> would build an index mapped to URL-space,
I suggested this to at least one robot author a while ago in the
context of URL checking (Hi Roy :-), but there are a number of
problems: CGI-script generated pages are excluded, access
authorisation is ignored, and you need to parse server config files to
look at URL mappings.
> then offered to serve those indexes from here, would people use it?
Well, by just making the file available on a well-known place anybody
can use locally-generated map. Ehr /ls-R.txt ?
> In other words, as was suggested here, you'd maintain your index locally,
> then ship it to Verity to be served by our Web server.
Or rather, you pull it whenever needed.
> Thoughts?
I think the problems identified above are rather non-trivial; and that
a trivial solution may give a significant number of bogus URL's. Even
with a local HTTP robot you have access-permission issues, but at
least you know that correct URL's get out.
-- Martijn
__________
Internet: m.koster@nexor.co.uk
X-400: C=GB; A= ; P=Nexor; O=Nexor; S=koster; I=M
X-500: c=GB@o=NEXOR Ltd@cn=Martijn Koster
WWW: http://web.nexor.co.uk/mak/mak.html