[7336] in www-talk@info.cern.ch

home help back first fref pref prev next nref lref last post

Re: Broken links, are we ever going to address them?

daemon@ATHENA.MIT.EDU (John Labovitz)
Wed Jan 25 13:40:15 1995

Date: Wed, 25 Jan 1995 18:57:25 +0100
Errors-To: listmaster@www0.cern.ch
Reply-To: johnl@ora.com
From: John Labovitz <johnl@ora.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>

[I tried to post this the other day, but it may
not have gotten through due to mail problems
at cern.ch.]

Paul Phillips <psphilli@sdcc8.UCSD.EDU> said:

> This isn't going to fly with the current Referer implementations.  Too 
> many browsers lie, especially all the Mozillas which constitute over half 
> the web clients currently.  Even if every version written from now on 
> were accurate, the sheer number of liars deployed will results in too 
> many false positives.  I get dozens of the MCOM home URL in my Referer 
> logs on a daily basis.

Could you explain exactly what you mean by lying 
browsers?  Are they putting incorrect URLs into 
the Referer field, or places the user hasn't 
actually been to, or what?

How widespread is this?  What other browsers lie?

We've been thinking about tracking Referer fields 
to see where people are coming into GNN, and where 
they come from.  It sounds like that may not end up 
to be as useful as I thought.

> There also needs to be a more reliable way of ascertaining the maintainer 
> of a page.  There are a few machine heuristics and a few more human ones 
> that can work, but no reliable method.  Even a ~user URL isn't 
> guaranteed to be able to receive mail at the same machine.

For many months now, we've used the <LINK> tag in 
all GNN documents.  The main reason was because the 
Lynx browser used to (maybe still does?) check for 
broken links, and reports these errors by email to 
the maintainer of the document containing the bad 
links.  The tricky part was that unless otherwise 
specified, the errors would go to the maintainers 
of Lynx (hi, Lou!), *not* GNN.  Adding a tag like:

  <LINK REV=MADE HREF="mailto:johnl@ora.com">
 
caused the errors to be mailed to whoever was
specified in the HREF attribute.

This seems to have fixed the problem (at least we're 
not hearing from the Lynx folks ;), but still has 
some problems:

  - the errors were formatted to be readable 
    by humans, which makes tracking and 
    automation difficult

  - at least in the case of Lynx, apparent
    network problems were sometimes classified 
    as bad links; so we ignore the 'bad link' 
    messages that refer to, say, http://info.cern.ch ;)

In a related problem, I've noticed that our error 
logs are sometimes filled with URL requests for a 
document that really doesn't exist in GNN, but is 
close.  For instance, /gnn/gnnhome.html is *probably* 
GNNhome.html, our home page.  

I'd guess that these are from people typing in URLs 
from newspaper articles or other print media.

It seems silly to simply return a 404 error for these.  
I've thought about hacking the server to instead bring 
up a nicer page that says, in effect, 'Sorry, you've 
dialed a wrong URL.  Perhaps you mean one of these...'

Has anyone experimented with a more user friendly
approach to handling errors like this?  With the right 
interface, perhaps there could be a button that says 
'Report this URL as a bad link,' which could then link 
to a CGI script that could catalog the bad link.

--
John Labovitz
Technical Services Manager, Global Network Navigator <http://gnn.com/>
O'Reilly & Associates, Sebastopol, California, USA (+1 707 829 0515)

home help back first fref pref prev next nref lref last post