[5232] in www-talk@info.cern.ch

home help back first fref pref prev next nref lref last post

Re: Caching Servers Considered Harmful (was: Re: Finger URL)

daemon@ATHENA.MIT.EDU (John Labovitz)
Mon Aug 22 12:57:12 1994

Date: Mon, 22 Aug 1994 18:55:21 +0200
Errors-To: listmaster@www0.cern.ch
Errors-To: listmaster@www0.cern.ch
Reply-To: johnl@ora.com
From: John Labovitz <johnl@ora.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>

[Rob Raisch:]
> You can provide no guarantee that the versions that you present to your
> users are accurate or timely.  Further, I have no idea of the number of
> consumers who view my content through your cache or what they view, how
> and when. 
> [...]
> Of course, I could be wrong.  I have only been peripherally associated 
> with publishers.  Anyone from O'Reilly wish to comment?

Sure.  Note that I don't know much about the mechanics
of caching servers, so if I'm off base in some way,
please let me know.

Our issue with caching servers has to do with accounting
of use in GNN (Global Network Navigator).  To make GNN
freely available, we sell advertising.  In order for an
advertiser to feel that they are making a worthwhile
investment, they want to know how many people are reading
their content.  We can determine the number of `hits' on
a given part of GNN, but only if we have access to usage
logs.  If someone's accessing GNN through a caching 
server, we only know about one hit, plus additional hits
each time the cache entry expires.

In Neil Smith's paper `What can Archives offer the World 
Wide Web,' there's a table (fig. 7) that lists `the most 
popular remote sites accessed via the UNIX HENSA cache.'  
Our main GNN site, nearnet.gnn.com, is up there at the 
top, with approximately 4000 accesses.  I haven't gone
through our logs to check specifically for accesses 
from the HENSA caching server, but I would guess that
the number is substantially less than 4000.  (From the
paper, the HENSA server will expire GNN non-GIF files
after two days, and GIF files after two weeks.  Here's
a real-life ramification of caching: for those using 
the HENSA server, our daily Dilbert comic strip is 
available only once every two weeks.)

One solution would be for caching servers to generate
a summary of hits on URLs `belonging' to particular 
servers, and to email that summary to a standard 
email address at those servers.  So even though we 
at GNN may not receive the level of detail that we 
get from our own logs (timestamp, hostnames, URLs), 
we could at least receive from the caching servers 
an approximation which we could integrate into our 
reports back to our advertisers.

Comments?

--
John Labovitz
Global Network Navigator <http://gnn.com/>
O'Reilly & Associates, Sebastopol, California, USA (+1 707 829 0515)

home help back first fref pref prev next nref lref last post