[5113] in www-talk@info.cern.ch

home help back first fref pref prev next nref lref last post

Re: Dienst, A Protocol for a Distributed Digital Document Library

daemon@ATHENA.MIT.EDU (Daniel W. Connolly)
Mon Aug 8 16:21:05 1994

Date: Mon, 8 Aug 1994 22:18:53 +0200
Errors-To: listmaster@www0.cern.ch
Errors-To: listmaster@www0.cern.ch
Reply-To: connolly@hal.com
From: "Daniel W. Connolly" <connolly@hal.com>
To: Multiple recipients of list <www-talk@www0.cern.ch>

In message <9408081907.AA17028@martin.cs.cornell.edu>, Carl Lagoze writes:

>Jim Davis and I recently submitted an Internet Draft describing Dienst, a
>protocol for communication with distributed digital library servers. 
>This protocol is embedded within HTTP.  You are invited to look at the
>protocol document, available in HTML at
>http://cs-tr.cs.cornell.edu/Info/dienst_protocol.html or in ASCII at
>ftp://nis.nsf.net/internet/documents/internet-drafts/draft-lagoze-dienst
>protocol-00.txt.  You might also want to look at a prototype
>implementation of dienst at http://cs-tr.cs.cornell.edu.  We welcome
>your comments.

Excellent! Great stuff! This is a great way to address three issues on
my list of WWW Architecture Wishlist: resource discovery, replication,
and a compound document architecture.

The list is:

	* resource discovery -- how do I find stuff? If I've got a good
	description of a given document (author, publisher, pub date, that
	sort of thing), and I have an internet
	connection, I should be able to submit a query and search the
	whole docuverse in one RPC (which will probably cascade into
	many RPC's, but as far as I'm concerned...).

	If I have only a vague description of the document I'm interested
	in, I should be able to conduct the same search, but it may take
	several RPC's, with some user interaction at each iteration.
	(e.g. What libraries are available? ... Ok, from those three,
	what databases relate to quantum physics? ...)

	* replication -- documents should be highly available; that is,
	given authorized access to sufficient connectivity, compute
	resources, and disk space, I should be _able_ (not required)
	to publish a document in such a way that there is no single
	point of failure between me, the producer, and any of my consumers.

	The USENET model addresses this feature, but due to its completely
	asynchronous operation, it lacks sufficient fault detection mechanisms.
	(e.g. I can't compute: did my message make it to foo.com?)

	Another limitation of USENET is that documents are immutable.

	* compound document architecture -- long story. But Dienst's
	support for printing pages is an application.

The items on my wishlist that dienst doesn't cover are:

	* data integrity/fault detection: If fred says "see XXX for info on
	apples," and I get XXX and it has info on oranges, I can't tell
	if there was a fault, let alone where it was. A reference/link/citation
	should be _able_ (not required) to contain integrity information
	of various levels of reliability:

		"see rfc822.txt; you'll know you've got the right XXX if
		it came from ds.internic.net any time since 1990"
			(allows replication by caching)

		"see foo.tar.Z; you'll know you've got the right foo.tar.Z if
		it has 1210921 bytes."

		"see foo.tar.Z; you'll know you've got the right XXX if
		it has a gnu cksum checksum of 1203980123"

		"see XXX; you'll know you've got the right XXX if
		it has an MD5 checksum of 2342345234lksjw34"

		"see XXX; you'll know you've got the right XXX if
		it's been RSA signed by fred@foo.com"
			(works for documents that change)


	
	* democratic publishing model -- anybody should be able to
	spontaneously create a lasting name for a document without doing
	an RPC with a naming authority. Authorized users should be
	able to create a name within a naming authority's namespace
	by doing an authenticated RPC. Document names must be associated
	with copyright owners only -- not service providers etc. (Witness
	the recent issues with 1-800 number portability between AT&T & MCI).

Dan

home help back first fref pref prev next nref lref last post