The Use of UR* in a CSTR Library

Abstract

The purpose of this document is to outline a possible approach in using Uniform Resource Names in a digital library (but specifically one of computer science technical reports). The ideas outlined below lean more towards what we have defines as a "navigation" service and how it can be implemented using UR*.

Contents

1.0 Syntax

It is not the goal of this document to create a completely new syntax for Uniform Resource Names (URNs). The current drafts produced by the IETF, outline this syntax quite adequately. A URN is something of the form:

<URN:naming_scheme:authority_id:opaque_string>

All text in a URN should be in some standard format without the use of highly specialized characters.

It should be possible to take two URNs, remove spaces, tabs, CRs, LFs, and convert all A-Z to lower case, and then do a string comparison on two resulting URNs. This simple comparison process guarantees that if a URN is entered using different capitalization and formatting it can still point to the same object.

1.1 Naming_scheme

The naming scheme is something of the form: "dns," or "isbn," or "xascii," or "cstr" (see below).

1.2 Authority_id

The authority id can be anything as long as it satisfies the following conditions:

1.3 Opaque_string

The opaque string should be one of the two following forms: 1.4 Document Identifier

It is the job of the naming authority to assign a unique name to the document. The uniqueness of the name should only be enforced locally. That is, it should be possible to have two "tr-550" in the data base of two different authorities. What should be unique is the combination of the naming authority id, and the document id. This ensures that the following two URNs (although containing the same opaque string) actually point to two different objects.

<URN:naming_scheme:physics.mit.edu:tr-550>

<URN:naming_scheme:chem.mit.edu:tr-550>

It is irrelevant what the actual opaque string is, but it would be beneficial to the human user if it was easily transcribable and somewhat understandable. An id of the form: "lcs-tr-520" is therefore better than the id: "AJXX9312321BC".

1.5 Digital Document Form Identifier

This identifier is a unique way to distinguish between different forms of a document (postscript, text, html, etc.). For reasons I shall outline below, it is insufficient for the pair of authority_id:opaque_string to be unique in for a digital object. For a digital object, it would be beneficial for the opaque_string alone, to be unique. This can be done by the use of an MD5 checksum on the digital object. The opaque_string therefore becomes:

checksum[document-id]

An important point is that document-id is not a required field. It is therefore sufficient to make the request "get checksum" to the URN->URC service. This system has the following benefits:

2.0 URNs for CSTRs

For our purposes it might be beneficial to define a specific type of URN called "cstr" or something appropriate.

<URN:cstr:mit.edu:mitlcs//tr-520>

or <URN:cstr:mit.edu:mitlcs:tr-520>

or <URN:cstr:lcs.mit.edu:tr-520>

The basic idea behind this is that we have a naming authority id (mit.edu) which can be looked by use of DNS. We have "mitlcs" which is the publisher id. For this purpose I will define publisher as the sub-organization which is responsible for the publication of a specific series of documents. For example mitlcs, or mitai. The benefits of such a system is that resolver requests within a naming authority can be delegated to lower level organizations in a hierarchical fashion.

3.0 Uniform Resource Characteristics

The specifications for URCs is very loose, so below I will define them to meet the specific requirements of a library system.

3.1 URCs for the Document


URN:urn:cstr:lcs.mit.edu:tr-520 {
Id: lcs-tr-520

Title: Structure in Monotone Complexity

Author: M. Gringi

Organization: Massachusetts Institure of Technology, LCS

Abstract: In this work we study complexity classes in monotone
computation. Our main contributions are the following: A consistent
framework for monotone computation, including monotone analogues of
many standard computational models. 

<URN:cstr:lcs.mit.edu:MD5-checksum1[tr-520]>
<URN:cstr:lcs.mit.edu:MD5-checksum2[tr-520]>
<URN:cstr:lcs.mit.edu:MD5-checksum3[tr-520]>
}

3.2 URC for the objects (formats)

Sample:
URN:urn:cstr:lcs.mit.edu:MD5-checksum1[tr-520] {
Id: lcs-tr-520

Title: Structure in Monotone Complexity

Author: M. Gringi

Organization: Massachusetts Institure of Technology, LCS

Abstract: In this work we study complexity classes in monotone
computation. Our main contributions are the following: A consistent
framework for monotone computation, including monotone analogues of
many standard computational models. 

content-type: application/tiff

page: 1

Signature: some-encrypted-security-measure

Comment: some-comments

URN->URL resolver: this field can include the name of a resolver
	           service that will look up the URN and return
		   URLs
}

The URC for the object may also contain a file size field.

4.0 Storage of UR* information

Disclaimer: Everything in this section is my own view of how this should work:

It is the responsibility of the naming authority to hold original copies of the URNs and URCs of a document. These records may only be updated and changed by the naming authority. Each record will be time stamped and can include a signature (for security purposes). It is unnecessary for the authority to hold a copy of the URLs since they are not trust worthy.

It is also the responsibility of the naming authority to pass on client requests to the appropriate resolution service.

There is a requirement for at least two kinds of resolution services (one that returns URCs and one that returns URLs). These act as the navigation service, by translating a URN to its characteristics or location. This does not imply that there can not be more than two services involved. For example we can have "discovery" services which act as search tools. A discovery service may submit a request for UR* information held be the naming authority, and maintain a catalog of this information.

It should be noted that only "trusted" repositories should be returned by the URN->URL resolver. It is not necessary to list all possible locations of the object. A URL resolver (by definition of URLs) should not be trusted as URLs can change over time. A reliable resolver (or naming authority) should make an attempt to keep itself as updated as possible.

Replication of resolver services should be encouraged. The services should act as mirror sites for the naming authority. They should either update themselves by querying the naming authority for "what's new" or "what's updated" or if it is run by the naming authority such information should be "pushed" onto it.

5.0 Usage

A user has a URN. The client that the user is working with obtains host information for the naming authority by doing a DNS lookup of the naming_authority and submits the request (in whois++ format).

Case 1

The URN is for a document. Example:
template=URC[Specific_info];URN=urn:cstr:lcs.mit.edu:tr-520
The field [Specific_info], can contain object specific information (i.e. "form," "cost," "signature," etc.). This allows the added feature that if a user has specific requirements or features for objects, this specific information rather than a "full" URC is returned. Some scenarios for this usage are: From here, there are two sub-cases:

Case 2

The URN is for an object. Example:
template=URC;URN=urn:cstr:lcs.mit.edu:MD5-cheksum[tr-520]
In the second case, where a user knows the URN for a digital object, it is simply a matter of returning the URC with the appropriate URL lookup. The client gets back:
URN:urn:cstr:lcs.mit.edu:MD5-checksum1[tr-520] {

Title: Structure in Monotone Complexity

Author: M. Gringi

Organization: Massachusetts Institure of Technology, LCS

Abstract: In this work we study complexity classes in monotone
+ computation. Our main contributions are the following: A consistent
+ framework for monotone computation, including monotone analogues of
+ many standard computational models. 

content-type: application/tiff

page: 1

Signature: some-encrypted-security-measure

<URL:http://lcs.mit.edu/pub/tr520-1.tiff>
<URL:ftp://some.reposit.ory/pub/foo.tiff>
}

6.0 Notes

This document is based largely on the architectural notes that Jeremy and Andrew wrote, as well as discussions with Mitchell. I also used a lot of the ideas developed by the IETF taskforce on URIs, and I'll include a bibliography as soon as I can dig up all my references.

Unresolved questions

1. Who cleans up the URC for the user? Is the viewer responsible for cleaning up the URC for itself (example: mosaic making it htmlified) or is the the server (server is given client information and returns the URC in a nice format)?

2. Should a URL be returned automatically if the user doesn't specifically request it? The URN to URL server can also work in the whois++ format and can accept requests like:

template=URL;URN:urn:cstr:lcs.mit.edu:MD5-checksum1[tr-520]
comments to eytan@mit.edu
modified 2/21/95
Return to Library 2000 home page.