The Use of UR* in a CSTR Library

Abstract

The purpose of this document is to outline a possible approach in using Uniform Resource Names in a digital library (but specifically one of computer science technical reports). The ideas outlined below lean more towards what we have defines as a "navigation" service and how it can be implemented using UR*.

1.0 Syntax
- 1.1 Naming_Scheme
- 1.2 Authority_id
- 1.3 Opaque_string
- 1.4 Document Identifier
- 1.5 Digital Document Form Identifier
2.0 URNs for CSTRs
3.0 Uniform Resource Characteristics
- 3.1 URCs for Documents
- 3.2 URC for the objects (formats)
4.0 Storage of UR* information
5.0 Usage
6.0 Notes

1.0 Syntax

It is not the goal of this document to create a completely new syntax for Uniform Resource Names (URNs). The current drafts produced by the IETF, outline this syntax quite adequately. A URN is something of the form:

<URN:naming_scheme:authority_id:opaque_string>

All text in a URN should be in some standard format without the use of highly specialized characters.

It should be possible to take two URNs, remove spaces, tabs, CRs, LFs, and convert all A-Z to lower case, and then do a string comparison on two resulting URNs. This simple comparison process guarantees that if a URN is entered using different capitalization and formatting it can still point to the same object.

1.1 Naming_scheme

The naming scheme is something of the form: "dns," or "isbn," or "xascii," or "cstr" (see below).

1.2 Authority_id

The authority id can be anything as long as it satisfies the following conditions:

It is unique among a list of other authority ids. That is, one id points to one authority.
There is a method (alias in the DNS server) to find the naming authority. That is, if we have mit.edu as the as the text in the URN, but the host responsible for naming is name_auth.mit.edu, there should be a way to route the requests for
<URN:naming_scheme:mit.edu:lcs-tr-512>
to the proper host (name_auth.mit.edu).

1.3 Opaque_string

The opaque string should be one of the two following forms:

Documents Using Jim Davis' definition (although not necessarily the correct one) is that a document is a single piece of intellectual content which may exist in several forms of representation.
Digital Document Forms Digital Document forms are the representations of a document (i.e. postscript, text, html, etc.).

1.4 Document Identifier

It is the job of the naming authority to assign a unique name to the document. The uniqueness of the name should only be enforced locally. That is, it should be possible to have two "tr-550" in the data base of two different authorities. What should be unique is the combination of the naming authority id, and the document id. This ensures that the following two URNs (although containing the same opaque string) actually point to two different objects.

<URN:naming_scheme:physics.mit.edu:tr-550>

<URN:naming_scheme:chem.mit.edu:tr-550>

It is irrelevant what the actual opaque string is, but it would be beneficial to the human user if it was easily transcribable and somewhat understandable. An id of the form: "lcs-tr-520" is therefore better than the id: "AJXX9312321BC".

1.5 Digital Document Form Identifier

This identifier is a unique way to distinguish between different forms of a document (postscript, text, html, etc.). For reasons I shall outline below, it is insufficient for the pair of authority_id:opaque_string to be unique in for a digital object. For a digital object, it would be beneficial for the opaque_string alone, to be unique. This can be done by the use of an MD5 checksum on the digital object. The opaque_string therefore becomes:

checksum[document-id]

An important point is that document-id is not a required field. It is therefore sufficient to make the request "get checksum" to the URN->URC service. This system has the following benefits:

Allows for URL->URN lookups In a possible scenerio, a user has a URL to an object and require a URN. The client can grab the digital object as described by the URL, and do an MD5 checksum on it. The user could then submit the MD5 checksum to the naming authority/resolution service and obtain the appropriate URN. It is also possible to create an Archie type server that accepts MD5 checksums and identifies the naming authority responsible for its creation. Admitedly, this is based on the assumption that objects are immutable (a somewhat improbable assumption). However, it can be argued that if an object changes and no longer returns the original checksum, it is no longer the original document.
Verification of source A user can take a digital object do an MD5 checksum, and compare the results to the URN string returned by the naming authority. This allows for a verification of a specific object's "credentials."
Allows for distinction between digital objects A digital library system should allow for the distinction between different formats of the same document. A user should be able to explicitly ask for a URL/URC for a specific form of a document.
Replication Daemons can be created to check the state of objects by comparing the URN to the checksum of the object. There would be no need to communicate with the authority.

2.0 URNs for CSTRs

For our purposes it might be beneficial to define a specific type of URN called "cstr" or something appropriate.

<URN:cstr:mit.edu:mitlcs//tr-520>

or <URN:cstr:mit.edu:mitlcs:tr-520>

or <URN:cstr:lcs.mit.edu:tr-520>

The basic idea behind this is that we have a naming authority id (mit.edu) which can be looked by use of DNS. We have "mitlcs" which is the publisher id. For this purpose I will define publisher as the sub-organization which is responsible for the publication of a specific series of documents. For example mitlcs, or mitai. The benefits of such a system is that resolver requests within a naming authority can be delegated to lower level organizations in a hierarchical fashion.

3.0 Uniform Resource Characteristics

The specifications for URCs is very loose, so below I will define them to meet the specific requirements of a library system.

3.1 URCs for the Document


URN:urn:cstr:lcs.mit.edu:tr-520 {
Id: lcs-tr-520

Title: Structure in Monotone Complexity

Author: M. Gringi

Organization: Massachusetts Institure of Technology, LCS

Abstract: In this work we study complexity classes in monotone
computation. Our main contributions are the following: A consistent
framework for monotone computation, including monotone analogues of
many standard computational models. 

<URN:cstr:lcs.mit.edu:MD5-checksum1[tr-520]>
<URN:cstr:lcs.mit.edu:MD5-checksum2[tr-520]>
<URN:cstr:lcs.mit.edu:MD5-checksum3[tr-520]>
}

3.2 URC for the objects (formats)

Sample:

URN:urn:cstr:lcs.mit.edu:MD5-checksum1[tr-520] {
Id: lcs-tr-520

Title: Structure in Monotone Complexity

Author: M. Gringi

Organization: Massachusetts Institure of Technology, LCS

Abstract: In this work we study complexity classes in monotone
computation. Our main contributions are the following: A consistent
framework for monotone computation, including monotone analogues of
many standard computational models. 

content-type: application/tiff

page: 1

Signature: some-encrypted-security-measure

Comment: some-comments

URN->URL resolver: this field can include the name of a resolver
	           service that will look up the URN and return
		   URLs
}

The URC for the object may also contain a file size field.

4.0 Storage of UR* information

Disclaimer: Everything in this section is my own view of how this should work:

It is the responsibility of the naming authority to hold original copies of the URNs and URCs of a document. These records may only be updated and changed by the naming authority. Each record will be time stamped and can include a signature (for security purposes). It is unnecessary for the authority to hold a copy of the URLs since they are not trust worthy.

It is also the responsibility of the naming authority to pass on client requests to the appropriate resolution service.

There is a requirement for at least two kinds of resolution services (one that returns URCs and one that returns URLs). These act as the navigation service, by translating a URN to its characteristics or location. This does not imply that there can not be more than two services involved. For example we can have "discovery" services which act as search tools. A discovery service may submit a request for UR* information held be the naming authority, and maintain a catalog of this information.

It should be noted that only "trusted" repositories should be returned by the URN->URL resolver. It is not necessary to list all possible locations of the object. A URL resolver (by definition of URLs) should not be trusted as URLs can change over time. A reliable resolver (or naming authority) should make an attempt to keep itself as updated as possible.

Replication of resolver services should be encouraged. The services should act as mirror sites for the naming authority. They should either update themselves by querying the naming authority for "what's new" or "what's updated" or if it is run by the naming authority such information should be "pushed" onto it.

5.0 Usage

A user has a URN. The client that the user is working with obtains host information for the naming authority by doing a DNS lookup of the naming_authority and submits the request (in whois++ format).

Case 1

The URN is for a document. Example:

template=URC[Specific_info];URN=urn:cstr:lcs.mit.edu:tr-520

The field [Specific_info], can contain object specific information (i.e. "form," "cost," "signature," etc.). This allows the added feature that if a user has specific requirements or features for objects, this specific information rather than a "full" URC is returned. Some scenarios for this usage are:

To request "all" the information for every object the template request is for "URC_FULL."
User can only view txt files. The user's client is aware of this fact and so it requests as the template "URC_FORM." The returned information includes the formats of the available objects for a specific document.
User wants the cheapest document. The client makes the request "URC_COST" and sorts the information by cost.
User wants the cheapest postscript version. The client makes the request "URC_COST_FORM," (I haven't worked out all the details in how templates should be named, any suggestions?) and sorts this information appropriately.

From here, there are two sub-cases:

Case 1.1 - user requests FULL info: Upon receiving this request the naming authority passes it on to the appropriate resolver service. When a URC is requested for a document the server will return the relavent top level information for that document: title, author(s), organization, and absract (and any other fields which are part of the document's characteristics). The URC for the document will also contain the URNs (the MD5 checksumed ones) for the objects and will (invisibly to the user) submit those to the URN->URC resolver. The returned information for these objects will include all relevent document information (minus the URNs) plus mime type, cost, and possibly a signature (for security purposes). The server will (at this bottom level of the hiearchy) find the URLs for the objects (by using the URN->URL resolver) and will incorporate the information into the URC. The server will also parse out the redundant information (don't need to send author twice). The client/user will get the following:


URN:urn:cstr:lcs.mit.edu:tr-520 {

Title: Structure in Monotone Complexity

Author: M. Gringi

Organization: Massachusetts Institure of Technology, LCS

Abstract: In this work we study complexity classes in monotone
+ computation. Our main contributions are the following: A consistent
+ framework for monotone computation, including monotone analogues of
+ many standard computational models. 

URN:urn:cstr:lcs.mit.edu:MD5-checksum1[tr-520] {

	content-type: application/tiff

	page: 1

	Signature: some-encrypted-security-measure

	<URL:http://lcs.mit.edu/pub/tr520-1.tiff>
	<URL:ftp://some.reposit.ory/pub/foo.tiff>
}

URN:urn:cstr:lcs.mit.edu:MD5-checksum2[tr-520] {

	content-type: application/postscript

	Signature: some-encrypted-security-measure

	<URL:http://lcs.mit.edu/pub/tr520-1.ps>
}

URN:urn:cstr:lcs.mit.edu:MD5-checksum3[tr-520] {

	content-type: application/tiff

	Comments: color postscipt version

	Signature: some-encrypted-security-measure

	<URL:http://lcs.mit.edu/pub/tr520c.ps>
}}

Case 1.2 - user requests [Specific-info] : Similarly to case 1.1, the URN for the document is submitted to the URN->URC resolver. The resolver obtains the URNs for the objects and makes a URN->URC request. In this case however the request is made with a "specific" template. That is,
```
template=URC_specific_info;URN=urn:cstr:lcs.mit.edu:tr-520
```
rather than
```
template=URC_FULL;URN=urn:cstr:lcs.mit.edu:tr-520
```

Case 2

The URN is for an object. Example:

template=URC;URN=urn:cstr:lcs.mit.edu:MD5-cheksum[tr-520]

In the second case, where a user knows the URN for a digital object, it is simply a matter of returning the URC with the appropriate URL lookup. The client gets back:

URN:urn:cstr:lcs.mit.edu:MD5-checksum1[tr-520] {

Title: Structure in Monotone Complexity

Author: M. Gringi

Organization: Massachusetts Institure of Technology, LCS

Abstract: In this work we study complexity classes in monotone
+ computation. Our main contributions are the following: A consistent
+ framework for monotone computation, including monotone analogues of
+ many standard computational models. 

content-type: application/tiff

page: 1

Signature: some-encrypted-security-measure

<URL:http://lcs.mit.edu/pub/tr520-1.tiff>
<URL:ftp://some.reposit.ory/pub/foo.tiff>
}

6.0 Notes

This document is based largely on the architectural notes that Jeremy and Andrew wrote, as well as discussions with Mitchell. I also used a lot of the ideas developed by the IETF taskforce on URIs, and I'll include a bibliography as soon as I can dig up all my references.

Unresolved questions

1. Who cleans up the URC for the user? Is the viewer responsible for cleaning up the URC for itself (example: mosaic making it htmlified) or is the the server (server is given client information and returns the URC in a nice format)?

2. Should a URL be returned automatically if the user doesn't specifically request it? The URN to URL server can also work in the whois++ format and can accept requests like:

template=URL;URN:urn:cstr:lcs.mit.edu:MD5-checksum1[tr-520]

comments to eytan@mit.edu

modified 2/21/95

Return to Library 2000 home page.