Library 2000 Architecture: Components and Definitions

Components and Definitions
Version of December 7, 1994

Here is an overview of the main components of a digital library, as
envisioned by the Library 2000 research group.  The library is
organized as a client/server system connected with a network such as
the Internet.

CLIENTS

       User
         the consumer of the library--looks for information and uses it
         one per desktop
         provides the user interface for discovery and browsing
         has minimum 1Kx1Kx8 display, color or grayscale
         has minimum 1Mb/sec port to network (Ethernet or T1 speeds)
       Publisher
         creates new documents to place in the library

       Service Manager, for each service
         manages the data that underlies the service
         of these, one has a special name, the
       Librarian
         The service manager of a Discovery service


DISCOVERY SERVICE

A discovery service helps a user client find documents of interest.
It provides search, bibliographic, and indexing services over some
collection of documents.  The discovery service may search simple
bibliographic information, abstracts, or the full text of some set of
documents.

The interface to the discovery service is minimal. A client submits a
query, such as a list of keywords, a complicated search expression, or
a Z39.50 query and receives in return a list of documents that match
the query. The response may include up to three components:

1.  A list of permanent document identifiers.

2.  Optional information originally obtained from the archive service
(for example, bibliographic data such as author and title) to help the
client select which items to follow up.  The discovery service copy of
this information is not authoritative and the client should treat it
as a hint that should be checked with the archive.

3.  Optional value-added information, also to help the client select
which items to follow up.  Value-added information is not held by the
archive, it is instead developed by the management of the discovery
service and held only by the discovery service.

The scope of a discovery service is the set of documents that its
maintainer, a curator or reference librarian, decides to provide.
There may be many different discovery services, and an on-line
document is likely to be indexed by several discovery services.  A
discovery service is operated by a collection-maintaining organization
that is analogous to the traditional research library.  A discovery
service may index selected materials from many different repositories,
or all the materials of a single repository, or any combination,
depending on its objectives.

The interface to a discovery service provides

     a method for accepting queries and returning a list of permanent
              document id's that respond to that query, together with
              associated bibliographic and value-added information for
              each document.
     a method for asking Archive services "What's new?" (this method may
              belong in the Librarian's client)
     a method for indexing new documents and storing their
              bibliographic information, restricted to Librarian client


ARCHIVE SERVICE

An archive service is a permanent, reliable storage place for on-line
documents.  It provides a simple interface that allows an authorized
client to retrieve a particular document, in a particular format, when
given that document's identifier.  The archive service provides a
strong guarantee that a retrieval operation on a specific document
will always return the same thing.  An archive may be internally
replicated to help maintain the integrity of the data it holds, but
this replication is minimally visible from outside the archive.

Discovery and awareness services need to know about additions and
changes to an archive.  For this purpose, an archive service also
provides a transaction-based interface for archive maintainers to add
and modify documents, and for clients of the archive to discover
changes to the archive. For example, when a maintainer adds a document
to a storage service, the system assigns a transaction number to the
action. Later, an index service can ask the server "What's new?" by
providing a transaction id; the storage service responds with a list
of all the new or modified documents since that transaction.

There may be many different archive services, and an on-line document
may be stored by more than one.  An archive service is operated by an
organization that does not have a good analog in the traditional
paper-based world.  The closest analog is the depositary library.

The interface to an archive provides

      a method for accepting a permanent document ID and returning
               document properties (e.g., bibliographic information)
               or the document itself.
      a method for addition of new documents--accepts a document and
               returns a permanent document ID.  This method is restricted
               to the Archive Manager's client.
      a method for the query "What's new?" which accepts a transaction
               number and returns the list of document ID's added since that
               transaction, plus the current transaction number for use in the
               next "What's new?" call.  
   

NAVIGATION SERVICE

       one per naming scheme
       method for accepting permanent document ID, returning
             current URL for document
       method for adding a document ID --> URL association, restricted
             to the Navigation Manager's client


INTERNET ENVIRONMENT

       IP/TCP packet interconnection
       Domain Name Service for host names
       Kerberos for authentication
       WWW (HTTP)


CONDITIONS SERVICE

       method for accepting permanent document ID, returning Conditions of use
          copyright owner
          permissions & restrictions
       method for registering a new document, restricted to the
             Conditions Manager's client


PUBLISHER

       one per Publishing organization
       responsible for creating new documents and for advertising them
       has a contract with one or more Archive services to store the document

[Note from Mitchell Charity: what does it take to publish?  To be a
Publisher?  Is the concept a "light-weight" one?  Ie, a Publisher is
nothing more than one who publishes.  Or does it have authorization
(obtaining a controlled publisher identifier) or capital plant (as
when we discussed publishers having responsibility to provide
archiving) entailments.  It seems desirable to keep the publishing
concept as light as possible.]


AWARENESS SERVICE

       one per entrepreneur who wishes to provide it (probably by field)
       subscribers are users and Librarians with specific interests
       arranges to receive advertisements from Publishers
       invokes "What's new?" service of archives
       forwards filtered results to users and Librarians


DOCUMENT (ELECTRONIC)

       may be in any of many forms
           image preferred for human use
           text usually available for computer use
           other fads of the day (PostScript, Acrobat, LaTeX, etc.)
       may contain permanent document id's of other documents
       may have Conditions of use
       needs a "MIME type"
[Note:  the definition of a document needs a lot more work.]


THIRD-PARTY SERVICES

There are many services that are not provided by the basic
architecture. The core services described above can be used as
building blocks to compose more complicated services. The underlying
architecture is intended to support this composition.  There are many
possible third-party services, some of which involve simple on-the-fly
data or protocol conversion, others of which involve different
administrative entities.  Here are a few illustrative examples:

Format conversion: a service that converts a document from one format
to another on-the-fly;

Deduplication: a service that determines if two permanent document
identifiers refer to the same document;

Retrieval: a service that provides actual documents in response to
searches (a composition of search service and archive);

Search composition: a service that queries several distinct discovery
services and returns the results as a single response, possibly having
deduplicated them first;

Reference following: a service that converts a citation found in a
document entry into the permanent identifier of that document. (This
service actually constructs a specialized query and forwards it to
some set of discovery services.)

Caching: A service that holds frequently-used copies of on-line
documents at a location that provides more rapid delivery or greater
availability to some set of clients.

Commerce service: A service that provides accounting, billing, usage
logging, or related information-gathering for purposes of commerce.
Return to Library 2000 home page.