Components and Definitions
Version of December 7, 1994
Here is an overview of the main components of a digital library, as
envisioned by the Library 2000 research group. The library is
organized as a client/server system connected with a network such as
the Internet.
CLIENTS
User
the consumer of the library--looks for information and uses it
one per desktop
provides the user interface for discovery and browsing
has minimum 1Kx1Kx8 display, color or grayscale
has minimum 1Mb/sec port to network (Ethernet or T1 speeds)
Publisher
creates new documents to place in the library
Service Manager, for each service
manages the data that underlies the service
of these, one has a special name, the
Librarian
The service manager of a Discovery service
DISCOVERY SERVICE
A discovery service helps a user client find documents of interest.
It provides search, bibliographic, and indexing services over some
collection of documents. The discovery service may search simple
bibliographic information, abstracts, or the full text of some set of
documents.
The interface to the discovery service is minimal. A client submits a
query, such as a list of keywords, a complicated search expression, or
a Z39.50 query and receives in return a list of documents that match
the query. The response may include up to three components:
1. A list of permanent document identifiers.
2. Optional information originally obtained from the archive service
(for example, bibliographic data such as author and title) to help the
client select which items to follow up. The discovery service copy of
this information is not authoritative and the client should treat it
as a hint that should be checked with the archive.
3. Optional value-added information, also to help the client select
which items to follow up. Value-added information is not held by the
archive, it is instead developed by the management of the discovery
service and held only by the discovery service.
The scope of a discovery service is the set of documents that its
maintainer, a curator or reference librarian, decides to provide.
There may be many different discovery services, and an on-line
document is likely to be indexed by several discovery services. A
discovery service is operated by a collection-maintaining organization
that is analogous to the traditional research library. A discovery
service may index selected materials from many different repositories,
or all the materials of a single repository, or any combination,
depending on its objectives.
The interface to a discovery service provides
a method for accepting queries and returning a list of permanent
document id's that respond to that query, together with
associated bibliographic and value-added information for
each document.
a method for asking Archive services "What's new?" (this method may
belong in the Librarian's client)
a method for indexing new documents and storing their
bibliographic information, restricted to Librarian client
ARCHIVE SERVICE
An archive service is a permanent, reliable storage place for on-line
documents. It provides a simple interface that allows an authorized
client to retrieve a particular document, in a particular format, when
given that document's identifier. The archive service provides a
strong guarantee that a retrieval operation on a specific document
will always return the same thing. An archive may be internally
replicated to help maintain the integrity of the data it holds, but
this replication is minimally visible from outside the archive.
Discovery and awareness services need to know about additions and
changes to an archive. For this purpose, an archive service also
provides a transaction-based interface for archive maintainers to add
and modify documents, and for clients of the archive to discover
changes to the archive. For example, when a maintainer adds a document
to a storage service, the system assigns a transaction number to the
action. Later, an index service can ask the server "What's new?" by
providing a transaction id; the storage service responds with a list
of all the new or modified documents since that transaction.
There may be many different archive services, and an on-line document
may be stored by more than one. An archive service is operated by an
organization that does not have a good analog in the traditional
paper-based world. The closest analog is the depositary library.
The interface to an archive provides
a method for accepting a permanent document ID and returning
document properties (e.g., bibliographic information)
or the document itself.
a method for addition of new documents--accepts a document and
returns a permanent document ID. This method is restricted
to the Archive Manager's client.
a method for the query "What's new?" which accepts a transaction
number and returns the list of document ID's added since that
transaction, plus the current transaction number for use in the
next "What's new?" call.
NAVIGATION SERVICE
one per naming scheme
method for accepting permanent document ID, returning
current URL for document
method for adding a document ID --> URL association, restricted
to the Navigation Manager's client
INTERNET ENVIRONMENT
IP/TCP packet interconnection
Domain Name Service for host names
Kerberos for authentication
WWW (HTTP)
CONDITIONS SERVICE
method for accepting permanent document ID, returning Conditions of use
copyright owner
permissions & restrictions
method for registering a new document, restricted to the
Conditions Manager's client
PUBLISHER
one per Publishing organization
responsible for creating new documents and for advertising them
has a contract with one or more Archive services to store the document
[Note from Mitchell Charity: what does it take to publish? To be a
Publisher? Is the concept a "light-weight" one? Ie, a Publisher is
nothing more than one who publishes. Or does it have authorization
(obtaining a controlled publisher identifier) or capital plant (as
when we discussed publishers having responsibility to provide
archiving) entailments. It seems desirable to keep the publishing
concept as light as possible.]
AWARENESS SERVICE
one per entrepreneur who wishes to provide it (probably by field)
subscribers are users and Librarians with specific interests
arranges to receive advertisements from Publishers
invokes "What's new?" service of archives
forwards filtered results to users and Librarians
DOCUMENT (ELECTRONIC)
may be in any of many forms
image preferred for human use
text usually available for computer use
other fads of the day (PostScript, Acrobat, LaTeX, etc.)
may contain permanent document id's of other documents
may have Conditions of use
needs a "MIME type"
[Note: the definition of a document needs a lot more work.]
THIRD-PARTY SERVICES
There are many services that are not provided by the basic
architecture. The core services described above can be used as
building blocks to compose more complicated services. The underlying
architecture is intended to support this composition. There are many
possible third-party services, some of which involve simple on-the-fly
data or protocol conversion, others of which involve different
administrative entities. Here are a few illustrative examples:
Format conversion: a service that converts a document from one format
to another on-the-fly;
Deduplication: a service that determines if two permanent document
identifiers refer to the same document;
Retrieval: a service that provides actual documents in response to
searches (a composition of search service and archive);
Search composition: a service that queries several distinct discovery
services and returns the results as a single response, possibly having
deduplicated them first;
Reference following: a service that converts a citation found in a
document entry into the permanent identifier of that document. (This
service actually constructs a specialized query and forwards it to
some set of discovery services.)
Caching: A service that holds frequently-used copies of on-line
documents at a location that provides more rapid delivery or greater
availability to some set of clients.
Commerce service: A service that provides accounting, billing, usage
logging, or related information-gathering for purposes of commerce.
Return to Library 2000 home page.