Library 2000 Proposal

Cover sheet items...

From:            M. I. T. Laboratory for Computer Science
                   Library 2000
                 M. I. T.  Library System
                   Distributed Library Initiative

To:              The Corporation for National Research Initiatives
                 1895 Preston White Drive
                 Reston, Virginia 22091
                 Tel. (703) 620-8990

Investigators:   Jerome H. Saltzer
                    Professor of Computer Science
                    M. I. T.  Laboratory for Computer Science
                    Room NE43-513, tel. 253-6016, e-mail Saltzer@mit.edu

                 T. Gregory Anderson
                    Associate Director for Systems and Planning
                    M. I. T. Library System
                    Room 14S-216, Tel. 253-5654, e-mail ganderso@mit.edu

Begin:           October 1, 1992
End:             June 30, 1995       
Budget:          $1,521,496






             A PROPOSAL FOR M. I. T. PARTICIPATION IN

                    AN ELECTRONIC LIBRARY PLAN 

                          10 November 1992



I.  OVERVIEW


A small group of research universities, currently consisting of
Carnegie-Mellon, Stanford, the University of California at Berkeley,
Cornell University, and M. I. T., proposes to create a working,
useful prototype of an online, electronic library consisting of the
technical reports of their respective Computer Science laboratories.
Each institution will create a network-attached server to hold scanned
images of its own technical reports, and create bibliographic records
that provide a descriptive index to the technical reports, including
pointers to the images.  The bibliographic records would be exchanged
with each of the other institutions; each site would make available
locally the set of shared bibliographic records using its preferred
catalog system.

The goals and characteristics of this project proposal are:

1.  to obtain early experience with a core function of the
    distributed electronic library of the future,

2.  to work with a database that is readily available, that has a
    critical time-sensitive value, and that is already well-known and
    valued by its target audience,

3.  to explore the architecture, design, and work-flow issues
    associated with making information available in digital form.

4.  to work within the research/prototype domain with a volume of
    information large enough to be useful and interesting and that can
    scale to an operational system,

5.  to provide an important service to an audience of researchers,
    faculty, and students who are motivated and likely to have access
    to appropriately powerful workstations to use the library from
    their offices.

This document outlines briefly the plan for M. I. T. participation in
the project.  Our overall approach is to build a full-scale but
flexible testbed using real data in which we can try out ideas of
system modularity, work flow, data integrity, and data linking.
The testbed will include capture of scanned images of technical
reports and bibliographic descriptions of them as they pass through
the usual library work procedures, capture of full text of those same
reports as they pass through laboratory publication procedures, a
storage service for scanned images, full text, and bibliographic
materials with an interface that allows indexing by a variety of index
services, and at least one example of an indexing service.

A statement of work will be found as Appendix A.


II.  M. I. T. PARTICIPATION PLAN


The M. I. T. component of this project involves research on system
structure for large quantities of data using advanced technology,
while at the same time maintaining an operational prototype loaded
with useful data available for real use both within M. I. T. and across
the internet.  This blend of research and experimental service
delivery is intended to assure that the system design that results
from the research is workable in the real world. 

From a research point of view, the challenge is to organize a system
that contains quantities of data measured in terabytes, to be preserved
for periods measured in decades.  Several interesting ideas for
tackling these challenges will be tried as part of this project.

Organizationally, this project represents an opportunity for M. I. T.
to try out integration of its several information production and
dissemination activities in a way that has been frequently discussed
but never before feasible, because of differing goals, standards, and
levels of automation.  Currently, M. I. T. technical reports and
theses are handled by many different ad hoc mechanisms, most of which
involve being on-line at one stage or another, but none of which are
completely electronic.  The various laboratories and departments
produce reports, the M. I. T. Library System catalogs, archives, and
in some cases distributes them, and special offices such as Industrial
Liaison bring them to the attention of their member communities.
Students discover them, but often only accidentally and too late to be
of value.  A specific goal in this project at M. I. T. is to return to
the Institute, to the research community, and to sponsoring agencies
ready access to the knowledge that it creates.  This project is
feasible now because of advances in system architectures, network
connectivity, and a shared vision among the participants in this
proposal.  This project will enable M. I. T. to explore and establish
the structure which could be applied to a production service
extensible to other types of knowledge.

The medium of an electronic library with on-line images of the pages
of the technical reports is a vehicle that can bring these diverse
production and dissemination activities into an integrated and, we
hope, much more effective form.  By building this system, M. I. T.
expects to be able to step well ahead of where it would be by natural
evolution of the current diversified approach.  The system will serve
as a foundation for future development and expansion of information
services.

M. I. T.'s participation in the project will be a joint effort
between two recently-initiated projects, Library 2000 and the
Distributed Library Initiative.  Library 2000, at the M. I. T.
Laboratory for Computer Science, intends to explore how computer
technology might support the electronic library five to ten years from
now.  The Distributed Library Initiative, a project of the M. I. T.
Library System and M. I. T. Information Services, has a shorter-term
goal of exploring how to apply computer technology to improve library
access in the next one to three years.

The centerpiece of the Library 2000 project is a prototype system that
is initially implemented with 1992 technology but using an
architecture that is hypothesized to be suitable for the future.
Participation in the project will be accomplished partly by expanding
and extending this initial prototype system.  Technically, there will
be a focus on two areas:  the special problems of managing large scale
archival storage, and on discovering simple, low-effort, high-payoff
ideas for making immediate widespread use of the images and associated
text.  Operationally, the Library System will provide scanning
services, a real-life testbed for the organizational aspects, and
delivery of the services to the M. I. T. community.

Looking ahead, the Distributed Library Initiative provides a mechanism
to prepare M. I. T. for the time when this project is completed.  It
establishes a commitment at M. I. T. to continue this activity, not to
declare victory and pull out after three years.  If the goals of the
project are met, we will have built a structure that can encompass
Technical Reports, theses, and other technical information from all
areas of study, and the M. I. T. Library System will become the place
that will manage and operate that structure.




III.  THE RESEARCH COMPONENT

This project is organized around two time frames.  As described below,
in the short term, we intend to make available an online resource of
computer science technical reports.  For the longer term we hope to
develop an architecture that can scale up to deal with all research
publication, and eventually to an on-line library of any size.  While
ad hoc approaches may be able to deliver the short-term resource, we
believe that longer-term increases in scale will require that the
architecture be systematically organized with several additional
issues, requiring research, in mind.  The research component of the
project thus involves discovering solutions to these longer-term
problems in the context of the short-term delivery system.  Four
specific research problems we intend to address are system flow,
appropriate modularity, data integrity, and data linking.

    - System flow.  This term refers to understanding and solving the
problem of scale and production.  Our proposal goes beyond a
demonstration, to include building a framework for large-scale,
ongoing, production level control and delivery of TR's, with attention
to the organizational and workflow analysis to run such an operation.
We also intend to analyze how this material is used and how effective
it is for the audience in terms of timely distribution of TR's without
regard to physical location.  Although timeliness and independence of
location seem rather simple goals in an age of networks, there is a
raft of design issues to be settled in order to achieve them.  Our
method here will be to implement in our testbed a specific system flow
framework that captures the production requirements as seen by M. I. T.,
and at the same time plugs in to the overall distribution and access
system design worked out in concert with other project members.

    - Appropriate modularity.  It is not yet clear exactly what
functional division is appropriate in distributing presentation
management, indexing, storage, and collection management across
distinct network-attached components.  This modularity extends to the
organizational model for collection management--what segments of this
system/process are best centralized and what are best distributed;
what are the architectures necessary to ensure effective and viable
handling of collections and bibliographic control.  The client/server
model is a helpful starting point, but it does not provide guidance as
to which components should retain what kinds of state.  We believe
that robustness is best achieved if the server maintains the minimum
amount of information about its clients, and we will test this
hypothesis.  The client/server model also provides no guidance as to
how to exploit RAM costs that will soon be 100 times lower than they
are today.  We have two technology hypotheses we hope to confirm:
that the projected price of magnetic disk storage 8 years hence will
make page image storage on that medium cheaper than storage on paper,
and that the projected price of RAM storage 8 years hence will make it
clearly appropriate to place full-text indexes entirely in RAM.
Again, our method here will be to try out architectures that are based
on these two technology hypotheses in our testbed system.

    - Data integrity.  There is a challenge in maintaining integrity
in a system that can potentially contain terabytes of material, in the
form of millions of files, over tens of years.  Copying that much data
accurately when a technology becomes obsolete and needs to be replaced
is already a challenge; keeping that much data accurate in the face of
media failures requires an approach very different from traditional
tape backup systems.  We intend to explore techniques of multiple
replication at sites that are widely separated, on the hypothesis that
this approach will prove much better-matched to the problem than are
traditional backup system designs.  To our knowledge, although many
workers have explored both the theory and practice of replication,
that work has had the goal to improving availability and performance
in the short term; noone has seriously proposed using replication in
place of traditional, complex, full- and incremental-backup systems.
We expect that the volume of information in a digital library will be
sufficiently large that the rate of decay in place may be comparable
to the rate of change and addition of new information; thus we expect
to provide an on-going process that continually reviews the data in
each replicant for integrity and makes any needed repairs by reference
to the other replicants.  Interestingly, adding such a process to the
system makes updates and additions to the database a challenge--the
integrity preserving process may throw out updates unless the update
procedure is carefully coordinated with it.  We intend to design and
implement in our testbed a coherent integrity-preserving system based
on replicants.

    - Linking.  A link is a cross-reference, placed in a data object,
to another data object, which may be elsewhere in the network.  What
makes links challenging is that the cross-reference may not be invoked
until several years later, and during the intervening time the target
object may have been involved in physical, logical, or administrative
reorganizations, and the target document itself may have been updated.
In addition, since one may collect cross-references from many sources,
it is important to be able to figure out which ones are duplicates.
It is our hypothesis that some combination of unique identifiers and
names will be required to meet these requirements.  The Internet
Engineering Task Force has recently begun work on a concept called the
Universal Document Identifier that is intended to tackle a significant
part of the cross-reference problem.  Several other candidate ideas
for handling cross-reference have been suggested in related contexts
such as the World-Wide Web and the Wide Area Information Service.

As suggested by the hypotheses, ideas are in hand for tackling each of
these problem areas, but none of these hypotheses has yet been tried
out on anything approaching the needed scale.


IV.  THE DATABASE

M. I. T. will extend the basic project proposal, which calls for
accumulating scanned images and catalog records for computer science
technical reports published after the project begins, in four ways:

     1.  By retrospectively collecting catalog records including
machine-readable abstracts, for most older technical reports and
technical memos from the Laboratory for Computer Science and the
Artificial Intelligence Laboratory.  Between the M. I. T. Library and
the laboratory publication offices, we have at least two independent
sources for many of these records, and we will explore the problem of
matching the sources and choosing the best available information.  We
thus expect to be able to provide to the project comprehensive and
accurate retrospective bibliographic records for these Technical
Reports.

     2.  By gradually scanning the entire 25-year collection of M. I. T.
computer science technical reports.

     3.  By including all library-deposited theses in computer
science.  In physical volume, this is a modest extension, because a
majority of such theses are already issued as technical reports.
However, it offers an opportunity to settle additional issues of
ownership and distribution rights and to get an advance look at the
possibility of putting online the M. I. T. Library System's comprehensive
collection of all M. I. T. theses.
 
     4.  By capturing full machine-readable text, when available, to
permit experimentation with full-text search.

Some of the materials of the extensions have long existed in
machine-readable form, but in different systems and for different
purposes.  The thing that is new in this proposal is to bring them
together under a common, systematic architecture.

In addition, if time and resources permit, M. I. T. also intends to
include two additional report series that are widely looked for but
hard to locate, L.C.S. Technical Memos and A. I. Laboratory Memos, as
well as a small number of monographs that have status equivalent to
laboratory technical reports.



V.  TIMETABLE

Because the project is a testbed, we expect that research on system
flow, modularity, linking, and data integrity will be iterative--it
will begin as soon as the testbed implementation is mature enough to
try out ideas, and several different iterations may be needed in each
area to discover which methods work out best.  The following timetable
therefore mentions only the first opportunity to begin work on each of
these research goals.

Year one:  Trial distribution of bibliographic records, followed by
production distribution.  Develop architectural prototype of retrieval
system, using initial modularity hypotheses.  Develop first example of
storage server protocol.  Upgrade scanning hardware systems in
library.  Initial tests of page images.  Work out initial system flow
plan.  Implement initial linking ideas.  Install replicated storage
server hardware using 2 Gbyte disks.

Year two:  Develop storage server protocol and window-based
presentation clients.  Begin trial service of page images, followed by
production service for page images on small scale.  Provide
retrospective bibliographic records for all older LCS and AI technical
reports.  Expand storage service hardware and geographically separate
it to begin experimenting with replication using geographic diversity.

Year three:  Upgrade storage service hardware to use 20 Gbyte disks.
Implement and try out data integrity algorithms.  Begin scanning of
retrospective collection of LCS and AI technical reports.  Evaluate
the way the system is used in practice, both by M. I. T. customers and
by remote users.  Prepare project final report.



VI.  AVAILABILITY POLICY

M. I. T. will make its software and the resulting data bases of
images, full text, and bibliographic records available to other
members of the DARPA-sponsored technical reports project as described
in the following paragraphs.  It will also acquire the bibliographic
data bases of other project participants and make them and the page
images provided by the other participants available to the M. I. T.
community.

This policy draws upon other related policies already in place at the 
Institute.  It applies only to the components associated with the 
DARPA project, though it may be used as a basis for developing similar 
policies for similar projects and services in the future.

M. I. T. reserves the right to modify these availability policies in the 
future.


1.  Bibliographic records

All bibliographic records generated by M. I. T. will be publicly
available.  They are provided "as is," and M. I. T. makes no
representation or warranties, express or implied, about them.


2.  Software

The software associated with delivering the services of the project 
falls into two categories -- software developed at M. I. T. and software 
licensed by M. I. T. from a third party.  

2.1  Software developed at M. I. T.

M. I. T. will grant to participants in the project license to use,
copy and modify, without fee or royalty, the software developed at
M. I. T. for this project, provided that the licensee satisfies 
specific conditions.  The text of the license appears in Appendix B.

2.2.  Software that M. I. T. licenses from a third party

To the extent that third party software may be used in delivering the 
services of the project, M. I. T. will comply with the license terms and 
conditions governing availability of the software to other 
participants.  By way of example but not limitation, M. I. T. makes no 
representations or warranties that software developed by M. I. T. will 
be operable without associated third party software that participants 
must license separately.

    
3.  Page Images

As an academic institution, M. I. T. is inclined to set policies that 
make the page images of technical reports as widely available as its 
ownership interests, the interests of the authors and sponsors, and 
the availability of its resources permit.  There are, however, a 
number of issues to be resolved about ownership and distribution 
policy both at the Institute and in concert with the other 
participants in the project, in order to formulate a statement of 
availability.

At M. I. T., we expect to pursue these issues energetically in the next
few months, so that satisfactory resolution of them for the purposes
of this project can occur expeditiously.  We hope that the other
project participants actively engage in discussion of these issues on
their individual campuses and create a forum for exploring them with
one another.


4.  Full Text

The full text form of technical reports is expected to be somewhat
problematical in that it may contain residual markup text, it may omit
graphics, it may contain errors arising from OCR, it may be derived
from a version of the original text different from the version used to
generate the page images, and in some cases it may be partially or
completely absent.  For these reasons, we expect to be somewhat
cautious about making full text generally available to end users.
However, M.  I. T.  will make the full text available to the other
project participants for the duration of the project under the same
conditions as page images, to allow research on alternate delivery
methods, alternate indexing strategies, and methods of dealing with
problems of correspondence between text and image.



VII.  STAFF

Planning, high-level design, and supervision of the project will be
the joint responsibility of senior members of the Laboratory for
Computer Science and the M. I. T. Library System.  Most of the research
and advanced development activity will be carried out by graduate and
undergraduate students of the Laboratory for Computer Science.
Design, implementation, and day-to-day operation of document scanning
activities will be done by staff of the micro-reproduction laboratory
of the M. I. T. Library system.  Staff technical support is also
needed to set up and run the electronic systems, and, finally, staff
time is required to do cataloguing.  The following staff levels are
anticipated:

(Percentages are of full-time equivalents):

  Library System
     20%    technical person, expert on scanning equipment
     80%    technical assistant for scanning and processing documents
     20%    supervision, MicroReproduction Laboratory
      5%    supervision, Library System

  Laboratory for Computer Science
     50%    programmer and computer system wizard
    100%    1 graduate research assistants
     40%    3 undergraduate student programmers
     20%    coordinator at reading room
     25%    faculty supervision
     12%    support staff
     10%    bibliographic assistant in publications office

  Artificial Intelligence Laboratory
     10%    bibliographic assistant in publications office

Appendix C provides the names of staff members and brief descriptions
of their roles.


VIII.  EQUIPMENT

The funds proposed will allow acquisition of three different kinds of
optical scanning equipment and associated workstations, a set of three
16 Gigabyte networked servers for image storage, memory upgrade for
three currently existing index/catalog servers, and eight development
and two access workstations.  The number of access workstations is
small because we anticipate that most access will use workstations
already in place at M. I. T. for other purposes.  In addition, funds
are included to contract for vendor standard hardware maintenance for
the duration of the project.

The amount of image storage hardware proposed is based on the current
Technical Report volume of the computer science laboratories at M. I.
T., about 200 publications per year, with a total of 12,000 scanned
pages.  We have also incorprated a data volume estimate from
Carnegie-Mellon University of 0.1 Mbyte/page when using Group IV FAX
compression algorithms.  We intend to provide space at the outset to
hold acquisitions for three years as well as retrospective scanning of
a 72,000 page backlog.  An increase of disk space by a factor of
three, accomplished by upgrading disk technology, is planned for the
third year, to allow incorporation of computer science theses and
technical memoranda of the two laboratories.

Appendix D provides detailed equipment schedules and estimated costs,
and these costs are also incorporated in the complete project budget,
Appendix E.  The costs listed for equipment other than scanners are
calculated assuming contributions in the form of deep discounts from
hardware vendors in return for joint participation in the project.
Approvals for purchase of specific hardware items will be requested
after participation agreements are negotiated with these vendors and
actual costs are better known.  Negotiations are underway with both
Digital Equipment Corporation and IBM Corporation.  Since both vendors
have already contributed to predecessor projects, it is anticipated
that both will contribute to this project.


APPENDIX A

STATEMENT OF WORK

1.  The M. I. T. Laboratory for Computer Science and the M. I. T.
Library System will jointly develop and implement a network-attached
testbed system for an electronic digital library.  The design of this
system will be developed in conjunction with four other universities
under the coordination of the Corporation for National Research
Initiatives.

2.  Using this testbed, M. I. T. will carry out a research program in
four areas:  system flow, system modularity, data integrity, and data
linking.

3.  M. I. T. will place in this testbed a database of technical
reports from the M. I. T.  Laboratory for Computer Science and the
M. I. T.  Artificial Intelligence Laboratory, including page images,
full text, bibliographic descriptions, and abstracts, and make this
information available to the other project members, and also (with
restrictions appropriate for the various categories of information
involved) to the Internet community.

4.  M. I. T. will attempt to expand the database to include previously
published technical reports of the two laboratories, M.  I. T.  theses
in computer science, and any other related materials that M. I. T.
deems appropriate.

5.  M. I. T. will work with CNRI and the other project members to
develop and refine Terms and Conditions for use of M. I. T.-provided
materials and more generally to develop models for such Terms and
Conditions that might be widely applicable.

APPENDIX B

LICENSE



Here is the text of the license, applicable to other participants, for
use of software developed for this project at M. I. T.:  [When the 
font permits, the international copyright symbol -- c in a circle -- 
must be inserted in the third paragraph between the word "Copyright" 
and the date.]: 

This software is being provided to you, the LICENSEE, by the 
Massachusetts Institute of Technology (M. I. T.) under the following 
license.  By obtaining, using, or copying this software, you agree 
that you have read, understood, and will comply with these terms and 
conditions:

Permission to use, copy, and modify this software and its 
documentation for any purpose and without fee or royalty is hereby 
granted, provided that you agree to comply with the following 
copyright notice and statements, including the disclaimer, and that 
the same appear on ALL copies of the software and documentation, 
including modifications that you make for your use:

Copyright 199_ by the Massachusetts Institute of Technology.  All 
rights reserved.

THIS SOFTWARE IS PROVIDED "AS IS" AND M. I. T. MAKES NO REPRESENTAITON 
OR WARRANTIES, EXPRESS OR IMPLIED.  BY WAY OF EXAMPLE, BUT NOT 
LIMITATION, M. I. T. MAKES NO REPRESENTATION OR WARRANTIES OF 
MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE 
OF THE LICENSED SOFTWARE WILL NOT INFRINGE ON ANY THIRD PARTY PATENTS, 
COPYRIGHTS, TRADEMARKS, OR OTHER RIGHTS. 

This software is provided "as is" and M. I. T. makes no representation 
or warranties, express or implied.  By way of example, but not 
limitation, M. I. T. makes no representation or warranties of 
merchantability of fitness for any particular purpose or that the use 
of the licensed software will not infringe any third party patents, 
copyrights, trademarks, or other rights.

The name of the Massachusetts Institute of Technology or M. I. T. may 
NOT be used in advertising or publicity pertaining to use of 
the software without explicit, written, prior permission.  Title to 
copyright in this software and any associated documentation shall at 
all times remain with M. I. T., and LICENSEE agrees to preserve same.

APPENDIX C

STAFF



The investigation under this proposal will be carried out under the
supervision of,

For the M. I. T. Library System, T. Gregory Anderson, Associate
Director for Systems and Planning.

For the M. I. T. Laboratory for Computer Science, Jerome H. Saltzer,
Professor of Computer Science

The other personnel involved will be:

Mitchell Charity, Staff Programmer, Laboratory for Computer Science.
Primary designer and implementer of all software support systems for
the project.  Systems manager for storage and index servers and
development workstations.  Responsible for supervision of
undergraduate research students.

Maria Sensale, Supervisor, LCS/AI Reading Room.  Responsible for
coordinating participation of the LCS/AI reading room and integration
of the bibliographic records from this project with bibliographic
records from other sources.

Keith Glavash, Head, M. I. T. Microreproduction Laboratory.
Responsible for overall management and supervision of the scanning
activity.  This includes systems planning, communication with the
other parties involved, maintenance of production statistics, and
fulfillment and reporting of project goals.

Michael Cook, Advanced Microfilmer, Microreproduction Laboratory.
Responsible for report scanning, indexing (or index linking), quality
control, and transmission of image and index files to network servers.
Will carry out these functions both for new reports received in hard
copy and, retrospectively, for reports already in microfiche format.
Responsible for maintenance of scanning statistics.

Lindsay J. Eisan, Production Supervisor, Microproduction Laboratory.
Responsible for all technical matters relating to the scanning system
design.  Will plan and coordinate hardware and software setup, network
interfaces, determine maintenance needs, and serve as the technical
contact with LCS and Libraries personnel.

Lisa Eastman, Senior Secretary, Laboratory for Computer Science.
Administrative support for the research group.

Undergraduate Students.  To be assigned.

Graduate Students.  To be assigned.

APPENDIX D

ESTIMATED HARDWARE BUDGET



The following estimates are based on 1992 catalog list prices.

1.  Three image storage servers (Digital Equipment Corporation)
                                                       1/1/93  7/1/93  7/1/94
    Each server                                price
      DECSystem 5000/240 w/64 Mb RAM servers  16,995
      2 SCSI turbochannels                     3,150
      10 RZ58 (1.4 Gbyte) magnetic disks      55,500
          in SZ16 enclosures
      TLZ204 DAT drive                         5,500
      VT420 Terminal & Keyboard                  629
                          one server          81,774
                          three servers      245,322
    Accessories
      RRD42 CD-ROM drive                         995
      system Documentation on CD-ROM             690
      
    Total server price                       247,007
    Purchase schedule, with discount                   61,945

2.  Third-year upgrade to 3.6 Gbyte disks

    30 disk drives @ 5K                      150,000
    Estimated cost after discount                                     37,500

3.  Index servers (IBM Corporation)

    RAM and CPU upgrades for three
         current index servers               200,000
    Estimated cost after discount                          0

4.  Workstations, for development and
    demonstration access
    (Digital Equipment Corporation)
    PM300-EK DECstation 5000/25 w/ 8MB RAM,
    426MB disk, VRT19"color, HX               10,645
    MS01-CA 16MB RAM                           2,560
                          one workstation     13,205
                          six workstations    79,230
    Estimated cost after discount                       6,603  13,205

5.  Scanning equipment

    1 Alosview CorVette Scan Station          25,000
    1 Seaport Imaging Scanner Controller       7,040
    Hardware integration and software (est.)   4,000
    1 low volume, high-resolution scanning
         system, vendor TBD                    8,000
          Total cost of scanning systems               25,000  19,040

Total hardware cost, by year                           93,548  32,245  37,500



Hardware maintenance for duration of project
    following warranty expiration, estimated
    as 5% of list price per year in force
         Year 1 (7 mo, list price $200,000)             5,833
         Year 2 (6 mo, list price $200,000,
                 6 mo, list price $698,610)                    22,465
         Year 3 (12 mo, list price $770470)                           38,523



APPENDIX E

Budget

                             10/92-6/93   7/93-6/94  7/94-6/95     Total
Salaries (Faculty,
staff, students,               133,067    173,263    178,453     484,783
and support staff)

Employee benefits
 @ 41.65, 41.9, 42.15, 42.4%    55,446     63,710     65,803     184,959

Other direct costs
  Equipment                     93,548     32,245     37,500     163,293
  U.S. Travel                    4,283      9,583     10,130      23,996
  Foreign Travel                     0          0      2,113       2,113
  Materials & Services           3,000      4,200      4,410      11,610
  Communications                 6,500      8,400      8,820      23,720
  Computer Resource Services    20,833     38,215     55,061     114,109
  Lab M&S allocation             4,775      5,524      6,391      16,690

Total direct costs             321,452    335,140    368,681   1,025,273
                              ==========================================
Modified Total Direct Costs    204,793    277,415    304,364     786,572

Indirect costs
 @62, 62.5, 63, 63.5%          126,972    173,385    191,749     492,106

Total costs                   $448,424   $508,525   $560,430  $1,517,379


Supporting information, not to be included in proposal.

1.  Expected Travel

Professional conferences to present papers and keep up with other
research work in the area (specific conferences and locations not yet
determined).  CNRI meetings to coordinate with other project
participants:  1-day meetings in Reston, Va. or at one of the project
sites.  Budgeted at $500 for east coast trips, $1000 for west coast
trips, $1500 for Europe.  Allocation:  2/3 LCS, 1/3 Library.

Year 1
     1 professional conference, 1 person (1 east coast trip)         500
     2 CNRI meetings, 2 persons (4 east coast trips)                2000

Year 2
     3 professional conferences, 1 person (2 east, 1 west coast)    2000
     3 CNRI meetings, 2 persons (4 east, 2 west coast)              4000

Year 3
     3 professional conferences, 1 person (2 east, 1 west coast)    2000
     1 professional conference, 1 person (1 Foreign trip/Germany)   1500
     3 CNRI meetings, 2 persons (4 east, 2 west coast trips)        4000
Return to Library 2000 home page.