README02.TXT, a description file included on every CSTR CD.


M. I. T. Laboratory for Computer Science
M. I. T. Libraries Document Services
Computer Science Technical Report Project

By Jerome H. Saltzer, Lindsay J. Eisan, and Mitchell N. Charity
June 5, 1996


Background

The Computer Science Technical Report (CS-TR) project is is a joint
undertaking of Carnegie-Mellon University, Cornell University, Stanford
University, the University of California at Berkeley, the Corporation
for National Research Initiataives, and M. I. T. to create a cooperative
on-line library of scanned page-images of Computer Science technical
reports.  

Over a period of two years, M. I. T. has done high-resolution (400
pixels per inch, 8 bits per pixel, grey scale) scanning of about 1000
technical reports and technical memoranda of the M. I. T. Laboratory for
Computer Science and the M. I. T. Artificial Intelligence Laboratory.
This collection of CD's preserves the raw, uncompressed scanned images.

This work was supported in part by the IBM Corporation, in part by the
Digital Equipment Corporation, and in part by the Corporation for
National Research Initiatives, using funds from the Advanced Research
Projects Agency of the United States Department of Defense under grant
MDA972-92-J1029.


Processing overview

All the pages of a single document were scanned into the memory and
thence to the disk of an Apple Macintosh Quadra 840AV running system 7.1
or 7.5.  A scan record describing the scanning process, the scanned
document, and the resulting digital images, was prepared as an Excel
spread sheet, then written to disk as a text file. Checksums of all
images and the scan record were calculated and written to disk as a text
file. Finally, the collection of images and supporting files for a
technical report was copied to Digital Audio Tape using the Dantz
Retrospect Remote backup system.  A second, non-archival, copy of the
data was was processed by reducing its resolution to match available
displays and printers, and made available for on-line delivery.

For preparation of this preservation CD, Retrospect restored the
contents of a Digital Audio Tape to disk.  An Applescript program
regrouped the files into folders each containing up to 43 images.  The
script also prepared UNIX and DOS versions of each text file
(substituting end-of-line and double-quote characters), renamed the
files as described below, and replaced the scan specification document
with a more readable version.  The resulting files were recorded to CD
using a Yamaha CDE100 recorder operating at 4X speed and operated by
Toast 2.5.9.


Contents of this CD

The CD contains a single session, consisting of a file system written in
strict ISO 9660 level one format, with file type and file creator codes.
A list of the codes used appears below.  The file system is laid
out with four directories (folders) at the root level as follows:

 INFO_MAC               separates lines
      README02.TXT;1   This file
      SCANREC.TXT;1    ASCII description of the scanned document
      CHECKSUM.TXT;1   ASCII file containing checksums calculated 
                       shortly after scanning.
      OPTIX.BIN;1      Binary summary file produced by scanning software
      SCANSP           folder containing specifications of SCANREC.TXT
        SCANSP14.TXT;1   ASCII description, version 1.4
        SCANSP13.TXT;1   ASCII description, version 1.3
        SCANSP12.TXT;1   ASCII description, version 1.2
        REVHIST.TXT;1    ASCII description of revision history

 INFO_DOS               separates lines
      (same contents as INFO_MAC)
   
 INFO_UNX               separates lines
      (same contents as INFO_MAC)

 K_OF_N                 empty folder; the digit "K" being the number
                        of this disk in this report and "N" being the
                        number of CD's this report occupies.

 IMAGES01      numbered image scans, in TIFF format, uncompressed
      001.TIF;1
      002.TIF;1
      etc.
   
For reports that comprise more than 43 scanned images, each CD contains
a complete set of README and INFO files and the empty 1_OF_N folder. The
first CD, in folder IMAGES01, contains images 001-043; the second CD, in
folder IMAGES02, contains images 043-086, etc.

Note that the file names appearing in SCANREC.TXT and CHECKSUM.TXT are
those of the original Macintosh files, not the ISO 9660 file names used
on the CD.  Here is a typical mapping:

   mit-ai-tr-974-srec.txt     -> SCANREC.TXT;1
   mit-ai-tr-974-image-12     -> 012.TIF;1
   mit-ai-tr-974-0            -> OPTIX.BIN;1
   scanrec.cstr.1.3.txt       -> SCANSP13.TXT;1
   mit-ai-tr-974-checksums    -> CHECKSUM.TXT;1

Note also that the checksums in CHECKSUM.TXT are valid only for the
image files and the Macintosh version of SCANREC.TXT.


Disk naming convention:  Each CD is assigned an ISO-compliant name
consisting of two letters that indicate the source and series, the
digits of the technical report number (including any letter suffix), an
underscore (if space is available), and a one-digit disk number.

Sources:
   A    M.I.T. Artificial Laboratory
   L    M.I.T. Laboratory for Computer Science (Formerly Project Mac)
Series:
   R    technical report
   M    technical memorandum

Thus the second CD of MIT-AI-TR-974 is named AR974_2.


File types and creator codes:

                   File type    creator

text files          TEXT         ttxt
image files         TIFF         PIXL
OPTIX.BIN           PIXD         PIXL


Revision history:

README01.TXT -> README02.TXT

1.  Change Disk naming convention to use name derived from TR number.

2.  Clarify description of K_OF_N folder naming convention.

--------------------------- END -------------------------------------


Return to Library 2000 home page.