Scanned document record, version 1.0 (Second Draft)
May 27, 1994
By Jerry Saltzer

The objective of this document is to define the information that should be
captured when a document is scanned, as an on-line record that becomes a
component of the scanned form of the document.  The format of the
information is not defined here, only its content; even that is defined
only by example.  The format should probably be records with named fields.

The objective of the scanning record is to capture information that is not
explicit in the scanned image, yet is needed:

1.  to view, display, or print the image properly.

2.  to understand how to interpret the image.

3.  to meet contractual or legal requirements.

The information that the scanning record captures falls into four general
categories:  scanning conditions, document source information, an array of
image map entries, and miscellany.  The first item captured is the version
of scanned document record, to accommodate changes in standards for
capturing scanning information.  This being a preliminary proposal, it
probably omits information items that should be captured, and some of the
information items suggested may be unnecessary.


Scanning record version:  1.0

I.  Scanning conditions:

Name of scanning condition set:  standard
Scanner:  Fujitsu 3076G with 8191 document feeder
Software:  Optix version 3.1
Resolution:  400 dpi, 8-bit grey
Settings:  automatic brightness/contrast
Organization:  M.I.T. Document Services
Operator:  Jack Eisan
Date:  March 17, 1994
End of set:  standard

Name of scanning condition set:  color
Scanner:  HP IIcx
Software:  Photoshop 2.5 LE and DeskScan version 2.0
Resolution:  400 dpi, 24-bit color
Settings:  automatic brightness/contrast/color-balance/gamma
Organization:  M.I.T. Document Services
Operator:  Jack Eisan
Date:  March 17, 1994
Notes:  Scanner replaced by factory March 15, 1994.
End of set:  color

II.  Document source information

Document Label:  MIT LCS TR-87

Name of source:  Source1
Description:   typed sheets on thesis bond, intended for duplex reproduction.
Size:  8.5 by 11 inches
Count:  121 sheets
Duplex:  no
From:  LCS publications office.
Date:  August, 1964
Notes:  Page three includes a tipped-in color photograph.
end of source:  Source1

Name of source:  Source2
Description:  offset reproduction
Size:  8.5 by 11 inches
Count: 61 sheets
Duplex:  yes
From: MIT Library system, Archive copy.
Date:  circa 1980
end of source:  Source2

Name of source:  greytest
Description:  IEEE standard GS-1994.2 grey-scale target
Size:  8.5 x 11 inches
end of source: greytest

Name of source:  colortest
Description:  Kodak sQ-13 color separation card
Size:  3 x 8 inches
end of source:  colortest

Name of source:  blank
Description:  Document Services standard blank page replacement
end of source:  blank

III.  Scanned image map:

image name           source    sheet/side   original    condition set  notes

MIT LCS TR-87 1      greytest                  --         standard
MIT LCS TR-87 2      Source1      1/1           1         standard
MIT LCS TR-87 3      Source1      2/1           2         standard
MIT LCS TR-87 4      Source1      3/1           3         standard
MIT LCS TR-87 5      colortest                 --         color
MIT LCS TR-87 6      Source1      3/1           3         color           1
MIT LCS TR-87 7      blank                     (4)        standard
MIT LCS TR-87 8      Source2      4/1           5         standard
MIT LCS TR-87 249    Source1    121/2         241         standard

IV.  Miscellany

Copyright notice:  none

Credits:  This document was scanned under a grant from the Carnegie Foundation.

1.  Image outside the outline of the color photograph was digitally masked out.


Comments about the above example...

The example image map indicates that Sheet 3 of source 1 (the one with the
tipped-in color photo) was scanned twice, once with the grey-scale scanner
and once with the color scanner; both images are included in the scanned
version of the document.  Question:  in the grey-scale image, should the
picture be replaced with a note saying that there is a color image
available?  Ideally, we should have enough information here that a clever
browser can put the page back together again on the screen.  Suggestions
are in order!)

The back side of sheet 3 of source2 was blank, and therefore replaced with
a blank target.  Since the next sheet carried the page number 5, page
number 4 is implied, which is indicated by placing it in parentheses.

If the next sheet had carried the page number 4, the original pagination
would have instead been shown as "--".

It is not clear whether or not a blank target would have been appropriate
if the original typed form had been intended for single-side reproduction,
with all pages consecutively numbered, but the reproduction was done on two
sides, with blank, unnumbered, pages introduced as necessary to get
chapters to start on right sides.  No doubt such cases will show up.

Acknowledgement:  This note is expanded from a set of ideas originally
developed at a Library 2000 group meeting on March 17, 1994.  Discussants:
Jack Eisan, Mitchell Charity, Ali Alavi, Sally Richter, Mary Anne Ladd,
Jeremy Hylton, Geoff Seyon, Eytan Adar, Greg Anderson, Jerry Saltzer.