Scanned document record, version 1.0 (Second Draft) May 27, 1994 By Jerry Saltzer The objective of this document is to define the information that should be captured when a document is scanned, as an on-line record that becomes a component of the scanned form of the document. The format of the information is not defined here, only its content; even that is defined only by example. The format should probably be records with named fields. The objective of the scanning record is to capture information that is not explicit in the scanned image, yet is needed: 1. to view, display, or print the image properly. 2. to understand how to interpret the image. 3. to meet contractual or legal requirements. The information that the scanning record captures falls into four general categories: scanning conditions, document source information, an array of image map entries, and miscellany. The first item captured is the version of scanned document record, to accommodate changes in standards for capturing scanning information. This being a preliminary proposal, it probably omits information items that should be captured, and some of the information items suggested may be unnecessary. 0. Scanning record version: 1.0 I. Scanning conditions: Name of scanning condition set: standard Scanner: Fujitsu 3076G with 8191 document feeder Software: Optix version 3.1 Resolution: 400 dpi, 8-bit grey Settings: automatic brightness/contrast Organization: M.I.T. Document Services Operator: Jack Eisan Date: March 17, 1994 Notes: End of set: standard Name of scanning condition set: color Scanner: HP IIcx Software: Photoshop 2.5 LE and DeskScan version 2.0 Resolution: 400 dpi, 24-bit color Settings: automatic brightness/contrast/color-balance/gamma Organization: M.I.T. Document Services Operator: Jack Eisan Date: March 17, 1994 Notes: Scanner replaced by factory March 15, 1994. End of set: color II. Document source information Document Label: MIT LCS TR-87 Name of source: Source1 Description: typed sheets on thesis bond, intended for duplex reproduction. Size: 8.5 by 11 inches Count: 121 sheets Duplex: no From: LCS publications office. Date: August, 1964 Notes: Page three includes a tipped-in color photograph. end of source: Source1 Name of source: Source2 Description: offset reproduction Size: 8.5 by 11 inches Count: 61 sheets Duplex: yes From: MIT Library system, Archive copy. Date: circa 1980 end of source: Source2 Name of source: greytest Description: IEEE standard GS-1994.2 grey-scale target Size: 8.5 x 11 inches end of source: greytest Name of source: colortest Description: Kodak sQ-13 color separation card Size: 3 x 8 inches end of source: colortest Name of source: blank Description: Document Services standard blank page replacement end of source: blank III. Scanned image map: image name source sheet/side original condition set notes pagination MIT LCS TR-87 1 greytest -- standard MIT LCS TR-87 2 Source1 1/1 1 standard MIT LCS TR-87 3 Source1 2/1 2 standard MIT LCS TR-87 4 Source1 3/1 3 standard MIT LCS TR-87 5 colortest -- color MIT LCS TR-87 6 Source1 3/1 3 color 1 MIT LCS TR-87 7 blank (4) standard MIT LCS TR-87 8 Source2 4/1 5 standard ... MIT LCS TR-87 249 Source1 121/2 241 standard IV. Miscellany Copyright notice: none Credits: This document was scanned under a grant from the Carnegie Foundation. Notes: 1. Image outside the outline of the color photograph was digitally masked out. ------------------------------------------------ Comments about the above example... The example image map indicates that Sheet 3 of source 1 (the one with the tipped-in color photo) was scanned twice, once with the grey-scale scanner and once with the color scanner; both images are included in the scanned version of the document. Question: in the grey-scale image, should the picture be replaced with a note saying that there is a color image available? Ideally, we should have enough information here that a clever browser can put the page back together again on the screen. Suggestions are in order!) The back side of sheet 3 of source2 was blank, and therefore replaced with a blank target. Since the next sheet carried the page number 5, page number 4 is implied, which is indicated by placing it in parentheses. If the next sheet had carried the page number 4, the original pagination would have instead been shown as "--". It is not clear whether or not a blank target would have been appropriate if the original typed form had been intended for single-side reproduction, with all pages consecutively numbered, but the reproduction was done on two sides, with blank, unnumbered, pages introduced as necessary to get chapters to start on right sides. No doubt such cases will show up. Acknowledgement: This note is expanded from a set of ideas originally developed at a Library 2000 group meeting on March 17, 1994. Discussants: Jack Eisan, Mitchell Charity, Ali Alavi, Sally Richter, Mary Anne Ladd, Jeremy Hylton, Geoff Seyon, Eytan Adar, Greg Anderson, Jerry Saltzer.