Within each document directory, there are several items. Almost every TR available has an RFC1357 bibliographic record stored there. If a Postscript version is available, it too will be stored in this directory. If the document was scanned, a copy of the scanned document record will be available here. Finally, a subdirectory exists for each different format for which images are available. By looking at the contents of this directory, it is possible to determine what formats a TR is available in and to access each of these parts.
Next, the documents are processed and placed on-line. Postscript versions are compressed and put into the proper directory. However, scanned documents are first processed into different resolutions and then stored in the proper directory. This process also generates a PIF, described below, for each resolution. Although the image processing was already part of the Library 2000 system, the PIFs were created specifically to support generation of digiments.
Finally, when a digiment is requested, the server runs the create-digiment program, which generates a proper digiment and returns it. The rest of this chapter describes the functioning of this program.
Version: 1.0 Processing-comments: Tue Apr 11 09:57:37 1995 Image: 1 image/gif;dpi=100;bits=5 unnumbered doccontrol Image: 2 image/gif;dpi=100;bits=5 unnumbered doccontrol Image: 3 image/gif;dpi=100;bits=5 unnumbered unnumbered;title page Image: 4 image/gif;dpi=100;bits=5 2 numbered 2 Image: 5 image/gif;dpi=100;bits=5 3 numbered 3A similar PIF would be created for the 600 DPI TIFF images.
The create-digiment program generates page-lists, page-maps, and body-lists, as well as a part-list and a bibliography part.
The pif2digiment program scans through the entire PIF, generating a page-list and a page-map. Although these two parts are described separately, the actual processing takes place simultaneously so that the program does not need to scan through the PIF twice. When the program has scanned the entire PIF file, it prints out the page-map part, and then the page-list part, separated by an encapsulation boundary.
http://server.name.here/Server/TR/technical-report-ID/Page/#?type=data/typeThe values in italics are replaced by the values as explained in Table 5.1 on page 52.
Table 5.1: Values for Dienst URL to access a page
-----------------------------------------------------------------------
Value name Meaning
-----------------------------------------------------------------------
server.name.here The name of the Dienst server that stores the doc
ument
technical-report-ID The CSTR ID for a technical report that is stored
on the server
# The image number requested
data/type The format in which the image is requested
-----------------------------------------------------------------------
The portion of the URL that is identical for all images, and therefore can be placed in the URL-stem header is everything up to the page number. So a typical URL-stem for a MIT TR would look like this:
URL-stem: http://cstr-http.lcs.mit.edu/Server/TR/AI-TM-1066/Page/Next, the content-type header is created. Since all the images are assumed to be of the same type for this program, the content-type from the first entry in the PIF is extracted and used for this field. The last header, page-map, contains the Content-ID of the page-map that is simultaneously generated from the same PIF file.
The rest of the page-list contains an entry for each image as specified in section 4.4.5 on page 40. The entry is constructed from the information in the PIF. The image number from the PIF is used as the VSN for the page. The remainder of the URL is constructed by concatenating the image number, the string "?type=" and the format type to complete a Dienst URL as described above. The page description field is created by looking at the image format; if it is a 600 DPI image, the field is entered with "Printable image". With 100 DPI images, the field contains "Viewable image".
The first field in a Map: entry is the same VSN that is used in the page-list, and also comes from the PIF image number. The next two fields are the page content and page description fields. These fields are created by looking at the page number field of the PIF. If the page is a numbered page, the page content field is filled with the page number, as is the page description field. If the page number is unnumbered, then the entry depends on the description field of the PIF. If the description field contains "title page", then the page number field becomes "title page", and the page description field becomes "Title page". If the description field contains "blank", then the page number field becomes "unnumbered", as does the page description field. Any other value in the description field causes the page content field to become "supporting", while the page description field gets the value to the PIF description field. A summary of this is presented in Table 5.2 on page 53.
Table 5.2: Mapping from PIF to page-map fields
------------------------------------------------------------------------
Page number Description Page content Page description
------------------------------------------------------------------------
any number anything Page number field Description field
unnumbered title page title page Title page
unnumbered blank unnumbered unnumbered
unnumbered anything else supporting Description field
------------------------------------------------------------------------
http://server.name.here/Server/TR/technical-report-ID/Body?type=data/typeThe values in italics are replaced by the values as explained in Table 5.1 on page 54.
Table 5.1: Values for Dienst URL to access a page
-----------------------------------------------------------------------
Value name Meaning
-----------------------------------------------------------------------
server.name.here The name of the Dienst server that stores the doc
ument
technical-report-ID The CSTR ID for a technical report that is stored
on the server
data/type The format in which the body is requested
-----------------------------------------------------------------------
The portion of the URL that is identical for all body parts, and therefore can be placed in the URL-stem header is everything up to the question mark. So a typical URL-stem for a MIT TR would look like this:
URL-stem: http://cstr-http.lcs.mit.edu/Server/TR/AI-TM-1066/BodyNext, the content-type header is filled in. The current version of the program only allows a value of "application/postscript". Finally, a single Body: line is created with a VSN of 1, a URL value of "?type=application/postscript", and a Page Description field of "Postscript document". Since the URL-stem line uniquely identifies the TR, this line is the same for all TRs and looks like this:
Body: 1 ?type=application/postscript Postscript document
Since digiments are created on the fly, the process does not start until a digiment has been requested. This is accomplished by requesting the body of a document from the server with a type of multipart/digiment, the base MIME type for a digiment. Thus, a typical URL requesting a digiment might look like this:
http://cstr-http.lcs.mit.edu/Server/TR/MIT-AILab:AIM-1066/Body?type=multipart/digimentWhen the server is processing the request, it notices that the type requested is multipart/digiment. At this point, the server calls the program create-digiment, with the ID of the TR as an argument.
The create-digiment program first resolves the ID into the directory path where the document is stored locally. The program then checks the directory to make sure that it is actually a valid directory which contains a document. If not, it returns an error. If the directory does contain a document, the program continues by generating a bibliography part. This part is then put into a list of application/digiment parts.
The create-digiment program then looks at each of the subdirectories of the document directory to see if they contain a PIF. If a PIF exists for that subdirectory, the program inserts it into a list of PIFs. Then, the program calls pif2digiment for each PIF that was found. Each call returns a page-list and a page-map which was generated from that PIF. Each of these parts are put into the list of application/digiments.
Next, the program checks to see if a Postscript file exists in the document directory. If it does, the program generates a body-list part for the Postscript.
Finally, the program generates a part-list from all of the other application/digiment parts that have been created. It then creates a MIME header for the multipart/digiment document and returns this followed by each of the application/digiment parts, separated by the appropriate encapsulation boundaries. The server takes this return value, which is a legal multipart/digiment, and returns it to the client. The actual response time varies depending on the number of pages and formats of the document and the load of the server, but is typically on the order of a few seconds.
The entire process is summarized briefly below: