By defining a digital document and creating a standard for its transmission, it is possible to solve these problems. By allowing the content to be in any format, but having the digital document describe the format, it is possible to use any format, present or future, for the content itself. By creating a standard way to describe the document's meta-information and content format, it is possible to transmit the document without having to worry about what format it is stored in. Finally, documents can be stored and delivered in multiple formats, making it possible to view both the scanned images and ASCII text of a document.
What exactly do we mean by a document? A document can be a book. It can be a magazine or periodical. It can be a bibliography of a technical field. A technical report is a document. Instruction manuals, TV listings, newspapers, rulebooks, and catalogs can all be considered documents. What do we mean by a digital document, or a document stored on a computer? A digital document can be composed of a collection of images in GIF, JPEG, TIFF or other format. It can be a Postscript, LaTeX or ASCII text file or set of files. It can be a set of images, the ASCII text, and a text to image map that relates the two. It can include sound. In short, there is no easy definition for a digital document. Nor do digital documents correspond well to what we call documents in the physical world.
For these reasons, we need to create a new type of object, called a digiment, for digital document. Defining a digiment will allow us to achieve some of the goals expressed above. It will provide a standard way to pass around digital documents, regardless of the format in which the content itself is stored. It will provide a standard for archiving and retrieving digital documents. It will provide for the use of multiple representations of a single "document" that can be linked together, as with a text to image map. It is intended to accommodate the use of future data formats transparently. And it will provide meta-information about a document, such as author, publisher, and copyrights. For comparison, in a book, the text on the pages of the book is the content, while the meta-information is found on the inside of the cover page, such as ISBN number, copyright and publisher.
The digiment standard is not a specific data format for storing or transmitting the content of digital documents. Rather, it is a container for transmitting or storing documents in arbitrary formats. A digiment consists of the data itself, which can be in any form, along with some associated meta-information structural information. This additional information is what differentiates a digiment from a simple set of images or data files. The structural information specifies the format that the data itself is stored in, whether it is image, text, Postscript, etc. It also specifies how the different data parts of the digiment are related to each other. The meta-information includes bibliographic information about the digiment, such as the author, publisher, copyrights, distribution rights, and relationships between different formats. By specifying the digiment in this way, it is possible to use any type of format for storing the actual data, including formats which may be created in the future.
By creating a digiment MIME type, we can separate the concept of a digiment from the actual representation that the data uses. The digiment MIME type contains a description of the format that the actual data uses, other formats that are available, how the formats are linked, and all the other meta-information that is contained in the digiment.
MIME types consist of a type and subtype pair, in the form "type/subtype". The type specifies the main class that the MIME object belongs to. The subtype specifies which specific type the object is within that class. The MIME standard already has a class that describes objects with multiple parts; "multipart". By defining a subtype "digiment", we are able to define a new MIME type, "multipart/digiment", that describes an object with multiple objects inside. We also define a new type that corresponds to one specific part of a digiment; "application/digiment".
A given MIME type/subtype specifies the format of the data that is being transferred. By specifying a data transfer as "multipart/digiment", a client would know to launch a digiment-aware browser that could then interpret the digiment as a true document, instead of simply a set of pages of text.
A typical scenario for the use of a digiment browser is as a helper application for Mosaic. When Mosaic (or any other WWW browser) receives data with a digiment MIME type, it passes the data stream to the digiment browser, which can then display and manipulate the digiment. A good digiment browser should be optimized for display speed and ease of use, and support the full digiment MIME specification.
The digiment browser described in this document uses a WWW browser as its front end. The browser itself runs as a script on a WWW server that takes a pointer to a digiment as an argument. The script then generates HTML pages that can be displayed by any WWW browser, along with generating buttons to navigate around the digiment. The advantage of this approach is that it is platform independent, as you can view the digiment from any platform that has a WWW browser available.
1. The system is designed to be widely and cheaply distributed in a short time frame. The browser will be made available at the conclusion of the research. This system is intended to be a standard for digital document use.
2. The system will be extensible to future data types and meta information. This document describes version 1.0 of the digiment MIME type, which will be the current definition of a digital document. However, the standard will be written in such a way that it will be possible to create newer versions of the MIME type that incorporates additional data and types, transparently to existing servers and client.
The remainder of this work consists of seven chapters plus an appendix. Chapter Two describes related work in this field. Chapter Three contains a description of what a digiment is from a conceptual level and the decisions that went into defining its format. In Chapter Four I define both the multipart/digiment and the application/digiment MIME types and describe how to use them. Chapter Five describes the system that I have created to generate digiments on the fly from Technical Reports on-line as part of the Library 2000 project. Chapter Six discusses the design and implementation of the WWW based digiment browser. Chapter Seven is a discussion of future directions for this work and possible extensions. Chapter Eight analyses the process of defining a new standard and trying to define what a digital document is and should be. Finally, Appendix A contains a formal description of the application/digiment and multipart/digiment MIME types, while Appendix B contains a sample digiment.