Chapter 8

Previous chapter

Chapter 8

Insights and Lessons Learned

This chapter describes the insights gained by building a model digiment browsing system.

The digiment differs from other digital document systems in two key areas; making the distinction between the semantic content of a document and its representation, and creating a compound object that contains structural elements specifically designed for documents. Although some other systems implement one or both of these ideas to some extent, they focus on one more than the other, and in all cases neither of these ideas is central to the system.

In addition, several other key concepts are valuable when making a digital document system. The idea of a document ID, or handle, which does not point to a specific representation of a document, but instead references a collection of representation, is essential to deal with a document which may come in multiple forms. The use of MIME types allows easy integration into existing standards, such as the World-Wide Web. Finally, the ability of a digiment to include the content by reference instead of embedding it allows all the necessary information to browse a digiment to be sent with a very low overhead. The browser can then choose which content parts it wishes to transfer at its convenience.

Although the digiment of the future may be based on other standards, such as OpenDoc, the key concepts and ideas that went into this system will be valuable in creating a generic, expandable system for transferring and viewing digital documents.

8.2 The Distinction Between the Semantics of a Document and its Representation

One of the most important ideas that was examined is the distinction between the semantic content of a document and its representation. Most systems that deal with "documents" either explicitly or implicitly define the document in terms of its representation. This leads to the problem of defining a GIF image version of a document and a TIFF image version to be two separate documents. However, this thesis explores the idea that the true definition of a document is should not be linked to a specific representation, but is rather the abstract information that the document is conveying. This idea does not just apply to different image formats, or Postscript versus text, but can be expanded to include any notion of "sameness". Thus, both English and French language versions of a particular text may be considered the same "document" in terms of the information that it is trying to convey.

Once we have separated the semantics of a document from its representations, we can talk about a document without having to choose a specific representation. However, to manipulate these new types of documents on-line, we need to define a new type of digital object. This new object is a digiment. The digiment contains all of the semantic information about a document, such as bibliographic information, as well as all of the representations of the document that are available. By referring to digiments, people can pass around the abstract notion of a document without having to be concerned with the representation, which is irrelevant to the information that the document is presenting.

8.3 A Compound Digital Object Specifically Designed for Electronic Distribution and Browsing

The other important idea that has come out of this work is the usefulness of a compound digital object that is specifically designed for electronic distribution and browsing. Systems such as OpenDoc and OLE allow the creation of compound objects that contain arbitrary data types. An OpenDoc program can then work with these objects by manipulating the data types that it recognizes, while letting other OpenDoc programs manipulate the types that it doesn't. The digiment improves on this concept by defining an object that is structured in such a way that it preserves the types of relationships among parts that are necessary for browsing a document. In fact, digiments could conceivably be stored as OpenDoc documents with the appropriate additional information stored along with the different representations.

A true document does not just consist of arbitrary data types which can be viewed in any order. Documents have notions such as "next" and "previous", which determine the order in which information should be presented. In addition, documents have the notion of "pages" and "sections". The digiment allows its contents to be structured into pages and sections, as well as preserve the "next" and "previous" relationships between the pages and sections. In addition, the digiment extends this structure to alternate versions of the same data. For example, page 12 may have greyscale and color images, while page 13 may have greyscale and color images, and a Postscript version of that page.

Furthermore, documents contain relationships between objects and sub-objects. Page 12 may have a figure embedded in it, which is also available in a separate, expanded image. A Table of Figures part could list all the figures available.

By defining structural elements that are specifically designed for documents, programs can edit and view digiments while preserving the actual structure of the document.

Previous chapter

Chapter 8

Insights and Lessons Learned

8.1 General Conclusions

8.2 The Distinction Between the Semantics of a Document and its Representation

8.3 A Compound Digital Object Specifically Designed for Electronic Distribution and Browsing

Next chapter