I also prepared a discussion of text-image maps with a slightly different emphasis for the Workshop on HTML+ at the First International Conference on the World-Wide Web. The position paper is available online.
An electronic library may store two different representations of documents -- the ASCII text and the images of the pages. Each contains valuable information. The ASCII text, to a large extent, represents the primary information content of the document, the words themselves; the text can be stored cheaply and indexed easily. People are used to reading _pages_ of text, which have a particular physical format that itself conveys information; documents can contain information not representable as text, like photographs and diagrams. While ASCII text may be the computer's preferred way of dealing with documents, people prefer the physically familiar page images.
If text and images are each important in some situations, a means of coordinating the information contained in each is equally important. The text-image map provides that coordination: letting a user view images of pages and perform operations on them, but letting the computer turn a click on a particular spot on the page into a click on a particular word of the ASCII text.
The first prototype of a text-image map exposed several issues that I will explore during the remainder of this semester. The issues seem to fall in three general categories: preparing the data, generating a storage format, defining an interface.
For the most part, existing documents that will be added to an electronic library will have to be scanned in and the scanned images used to produce the text; most documents are old enough that electronic text never existed or is prohibitively difficult to find or retrieve. Even when a document exists as eletronic text with the formatting codes necessary to produce an image, it is often difficult to capture the needed position information.
A small part of my research will involving learning how to use existing optical character recognition (OCR) software to turn scanned pages into text and to capture position information about the words of the text.