Text-Image Maps:
Coordinating Scanned Images with Raw Text

Jeremy Hylton, jerhy@lcs.mit.edu
Last Update: 2 Feb 1995

During IAP '94 I developed a rough prototype of a system that coordinated bitmapped images of technical report pages with an ASCII file containing the text of those images; the prototype used what I call a "text-image map." In the future I hope to flesh out that prototype and develop a good representation for the text-image maps.

I also prepared a discussion of text-image maps with a slightly different emphasis for the Workshop on HTML+ at the First International Conference on the World-Wide Web. The position paper is available online.

An electronic library may store two different representations of documents -- the ASCII text and the images of the pages. Each contains valuable information. The ASCII text, to a large extent, represents the primary information content of the document, the words themselves; the text can be stored cheaply and indexed easily. People are used to reading _pages_ of text, which have a particular physical format that itself conveys information; documents can contain information not representable as text, like photographs and diagrams. While ASCII text may be the computer's preferred way of dealing with documents, people prefer the physically familiar page images.

If text and images are each important in some situations, a means of coordinating the information contained in each is equally important. The text-image map provides that coordination: letting a user view images of pages and perform operations on them, but letting the computer turn a click on a particular spot on the page into a click on a particular word of the ASCII text.

The first prototype of a text-image map exposed several issues that I will explore during the remainder of this semester. The issues seem to fall in three general categories: preparing the data, generating a storage format, defining an interface.

I. Preparing the data

To create a text-image map, we need both text and image, but getting a document in both forms is somewhat difficult, and generating the information that links the two is harder. Specifically, to build a text-image map requires knowing the exact location on the page of each word.

For the most part, existing documents that will be added to an electronic library will have to be scanned in and the scanned images used to produce the text; most documents are old enough that electronic text never existed or is prohibitively difficult to find or retrieve. Even when a document exists as eletronic text with the formatting codes necessary to produce an image, it is often difficult to capture the needed position information.

A small part of my research will involving learning how to use existing optical character recognition (OCR) software to turn scanned pages into text and to capture position information about the words of the text.

II. Generating a storage format

Once we know all the words and their positions on the pages, we need to store that information in a way that makes it easy to (1) turn a reference to page location into a word and (2) turn a reference to a word into its location on the page. The storage format should allow both transformations to take place quickly and should take as little storage space as possible. It may be the case that the text-image map contains enough information about the text that there is no need to store the text separately from the map.

III. Defining an interface

There are really two problems here. (1) There needs to be page image viewer that allows users to exploit the page image map. At the very minimum, users should be able to highlight several words on a page image and use a cut-and-paste feature to paste the words highlighted into a text editor. Unfortunately, there are few (if any) applications that allow users to highlight regions of images in this way. Some work must eventually be put into developing a user application that exploits the page-image maps. (2) There must be a common set of commands that programmers use to perform transformations using text-image maps. Some thought must be put into a standard collection of commands that defines the functional appearance of a text-image map.


Return to Library 2000 home page.