WWW'94 Trip Report

This is the trip report. I'm sorry it took so long to finish. I took a three-week vacation starting at the end of June and that got me sidetracked. I talk about a few different things in the report, in this order:

1. A report on the introductory talks

2. Several short reports on interesting talks and on the workshops I attended

3. A mention of some of the more interesting papers (or people) I met, which didn't really fit in part 2

Jeremy Hylton
Library 2000

Introductory Talks

David Chaum, CWI and DigiCash
Effective rules in cyberspace
Today, people have one contact for each service. They have seperate accounts with the bank, the insurance company, the cable company, etc. He sees two possible futures for services in cyberspace. The first, which is advocated by people like Barry Diller, is that a select group of services (cable companies, phone companies, etc.) will offer "head-end" services; they will act as intermediary between users and all the services they want to use. The second option is an open network approach where users are free to communicate directly with the service providers.

So Chaum asked if in the future, will we really have an open network or will it be controlled by a single point on the value chain? The answer, he said, depends on the infrastructure we build now. He suggested modeling our infrastructure on the solutions we have found in human societies, rather than in commercial markets. His "citizen model" implies, among other things, the existence of private communication and the ability to vote and take polls.

The citizen model relies on a trusted 3rd party, and public key cryptography allows the simulation of the trusted party without the actual existence of the party. However, he suggested that a hierarchical model based on a single digital signature for each person was bad. For example, billing using credit cards encrypted over the network is an example of the head-end approach, not an alternative to it.

Chaum offered a couple criticisms of digital signatures:
- Spoofing. People can keep messages and replay them.
- Signatures are permanent. We sign some bits to authenticate them, but then we lose control over them. Do we like the idea of someone appearing 10 or 20 years later with a message you signed, particularly when that person can prove to anyone else that you signed them?

Some alternatives: - Undeniable signatures (forgot exactly what this means; I believe it is a technical term in the cryptography fiueld). - Anonymous payments / blind signatures The punchline is really that cryptography can be used the create digital pseudonyms and credentials. A person can arrange to create credentials that will verify him only to the person or persons he specifies. Chaum didn't explain any of the technical details, only what we could do assuming this is possible.

Joseph Hardin
NCSA Software Development
Hardin's talk was pretty vacuous. It sounded like he used a stock "about NCSA" talk. He did mention a couple areas where the Mosaic team was going to be working:
- an API for IPC development, so that Mosaic can communicate with other applications, Acrobat for example - a citation system (URC?) for the Web.

Tim Berners-Lee

The talks got started late, and Chaum and Hardin used more time than was alloted to them, so Tim's remarks were cut short. He discussed some areas for future development:

1. Semantic objects. Right now the Web is just a bunch of documents, without any semantics. We need semantics, because "semantics allow machines to manipulate reality." Semantic objects could allow:
- logical arguments (link implies agreement, disagreement)
- dependencies (project control)
- physical position
- structure of organizations

Some of the difficulties of semantics:
- keeping it person-friendly, i.e. right now people can write HTML or write a URL on the back of an envelope. Can the Web stay person-friendly in this way?
- defining a general model for these objects
- will it be stable?

2. A constitution for cyberspace.
- Example issue: GET and PUT should be idempotent. Only a POST should commit a user to anything.

3. Persistent names.

4. Real-time extentions to support video, IRC, MUDs, etc.

(2, 3, & 4 were areas he would have discussed if he had more time. Instead he just mentioned them as areas of interest.)

5. Dissemination versus collaboration, interaction. The Web was originally envisaged as a way for users to share and annotate information. It has become a means of disseminating information instead. Currently data tends to be controlled by a system administrator and any user interaction is mediated by the sys admin; this system limits the amount of interaction between users and overworks the sys admin.

Finally, Tim offered an unofficial announcement of the WWW consortium that will be run out of CERN and MIT. He was vague about specifics, said it would be a lot like the X consortium, and told corporate attendees to talk him immediately if they wanted to participate; he said a formal announcement and the actual creation of the consortium would probably be in July.

Lunch discussion with Dan Connolly (Hal) and Martijn Koster (Nexor) Connolly had two points to make:

1. Authors should be able to specify some end-to-end check that lets a user verify that he got the bits the author wanted him to get. Specifically, the links in a document should contain enough information for this kind of check, so that an author can sign a document and not worry about linked documents changing.

He suggest a URL that like: , where check was a Content-Length or a checksum or something like that. When a client program followed a link, it could use the check info to make sure that it got the same bits that the author linked to.

2. Authors should be able to specify several alternatives within a single link. For example, the author ranks URLs 1, 2, & 3 as acceptable instances of the document he wants to link to. The client program tries URL #1; if it is unavailable or has moved, or if the client doesn't understand the representation, then it tries link #2, then #3.

I think Tim offered a different model for checking that the right document was retrieved: Some authentication takes place between client and server. Once the two are authenticated to each other, the client can trust that the server would always return the same document. (I assume that this doesn't necessarily mean a URL always returns to the same bits, but that there is some guarantee that a URL will always resolve to the "same" thing, taking into account dynamic data and evoling documents.)

I suggested that Connolly really needed a better namespace than URLs. With a persistent naming system, the naming authority could guarantee that (1) a name always resolves to the same thing and (2) that a client has some alternatives for locating a document. Connolly was pretty strongly opposed to using anything other than URLs in links, because URLs were so easy and inexpensive to use and something like URNs would require an extra step to resolve, which would be far too costly.

Wednesday Workshops:

(1) WWW interoperability and (2) the role of SGML The first workshop was pretty much a loss. Connolly wanted to get a group of vendors together to specify in detail what HTML and HTTP should look like for version 1.0. The few vendors there agreed that having a more specific standard than Tim's drafts was important for successful commercial ventures. A great deal of the discussion centered around the important and building fully SGML-complient parsers and detecting and correcting improperly written HTML.

There was some overlap in the discussion between the two workshops. SGML and the document type descriptions (DTDs) seem very important to a lot of people. (I don't really know much about SGML; the understanding I've developed is that SGML describes a syntax for adding semantic notation to text but doesn't actually say anything about the semantics.) The second workshop included a number of publishers, including Elsivier, who were very interested in developing SGML editors and parsers.

(Later, while I was trying to figure out why SGML was so important to people, Tim mentioned that he would be just as happy if the Web dispensed with SGML entirely.)

During the discussion, I mentioned some of the difficulties that using text and images presents. Someone from Oxford (Robinson, I believe) mentioned the Text Encoding Intiative (TEI). He said he had early Chaucer manuscripts that were recorded as images and also transcribed in electronic form; the transcription included formatting that recorded the positions of the words in the image. The TEI sounds a bit over-specified: there is a 1,300 page documents that outlines the design of an SGML spec for TEI and describes the tags and their use. But it sounds like I should take a look at the work done there.

I suggested that within a library it seemed that there were many different kinds of links that should be generated on-the-fly, so that a user outlines an area and then says, look this up in the catalog, or give me a dictionary definition of this word, or treat this as a page number. A possible solution to this was worked on at the Microcosm project at the Univ. of Southampton. Effectively, microcosm maintains seperate data and link files (a lot like having data and resource forks in Mac files). Any number of links can be specified for a particular section of the source data file, and the interface allows users to select the link to follow.

Wednesday afternoon, hackers' roundtable

After regular sessions ended on Wednesday, Tim gather a bunch of the implementors and hackers for some drinks and discussion. I didn't really have the technical know-how and experience that most of these people did, but I tagged along anyway. There was a lot of interesting discussion; below is a brief outline of some of the issues raised:

- Proxy servers. Many clients ask a proxy server to retrieve a URL for them. A proxy server provides a clean mechanism for communication through a firewall, because clients don't need to know about the firewall; the use of something like SOCKS is limited to the proxy server. Because all URL resolution goes on through the proxy server, it allows provides cacheing of URLs. There are a couple good papers in proceedings about these servers, [Glass] and [Luoto].

- Detail: full URLs. A problem that was noticed when people started using proxy servers is that the HTTP request a server sees is short of a little information; when a user issues a request for , then client strips off the host, makes a connection to it, and issues the request "GET /ltt-www/foobar/". For proxy servers to work, clients need to be manipulated some so that they send a full URL to the proxy server. People seemed to conclude that future specs should require the HTTP request to look like "GET ltt-www.lcs.mit.edu /ltt-www/foobar". This way clients won't need to be modified to work with proxy servers, and it is easier to run several servers from one machine. Instead of requiring the extra service identifier "/ltt-www/", the machine running several servers can use the host name to determine the proper service. So a "GET /" can resolve to different files depending on whether the host is ltt-www or rr3 or whatever.

- A lot of people would like to see a WWW tool that combines mail reading and news reading. The tool should provide an index of mail and news, a good interface for reading them, a way of following links between mail and news, and an HTML editor for generating messages.

- Drag-and-drop interfaces. There is a lot of interest in browsers that have this kind of interface. For example, an ltt user could outline an area of an image and drag it onto the "Lookup in catalog" icon or the "Page number" icon, as needed. In the news/mail browser, you could compose a message and then drop it onto icons for newsgroups to post it to or addresses to mail it to.

- Real-time extensions. Tim and Dave Raggett [Ragge] are interested in using the Web as a transport means for virtual reality and MUDs. Phil Hallem-Baker [Halle] also sees this as the future of the Web. One of Hallem-Baker's idea for a commercial service is to set up a text-based MUD that has optional sound and graphics extensions; to get the sound and graphics, one subscribes to a service that sends a CD each month with new bits. Users can access the MUD for free, but pay for the CDs. Other real-time use is video conferencing.

- There was some more discussion of the need for semantic information in the links and in the HTML documents. There didn't seem to be much concrete discussion about this, though.

Workshop on the same information via multiple services

I gave a talk on The Tech's server at this workshop. There were really only a few people in attendance and it seemed to be the workshop least focused on the Web. The attendees seemed most interested in providing library-like services -- for example, collecting all the manuals and documentation for computer systems at the Univ. of Trieste Observatory -- but they had little experience in the area.

Workshop on electronic publishing

I caught the second half of Steve Pemberton's workshop of electronic publishing. The workshop drew an interesting mix of people with either technical background or publishing experience. There was a good discussion of the distinction between being an information provider and an archiver. Pemberton said he plans to put together a detailed report of this discussion -- and I jotted down only a long list of topics mentioned -- so I'll omit that list in favor of his report.

Reception of text-image map paper

The schedule of workshops was changed at the last minute and the workshop on HTML+, where I was going to give the talk on text-image maps, was rescheduled for Friday morning, right about the time I left for the airport.

I did have a couple conversations with Dave Ragett, who organized the workshop and is heading the HTML+ effort, about the paper. He agreed that the ISMAP interface that is currently implemented is a little limited. Dave was pretty enthusiastic about the paper and said he planned to add some of my suggestions to the HTML+ spec. He also said he planned to make some mention of my paper at his talk in the main auditorium. (He gave that talk after I left, too.)

I also had a conversation with Andy Whitcroft and Tim Wilkinson of City University (UK) about the use of typed links and how they could be useful for text-image maps. In library applications, each piece of text should be linked to several things -- each word could have a dictionary entry, some groups of words constitute citations, etc. If each HREF link had a type and a particular hotspot was part of more than one link, the browser could present a list of link types and the user could choose which one to follow.

Notes on people I met and their papers

[Glass]. Steve Glassman. A Caching Relay for the World Wide Web. Works at DEC SRC. Designed a caching server for the DEC firewall. Good technical content in the paper.

[Luoto] Ari Luotonen and Kevin Altis. World-Wide Web Proxies. Ari is a grad student who is working on the CERN httpd server for a year. Kevin is a big WWW proponent at Intel; works at the multimedia division in Oregon. The paper describes the CERN server, which the two of them did a lot of the work on.

[Sato] Shin-ya Sato. Dynamic Rewriting of HTML Documents. from Nippon Telephone and Telegraph research lab. Described representing HTML documents as LISP objects; these objects can presented be dynamically rearranged to cater to different kinds of reading styles, e.g. a top node with links to each chapter or a single document incoporating all the chapters; the objects can also contain information for features like ISMAP. Servers could also send the LISP objects to the client, where the client could perform some computation on the object at the user's request, which cuts down on the number of requests needed.

[Ragge]. Dave Raggest. A Review of the HTML+ Document Format. Dave works at HP Labs in England; hopes to come to CERN through HP and work on the W3 consortium. Is developing the specification for HTML+ (which is really HTML version 2.0) and writing a browser for it.

[Halle] Phillip Hallem-Baker. Co-organizer of workshops on HTML+ (with Dave Raggett) and on specialied servers (with the TNS group). A CERN research, isn't officially part of the WWW group. He is working a lot of user authentication issues. Research interests include formal methods and cryptography. Has written WWW libraries, a server, and a browser in an FSM language he wrote; has a compiler for the language that produces C as output.

Return to Library 2000 home page.