Notes on architecture, by Mitchell Charity. Fall, 1994.

A big, picture document.

 * Contents

 data model - some thoughts
 board capture - Oct 20, data model bottom up
 board capture - Oct 18, functional decompositino
 What paper are we now quickly writing?
 naming - Oct 18, board capture
 overview doc - version mcharity Oct 06 15:53
 data model (v.0)
 Sep 21 high-level outline


 * data model - some thoughts

[converted stars to +'s]
From mcharity Thu Oct 20 15:28:00 EDT 1994
Subject: data model - some thoughts
X-Phone: NE43-512:(617)253-6023  fax:258-8682  home:497-1506

I scribbled on the board a bit this morning.
Here are some thoughts on creating a data model.

The approach here is functional decomposition, and subsequent
recombination.  I sketch the idea of decomposition, its encoding and
recombination, and parts of a decomposition for our data model.

 + Identify and describe modules of functionality (cluster the data)

 + Encoding and recombination

 - A note on excessive flattening
 There is the temptation to take multiple functional chunks and to
encode them with a single, flat, bag of name-value pairs.  What is
going on here, and is it desirable?  Consider two functional chunks,
each represented as a bag of name-value pairs.  Merging them into a
single bag can be seen as inheritance, with its various associated
issues.  There is no longer a simple way to refer to or reconstitute
the functional chunks.  Names in the two bags may collide.  Doing
something meaningful with collisions across functional boundaries is
difficult.  Avoidance of collisions requires either similarly
difficult coordination, or creating new names which encode the
originating chunk.  This latter is simply a somewhat crufty way of
_not_ merging them.
 The conclusion?  Probably insufficiently motivated above?  Expect
extensive explicit nesting.

 - A note on indirection
 Conclusion - expect a general model of values being allowed to be
both real-values and value-references (identifiers like urns and urls,
and pairs of identifier,subpart-selection).
 Argument - one wants the possibility of indirection whenever a value
(such as a name-value-pair bag addressing some function) appears more
than once, or when one wishes to permit multiple providers.
Efficiency is not a counter argument - one may include/cache a version
of the object along with the identifier.

 - A note on keeping attributes well defined
 Attribute lists can have the problem of the attributes being
attributes of different objects, sometimes subtly different.  One
cause of this is excessive flattening.
 For illustration, in a bibliographic record, the "author" may be an
attribute of the work, "edition" of a version of the work, and "pages"
of a particular printing.  Trying to associate such a record with some
real or abstract document is trying.
 This suggests that a well formed attribute list is a tuple of
 (I was temped to call the first part the object-object, but resisted.)
Equivalently, written with a lis'thp:
 (attribute-list  (...))

 - A note on associating attributes with objects

[Ah well, mind is gone.  The rest of this doc is cursory.]

 Attributes can be connect to objects externally, as above, or
"in-line", with an attribute explicitly identifying the object-object.
External has advantage of being able to refer to the attributes
explicitly and regarding multiple objects.

 - A note on variable hiding (and explicit inheritance)
 Ex: biblio for a doc, with a modifying a biblio for a version.

 + Some clusters

[The following is unannotated board capture, provided for whatever it
is worth.]

  data description (things which might be also present in object)
  data origin
  data analysis (ex: page breaks in ascii ocr)
  linkage to doc description (page->image map)
  doc description
  real-page characteristic/space/spec
  abstract -> real-page map
  doc structure
  doc structure -> page (abstract or real?)
  things to do with biblio...
  author authorities in biblio


 * board capture - Oct 20, data model bottom up

outline of a note on data model, bottom-up
 clusters (of data based on function)
  excessive flattening (inheritance)
  indirection (bob, urn, or <- + modifier)
  attrib-val-pair bag  "(avpb refered-to-object (avps...))"
  simple bag
  attachment - inline vs taped together
  variable hiding (ex: biblio for doc, then modifying biblio for version)
    explicit inheretence
 some clusters
  data description (things which might be also present in object)
  data origen
  data analysis (ex: page breaks in ascii ocr)
  linkage to doc description (page->image map)
  doc description
  real-page characteristic/space/spec
  abstract -> real-page map
  doc structure
  doc structure -> page (abstract or real?)
  things to do with biblio...
  author authorities in biblio

 * board capture - Oct 18, functional decomposition

 classification of names:
  nonresolvable - deadend - dont go further.
 to resolve a name, need
  recognize can use resolution mechanism
  lookup mechanism
  data maint, perhaps distributed
. bagging accessors
. accessors (1-1 access pathts = urls)
re "."...
 identify and describe modules of functionality
  change spacial/temporal properties
  doc meta data
  doc rep meta data
  space translation

 * What paper are we now quickly writing?

From mcharity Tue Oct 18 21:58:59 EDT 1994
Subject: What paper are we now quickly writing?
X-Phone: NE43-512:(617)253-6023  fax:258-8682  home:497-1506

Jeremy and I just talked for a few hours.  Here are some quick notes.
The hour is late.  I have tried for coherence.  Apologies for lapses.

It seems a functional decomposition (identifying and describing needed
nuggets of functinality) is useful in understanding the design space
(the range of posible system designs).  Two examples of nuggets are:
 - "branching" function - giving a name and getting back a bunch of
names which have roughly equivalent meaning.  For instance, trading a
urn for a url, when both identify the same object, and the url differs
only in being valid for a shorter time.  (This was a bad example as it
illustrates a combination of "branching" and "namespace translation".
Better something which mapped from a urn to other urns, say ones for
commercial rapid delivery of the same stuff.)
 - "document meta data" - sortof biblio info (though it looks like the
two should be separated out).
 So what does this model do for us?  It helps one say things like -
`The current URC spec is broken because it combines branching and doc
meta data in one seemless lump.  The two have functions have different
constraints, and if/when they are both stored in a one text object, as
with URCs, they must be separable.  Further, separating the various
needs makes it obvious that foo has been missed, and bar done

But thats not what I wanted to write about.

As we went to close down the conversation, it became clear that we
(atleast I) was unclear on just what the scope of "the paper" was.
A _brief_ discussion yielded the following...

There seem to be a variety of objectives in the air.  One of them
seems to be to get a "paper" out to cs-tr by yesterday.  But what
should it cover?

One breakdown of the range of pending papers is:

  intro paper
    big picture (ala inria)
    the library-like subproblem ("research slice")
    our architecture decomposition
      brief justification of notion
      navigation (naming)
      authorization (included as demand great, but handwaved away
                     with an encryption or capability (ie, its mainly
                     someone else's problem) argument)
    illustrative example (perhaps should integrate with above)
      including social context
        the publisher+library+consumer model in particular
        this is a `satisfies actual needs' issue

  full architecture in detail paper
  needs / design space analysis paper
  subparts in great/research detail papers
    replication, mumble, ...

So is the outline labled "intro paper " above an introductory paper,
or merely the introduction _to_ an introductory paper?

If we are putting together a done-by-yesterday intro paper, what is
the thrust?  Posibilities include:
 - a store/discover/nav decomposition is good and here's how a full
system might be built with them
 - above, plus discussion of stuff like other services.
 - above, plus input/output specs of modules.  Forces resolution of
some significant choices which can be glossed over above.
 - decomposition is good (why/how/implications). here's an example.
 - needs analysis for a libraryish system.  tradeoffs.  example.

Have come out in rough order of difficulty.

  In so far as we are currently trying to get "a paper" written,
  what/which paper is it?


 * naming - Oct 18 board capture

 top view
   tr lib data model
   different parts
   other library stuff

 naming ---
  compatible vs incompatible spaces
  syntax compatibility
  syntax compatibility as test of `what is a name'
   urns urls handles authors
  atleast one
   pro , con
  atmost one
   pro , con
  to same/diff stuff
  standard vs flexible typing
   : 1 unbount
     2 typed
     3 std/fixed/limited typed
   : self typing or no?
   comparison of distribution vs aliasing
    various - series, docs, versions, ...
    other meta data
    obj rep

 * overview doc - version mcharity Oct 06 15:53

From mcharity Thu Oct  6 15:58:31 EDT 1994
Subject: overview doc - version mcharity Oct 06 15:53
X-Phone: NE43-512:(617)253-6023  fax:258-8682  home:497-1506

Should we be having another meeting to discuss the big picture?
It has been a few weeks now since the last one.


If people would find this more convenient as a web page than as email,
let me know...

----[version Oct 06 15:53]----
Our information infrastructure is improving, enabling the unbundling
of information services, and the specialization of their providers.
Specialization allows organizations to concentrate on addressing
specific narrow needs.  An environment which supports specialization
allows competing implementations and evolutionary improvement.  The
architectural challenge is to engineer a system which permits the
specialization of services, and their gracefull composition into large
evolving systems.

What infrastructure technology is changing, and how does it enable
specialization?  Quality display, inexpensive computes, and improving
communication make it possible to deliver services to end user
equipment.  Inexpensive disk, ram, and computes make it technically
inexpensive to offer information services.  Inexpensive quality
communication means those services need not be centralized. <...>

But what are some examples of specialization, and how does one
determine where to cleave existing services into specialized parts?
Following are some examples, grouped by the type of cleavage, and
followed by a general discussion of cleavage selection.

One can specialize based on the kind of service.  For example, one can
separate storage from discovery, storage from authorization, and
namespace maintenance from name resolution.
 * Separating storage from discovery
 * Separating storage from authorization
 * Separating namespace maintenance from name resolution

One can specialize based on the "class of service" (Network-speak - is
there a more general way to say this?)
 * Separation of storage services optimized for persistence,
   availability, and speed.

One can vertically dis-integrate / decompose the creation of data.
 * Separate out "authority" (in the library sense)
 * Separate out quality control
 * Separate out format conversion
 * Separate out storage, distribution and awareness

    linkage (commerce, data conversion, protocol diplomacy)

   (a sideeffect of specialization)



 * data model (v.0)

From mcharity Fri Oct  7 12:40:13 EDT 1994
Subject: data model (v.0)
X-Phone: NE43-512:(617)253-6023  fax:258-8682  home:497-1506

     Data, data, everywhere, and not a bit to flip.
        - from C Rime of the Ancient Mutator
        (or `a gremlin's frustrations with write-only persistent storage')

Well, its a bright and early on a sunny (hence the bright) Friday morn,
so it must be time for...

Gauging an OO data model (sorry)

This is an attempt to first explore the cognitive units in current
use, and then derive data types and functions.  It is roughly
organized into the administrative context, the artifact, ...

* The Administrative Context

Series, and publishers, and reports, oh my.

 - Series

What is a series?  Here is a simple model, and a general one.

A simple model is an ordered set of sequentially numbered reports.
Something with reports named along the lines of SERIES-.

But as one looks around, exceptions accumulate.

MIT AI lab has a single number sequence which contains both AI Memos
and AI TRs.  So there are AIM- and AI-TR-, but never
with the same number.  These could be viewed, and in some
administrative respects are, as a single typed series.  But officially
they are two series, sharing a sequence.  Then again, this may change
at the whim of the publications office(/person).  One series or two?

LCS has an unnumbered TR, "Barriers to Equality".  It is not ordered
with respect to the usual LCS-TR-'s.

So a general model is a partially ordered set (bag?), of which one can
only, in general, ask whether a report is a member.

 - Organizations and series

Organizations may have multiple series.

Series may have multiple organizations.  The set (ordered) of
organizations may vary with each report, though often (usually?)
there are one or more primary organizations which are generally

 - Report

Consider a report called LCS-TR-278.

One day, a report labeled LCS-TR-278.b comes out.  This suggest the
original was implicitly LCS-TR-278.a.

One day, you notice LCS-TR-278.b now has a different number of pages
and has been edited since you first got it.

So it looks like there are concepts
  Report-version  (Acknowledged-version?)
  Report-body     (Version?)

 - Reports and series

But wait, there is more.

A report may come out under multiple series.  The series may share the
entire document, or have distinct covers and share only the body.  The
series may be of grossly different organizations (w collaborations).

 - Reports and bibliographic information

Bibliographic information includes a mix of acknowledged information
and artifact description.  So while it often describes a version, the
body may drift out of sync, affecting fields like page count.

* The Artifact

So what about the Phys-Ob?

 - PostScript

One has the PostScript file of a report.  One has meta information
such as "this was generated using dvi2ps, so dont expect it to work
with ghostscript".  One has meta information such as "this is a
faithful copy of the report sans the figure on page 5 which was cut
and pasted".

Sometimes you have two PostScript files, as when the report is long or
multiple generators were used.  Some meta information is specific, and
some is shared.

Sometimes you have two PostScript files, and the are subtly different
versions of the same report.  One color and one grayscale, with the
text ("the .red.gray. line") modified to match.

 - A meta detour

Associated with documents is various structural meta information.
Pages are named ("i,ii,1,2,...").  There are gross divisions ("TOC,
Chapter1,Bibliography"), and fine divisions (footnotes, figures,
citations).  All(?) are in users' concept spaces, and thus things they
may wish explicit.

One thus builds mappings between these conceptual structures and
document encodings.

PostScript, printed on paper pages, is perforce proper.
But as the media changes, things become less simple.

 - Images

Images force several issues latent with PostScript.

[Brach.  Got distracted by cruft.  Will get back to this...]

(- Scanned text)?
(- Attachments/multimedia)?
* Services
* Glue
* A straw model (with glue)

Anyone want to play volleyball?


 * Sep 21 high-level outline

From mcharity Wed Sep 21 12:16:27 EDT 1994
Subject: high-level outline
X-Phone: NE43-512:(617)253-6023  fax:258-8682  home:497-1506

Some brief thoughts...

display+communication+`powerful pcs'  =>
  `posible to deliver to end user equipment'

disk+ram => `very cheap to offer services'

communication => `those services need not be centralized'

decouplin/disentangling/decentralization =>
 `architecture requires exposure/shadowing mechanisms'.

`shadowing' => `requires ability to dump - interation or id enumeration'

`efficient shadowing' => `requires full dump alternative - whats-new'

Decouple how? - Here is a division based on mumble mumble...

  storage, index, client, namestuffs,
  derivative services (third-party resell, proxys),
  linkage (commerce, data conversion, protocol diplomacy)


Return to
Library 2000 home page.