Mitchell's Replication Notes, Sept 6

This note is an attempt to understand the research contributions we hope to make in replication.

I know of these areas:

  1. Replication and long term storage with technological obsolescence.
  2. Replication reliability by means of an `obviously correct' system.
  3. Local verification of replication system function.
  4. ... as yet unknown things which will turn up as we implement ...
Elaborations:

Long term storage has implications such as shifting storage and encoding standards. These have implications for a replication system. Also, replication can help address them. Examples: encoding change complicates integrity and identity testing; removal of an obsolete disk can simply be treated as a disk failure.

It seems intuitive that one can maintain a replica by watching for changes and comparing it with other replicas. And that this can be done with an algorithm of such simplicity that its correctness is clear and its misimplementation unlikely. And that the emergent system behaviors are also simple and correct. Proof would be nice. :)

How is the correct operation of a replication system determined? Who is in a position to make claims? One approach is - A replica makes a claim of reliability based on its own integrity and on constant testing of those replicas it depends upon for recovery in the event of its own failure. Others can contribute to the estimate of its own integrity. And one can go a level further and have it test others' testing.

The replica's claim is an experimental one. Rather than depending on an out-of-band trust in the correctness of implementations at other sites, it tests them, and merely requires that any byzantiness be limited to few enough sites. With this model, one can go on to have an open system which anyone can join, with sites which are partial replicas, give varied service guarantees, and other amusements. Model may collapse if cant justify experimental principle - ie, if failures are not augured. May have to do something weird like prohibiting high reliability subsystems in order to maintain experience with failure.

Implementing something forces thoughts to precision, and tends to turn up interestingness hiding in what used to be ambiguity. Thus one can implement for the purpose of turning up interesting thoughts.

Todo:

Perhaps - (1) needs a writeup to clarify what work, if any, still needs to go into it. (2) needs a spec, a proof, and an implementation. (3) needs detailed elaboration to determine how interesting it is, and whether to proceed to spec. (4) is most likely an artifact of (2) or (3), though could be pursued independently.

Pointers:

Kom's thesis. Yoav's summer notes. Yoav. Jerry. Me.

Mitchell


Last Update: 13 Sept 1994    Return to Library 2000 home page.