Date: Wed, 6 Jul 1994 15:49:19 -0400 From: yoav@reading-room-4.lcs.mit.edu Message-Id: <9407061949.AA15247@reading-room-4.lcs.mit.edu> Subject: replicationHi,
I'm going to make an attempt to explain what Mitchell and I have devised as a possible concept for a replication system. Please forgive me if I mis-explain things or leave stuff out. I think the best way to explain and develop this would be via continuous email, so here is the starting point.:
By the way, Neither Mitchell and I see this as the final draft, or agree with the complete concept as is written here, but this is just intended to start up discussion.
yoav
In our design of the system, we had solved several problems by forcing things on the system. Firstly, we chose not to assume that a system would be secure on the large time frame we were
considering. We therefore chose to make all itmes stored in the system immutable (although we came up with several schemes to support mutations when needed). Secondly, we chose to design a scheme which would allow for consistent namespace across all replicas without the need for centralized name-servers or concurrency. We therefore chose to use something unique about the file for an identifier.
Every object in this replicated system will be identified by a special signature. There are two basic assumptions for this scheme to work. Firstly, the scheme must be designed so that the probability of an identifier collision is extremely low (the universe is more likely to end before a collision would occur.. etc.). Secondly, the scheme must be non-reversible at the time of the file's deposit. To achieve this, the signature will consist of two parts. The first is a signature scheme identifier, and the second is the actual signature. The identifier should refer to a standardized scheme, while the signature should be the cryptographical signature generated by it. In short then, every object is a pair of signature and associated bits.
Finally, before deliving into the implementation, one final issue was discussed in order to explain the operations of the system. We tried to understand the concept of "trust" and its implications. We defined trust to be the replica server's concept of the outside world (so that it may decide which replica's votes it will consider with possibly what weights etc.) Using this, we envisioned the possibility of selective replication, and realms of replicas. There are, however, many complex issues that arise with these, and we probably have not analyzed them fully yet.
Now, I shall delve into the implementation and operation of the replication system:
Firstly, the static scenario:
In the static model, no file is intentionally added or deleted, all changes are errors. The following errors, therefore, can occur:
Now, for the dynamic scenario:
In the dynamic scenario, files are added to the system (though not deleted) by users and third parties, as well as by the servers themselves. For this scenario, I will analyze the possible situations (from a single server's point of view) and the remedial action to be taken:
There are of course other things needed to ensure correct operation. The server needs to regularly traverse its entire filesystem to detect possible errors. It also needs to check other trusted servers' lists of files to determine possible missing files. Also we would like to place a denial of service watchdog somewhere in the access path on the server (most likely at the presentation end, though this could be changed).
Theoretically, this is enough to ensure reliable replication (though it should of course be tested first). It has been proposed that every {long time} a complete dump might be used to ensure correctness (mostly of the implementations of algorithms). This might be better supported through third parties which can inform sites of possible bugs, though if chosen, i could add that to a possible implementation.
Third Party Services:
To make this replication system act more like systems that are familiar to the world, several third party services have been proposed to improve functionality for the cost of greater complexity and possibility of failure.
The first and most obvious is a Unix Filesystem to Replicated File System mapper. Ultimately, I envision this service operating just like NFS, where a request for a file is sent out, and the file is returned, all without the user aware of all the mappings and signatures etc. This is also one of the first third party services I would like to implement, as it is necessary for our use of this system. The extra file information can be stored as part of the "bits" and some simple encryption mechanism might be used to insure modest security (though not reliable over the large time frame).
Another possibly useful service is a "latest revision" service. This helps sites that would like to provide reliable services, but that depend on some database file they store and could lose. This service receives regular reports of which services would like which id's associated with their latest files. (There are some security issues to address, but solutions for those already exist). The likelihood of both the user and service going down at the same general times is low, and even so, the worst that could happen would be a delay while the server does an exhaustive search on the files. Since this is VERY rare, the tradeoff is worth it.
Many other services could be based on this system, but for now, I think I'd like to build the base system first.
Finally, the protocol design:
Although we haven't designed the protocol fully, the following must be supported:
1) Queries about the server: Media type (?) servers it trusts. signature schemes it supports. (deposits / reads) available storage space (?) 2) Queries from the server: list of all objects an object by signature existence of object by signature (?)I think this is enough information to start a conversation as to the functioning of the system. Please ask about any parts I have not made clear or that might have some conceptual flaw..
tanks..
yoav