Section: New Results

System support: Protocols for resilient systems

Participants : Vivien Quéma, Alessio Pace.

We have worked on replication protocols for P2P systems. In particular, we have worked on replication in so called Distributed Hash-Tables (DHTs). DHTs provide a simple high-level put/get abstraction that can be used to build efficient distributed storage systems. DHTs gained wide popularity in the last decade, fostering a large amount of interest in the academia, and inspiring the design of key/value distributed storage systems deployed in production.

DHTs provide a way to deterministically map objects to nodes and allow efficiently retrieving objects in a distributed fashion. Nodes and objects are logically arranged in a large numeric key-space, according to a given variant of consistent hashing. Typically, the node in charge of an object is the one whose position immediately follows the object in the key-space.

To guarantee that objects are reliably stored, DHTs rely on replication. A replication protocol is in charge of ensuring that, at any time, each object is replicated on a sufficiently large number of replicas. Several replication strategies have been proposed in the last years. The most efficient ones use predictions about the availability of nodes to reduce the number of object migrations that need to be performed: objects are preferably stored on highly available nodes.

We have proposed an alternative replication strategy. Rather than exploiting highly available nodes, we have designed a protocol that leverages nodes that exhibit regularity in their connection pattern. Roughly speaking, the strategy consists in replicating each object on a set of nodes that is built in such a way that, with high probability, at any time, there are always at least k nodes in the set that are available. We have evaluated this new replication strategy using traces of two real-world systems: eDonkey and Skype. Our evaluation showed that our regularity-based replication strategy induces a systematically lower network usage than existing state of the art replication strategies. This work has been published at the International Symposium on Reliable Distributed Systems, in October 2011.