Chord Review from Troy Ronda on 2005-11-07 (mbox)

From: Troy Ronda <ronda_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Mon, 7 Nov 2005 10:50:14 -0500

Chord Review
Review By: Troy Ronda

A P2P system is a distributed system where all nodes run software with
equivalent functionality and do not rely on centralized control. In these
systems, the main operation is efficient lookup of data, despite having
no central database. P2P systems must also handle the frequent arrival and
departure of nodes. They must also, for both routing efficiency and
scalability, load balance the amount of data each node contains. Chord
provides substrate to other applications desiring P2P functionality. Its
only function is to map a values key to one of the systems nodes. It has
provable correctness and performance. Chord uses consistent hashing to
load balance (with high probability) key values to nodes. In fact, with
high probability, when a node leaves the system, O(1/N) fraction of keys
are moved to another node. Each node must maintain information about O(log
N) other nodes. Each lookup requires O(log N) messages. Each node and
data key is hashed using SHA-1 with length m. The node hashes are ordered
in a circle modulo m. Keys are assigned to the first node whose hash is
equal to or follows the hash of the key. In the circle analogy it is the
first node clockwise from the hash of the key. This node is termed the
successor to the key. Each node must maintain the successor and
predecessor nodes. The simplest method to lookup a key is to pass the
request around the entire circle. In order to achieve the O(log N) bound,
nodes must also maintain a finger table with exponentially increasing
successors. Correctness proofs and algorithms are also given for nodes
joining and leaving the system.

This paper is proof that centralized or hierarchical control of a
distributed system is not necessary to maintain efficient lookups. DNS,
for
example, could be restructured using Chord to eliminate the root servers
requirement. The concept of passing messages around a ring is very simple.
Even the idea of using a finger table to obtain usable efficiency is
simple. This must be considered a strength as implementation should not be
too challenging. In fact, the authors even supply code so that anyone can
use Chord as their routing substrate. This paper also gives proofs of the
stated bounds, which gives the paper teeth. The paper also provides
empirical simulations that backup the earlier proofs. The algorithm does
not break, only degrades performance, when a nodes information is
incorrect. It was also helpful to read the example applications for which
Chord can be employed: Cooperative mirroring, time-shared storage,
distributed indexes, and large-scale combinatorial search. I have thought
of several of these problems in the past and here is a paper that shows
how to solve (or at least route) them.

Chord does not exploit network locality, as stated in both the Chord and
Pastry papers (i.e. Chord does not attempt to reduce the distance that
messages need to travel). This is an important property for a routing
protocol to support. Another item, stated under future work, deserves
attention, even in this review. Malicious or buggy Chord nodes could
threaten the availability of data in the current Chord algorithm. I liked
the empirical analysis in the Pastry paper much better than the one in
Chord (i.e. The results were much easier to read in Pastry, possibly due
to the style of the graphs). With NAT many IP addresses could be the same.
Would it be a good idea to use the port number as part of the hash? This
is not particularly important. In the Pastry paper, we will also see a
less rigid definition for successors, which will help exploit properties
like network distance. It is hard to come up with further ideas as the
authors have stated many of them already.
Received on Mon Nov 07 2005 - 10:50:31 EST

This archive was generated by hypermail 2.2.0 : Mon Nov 07 2005 - 10:50:31 EST