Pastry Review from Troy Ronda on 2005-11-07 (mbox)

From: Troy Ronda <ronda_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Mon, 7 Nov 2005 10:53:10 -0500

Pastry Review
Review by: Troy Ronda

The first generation of P2P applications, such as Napster, still used a
centralized database. The next generation of P2P decentralizes the
database. Each node maintains a portion of the data (and their keys). Each
node has identical capabilities and responsibilities and all communication
is symmetric. One of the key challenges (also stated in Chord) is to
provide an efficient algorithm for data lookups and routing. Pastry
provides a substrate for P2P data lookups that other applications can
build upon. Each node, in Pastry, has a unique numeric 128-bit identifier;
it is assigned randomly (or from a hashing function or any other
application defined method) when a node joins. Each key also has a numeric
identifier. Pastry routes messages to the node that has the closest
numeric value to a key. That is: during routing, a node will forward a
data message to a node whose identifier shares a prefix with the data key
at least one digit longer than the current node. If this is not possible
then the message is forwarded to another node with the same prefix but
whose identifier is numerically closer to the data key. It is expected
that the number of routing steps is O(log N), like Chord. In order to
accomplish this routing, Pastry maintains a routing table organized by
each n-length prefix of the nodes identifier along with the possible
entries for the n+1 digit and an IP address will be the value for each of
these entries. It also maintains a leaf set of the closest identifiers
above and below the current nodes identifier. Pastry also maintains a
closest-neighbor list so that network distance can also be used for
routing. The routing algorithm first checks if the key is in the range of
the leaf set. If so, the message is forwarded to the node in this leaf set
with numeric identifier value closest to the key. If not, it is sent to
the closest entry in the routing table. The self-organization is similar
to Chord, in that: the leaf, routing and neighborhood tables are based on
an existing node in the network. The difference is that Pastry also
updates its routing table, based also on information from nodes in its
neighborhood set, to better reflect some application defined network
proximity.

The Pastry method seems to have more flexibility than the Chord method
because "successor" (using this Chord term loosely, of course) is not so
rigidly defined with the Pastry identifiers. Pastry, for example, takes
into account network locality by adjusting nodes in its routing table.
Chord cannot handle this problem because there can only be one successor.
Properties, such as the identifier creation and network proximity metric,
are defined by the application using the Pastry substrate. This is another
example of its good flexibility. The paper also discusses malicious hosts
and suggests randomizing the selection of the node (from the set of nodes
with the closest prefix) as a possible solution. The empirical results
provided in this paper are very strong and intuitive. I had a much easier
time following these results than the Chord ones.

One of the weaknesses in this paper is the lack of proofs. No proof, for
example, is given for routing performance: only "it can be shown that".
Also
no proof is given for the network proximity performance, which I view as
an important difference between Chord and Pastry. A stronger experiment
proving that Pastry outperforms Chord for network proximity should be
considered. It would also be interesting if keys could be redistributed
due to demand. Perhaps some area of the network is using values more than
another area (according to the proximity values). This redistribution
should be possible, partly due to the flexibility of identifiers in
Pastry. It would also be good to know why the identifiers could not be
based more on the proximity value (yes, it is application defined but a
discussion would be nice). Granted load balancing the values is important
for identifier selection but perhaps some combination could be used (this
is a rough idea). The Pastry method is also not as intuitive to me as
Chord, not that this is a weakness but interesting to point out. I also
think the authors should have spent more time on the parameter selections.
Most of the experiments used the same set of parameters and suddenly one
of the graphs has a different set.
Received on Mon Nov 07 2005 - 10:53:20 EST

This archive was generated by hypermail 2.2.0 : Mon Nov 07 2005 - 10:53:21 EST