review of Pick Two

From: Guoli Li <gli_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Sun, 13 Nov 2005 23:01:05 -0500

This paper argues that the bandwidth to process queries and maintain file
redundancy in p2p systems is the key limitation of large-scale p2p
storage. The authors propose a simple resource usage model of the
bandwidth for redundancy maintenance. Base on this model, they distinguish
the short-term and long-term offline peers, and compare the redundancy
needed by replication and erasure coding approaches. Their result shows
that erasure codes use less storage and bandwidth than replication to
achieve the same availability.

This work has the several assumptions that are common in p2p systems, such
as identical peers, a constant rate of peer joining and leaving, and
independence of peer behaviors. We have discussed limitations of these
assumptions. I want to discuss another assumption here. This paper assumes
that files stored in the p2p system are totally static. This is not true
in real systems. Files maybe updated. Extra bandwidth is needed for the
update and maintaining the consistence among multiple replicas.

When estimating the number of hosts in the p2p system, most systems
overestimate it because of DHCP and NATs, which cause a host has multiple
IP addresses. This problem is pointed out by Ranjita Bhagwan et. al. They
argue the peer probing should use host IDs. With the inaccurate
estimation, their model may underestimate the required bandwidth.

The authors apply the same redundancy schemas to all the files stored in
the system. This is not necessary. For example, popular files will
self-replicated by user requests and be widely distributed in the network;
unpopular files have few requests and do not need many replicas in the
network. Therefore, the needed redundancy should be different according to
the popularity of the file, which is indicated in users request traces. We
can optimize the bandwidth usage by maintaining necessary redundancy for
different files.
Received on Sun Nov 13 2005 - 23:01:16 EST

This archive was generated by hypermail 2.2.0 : Mon Nov 14 2005 - 01:05:05 EST