Review - Understanding Availability

From: Ian Sin <ian.sinkwokwong_REMOVE_THIS_FROM_EMAIL_FIRST_at_utoronto.ca>
Date: Mon, 14 Nov 2005 01:03:56 -0500

This paper studies the availability of hosts in the Overnet P2P network,
a more complex matter than other systems' availability due to the nature
of P2P networks. It shows that IP aliasing (due to NAT and DHCP) is a
major factor in availability studies as it underestimates availability
and overestimates the number of hosts in the system. The study also
shows that the interdependence of host availability is close to none but
there is some time-of-day dependence and churn rate in the system is
significant.

The strength of this paper is that it shows how the methodology used in
some previous studies to determine availability might have
underestimated availability in P2P systems. The study is also the first
to show the rate of arrival and rate of departure in the Overnet P2P
network, which it claims significantly affects the design of a P2P
system architecture, especially for systems that base the amount of
replication on availability.

The shortcomings of this paper was the fact that it studied a relatively
small fixed set of hosts, i.e. 2400. It is unclear if this set is
representative of the overall Overnet system. Also, their method of
crawling the network to get a snapshot is not particularly convincing. I
believe they will not get an accurate picture and this will introduce
some inaccuracies but I can't quantify it. I am also left wondering what
the overhead of the prober is (mentioned in Section 3.2) and on what
grounds probing every 20 mins is deemed "good enough".

A future direction of research would be a study of how these results
apply to other P2P systems. Does the nature of the P2P application (e.g.
file sharing vs file storage vs DNS records) change the results -
probably DNS peers will be up most of the time? Does geography change
the results, e.g. in developing countries, there are mostly dial-ups and
it is unclear whether the rate of 6.4 joins/leaves account for this?
Does architecture change the results, e.g. DHT vs KaZaA vs Gnutella
architecture? How do we use these results now to design better P2P systems?
Received on Mon Nov 14 2005 - 01:04:15 EST

This archive was generated by hypermail 2.2.0 : Mon Nov 14 2005 - 09:51:53 EST