Understanding Availability Review from Troy Ronda on 2005-11-14 (mbox)

From: Troy Ronda <ronda_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Mon, 14 Nov 2005 10:49:53 -0500

Understanding Availability Review
Review by: Troy Ronda

Systems fail in the real world. They can fail for many reasons including
configuration error, user error, or hardware error. Highly available
systems must tolerate each type of error and include the ability to
recover from them. Peer-to-peer storage systems must model the poorly
understood availability of each member host. Each member host, for
example, can fail due to software failures, partial or total communication
failure, and users who leave the systems when they desire. P2P storage
systems, however, have a goal to provide efficient, highly available file
storage. This paper focuses on systems comprised of hosts with highly
variable and relatively poor availability. The authors ran an experiment
that crawls host identifiers in the Overnet system to determine which
hosts are in the system, for a given time. They found that IP aliasing is
a problem for previous studies. That is, hosts can be associated with
different IP addresses, therefore any availability results based on IP
address are underestimated. This can create design problems because an
underestimate of availability will cause systems to over-replicate in
compensation. The authors also found that host availability decreases over
time. This creates a motivation to periodically refresh the files in the
system. They also found that the number of hosts in the system depends on
the time of day. In fact, individual hosts join or leave the system
multiple times per day. This can be a problem for systems that actively
replicate on join. The authors also found that host availability is
significantly independent of other other hosts in the system. The arrival
and departure rate in Overnet during their study are approximately the
same. Hence, the size of Overnet remained nearly constant over their
study.

The main strength of this paper is the demonstration of IP aliasing. This
is important because several previous studies relied on IP addresses to
estimate availability. It is surprising (and interesting) that 40% of
hosts used more than one IP address in just one day. The authors also
demonstrate that the availability characteristics of IP addresses vs host
identifier are very different. This is important because previous studies
may have underestimated the usefulness of P2P for highly available
storage. The presented data and graphs are useful to my understanding of
P2P availability. I found it interesting that time of day had an impact on
availability. I would have liked to hear the specific times that it
impacted. Does it happen, for example, when people turn their computers
off at night, or when they go to work, or both?

Ironically, I found the authors definition of availability to be fuzzy.
They define availability out of the dictionary but stop short of giving a
precise definition. This caused problems when I attempted to read the
graphs later in the paper. Does availability in these graphs mean that
each host is up for a percentage of the time. This became my assumption. I
found that this paper had poor writing and I wonder if I even understand
availability better. I have several unanswered questions right now. Will
crawling always pick up every host? Why is four hours the correct
granularity for the crawls? When do new random identifiers become assigned
in Overnet? What is the probability of an overlap when randomly generating
the ID? I found the summary to be weak; no model is actually provided. A
future work would be to actually determine if P2P systems are suitable for
highly available storage systems, as I do not think this question is
answered in this paper. Where is the justification that all P2P networks
will perform in a similar manner to Overnet?
Received on Mon Nov 14 2005 - 10:50:00 EST

This archive was generated by hypermail 2.2.0 : Mon Nov 14 2005 - 10:50:01 EST