review of Total Recall

From: Guoli Li <gli_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Thu, 24 Nov 2005 00:19:08 -0500

This paper presents a peer-to-peer storage system, called TotalRecall. The
goal of TotalRecall is to design a distributed storage system built using
highly unavailable components. TotalRecall takes the availability as the
first class property in a p2p storage system. It allows users to specify
an availability target of a particular object (file) and uses automated
availability management approach to achieve the target in a dynamically
changing p2p environment.

This paper describes the architecture, operation and evaluation of
TotalRecall storage system. The key observation of the paper is that p2p
systems have high dynamism: short-term variation and long-term decay.
Systems can take advantages of the two level availabilities. That is,
using extra redundancy to mask short-term variation so that reduce
overhead of maintaining redundant data. The automated availability
management has three components: availability prediction, redundancy
management, and dynamic repair. Availability predication is to measure
host availability characteristics and predict future behavior of hosts.
Redundancy management is to tolerate transient host downtimes. It decides
what redundancy mechanism is appropriate. Dynamic repair is to tolerate
permanent failures. TotalRecall proposes two repair schemes: eager repair
and lazy repair. Eager repair repairs data as soon as hosts leave the p2p
overlay. It is simple, but has high repair overhead. Lazy repair uses
extra redundancy to deter repair only when it is necessary. It is more
efficient but it must maintain the states of which host carries what
objects. The results show that eager repair is better for small files
while lazy repair works well with larger files.

However, this paper has two limitations. First, it assumes that host
failures are independent. This is not always true in practice. Host
failures may correlate because of internet worms, for instance. This may
affect the performance of mechanisms presented in this paper.

Second, a general assumption in p2p systems is that peers are the same in
capability. In reality, however, nodes have unequal capabilities in terms
of storage capacity, memory, bandwidth and network delay. It is important
to exploit resource heterogeneity to improve system performance and
reliability.

Interesting direction may be: reducing number of repairs by allocating
replica of a file on a peer which maybe online when the file is requested,
if the behavior of file request and host join/leaving could be predicted
from the historic data.
Received on Thu Nov 24 2005 - 00:19:19 EST

This archive was generated by hypermail 2.2.0 : Thu Nov 24 2005 - 00:31:00 EST