CSC 2231 Review of Total Recall

From: Jin Chen <jinchen_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Thu, 24 Nov 2005 03:20:12 -0500

This paper presents an automated architecture, Total Recall, to adaptively
adjust the replication policy. It uses replication for small files and
erasure coding for large files; and uses eager repair for metadata, and
uses lazy repair for files. More importantly, it dynamically monitors the
host availability, and thus computes the degree of redundancy for the
system.

However, in such a system, there are lots of parameters to tune. For
example, probing period, file size threshold, the blocks of fragments, and
the repair threshold all influence the performance of the system.
Moreover, given the target availability, using host availability to
computing the redundancy degree may be not accuracy and effective, because
an average value of host availability of the system cannot reflect whether
these nodes are active in the same period or not. On the other hand, the
host availability could change quickly, and thus is hard to predict.

This paper mentions transient failure and long term failure. But it is
unclear how to distinguish them in the Total Recall system. It seems host
availability are measured based on long term failure. As the authors
mentioned, this system does not work for large scale unpredicted failures,
but this greatly weakens the impact of this paper since that could happen
frequently in P2P systems. In addition, the protocol of keeping strict
consistency for metadata could lead to long delay during file maintenance.
Received on Thu Nov 24 2005 - 03:20:19 EST

This archive was generated by hypermail 2.2.0 : Thu Nov 24 2005 - 10:06:16 EST