review of Glacier

From: Guoli Li <gli_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Thu, 24 Nov 2005 10:08:56 -0500

Most p2p systems assume that failures are independent among peers. This
assumption is not realistic. This paper discusses Glacier system, which is
a distributed storage system surviving correlated failures. It creates
extra redundancy to provide files with high probability under any
correlated failures. The authors try to achieve this goal with minimized
storage, bandwidth and security requirements.

Glacier uses erasure codes and garbage collection to minimize the overhead
of storage. When a new object is inserted, it applies erasure code,
attaches manifest with hashes of fragments, and sends each fragment to a
different node. Glacier performs periodic maintenance. It asks a peer node
for its list of fragments and compares with local list to recover any
missing fragments. Garbage collection is used to reclaim unused storage.

Messaging cost is reduced by aggregating small objects and using a loosely
coupled maintenance protocol for redundant fragments. This paper also
discusses the possible attacks and proper solutions for defense. However,
the more complex the system is, the more vulnerable it is subjected to
attacks. The solutions for defense may increase the complexity of the
system.

For failure recovery, Glacier assumes that the underlying DHT could
provide sufficient routing event under the large-scale correlated
failures. Otherwise, the recovery is hard. The DHT routing should
guarantee that the self-organization is fast enough for the recovery.

Trading redundancy for availability under large-scale failures is
expensive. Although the overhead is optimized in terms of storage and
bandwidth, it is still expensive. Many techniques used in Glacier are
periodically performed. The value of the period is important for system
performance.
Received on Thu Nov 24 2005 - 10:09:04 EST

This archive was generated by hypermail 2.2.0 : Thu Nov 24 2005 - 10:17:35 EST