Glacier from Jing Su on 2005-11-24 (mbox)

From: Jing Su <jingsu_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Thu, 24 Nov 2005 09:41:07 -0500

This paper presents Glacier, a system designed to provide strong
availability guarantees, even in the presence of large numbers of
correlated failures. The interesting aspect of Glacier's design is
that the creators boldly decided to sacrifice (significantly) storage
efficiency for the targetted availability.

However, Glacier depends on an environment which I think makes its
application less intresting. First, Glacier cannot be used by a
randomly large group of peers. Glacier assumes that nodes stay
connected for significantly long periods of time (on the order many
months) and have well-connected networks. Thus Glacier is most
applicable within an organization.

The application interface to Glacier is similar to that of tuple
storage systems like T-Spaces. Object lifetimes are controlled by
leases which must be refreshed or else objects go into garbage
collection. Though the lease periods can be set long, applications
are responsible for refreshing these leases. However, the paper does
suggest that it is possible to set the lease period to infinity. I
suspect that most people, like myself, would simply default to never
having leases expire. Thus the set of data which Glacier must
maintain only grows.

The effects of this is shown in Figure 9, which illustrates the large
amount of data Glacier manages to garbage-collect. In this
experiment, the lease time for all objects was 30 days, which even by
mail storage standards is very short.

For studying the load, I'm not sure if I learned anything from these
graphs. Maybe it's just me. In the evaluation, they stated that
their storage had approximatelly 8GB worth of email messages. To me
this doesn't seem like much data to play with considering the large
complexity of running Glacier. With this data size, in 7.6 they state
that their recovery mechanism took approximately one hour to recover
from a near 60% failure scenario. I still don't understand how the
cost is expected to extrapolate to recovering if the dataset is
larger. I also don't understand the significance of presenting graphs
showing message counts along the Y axis. I'm also not sure how to
compare Figures 14 and 16 in order to compare the effects of diurnal
versus uncorrelated failures.
Received on Thu Nov 24 2005 - 09:41:14 EST

This archive was generated by hypermail 2.2.0 : Thu Nov 24 2005 - 09:46:35 EST