(no subject) from Guoli Li on 2005-09-18 (mbox)

From: Guoli Li <gli_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Sun, 18 Sep 2005 20:32:24 -0400

This paper discusses the important properties of giant-scale services
exposed to failures and proposes some useful metrics which are helpful to
analyze and design these services. Brewer defined a basic model for
giant-scale services, and exploited the availability, online evolution and
growth which are challenges in real world application/service designing
and analyzing.

The key strength of this paper is that the systemic technique evaluating
giant-scale services that are open to failures are very practical. Brewer
simplifies the complicated services with a basic architectural model,
based on which Brewer discusses the high availability requirement of
giant-scale system. He defines uptime, yield, harvest, DQ value, and how
to evaluate the approaches used in the system design with these metrics.
He also addresses the online evolution and growth by comparing the
existing approaches with the DQ analysis.

The following problems are not fully addressed in this paper. First, the
approaches for giant-scale systems are workload-depended. For example, a
partitioning solution maybe optimized for a particular kind of workload,
but maybe not so efficient for another kind of workload. Could the
relative DQ be used for dynamic data repartitioning? Second, simply
denying expensive queries is not a good solution. Data required by queries
varies. An expensive query may occupy more DQ bandwidth and affect the
system performance. The load manager, for instance, could split the query
into several inexpensive ones to gain the harvest and yield. Third, online
evolution is extremely useful in real service upgrades. Beside the three
approaches discussed in the paper, I’m also concerned about data update
and failures occurred during the upgrade time. If the data update request
is processed in one or some nodes, could the data consistency be
guaranteed among all the data replicas?

Overall, the availability metrics and the DQ principle are valuable for
analyzing the system performance under a fault. Even though, Brewer
pointed out that the analyzing is based on a basic model and the DQ
principle is for data-intensive sites, the idea and the metrics are still
useful and thought provoking for more complicated giant-scale service
design.
Received on Sun Sep 18 2005 - 20:32:36 EDT

This archive was generated by hypermail 2.2.0 : Mon Sep 19 2005 - 00:02:18 EDT