[CSC2231] Paper Review: Lessons from Giant-Scale Services

From: Kai Yi Kenneth Po <kpo_REMOVE_THIS_FROM_EMAIL_FIRST_at_eecg.toronto.edu>
Date: Mon, 19 Sep 2005 05:43:06 -0400

This paper suggests the use of the harvest and yield metrics and the DQ
Principle to predict the availability of a service under various types
of situations such as failures and system upgrade.

This paper gives a revolutionary view in how to evaluation availability.
Traditionally, the industry considers uptime to be the most important
metric. A 0.99999 uptime means a service is almost always available but
the capacity of the service is unknown when the service is under partial
failures or upgrades. The harvest and yield metrics give an idea of the
capacity of the service under these harsh situations.

My criticism is that there is no rule of thumb of what values the
harvest and yield metrics should be to indicate a service with high
availability. These metrics by themselves are harder to understand than
the uptime metric. For example, deploying an extra replica is supposed
to improve the service’s availability but it actually lowers the yield
value. In addition, as a user of a service, I might not worry about the
harvest and the yield metrics. Instead, I think there should be a new
metric that addresses the timeliness of the service’s response. The
response time metric reflects the amount of data that can be handled by
the service: the faster the response time, the higher the service’s
capacity. For example, when the demand of a service approaches the
capacity limit, requests are put into queues and the response time is
lengthened. This metric is also as straightforward as the uptime metric.
Received on Mon Sep 19 2005 - 05:43:12 EDT

This archive was generated by hypermail 2.2.0 : Mon Sep 19 2005 - 08:27:48 EDT