Lessons from Giant-Scale Services
---------------------------------
Eric A. Brewer

This article presents a number of lessons useful in designing and analyzing 
giant-scale services. After making a number of fair assumptions (e.g. 
"read-mostly" traffic), the author describes a basic model for a giant 
scale-service and a number of "load management"-oriented approaches. The 
author's lessons are derived from the fact that failures are a certitude in 
a giant-scale service and from the high availability requirement for this 
kind of system.

The main contribution of the paper is represented by the availability metrics 
yield and harvest, and by the DQ principle. Yield and harvest are better 
metrics than uptime in the sense that they "map to user experience".
The author also gives the relation between system design and these metrics 
(replication & partitioning). Furthermore, a system can be designed and 
maintained with the DQ value in mind (graceful degradation, disaster tolerance, 
online evolution and growth), DQ being "measurable and tunable".

One of the weaknesses of the paper is its lack of data. A phrase like "DQ 
normally scales linearly with the number of nodes" should de backed up by
data. The same applies to the Cost-based Admission Control in Inktomi. I think
that a thorough analysis based on Inktomi would have strengthen the author's 
ideas. The smart client approach is kind of vaguely described and used. 
The author keeps refering to it (Disaster tolerance, Conclusions) without 
any thorough insight on the issue.

However, I find the paper to be a good one, mainly for its new availability 
metrics and for the DQ principle and its use in designing and maintaining a 
system.