Lessons from Giant-Scale Services --------------------------------- Eric A. Brewer This article presents a number of lessons useful in designing and analyzing giant-scale services. After making a number of fair assumptions (e.g. "read-mostly" traffic), the author describes a basic model for a giant scale-service and a number of "load management"-oriented approaches. The author's lessons are derived from the fact that failures are a certitude in a giant-scale service and from the high availability requirement for this kind of system. The main contribution of the paper is represented by the availability metrics yield and harvest, and by the DQ principle. Yield and harvest are better metrics than uptime in the sense that they "map to user experience". The author also gives the relation between system design and these metrics (replication & partitioning). Furthermore, a system can be designed and maintained with the DQ value in mind (graceful degradation, disaster tolerance, online evolution and growth), DQ being "measurable and tunable". One of the weaknesses of the paper is its lack of data. A phrase like "DQ normally scales linearly with the number of nodes" should de backed up by data. The same applies to the Cost-based Admission Control in Inktomi. I think that a thorough analysis based on Inktomi would have strengthen the author's ideas. The smart client approach is kind of vaguely described and used. The author keeps refering to it (Disaster tolerance, Conclusions) without any thorough insight on the issue. However, I find the paper to be a good one, mainly for its new availability metrics and for the DQ principle and its use in designing and maintaining a system.