CSC 2231 Review for "Lessons" from Jin Chen on 2005-09-19 (mbox)

From: Jin Chen <jinchen_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Mon, 19 Sep 2005 01:15:23 -0400

This paper reveals design lessons of giant Internet applications through
analyzing the scalability, availability and evolution issues over large
scale clusters. It further proposes yield, harvest, DQ principle to
quantify availability.

However, the definition of availability for giant clusters seems quite
vague.

First, how to define a failure for measuring uptime is unclear. If a
single machine fails, it seems not proper to consider it as a failure
since other machines are still alive. But if the data hosted by the
machine are not replicated on other machines, some queries could be unable
to get right results; from this view, we should consider it as a failure.
So should we distinguish failures by their corresponding damage and map
them into fractions?

Second, is yield a good availability metric? In my understanding, yield
considers performance degradation even without failure. Because under
overloaded case yield will not be 100 percent since some queries may be
discarded. Also, in this case, the query execution time may become longer
due to queuing time. After long time wait, users may already change to
other alternative services. Thus, should we define "completed queries"
according to some function of execution time by considering user patience?
In addition, different queries could have quite different execution time.
In an extreme case, queries with "no found" result may return immediately.
So if we measure yield during a fixed time interval, its value may be
influenced by various query patterns.

Another weak point is that analysis of replication vs. partition. The
author does not give a clear rule to guide data placement. If important
data have huge size or need frequently updating, full replication may not
be feasible. DQ principle seems not helpful here.

In general, the author gives a good overview based on Internet search
engine systems. I am wondering whether there are different lessons for
other large Internet systems, such as e-commerce, messaging, video on
demand, p2p file sharing, streaming.
Received on Mon Sep 19 2005 - 01:15:35 EDT

This archive was generated by hypermail 2.2.0 : Mon Sep 19 2005 - 01:40:08 EDT