The Failure Study Paper review

From: Ali Akhavan <akhavan_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Thu, 29 Sep 2005 11:02:42 -0400

The Paper reports on a failure study on three Major internet services to releaize the distribution of the failures over causes and places of failure. The paper compare the effect of three failure causes : H/W, S/W and service operators and introduce the operators failure as the most common and effective cause. The authors have also done comparisons on effectiveness of the location of the failure into the quality of internet service provided.

I think the major strength of the paper is that it is a ground realistic attempt towards failure study in intenet services. The authors have analyzed the error log files of three working internet services (rather than simulations) and conclude. Although their result can be guessed apriori by the reader, we need firm evidences to say about the major causes of failure. I say that because the hardware and software market has fastly grown and is now capable of building reliable nodes - at least and also networks, however, the custom system that is built upon these reliable infrastructures is not that safe, because it is custom! A good proposal of the authors on going towards more powerful tools for the operators will not make sense unless the industry goes towards standards and widely accepted benchmarks for constructing such huge internet service complexes.

As far as the key weaknesses of the paper, In my thought, the analysis that authors have done on the techniques of improvement in an internet service suffers from some strong assumptions. First, they have partitioned the improvement techniques into highly overlapped ones and second it is apparent that the operators of every web site will use a combination of these techniques and considering the overlapping effects of these techniques, one can not conclude which techniques to use. Second, there is no statement about the quality of each technique performed at the site. In other words, there are a lot of dimensions in each technique (Correctness, Completeness, ...), however the paper assumes the quality to be identical and finally the number of failures studied in this section is fairly low to help us arrive at a conclusion on which technique to use.

All in all, the paper is well written and communicate well with the reader.
Received on Thu Sep 29 2005 - 11:02:25 EDT

This archive was generated by hypermail 2.2.0 : Thu Sep 29 2005 - 11:02:26 EDT