Re: [CSC2231] Paper Review: Why do Internet services fail, and what can be done about it?

From: Kai Yi Kenneth Po <kpo_REMOVE_THIS_FROM_EMAIL_FIRST_at_eecg.toronto.edu>
Date: Thu, 29 Sep 2005 01:17:03 -0400

This paper examines the service incident data repository from three
large-scale Internet services that have different service
characteristics. The findings suggest that operator errors are the
primary reasons that cause service failures visible to users.

As an Internet user, I view the Internet as a best-effort service. If an
Internet service happens to be down when I browse, I simply cross my
fingers and retry some time later. However, as businesses rely more on
the Internet, they bring in service agreements to protect themselves
from down times. This forces Internet services to become more robust to
meet the contract. It is interesting to learn that many service failures
are caused by operator errors rather than machine or network failures.

In the discussion section, the authors suggest two ways to help
operators to make fewer mistakes: 1) software application designs should
weight maintainability as important as other aspects such as correctness
and performance, and 2) a standardized service failure data repository
should be built to train operators by negative examples. On one hand,
these are good suggestions; on the other hand, I recommend the authors
to investigate the costs to do so in practice. The authors use the
aviation industry as the role model, in which a single “service failure”
incident costs not only money but also human lives. The incentive to
achieve 100% availability at any cost exists in aviation but is unlikely
for Internet services.

The authors do not address security incidents in this investigation. I
find this decision inappropriate. As computer attacks grow in
popularity, they will happen no less frequent than operator errors or
machine/network failures. Such intrusion attempts may also bring down
Internet services. So what can be done about them?
Received on Thu Sep 29 2005 - 01:17:16 EDT

This archive was generated by hypermail 2.2.0 : Thu Sep 29 2005 - 09:34:07 EDT