Review - Web Caching and Zipf-like Distributions

From: Jesse Pool <pool_REMOVE_THIS_FROM_EMAIL_FIRST_at_eecg.toronto.edu>
Date: Wed, 12 Oct 2005 20:36:44 -0400

The research community has performed several studies that attempt to either
demonstrate a Zipf distribution in web caching, or to demonstrate that web
caching does not follow this property. Breslau et al. further study the
potential for effective caching algorithms and attempt to characterize their
findings. Their results indicate that a simple model for an independent
request stream following a Zipf-like distribution is sufficient to capture
some properties of web caching.

This research effort amassed an enormous amount of data, where approximately
18 million cache requests where studied across various corporate and
research installations. The findings demonstrate that web caching does
follow a Zipf-like distribution very well. Using the large dataset, it was
also confirmed that there is a low correlation between access frequency and
the size of a document. Similarly, there is only a small correlation between
document access frequency and average modifications per request.

The value in this paper is not its effort to confirm Zipf behavior in web
caching, but rather the attempt to formulate a simple characterizing model.
While taking several assumptions into consideration, a simple and well
described model is presented in a convincing fashion. Although some of the
assumptions taken here do not seem to be realistic, such as the assumption
that requests are independent, it is shown that the model performs fairly
well in practice, providing a reasonable approximation.

While this paper does seem to correlate a simple model with a real world
example, the confirmation of Zipf-like behavior in the first half of the
paper does not seem necessary. Breslau et al. provide several examples of
Zipf-like behavior across six installations, but the motivation for this
analysis is unclear. Why not simply reference the previous work? It seems
that the same conclusion was drawn first from 100,000 requests, and then
from 500,000 requests, further studies were performed on 2,000,000, and so
on. The only research described in this paper that claims web caching does
not follow Zipf-like behavior is discredited. This leads the reader to
consider the possibility that the proposed model could not be confirmed with
existing data.

The contribution of this paper is the result indicating that an independent
reference model, a fairly simple model first suggested in 1980 for file
system paging, is good enough for characterizing some web caching behavior.
However, the confirmation of Zipf-like behavior seems to be a wasteful
exercise.
Received on Wed Oct 12 2005 - 20:36:55 EDT

This archive was generated by hypermail 2.2.0 : Wed Oct 12 2005 - 20:37:56 EDT