Re: [CSC2231] Paper Review: Web Caching and Zipf-like Distributions: Evidence and Implications

From: Kenneth Po <kpo_REMOVE_THIS_FROM_EMAIL_FIRST_at_eecg.toronto.edu>
Date: Wed, 12 Oct 2005 20:37:50 -0400

This paper is a study of traces of web requests from various
organizations. Since web requests follow a simple Zipf-like distribution
model, the authors can predict the performance characteristics of web
caching for these traces.

In section 3 of the paper, the authors try to calculate the hit ratio of
web caches given some constraints. It may be more practical to look at
hit ratio for finite cache in terms of storage size instead of number of
web pages. In my opinion, it may be even more interesting to calculate
the hit ratio for individual bytes stored in the web caches. Such hit
ratio will show whether it is beneficial to cache a small number of
large objects or lots of small objects. For large objects such as video
clips, the web users may stop watching in the middle of the clip. This
makes the hit ratio varies across different parts of the video clip and
therefore considering a large object as one atomic unit is too
coarse-grained.

I think the diagrams on page 5 of this paper are hard to read and
understand because one can hardly figure out the number of black dots in
the diagram when part of it is entirely black. I would suggest the
authors to use a gradient of gray to tell the density of the dots.

Although web caching seems dead nowadays, Google and Yahoo actually
offer large web caches. These web caches exist for availability instead
of performance purposes. They are useful if web users do not mind
retrieving potentially stale data while the requested contents are
unreachable. One example is that the CS web server was down in the
morning of October 11 when I was trying to get to the web site of this
and another CS course. Eventually I reached Google's cached pages to
look for the information. While these web caches are helpful under
certain occasions, I believe their hit ratio is terrible. It may be an
interesting research to study the web caches offered by search engines
and understand their purpose of existence.
Received on Wed Oct 12 2005 - 20:37:55 EDT

This archive was generated by hypermail 2.2.0 : Wed Oct 12 2005 - 21:21:20 EDT