review of Zipf

From: Guoli Li <gli_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Wed, 12 Oct 2005 15:09:47 -0400

This paper studied the behavior of web requests. It addressed that the web requests from a fixed group follow a Zipf-like distribution, instead of Zipf¡¯s law. The Zip-like distribution indicates that the probability of a request for the i-th most popular page is proportional to, where 0<<1, and may vary from trace to trace.

 

The authors showed that there is no strong correlation between access frequency of a web page and its size. In other words, the designers of caching algorithm can ignore the correlation between the access frequency and the web page size, if there is any. The authors also proved that there is only a weak correlation between access frequency and the rate of web page changes. Based on the data traces, the correlation coefficients calculated in the paper are very small. Since the correlation may affect the designs of web cache consistency schemes, this answer may simplify the web cache consistency mechanisms by assuming there is no correlation between the web page popularity and the change rate.

 

This paper presented a simple model for web requests, where the requests are independent and follow a Zipf-like distribution. They studied the cache replacement algorithms in the simple model. The solid trace data analysis showed that the simple model is sufficient enough to capture the characteristics of web requests.

 

However, there are several limitations in this paper. First, the simple model is not perfect. It assumes that web requests are independent. It maybe not true in real web access patterns. How the dependency among requests would affect the cache performance is not clear yet. Second, although there is no strong correlation between access frequency and the rate of web page changes, the changes affect the cache hit-rate for sure. This problem is omitted in this paper. Third, cache replacement polices are not discussed in the model. Different policies may affect the performance of replacement algorithms. Furthermore, the Perfect-LFU is the best among the four replacement algorithms. However, the Perfect-LFU requires more overhead to maintain the page counters in terms of memory. It is less practical than In-Cache-LFU.

 

Overall, this paper is a good paper which explores interesting and important problems that are unresolved about web caching. We expect a more complicated and complete model for web requests to validate the answers in this paper.

Received on Wed Oct 12 2005 - 14:15:01 EDT

This archive was generated by hypermail 2.2.0 : Wed Oct 12 2005 - 17:24:59 EDT