The Zipf Paper Review

From: Ali Akhavan <akhavan_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Thu, 13 Oct 2005 10:59:42 -0400

This paper (1) delivers more thorough trace investigations to observe the web page request distribution over page rank in the Internet proxies (2) proposed a simplified statistical model for page request in the WWW and infer some three experimentally proved properties out of this model and (3) finally try to derive implications of the proposed model for web-cache design purposes.

They key strength of the paper is its more in-depth examination of web traces in comparison to the work in literature. The servers selected for trace analysis are variant enough to derive enough-general conclusions about the traces. More specifically, the traces span three major entry points into Internet, namely ISPs, Corporation proxies and University proxies. The authors were also able to explain the variance of \alpha in the Zipf distribution in terms of the diversity of the proxy servers' users. This pattern well matches the intuition that older claims of researchers about the Zipf distribution of page request might be related to the existence of less-divers proxy servers in the past.

One point noteworthy in the authors criticism of past works is that they have not considered the evolution of Internet technology and the role of web-page designers and users in their analysis. To clarify, the authors does not see significant correlation between access frequency and document size in the traces, whereas previous works say the opposite. This outcome might be due to the fact that in the past, large document web pages was loading slower than the smaller ones and this pushes the web designers to make the hot documents as small as possible to overcome bandwidth limitations. The same arguments hold about the correlation of access frequency and change frequency of web pages : The old web servers was not that fast to process many hot documents with high change rate. We may be able to derive a correlation between access frequency and the product of change rate and document size. One other argument that is not visible along the lines of this research is a discussion on the generality of the simplified model. In other words, there may be many models that satisfy the three asymptotic behaviour of the caches but not the same model as the one of author's; in the last part of the paper, authors say that they are going to examine other traces to justify their simplified model, whereas I say they should hang into other *properties* of their current model to see what the boundary of their model is.
Received on Thu Oct 13 2005 - 10:59:11 EDT

This archive was generated by hypermail 2.2.0 : Thu Oct 13 2005 - 11:03:22 EDT