CSC2231 Review Zipf

From: Jin Chen <jinchen_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Wed, 12 Oct 2005 21:21:09 -0400

This paper shows Zipf-like distributions do hold for web traffic through
analyzing six web traces, and demonstrates the break of "10/90" rule.
Based on Zipf observation, it further proposes some models to estimate
cache hit rate.

However, it fails to thoroughly reveal the reason leading to Zipf-like
distribution. It is easy to understand that hot documents got much more
requests than unpopular ones, but why top 2nd document only get nearly or
less than half of references as the top 1 document. Will the web traffic
distribution change with the new functions of web? Will the search engine
change this distribution? Because the document ranking from search engines
are possible not consistent with popularity, and they may direct people to
find some unpopular pages. And if the popularity of documents changes
quickly, will the distribution be affected? So without a deep
understanding of the reason, we are hard to conjecture the changing trend
of web traffic with new applications to happen.

>From this paper, it is unclear about the composition of the web traces.
How much traffic is short TCP, and how much is long TCP that possible
implies file downloading from the web. In addition, from the Figure 1,
we can see that the short beginning and the end of trace curves do not
match Zipf-like very well. But the paper did not explain the deviation
for the top 10 ranking documents and the last very unpopular documents.

Moreover, their cache hit rate models seem too simple. First, they ignore
the cache replacement strategy. Second, they assume requests happen
independently with each other. This may be not true if we consider that
users usually visit related web pages by clicking the links. Third, it is
not reasonable that they assume the cache can hold C web pages regardless
of the size of each web page. Usually, cache has a fixed size.

Recently, there are some works that show some multimedia traffic does not
obey Zipf-like curves. It is interesting to explore deep reasons behind
the observed traffic distribution.
Received on Wed Oct 12 2005 - 21:21:18 EDT

This archive was generated by hypermail 2.2.0 : Thu Oct 13 2005 - 00:33:39 EDT