Review - On the scale and performance of cooperative Web proxy caching from Ian Sin on 2005-10-19 (mbox)

From: Ian Sin <ian.sinkwokwong_REMOVE_THIS_FROM_EMAIL_FIRST_at_utoronto.ca>
Date: Wed, 19 Oct 2005 17:51:34 -0400

This paper summarizes an analysis of cooperative caching between large
populations and among small populations, using real web traces. Since their
real data trace is limited, they also present a model for cooperative
caching to model very large populations. This enables them to vary document
and population characteristics and observe the effects on cooperative
caching.
The paper presents the interesting results of the analysis clearly,
outlining some essential conditions under which cooperative caching will
work. These include high bandwidth, low latency and relatively small
population size; which are basically the characteristics of a LAN, but not a
WAN. It also presents some cooperative caching strategies which I think was
interesting, particularly the hash-based caching. This is similar to
content-based caching which, under a high bandwidth and low latency network,
is likely to perform well.
Although it is very insightful to know how cooperative caching of web
workloads will perform as the parameters of their model are varied, some of
it is also pointless. For example, they cite the study by Breslau which
concluded that there is no correlation between document rate of change and
popularity, yet they spend lots of time talking about varying popularity,
rate of change and hit rate in Section 4.4.2.
As they mention in their conclusion, the results of this study might not
apply for today's Internet workloads. Today, multimedia workloads are most
popular and have desirable caching characteristics, e.g. they are big (high
byte hit rate) and do not change often. Aside from legal challenges, it
might well be worth thinking about caching these popular files in a
content-based cooperative caching system. Also, advertising is very popular
and these items are not cacheable. It is unclear what the behavior of
advertising content is and if there are any means of caching them.

Received on Wed Oct 19 2005 - 17:52:18 EDT

This archive was generated by hypermail 2.2.0 : Thu Oct 20 2005 - 00:44:07 EDT