On the scale and performance of cooperative Web proxy caching from Troy Ronda on 2005-10-20 (mbox)

From: Troy Ronda <ronda_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Thu, 20 Oct 2005 09:38:30 -0400

On the scale and performance of cooperative Web proxy caching
Review By: Troy Ronda

Web proxy caches have reduced response time and have reduced bandwidth
consumption. The caches store local copies of remote data and only
request the object remotely when the object does not exist in the cache
or if it has expired. Alternatively, the proxy cache can service a miss
by first asking other proxies for the object and only going to the
source server when no proxy has it. This will only be beneficial if the
other caches are closer than the source server. There is a point in
client population where we get limited returns (2500 clients in this
paper) . Many organizations have less clients. The authors took a
simultaneous trace of traffic from the various organizations in a
university. They showed that co-operative caching between them
noticeably increases performance. This is only 4% hit rate better than
using random populations of the same sizes. Thus basing co-operative
caching on mutual interest has no obvious advantage. Co-operative
caching does not increase hit-rate much for caches that are large on
their own. This is not surprising because unpopular documents are
universally unpopular. This seems to suggest many people have
universally common interests like news site but work-related documents
are not well-shared. There is a correlation between document rate change
and hit-rate as well as web growth rate vs. request rate and hit-rate
(not surprisingly). There is also a trade-off for how distant caches are
from each-other. In a broad area, such as the west coast, the
inter-proxy latency overhead eclipses latency savings from co-operative
caching. Document cacheability remains the main challenge for improving
web cache behaviour. There is little reason to continue expending effort
on the design of highly scalable caches.

The paper was convincing, both with the amount of data collected and
with the model. The graphs are a strength in this paper. I found myself
referring to them more than the actual text. I found the methodology of
using different organizations to be clever (and agreeable). The
comparison to another large cache is also an important point. The
conclusions are important because it gives us a direction for future
research. I.e. The scale of current web proxy caches are good enough for
the current web. We need to focus on making documents more cacheable. It
is a refreshing look at how a negative results can be spun into a good
paper.

One of the major points in this paper was regarding the knee in the
graph at 2500 clients. However, the co-operative cache example of
organizations has 5357 clients. It would have been good to get a feeling
for performance increases at 1000, 2000, 2500 as well as 5357 clients.
Should all caches just maintain the same base list of objects and also a
local version for unpopular documents? It seems that this would work as
well as co-operative. Another point is that we could potentially reduce
latency by advancing a further step. For the directory model of caches:
instead of just requesting a directory of objects from another cache, it
is conceptually possible to request the missing objects at the same
time. It would be a balance of bandwidth vs. latency savings. Was the
model worth it? I found the empirical data much more convincing. In the
end, it was good to have some result on steady-state caching despite the
unsurprising results. Active caching? Is it possible that we just cache
all the links from unpopular documents while browsing? Is it worth it?
Should we increase bandwidth usage to save latency? What is better?
Received on Thu Oct 20 2005 - 09:38:42 EDT

This archive was generated by hypermail 2.2.0 : Thu Oct 20 2005 - 10:01:55 EDT