review of cooperative proxy cache from Guoli Li on 2005-10-18 (mbox)

From: Guoli Li <gli_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Tue, 18 Oct 2005 18:32:05 -0400

This paper studies the performance benefits and limitations of cooperative
caching from a systemic viewpoint. It first explores the traces collected
from UW and Microsoft, and develops an analytic model based on the
parameters extracted from the traces. It compares the performance of three
cooperative caching schemes in this model.

The authors use two approaches. First, they use trace-based analysis to
identify the characteristics of cooperative caching performance. According
to their study, the request hit rate is related to how many cacheable
documents are shared among clients. There are locality in organizational
clients and affinity in client access patterns. However, in both cases
cooperative caching has little impact on hit rate. The cooperative caching
can benefit limited populations. The simulation analysis shows that the
number of 20K clients gains the largest benefit from cooperative caching.
Scaling beyond this population provides little improvement. The
trace-based analysis approach identifies the upper bound of cooperative
caching benefit. Second, the authors develop an analytic model of Web
behavior. The model is aimed to explore the steady-state performance of
cooperative caching schemes. Hit rate, latency and bandwidth are studied
as a function of client population based on the model. The analysis shows
that hit rate is sensitive to the rate of change of document, especially
for the unpopular documents. The authors compare the performance of three
cooperative caching schemes based on the analytic model, including
hierarchical caching, hash-based caching and directory-based caching.

However, this paper assumes “perfect cooperating proxies” in the ideal
cases. It is not clear what “perfect” means here. Each scheme proposed has
its own drawbacks. Different caching scheme may have different hit rate
benefit. The hierarchical caching stores multiple copies of a document
along the request path in the hierarchy and the top-level cache server
forms a bottleneck by processing all the requests. Hash-based caching is
not scalable to the client population in terms of latency. The
directory-based caching is not stable during the directory maintain
period. Some of requests may miss cache because of the inconsistent
directory information. The update message exchanging also is extra traffic
to the network. Among the three schemas, hash-based caching has better
performance since only one copy of each document is cached and request
load is balanced across all caches. But in terms of failure, hierarchical
caching may have better fault tolerance.

This paper claims that although there is some affinity in client access
patterns, the impact on hit rate is not significant and can be ignored. So
in the model, it assumes that accesses to objects are independent. I’m not
convinced by the statement. I think the access patterns are application
scenario related. The access behavior of clients with different
application/organization background may have different benefit on hit
rate. Another limitation in this paper is that the workload is based on
static documents only. Web accesses of multimedia object have different
properties.
Received on Tue Oct 18 2005 - 18:32:18 EDT

This archive was generated by hypermail 2.2.0 : Wed Oct 19 2005 - 17:52:19 EDT