Review: cooperative web caching from Jing Su on 2005-10-20 (mbox)

From: Jing Su <jingsu_REMOVE_THIS_FROM_EMAIL_FIRST_at_cs.toronto.edu>
Date: Thu, 20 Oct 2005 01:09:53 -0400

This paper provides an empirical study into the efficacy of
cooperative caching between large groups of users. Two seven-day-long
traces were used: one from the University of Washington and the other
from Microsoft (both within close proximity of each other in Seattle).
In the year 1999, most Internet users are still on dial-up. Some
early adopters have cable-modems (though bandwidth back then was
extremely variable), but DSL is still an expensive and rare option.
Those who can afford it have ISDN, which provides guaranteed up/down
link bandwidth. The lucky few undergrads who live in dormitories get
to enjoy full 1+Mbit broadband from universities.

The paper found that the greatest benefits of caching arise from
increasing the effective population pool of small organizations.
Beyond a population of 20k, the effects begin to significantly level
off. What's interesting is that (as expected) cooperative caching can
never perform better than a unified cache. Thus if it weren't for
organizational barriers, it's better to just clump people up
arbitrarily and give them a proxy cache.

There are several interesting items to note in the dataset of this
paper.

First, it's interesting that while Microsoft has 40k employees, the
trace lists over 60k clients. That's 1.5 computers per employee!

Second, it is also very unfortunate that they removed modem-pool data
from their traces. UW's C&C maintains full IP traces of all their
dorm and modem-pool accesses as well as modem-pool usage logs. I'm
sure they could've found a way to distinguish clients out of the
modem-pool IP numbers. In the year 1999, most undergraduates used the
modem-pools for their Internet access. In fact, many of the on-campus
dormitories (including Terry, Lander, and Hagget Halls) did not yet
have broadband in 1999 and students in them also used the modem-pools.
Not to mention undergrads who lived off-campus.

Finally, while I can understand restricting the UW trace to 7 days
because of limited data from Microsoft, it didn't make sense that they
didn't use a longer trace to test steady-state behaviour in section 4.
As an empirical study where the data is available (they started
collecting since October 1998), why not first see if the trace data
contains steady-state behaviour?

Perhaps it's because I have the benefit of hindsight, but it seems
unsurprising to me that Microsoft benefited more from cooperating with
UW rather than the other way around. Because of the diverse usage in
the UW trace, their users can aid fringe document requesters at
Microsoft. In the other way around, due to Microsoft's homogeneous
usage, their cache content is unlikely to aid UW users who likely
already made those documents hot. Especially since the CSE
department likely dominated the data-trace.
Received on Thu Oct 20 2005 - 01:09:56 EDT

This archive was generated by hypermail 2.2.0 : Thu Oct 20 2005 - 09:38:43 EDT