Review - Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload

From: Ian Sin <ian.sinkwokwong_REMOVE_THIS_FROM_EMAIL_FIRST_at_utoronto.ca>
Date: Mon, 28 Nov 2005 02:26:36 -0500

This paper studies the poorly understood forces behind the multimedia
workloads on the Internet. The authors show that multimedia workloads differ
substantially from the Zipf curves and exhibit a fetch-at-most-once behavior
as compared to Zipf-like Web workloads that follow a fetch-frequently model.
The authors attribute this fact to the immutable nature of multimedia
objects as compared to frequently updated nature of web objects.
The strength of this paper is its clear and structured presentation of the
subject matter. The authors describe how they collected and analyzed the
data, and subsequently present a parametrized model that can be varied for a
more in-depth study. They show using their model how new clients rejuvenate
the system, providing better hit rates on popular objects and observe that
newly born objects are usually the popular objects. They subsequently
conclude that a huge amount of bandwidth could be saved by applying
locality-awareness in the workload.
The main weakness of this paper is that only one trace from the University
of Washington, mostly a student body, was used. Although it is a fairly
large trace, it does not represent the Internet population. An additional
'non-student' trace would have been nice before drawing statistical
conclusions. On a minor point, I believe that the analysis of the trace with
regards to newly-born clients is not accurate, mostly due to IP aliasing,
but the high level
conclusions drawn seem to be reasonable. It is also not very clear as to how
the parameters of the model should be chosen except by observing real traces
and try to fit it.
This paper clearly shows that the Wired article about 'The Long Tail' is
flawed from the beginning. The author of 'The Long Tail' bases his article
on a Zipf popularity distribution of multimedia content, which I now believe
is wrong. Books, CDs & DVDs would presumably exhibit a buy-at-most-once
behavior deviating from Zipf curves.
Received on Mon Nov 28 2005 - 02:26:49 EST

This archive was generated by hypermail 2.2.0 : Mon Nov 28 2005 - 10:29:02 EST