REVIEW: Measurement, Modelling and Analysis of a Peer-to-Peer File Sharing
Workload
The paper provides a qualitative analysis of the workloads in the Kazaa
peer-to-peer network. This analysis is based on traces gathered for a
period of over 200 days at the University of Washington's network border.
Among the goals of the paper were to understand the fundamental properties
of multimedia file-sharing systems, explore the forces driving these
systems and demonstrate the existing opportunities for optimization. The
importance of this work arises from the domination of bandwidth
consumption by such systems (43% vs. 14% for web traffic).
The analysis presented includes interesting findings, some of which
disputable. Among the findings are that multimedia content exhibit a
fetch-at-most-once behaviour, workloads p2p file-sharing are driven by the
creation of new objects or arrival of new users and that users of
file-sharing systems tolerate delays a lot more than web users do. The
authors also found that most requests are made by newer clients, more than
half of requests are for older and less popular objects, confirming the
Long Tail property discussed in [1]. Interestingly, the paper also finds
that Kazaa workloads do not follow a Zipf distribution.
Since traces were collected at the border of the university network,
internal Kazaa traffic was not accounted for and this underestimates the
popularity of the most popular objects thus impacting the shape of the
distribution curve. Accesses to popular objects are underestimated because
after the first accesses subsequent clients are able to download from
within the network, which is not traced.
A second source of concern is that in Kazaa objects are identified by
content (hashed key) and not by URL as is with the web, and thus a direct
comparison of the two is incorrect. For example, with this methodology, an
updated web page is considered the same but an edited video or audio track
is not (as claimed immutable), which will undermine its popularity. To
compare such two object types, one should equally identify web pages based
on the content. Surprisingly enough it may be found that neither web pages
follow a Zipf distribution and that they observe a fetch-at-most-once
behaviour too under this methodology.
A third weakness, is that in studying how patient file-sharing users are
partial requests were ignored. These requests may be an indication that
users may be impatient and as such choose to abort requests taking long.
To accurately determine this, upon the termination of a partial requests,
a probe should be made at all hosts that responded to the requesting
client with the given object to confirm that the abortion is not due to
unavailabiility of the object.
[1] Chris Anderson. The Long Tail. In Wired Magazine. October 2004.
Received on Mon Nov 28 2005 - 10:49:35 EST
This archive was generated by hypermail 2.2.0 : Mon Nov 28 2005 - 10:53:26 EST