Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload
Review
Review By: Troy Ronda
Bandwidth consumed due to peer-to-peer workloads has been quickly
increasing. At the time of this article, peer-to-peer file sharing
consumed 43% of the UofW bandwidth compared with 14% for WWW traffic.
Hence, it is a dominant factor in todays Internet. File sharing is used
mainly for multimedia workloads. There are several difference between
multimedia workloads and normal WWW workloads. Multimedia is typically
large, in the order of several megabytes or several gigabytes. A
distributed piece of multimedia is also immutable. WWW, in comparison, is
small (order of several kilobytes) and an individual page is subject to
change. This paper examines the properties of the multimedia file-sharing
system, explores the driving forces of P2P workloads and demonstrates
untapped locality. This is done through an extensive trace of Kazaa
traffic at the University of Washington. It traced downloads from peers
outside UofW to internal UofW users. Kazaa users are patient, whereas web
users are not. This implies that the web is an interactive system while
peer-to-peer is a batch system. Clients consume fewer bytes as the age
while new clients generate most of the load. Older clients, however, will
interact with the system at a constant rate. Kazaa is a blend of
workloads: 91% of requests are for objects smaller than 10MB but 65% of
bytes transferred are due to the larger files. 94% of the time a Kazaa
client requests a given object at most once. The most popular objects in
Kazaa are replaced by newly born objects. 72% of large-object requests,
however, are for old objects and 52% of small-object requests are for old
objects. There is substantial non-uniformity in Kazaa popularity implying
that caching could save bandwidth. The primary dynamic of web objects is
updates while P2P is new object arrivals. Kazaa does not have a Zipf
popularity distribution. Fetch-at-most-once behavior is a key contributer
to this non-Zipf behavior. New objects counter-balance fetch-at-most-once
to help give the workload locality. Web workloads in comparison do display
Zipf behavior but when fed through a shared proxy cache, it appears
non-Zipf, similar to the Kazaa behavior. The authors then give a model
that simulates this behavior. An ideal, centralized proxy cache would save
UofW 86% in external bandwidth. A transparent request redirector could be
implemented to help curb P2P bandwidth usage.
This paper is a great relief to me. The difference in writing, graph
readability and general interest over several of the last papers is large.
Yes, I enjoyed it. The strength of this paper is the empirical study of
the Kazaa workload at UofW. I learned several interesting properties of a
typical P2P workload that I had not previously known. The Zipf comparison
to the WWW workload is particularly interesting. The authors demonstrated
that a fetch-at-most-once web work-load can look very similar to the Kazaa
work-load. I also found the discussion about immutable multimedia loads to
be interesting. In general, the comparison between web workloads and Kazaa
workloads is another strength. I suppose that building a model that can
generate your traces is another strength, as it show that you understand
the properties of the trace. It is also good to have a potential use of
your data at the end of the paper, for example, the P2P request
redirector. I found each graph to be useful in some way and they were all
immediately accessible to me.
I cannot think of many bones to pick right now. I found the paper to be a
bit longer than necessary. I am not convinced that including the model was
as important a contribution as the presentation and analysis of the
empirical data. This is likely my own personal preference, however.
Another issue that has been bugging me throughout the paper is the
immutable description. It just doesnt sit right to call web-site changes
mutable and call video files immutable. It seems like the same thing, in a
way, we are adding stuff to a content site or we are adding stuff to the
P2P system. I know the difference but it still seems very similar. Perhaps
it is the added stuff that is really important in both cases. This would
be an interesting comparison. I believe that the bandwidth savings for
organizations would be worthwhile. It, hence, would have been good to have
more discussion on building the P2P redirector and showing some real
results from it. The problem that we can potentially face, due to
bandwidth curbing, is sneaky P2P that attempts to evade detection. With a
redirector, P2P will have no need to attempt to evade detection, in fact
it would be detrimental. I am not sure how results from one university
translate into the general case. My intuition says that P2P workloads from
a large campus probably translate generally, but I have no hard evidence.
Received on Mon Nov 28 2005 - 10:29:01 EST
This archive was generated by hypermail 2.2.0 : Mon Nov 28 2005 - 10:38:12 EST