Summary: An Analysis of Internet Content Delivery Systems

From: Andrew Miklas <agmiklas_at_cs.toronto.edu>
Date: Tue, 21 Nov 2006 01:36:05 -0500

This paper describes a measurement study that sought to understand how
Internet traffic was changing ~ 2002. In particular, the authors wanted to
compare Content Delivery Networks and Peer-to-Peer systems (relatively new
network designs) against conventional WWW traffic.

The key conclusions centred around P2P. The paper claimed that P2P traffic
dominates WWW traffic by a factor of 3. This is especially interesting
because there were many more WWW users in the trace than P2P. The result of
this is that a small number of users are consuming a large portion of the
network bandwidth. In fact, the paper makes this conclusion clear: the top
200 Kazaa users consume 20% of the total HTTP bandwidth.

Also interesting is how much of the upload channel is spent fulfilling P2P
requests. Figure 10b elaborates shows that 400 Kazaa nodes use 70% of the
total upload bandwidth. Since uploads are usually billed higher than
downloads, the university could be losing a lot of money transferring content
(ie. MP3s, movies, etc.) that the external clients could just as easily get
elsewhere. From a strictly economic point of view, this seems like it would
be a very high cost with practically no gain for the university.

The discussion of P2P caches was interesting, although there was only enough
data to discuss a "reverse cache". I'm not sure how well an idea like this
would work in practise, since it would involve the university paying to
co-locate a machine at their ISP for the express purpose of caching and
serving popular P2P content to the outside world. I'd imagine that the RIAA
would probably have a few things to say about this...

However, the data does suggest that there is something "unfair" about how P2P
systems select their peers. It seems that P2P servers on well-provisioned
networks will be overburdened by clients. Unfortunately for universities,
their networks are probably better provisioned per node compared with
cable/DSL. Instead of developing a reverse cache, simply throttling the P2P
traffic to each on-campus client would probably work just as well. External
clients would be encouraged to get their content from somewhere else, rather
than relying on the university's bandwidth generosity. Another idea might be
to create "fair" P2P algorithms where each server favours serving clients
from many different networks, rather than many clients from the same foreign
network (since they could just as easily exchange the content with each
other).

The amount of data used to generate the graphs seems very impressive. In
particular, the bandwidth vs. time graphs show a lot of interesting
daily/weekly patterns, and it is interesting to imagine how these would
relate back to people's daily routines. (Ex. the fast ramp up and slow decay
for most bandwidth graphs, the comparatively diffuse graph for weekend
traffic, the way that P2P traffic is a few hours out of phase with WWW, the
fact that WWW+Akamai requests seem to drop off at 6, but then pick up a bit
before dropping off for the night at around 10:00)

I found the Gnutella graphs somewhat confusing. They had very different
shapes from the Kazaa ones, but this seems mostly to be simply because the
Gnutella network was being far less used. Leaving the graphs in seemed to
suggest at first glance that there was some sort of big difference between
Kazaa and Gnutella in terms of design that was having a large effect on the
metrics.
Received on Tue Nov 21 2006 - 01:36:10 EST

This archive was generated by hypermail 2.2.0 : Tue Nov 21 2006 - 02:20:20 EST