Review: An Analysis of Internet Content Delivery Systems

From: Fareha Shafique <fareha_at_eecg.toronto.edu>
Date: Tue, 21 Nov 2006 00:07:29 -0500

This paper examines content delivery by focusing on 4 content-delivery
systems: HTTP web traffic, Akamai content delivery network, and the
Kazaa and Gnutella p2p file sharing networks. All incoming and outgoing
traffic at the University of Washington was passively monitored for 9
days to collect traces if traffic flowing between the university and the
rest of the Internet. It was classified into one of the following,
according to port numbers:
1. HTTP web traffic
2. HTTP Akamai traffic
3. HTTP Gnutella and Kazaa traffic (p2p), which consists of the file
transfers
4. non-HTTP TCP traffic including protocols such as NNTP, SMTP and HTTP
traffic to ports not used by the above (maybe other p2p systems)

The authors begin their data analysis by providing a high-level summary
statistics of the gathered traces. This summary shows:
1.The smallest bandwidth consumer of TCP traffic is Akamai, followed by
Gnutella, then WWW consuming about 14.3% of TCP traffic and finally
Kazaa is the largest contributor consuming about 36.9% of TCP bytes.
2. WWW and Kazaa have diurnal cyclyes but WWW peaks in the middle of the
day whereas Kazaa peaks late at night.
3.The incoming WWW and Kazaa traffic (in response to UW-clients) is on
the same order of magnitude, but the Kazaa response to external clients
dominates WWW requests by a factor of three.
4. GIF and JPEG images account for 42% of requests but only 16.3% of
transferred bytes. While AVI and MPG videos account for only 0.41% of
requests but make up 29.3% of bytes transferred. When comparing these
figures to a previous 1999 study conducted by the authors, they see that
HTML traffic has decreased 43%, GIF/JPG increased 59%, AVI/MPG increased
400% and MP3s increased 300%!

The paper then provides a detailed study of the content delivery
characteristics to show:
1. The properties of OBJECTS being delivered:
    - Akamai and WWW have a median object size of about 2KB, while this
is 4MB for p2p systems --> approximately a thousand fold increase.
    - The highest-bandwidth consuming objects for Akamai and WWW are
small object for which there was a large number of requests (or large
objects with few requests). On the other hand, for Kazaa these objects
were about 7000MB in size and accessed only a few times.
    - UW observes more outbound Kazaa traffic than inbound and hence can
benefit from reverse caching
    - The highest component of WWW traffic is text (and then images),
Akamai is images, Kazaa is almost completely dominated by video (80%)
and then some audio (Gnutella is more evenely split between audio and
video).

2. How CLIENTS use the new content delivery mechanisms:
    - In both WWW and Akamai, a small number of clients account for a
large portion of traffic. The case is similar for Kazaa, not only do a
few users account for most of Kazaa's traffic but also the top 200 users
account for about 20% of the total HTTP bytes downloaded.
    - Kazaa has a request rate about 2 orders of magnitude lower that
WWW, but a median object size 3x larger results in overall consumption
of more bandwidth.
    - WWW and Akamai requests are short (~120ms) while Kazaa's are
longer (~130seconds), therefore, fewer requests still leads to more
concurrent Kazaa transactions as compared to concurrent WWW+Akamai
transcations.

3. How SERVERS for new delivery services differ from those for the web:
    - Majority of WWW requests are served by a few servers. And 80% of
Kazaa bytes are being served by the top 334 peers (of 3888). The authors
expected p2p systems like Kazaa to better distribute work for
scalability and availability and hence expect the servr load to be much
more widely distributed among peers that for WWW. The resutls were
similar for internal UW and external servers.
    - Once again, Kazaa traffic is about 50% of total HTTP traffic
    - P2P systems often fail on requests (>80%), whereas WWW and Akamai
fail much less (<30%).

The feels that p2p systems are not scalable since every peer is a server
and client, hence consuming bandwidth in both directions. This leads to
the bandwidth cost of a Kazaa user being about 90x that of a web client.
The authors conclude that p2p traffic accounts for 75% of HTTP traffic
not because of widespread use but because of large object sizes
transferred, which also means few users use a disproportionately large
fraction of bandwidth.

Finally the paper describes the potential role of chaching in CDNs and
p2p sytems showing:
1. Akamai requests are more skewed towards popular documents than WWW,
and hence a cache can achieve an 88% ideal hit rate and 50% practical
hit rate (compared to the 77% and 36% of WWW). Therefore, a local web
proxy cache can achieve nearly the same hit rate as an Akamai replica
2. A small number of large objects are the larges contributors to p2p2
and hence caching can have a large effect of wide-scale p2p systems,
potentially reducing wide-area bandwidth demands dramatically.

The paper is well written, starting with a clear, brief description of
each content delivery system studied, and then provides the results
clearly. The summary at the end of the sections is very helpful. The
study seems to be too directed toward only the University's traffic and
also in most results the authors do not discuss Gnutella although they
do provide the results in the figures, and the results are very
different from Kazaa even though they both are p2p systems. The authors
do not discuss this anywhere in the paper.
Received on Tue Nov 21 2006 - 00:07:37 EST

This archive was generated by hypermail 2.2.0 : Tue Nov 21 2006 - 00:45:16 EST