Summary: Analysis of Internet Content Delivery Systems

From: Kiran Kumar Gollu <kkgollu_at_cs.toronto.edu>
Date: Tue, 21 Nov 2006 02:20:06 -0500

The paper presents a nice analysis of current Internet content delivery
systems focusing on four systems: HTTP web traffic, the Akamai content
delivery network, and the Kazaa and Gnutella peer-to-peer file sharing
systems. The goals of the authors are to characterize and compare
different Internet content delivery systems with a focus on the workloads
on latest peer to peer systems.

The authors use the network traffic trace data collected for nine days in
UW environment. The measurement methodology uses passive network
monitoring to collect traces of flowing inbound and outbound traffic from
UW. The software has the ability to classify the both HTTP traffic (web
and Akamai, Kaaza, Gnutella) and non-HTTP traffic (Kazaa and Gnutella) to
the four different traffic types mentioned above.

Key Observations:
1) High level characterization reveals that P2P traffic has overtaken
HTTP traffic and accounts for three quarter of the overall traffic.
2) Not surprisingly, TCP traffic represents 97% of the overall
network traffic. Kaaza is currently largest contributor of Internet
traffic, consuming ~37% of TCP bytes whereas web traffic accounts for only
14.3% of TCP traffic. Akamai consumed 0.2% traffic, Gnutella consumed 6%
traffic and the rest 43% was consumed by other TCP based protocols.
3) The median object size for P2P traffic is about 1000 times larger
than average web document size. Relatively small number of objects account
for large amount of traffic in peer networks. For e.g. top 1000 objects in
Kazaa objects are responsible for 50% of bytes transferred whereas top
thousand web object account for only 16% of the bytes transferred.
4) P2P requests transfers are long though request rates tend to be
quite low compared to web traffic. However, number of simultaneous P2P
requests open at an instant of time is about the twice the number of
simultaneous open web requests.
5) A small number of P2P clients disproportionately consume large
amount of bandwidth. 200 Kazaa clients are responsible for 50% of the
Kazaa bytes downloaded, and nearly 27% of all HTTP bytes downloaded.
6) Only 600 of 281,026 UW external Kazaa peers provided 26% of the
bytes received by UW clients/servers. This raises serious concerns about
scalability of P2P systems such as Kazaa and Gnutella.

The author goes on to analyze the role of caching in CDN and P2P systems.
Ideal cache and practical cache hit rates suggest that Akamai requests are
more skewed towards the most popular traffic than web traffic. Hence, a
widely deployed local proxy caches would significantly reduce the need for
a separate CDN.

Surprisingly, the measurement results suggest outbound traffic for P2P
systems is much higher than inbound traffic. Additionally, a small number
of large objects are the larges contributors to the outbound P2P cache
byte hit rate. So, placing a reverse cache would save lot of bandwidth.
The results indicate that cache would be effective for small population,
but even more effective for large populations.

The paper mentions that large amount of P2P traffic is downloaded as
fragments. But it doesn.t quantify portion of data downloaded traffic that
is fragmented in P2P networks. As acknowledged, the paper also fails to
characterize the role of caching in inbound P2P traffic because inbound
cache is not fully warmed even at the end of nine days. This could be
interesting future work.
Received on Tue Nov 21 2006 - 02:20:18 EST

This archive was generated by hypermail 2.2.0 : Tue Nov 21 2006 - 03:22:12 EST