CSC 2209 - Computer Networks - Course Project Web Page
Cooperative Browser Download Streams
Contents (reverse chronological order)
Project Final Write-up (December 13,
2006)
Project Final Presentation
(December 8, 2006)
CBDS Firefox Extension (December 6,
2006)
Simulator (December 6, 2006)
5 Minute Project Presentation
(November 14, 2006)
Midterm Progress Report
(November 10, 2006)
Source and Binaries for
Download (November 10, 2006)
Project Proposal (October
10, 2006)
NOTES:
(Nov 10, 2006) This project originally started out being called
"Cooperative Browser
Caches". This was a misnomer. The goal is to only share the content
that the user is downloading while it is being downloaded, and not
share the user's cache in general. The project has accordingly been
renamed to "Cooperative Browser Download Streams". The project proposal
still refers to the original title, but all other content reflects the
new name.
Project Final Write-up
(December 13, 2006)
The final write-up for the project is here: final_writeup.pdf
Project Final Presentation
(December 8, 2006)
The final presentation for the project is here: 2209_final_pres.pdf
CBDS Firefox Extension
(December 6, 2006)
The Firefox extension containing the Cooperative Browser Download
Streams (CBDS) implementation is here: cbds.xpi
The extension is just a wrapper that bootstraps a piece of java code
that acts as a proxy for Firefox. The extension part of the
implementation is based on the Java Firefox
Extension.
IMPORTANT:
- The extension was tested using Firefox 1.5.0.8. YMMV with any
other versions of Firefox.
- The default version of java on your system must be 1.5. If it
isn't the default, then you must have configured things so that Firefox
is using the 1.5 version. The extension was tested using Sun Java
1.5.0_09.
- The extension modifies the proxy configuration in Firefox when
it's installed. However, it does not revert those changes when it's
uninstalled. When you uninstall, you have to change the proxy
configuration information back to its original state manually. (Do this
by going to Edit->Preferences->General->Connection Settings
within Firefox.)
Simulator (December 6, 2006)
In order to demonstrate that the Cooperative Browser Download Streams
(CBDS) solution truly saves server bandwidth, a number of simulations
were run. These were done using a trace containing a list of 1640179
Kazaa requests made by over 24K clients during a period of 200 days.
Prior to
running the simulation on the trace, the trace was pruned so that it
only contained requests for files larger than 100M bytes. The rationale
for this is that the CBDS solution is intended for sharing large files.
The data in the trace file consists of a number of entries, where each
line represents the start of a download event. Each line in the file
has the following format
<timestamp> <client_id>
<null> <file_id> <file_size>
where
<timestamp> is the time in seconds
representing the start time of the file download.
<client_id> uniquely identifies the
client
<null> is a dummy entry
<file_id> along with a file_size
uniquely identifies a file
<file_size> is the size of the file
being downloaded
The Simulator works as follows. It maintains a priority queue
containing events to be processed. The priority queue is ordered based
on the time of the event. In order to populate events into the queue,
the simulation has to start by reading entries from the trace file. The
main loop of the simulation basically reads an entry from the trace
file, notes the time associated with that entry, and processes all
events in the event queue up until the time of the current entry read
from the file. It then adds the entry read from the file into the event
queue (as a START_DOWNLOAD event -- see below for details), reads the
next entry from the file, and continues processing events in the queue
until the time associated with the latest entry read.
Besides the event queue, the Simulator also maintains a mapping of the
clients actively downloading a given file.
There are two types of events that the event queue stores:
START_DOWNLOAD events, and END_DOWNLOAD events. Whenever a
START_DOWNLOAD event occurs, the simulator checks to see if there's
anyone downloading the file. If there is, then one of the clients
downloading the file is picked at random to be the peer from which the
file is downloaded. If there is no such peer, then the
START_DOWNLOAD event results in the current client going to the origin
server to download the file. In either case, an entry is added to the
mapping of clients actively downloading the file.
END_DOWNLOAD events are inserted into the queue when START_DOWNLOAD
events are processed. The time associated with the END_DOWNLOAD event
depends on from where the download is being served. If the client is
downloading the file from the origin server, then the event is added
with a timestamp associated with the time when the download will be
finished. This time is calculated based on a transfer rate that the
user specifies when running the simulation, and the size of the file,
which is known from the trace data and is also stored as part of the
event in the event queue. If the file is being
downloaded from a peer client, then the time stamp of the END_DOWNLOAD
event is the time at which the peer client will be finished
downloading. If this time isn't long enough for the full file to be
downloaded, a new START_EVENT is added to the queue (with timestamp
equal to the END_EVENT just added), and a new file size representing
the remainder of the file that has to be downloaded. When this new
START_EVENT is processed, a new peer client can be found from which the
rest of the file can be downloaded, or, if no such client can be found,
the remainder of bytes can be obtained from the origin server. In this
sense, the Simulator does "aggressive sharing", since it always tries
to use a peer client for downloading the file, and only obtains the
file from the origin server as a last resort.
The Simulator maintains a running total of the number of bytes
transferred between all nodes (including the origin server), and a
running total of the bytes transferred between peer nodes (i.e., the
number of bytes shared). Periodically it prints out the fraction of
bytes shared over the total bytes processed. The
Simulator also maintains for each client the maximum number of peer
clients it is uploading to concurrently. At the end of the simulation,
the cumulative fraction of peers having to concurrently upload to x other peers is calculated, where
x ranges from 0 to the maximum
number of concurrent uploads that any client has to perform.
The simulation was run using transfer rates of: 1 KB/sec, 5KB/sec, 10
KB/sec, 50KB/sec, 100KB/sec, 500KB/sec.
Here is a copy of the pruned trace file used: trace.zip
Here is a copy of the source code for the Simulator: Simulator.java
See the final presentation, and write-up for the resulting data.
5 Minute Project Presentation
(November 14, 2006)
The 5 minute project presentation is available in PDF format here.
Midterm Progress
Report (November 10, 2006)
The midterm progress report is available in PDF format here.
Source and Binaries
for Download (November 10, 2006)
The CBDS implementation as described in the Midterm Progress Report
(above) is contained in cbds.jar.
The jar file comes with the source code. Please note that the source
code is presently in a slightly "hackish" state, but this will be
cleaned
up prior to the final submission at the end of the term.
In order to use CBDS, you need a copy of Apache XML-RPC version 3. This
can be obtained from the Apache
XML-RPC homepage.
CBDS is a Java proxy that currently runs as a separate process. It will
be packaged in an XPI (Firefox extension) file, as described in the
midterm report, for the final submission. I haven't packaged it as an
extension yet since it is easier to debug when run as a separate
process. To run the proxy, you can edit the following shell script to
suit your needs: cbds.sh.
Basically, point the XMLRPC variable to where your copy of
xmlrpc-3.0/lib is located, and change the system properties if needed.
The system property CBProxyPort is the port number that the proxy will
be listening on for connections from the browser, and CBServerPort is
the port that the the proxy will be listening on for connections from
remote browsers.
After you've started the CBDS Java proxy, launch Firefox and go to
Edit->Preferences and click the Connections button. Change your
settings to Manual HTTP Proxy with hostname "localhost" and port number
"10006" (or whatever you changed CBProxyPort to).
CBDS was developed using: Firefox 1.5.0.7 and Sun Java 1.5.0_09.
Project Proposal
(October 10, 2006)
What is the problem I am solving?
When a user is downloading a large amount of content
from a server, he would like to increase the speed of his download. At
the same time, servers that provide content that is in high demand
would like to reduce the load that they experience. The goal is to
solve these two problems with one solution: cooperative browser caches.
By having users share their caches with other users, they can download
content from machines that are closer in proximity to them, potentially
increasing the speed of their download, and at the same time reduce the
load on the server by obtaining the same content from somewhere else.
Why is the problem interesting?
It is becoming more and more difficult for servers
that provide content that ranks as the most popular on the Internet to
serve that content. Many servers can't keep up with the load, and need
to use content distribution networks. Companies who want to serve this
content are forced to invest more and more money to keep up with the
popularity of their websites.
Companies would be interested in solutions to this
problem if that solution did not require any capital investment on
their part. And this is precisely one of the main selling features of
cooperative browser caches: it does not require companies to buy more
servers or pay for participating in a content distribution network. The
onus is completely on the user for helping share the content that the
server is providing.
Why is the problem hard?
In order for a cooperative caching scheme to work, a
user needs incentive to participate in it. Without incentive, the user
would have no reason for donating his bandwidth. How is such incentive
provided? One possible suggestion is to have a server refuse to provide
a client content unless the client is willing to participate in the
cooperative caching scheme. In the case of my implementation, this
would require the server being aware that the client has the firefox
extension installed.
Even if there is incentive, there also needs to be
ease of use. We need to be able to make participating in the
cooperative caching scheme a transparent experience. If the user has to
install several different pieces of software, run processes separate
from his browser, or spend time changing his configuration, he may not
want to bother. In that case he will go elsewhere for the content he
was seeking, or abandon trying to download it all together.
If a user chooses to participate in a cooperative
caching scheme, he will be essentially advertising to other users the
content that he downloaded. This could be seen as an invasion of
privacy; he may want to download the content, which requires him to
participate in the cooperative caching scheme, but at the same time he
may not want others to know that he has downloaded this content. How
can a user's privacy be maintained while at the same time allowing him
to participate in the scheme?
How am I planning to solve the problem?
My goal is to implement a Firefox
extension that allows a user to share his cache. This will require
intercepting incoming and outgoing HTTP from within the extension, and
also serving content from the extension. Content will be "shared" only
for the time during which the user is downloading content; the user's
cache will not become open indefinitely. Once a user has finished
downloading the content, any server sockets opened will be closed, and
anyone downloading from the user will be required to finish his
download elsewhere (possibly from the origin server).
OpenDHT will
be used to maintain a mapping between content (e.g. URIs) and nodes
that are participating the cooperative caching scheme. When a client
makes a request for a web page, it will first look up the URI in
OpenDHT to see if there are any other nodes currently serving the
content. If there is, the request will be redirected to one of those
nodes. Otherwise, the request will go to the origin server. In either
case, the current client will add a mapping to OpenDHT between the URI
its requesting and itself so that other clients are aware that the
current client is willing to share the data.
What is the related work?
Extensive work has been done previously on
cooperative web proxy caching. However, Wolman et al in [1]
demonstrated that cooperative proxy caching has only a minor benefit
within limited population bounds. In [2] Iyer et al examined Squirrel,
a decentralized peer-to-peer web cache. This work is very similar to
the work that we're proposing to do here. There is also similar work
being done in CoopNet [3]. CoopNet examines the benefits that using
peer-to-peer communication can have on
improving the performance of client - server applications in a network.
References
[1] Alec Wolman, Geoffrey M. Voelker, Nitin Sharma, Neal Cardwell, Anna
Karlin, and Henry M. Levy. "On the scale and performance of cooperative
Web proxy caching", In Proceedings of the 17th ACM Symposium on
Operating Systems
Principles (SOSP '99), pages 16-31, Kiawah Island Resort, SC,
USA, December, 1999.
[2] S. Iyer, A. Rowstron and P.
Druschel,
"SQUIRREL: A decentralized, peer-to-peer web cache",
appeared in Principles of
Distributed Computing (PODC 2002), Monterey, CA.
[3] CoopNet: Cooperative Networking. http://research.microsoft.com/projects/coopnet/
(Link to Course Homepage: CSC2209)