When designing computer systems it is essential to be able to quantify the
performance and reliability impacts of different design choices. For
example, should we invest in more buffer space or a faster processor? How
many data replicas are necessary to meet reliability requirements? How
should we schedule requests across servers?
The goal of this course is to provide students in computer systems with
the necessary tools and techniques for experimental design, measurement
and simulation of computer systems.
Topics include for example:
-
How to do back-of-the envelope calculations for system evaluation using
operational Laws:
Little's Law, response-time law, asymptotic bounds, modification analysis,
performance metrics;
-
How to design and perform meaningful simulations and experiments:
open versus closed systems, confidence
intervals, generating workloads for simulation, inspection Paradox;
-
Empirical Workload and Failure Measurements:
heavy-tailed property, Pareto distributions, self-similarity, heavy-tailed
distributions;
- Impact of workload on system performance;
- Properties of failure processes and their effect on system reliability;
- Highlights from recent and classical papers in the area of computer
system performance and reliability.
GENERAL INFORMATION
Meeting Time/Place: Thursday 1-3pm
Contacts
PROJECT
The basic goal of the project component is for class members to gain research experience by designing and exploring an interesting system problem. The system problem should explore issues, solve problems or exploit techniques from classroom discussions or papers. The expectation is to have a workshop quality paper by the end of the term. Students are expected to work in pairs of two on the project.
You are encouraged to propose your own project idea, and we will provide various project topic ideas (to help you brainstorm). It is fine for your project to span areas, combining system issues (this class) and others like machine learning, HCI and theory. However, there must obviously be a significant CSC-2232-related component, and all project plans must be explicitly okay'd by the instructor. It is also fine for your project to serve some external purpose (e.g., contributing to your research agenda), but there must be concrete planning and completion steps.
For a list of project ideas check here.
There will be five project milestones (the due dates might be subject to change):
- Sept 30 -- Project proposal. This is an informal write-up (two to three paragraphs) describing your main idea for the course project.
- Oct 7 -- Related work. This is a summary of previous work that is related to your project. The goal is *not* to give a laundry list of all prior work. Rather the point is to summarize previous and how it relates to your proposed work (in particular, how is your work different.)
- Oct 28 -- Status update I. A document summarizing your accomplishments so far and any roadblocks you might have encountered. The document should have the form of a draft of the final paper, i.e. it should have an introduction, related work and preliminary results/progress.
- Nov 20 -- Status update II. Same as status update I with more results.
- Dec 20 -- Final deadline. Time to hand in the complete write-up of your project's results.
BOOKS
Some textbooks that might be useful
- S. M. Ross, Introduction to Probability Models, Any Edition, Academic Press, 1997.
- R. Jain, The Art of Computer Systems Performance Analysis: Techniques for
Experimental Design, Measurement, Simulation, and Modeling, John Wiley & Sons, 1991.
READING LIST
Warmup: Open and closed systems
- Prasad 07:
Ravi Prasad, Constantinos Dovrolis, "Measuring the Congestion Responsiveness of Internet Traffic", PAM 2007
Presenter: Nosayba
- Not required reading, but also of interest might be the following two papers:
Prasad 08:
Ravi Prasad, Constantinos Dovrolis, "Beyond the Model of Persistent TCP Flows: Open-Loop vs Closed-Loop Arrivals of Non-Persistent Flows", 41st Annual Simulation Symposium 2008
Schroeder 06: Bianca Schroeder, Adam Wierman and Mor Harchol-Balter. "Open vs closed: a cautionary tale", NSDI 2006.
TOPIC 1: Operational laws in Action
- Thereska 06:
Eno Thereska, Michael Abd-el-malek, Jay J. Wylie, Dushyanth Narayanan, Gregory R. Ganger, "Informed data distribution selection in a self-predicting storage system", ICAC 2006
Presenter: Jack
- Thereska 08:
Eno Thereska, Gregory R. Ganger, "Ironmodel: robust performance models in the wild", SIGMETRICS 2008
Presenter: Mohamed
- Urgaonkar 05:
Bhuvan Urgaonkar, Giovanni Pacifici, Prashant Shenoy, Mike Spreitzer, and Asser Tantawi, "An Analytical Model for Multi-tier Internet Services and its Applications", SIGMETRICS 2005
Presenter: Tamer
TOPIC 2: Exploiting distribution knowledge
- Harchol 96:
"Exploiting Process Lifetime Distributions for Dynamic Load Balancing", SIGMETRICS 96 / TOCS 97. The
extended journal version can be found here.
Presenter: Ioan
- Shaikh 99:
Anees Shaikh, Jennifer Rexford, Kang G. Shin, "Load-sensitive routing of long-lived IP flows", SIGCOMM 1999
Presenter: Mohamed
- Heath 02:
Taliver Heath , Richard P. Martin , Thu D. Nguyen,
"Improving cluster availability using workstation validation", SIGMETRICS 02.
Presenter: James
TOPIC 3: Power laws and the internet
- Faloutsos 99:
Faloutsos, Faloutsos, Faloutsos, "On power-law relationships of the Internet topology", SIGCOMM 99
Presenter: Nosayba
- Li 04:
Li, Alderson, Willinger, Doyle, "A First-Principles Approach to Understanding the Internet's Router-level Topology", SIGCOMM 04
Presenter: James
TOPIC 4: Markov Chains to the Rescue
- Schwarz 2004:
T. J. E. Schwarz, Q. Xin, E. L. Miller, D. D. E. Long, A. Hospodor, S. W. Ng, "Disk Scrubbing in Large Archival Storage Systems", MASCOTS 2004
Presenter: TBD
- Rao 06: K. .K. Rao, J.L. Hafner, R.A. Golding, "Reliability for Networked Storage Nodes", DSN 06
Presenter: Ioan
- Ford 10:
Daniel Ford, Francois Labelle, Florentina Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan,
"Availability in Globally Distributed Storage Systems", OSDI 2010.
Presenter: Jack
CAVEAT
* Everything here is subject to change.
^TOP