When designing computer systems it is essential to be able to quantify the
performance and reliability impacts of different design choices. For
example, should we invest in more buffer space or a faster processor? How
many data replicas are necessary to meet reliability requirements? How
should we schedule requests across servers?
The goal of this course is to provide students in computer systems with
the necessary tools and techniques for experimental design, measurement
and simulation of computer systems.
Topics include for example:
-
How to do back-of-the envelope calculations for system evaluation using
operational Laws:
Little's Law, response-time law, asymptotic bounds, modification analysis,
performance metrics;
-
How to design and perform meaningful simulations and experiments:
open versus closed systems, confidence
intervals, generating workloads for simulation, inspection Paradox;
-
Empirical Workload and Failure Measurements:
heavy-tailed property, Pareto distributions, self-similarity, heavy-tailed
distributions;
- Impact of workload on system performance;
- Properties of failure processes and their effect on system reliability;
- Highlights from recent and classical papers in the area of computer
system performance and reliability.
GENERAL INFORMATION
Meeting Time/Place: Fridays 11am-1pm, starting Sept 18, 2009
Contacts
PROJECT
The basic goal of the project component is for class members to gain research experience by designing and exploring an interesting system problem. The system problem should explore issues, solve problems or exploit techniques from classroom discussions or papers.
You are encouraged to propose your own project idea, and we will provide various project topic ideas (to help you brainstorm). It is more than fine for your project to span areas, combining system issues (this class) and others like machine learning, HCI and theory. However, there must obviously be a significant CSC-2232-related component, and all project plans must be explicitly okay'd by the instructor. It is also fine for your project to serve some external purpose (e.g., contributing to your research agenda), but there must be concrete planning and completion steps.
For a list of project ideas check here.
Here are the project milestones:
- Oct 13 -- Project proposal. This is just an informal write-up (one or two paragraphs) describing your main idea for the course project.
- Nov 20 -- Mid-project milestone. This will be a 10-20min one-on-one meeting with me where you describe your progress so far, and any road-blocks you might have encountered.
- Dec 20 -- Final deadline. Time to either hand in a write-up of your project's results or arrange for a meeting with me and give me a presentation of the results.
BOOKS
Some textbooks that might be useful
- S. M. Ross, Introduction to Probability Models, Any Edition, Academic Press, 1997.
- R. Jain, The Art of Computer Systems Performance Analysis: Techniques for
Experimental Design, Measurement, Simulation, and Modeling, John Wiley & Sons, 1991.
READING LIST
Warmup: Open and closed systems
- Prasad 07:
Ravi Prasad, Constantinos Dovrolis, "Measuring the Congestion Responsiveness of Internet Traffic", PAM 2007
Presenter: Eric
- Not required reading, but also of interest might be the following two papers:
Prasad 08:
Ravi Prasad, Constantinos Dovrolis, "Beyond the Model of Persistent TCP Flows: Open-Loop vs Closed-Loop Arrivals of Non-Persistent Flows", 41st Annual Simulation Symposium 2008
Schroeder 06: Bianca Schroeder, Adam Wierman and Mor Harchol-Balter. "Open vs closed: a cautionary tale", NSDI 2006.
TOPIC 1: Operational laws in Action
- Thereska 06:
Eno Thereska, Michael Abd-el-malek, Jay J. Wylie, Dushyanth Narayanan, Gregory R. Ganger, "Informed data distribution selection in a self-predicting storage system", ICAC 2006
Presenter: Philip
- Thereska 08:
Eno Thereska, Gregory R. Ganger, "Ironmodel: robust performance models in the wild", SIGMETRICS 2008
Presenter: Svitlana
- Urgaonkar 05:
Bhuvan Urgaonkar, Giovanni Pacifici, Prashant Shenoy, Mike Spreitzer, and Asser Tantawi, "An Analytical Model for Multi-tier Internet Services and its Applications", SIGMETRICS 2005
Presenter: Michael
TOPIC 2: Exploiting distribution knowledge
- Harchol 96:
"Exploiting Process Lifetime Distributions for Dynamic Load Balancing", SIGMETRICS 96 / TOCS 97. The
extended journal version can be found here.
Presenter: Bogdan
- Shaikh 99:
Anees Shaikh, Jennifer Rexford, Kang G. Shin, "Load-sensitive routing of long-lived IP flows", SIGCOMM 1999
Presenter: Andy
TOPIC 3: Workloads in the wild
- Arlitt 96:
Arlitt, Williamson, "Web server workload characterization: the search for invariants", SIGMETRICS 96
Presenter: Nilton
TOPIC 4: Power laws and the internet
- Faloutsos 99:
Faloutsos, Faloutsos, Faloutsos, "On power-law relationships of the Internet topology", SIGCOMM 99
Presenter: Pouya
- Li 04:
Li, Alderson, Willinger, Doyle, "A First-Principles Approach to Understanding the Internet's Router-level Topology", SIGCOMM 04
Presenter: Afshar
TOPIC 5: Markov Chains to the Rescue
- Schwarz 2004:
T. J. E. Schwarz, Q. Xin, E. L. Miller, D. D. E. Long, A. Hospodor, S. W. Ng, "Disk Scrubbing in Large Archival Storage Systems", MASCOTS 2004
Presenter: Isaac
- Rao 06: K. .K. Rao, J.L. Hafner, R.A. Golding, "Reliability for Networked Storage Nodes", DSN 06
Presenter: Daniel
TOPIC 6: System failures in the real world
- Pinheiro 07:
E. Pinheiro, W.-D. Weber, and L. A. Barroso, "Failure Trends in a Large Disk Drive Population", FAST 07
Presenter: Ryan
- Bairavasundaram 07: L. N. Bairavasundaram, G. R. Goodson, S. Pasupathy, J. Schindler, "An Analysis of Latent Sector Errors in Disk Drives", SIGMETRICS 07
Presenter:George
CAVEAT
* Everything here is subject to change.
^TOP