News
Jan 14, 2014 - Welcome!
Overview
This course examines the design, implementation and analysis of selected aspects of operating systems with a focus on networked systems. It covers topics such as: resource naming and discovery, scheduling and load balancing; fault-tolerance, availability, and persistence; distributed communication models; and storage. We will explore these topics in the context of a variety of distributed system designs including grid, cloud, peer-to-peer and cluster systems. This is a seminar-style course based on occasional lectures, paper presentations by students, and discussions of readings. The focus is on the principles used in the design of networked systems and algorithms and data structures used in their implementation. Readings include case studies, seminal papers, and recent conference and journal articles.
General Information
Meeting Time/Place: | Wednesdays 1-3pm, BA 5205 |
---|---|
Instructor: | Angela Demke Brown |
Office: | BA 5228 |
Email: | middle-name -at- cs.toronto.edu |
Phone: | 416-946-8080 |
Grading
- 50% project
- 20% paper summaries
- 20% paper presentations
- 10% class discussion
Prereqs
Members of this class are expected to have taken an operating systems course equivalent to UofT's CSC 369 and achieved a grade of A or better. This includes familiarity as a user with an interactive operating system (e.g., Unix) and solid understanding of basic concepts in the design and implementation of operating systems. Students without this prerequisite knowledge are likely to struggle.
Some knowledge of advanced OS topics such as are covered in UofT's CSC 469/CSC 2208 is also assumed. In particular, students should be familiar with common OS structuring techniques (e.g. monolithic vs microkernels and virtual machine basics), performance evaluation strategies, synchronization, and basic aspects of distributed systems (e.g. timing and failure models, consensus, vector clocks, replicated state machines, etc.).
Components
- Critical study and discussion of recent literature in each of the core topic areas. This will include brief (25-30 minute) presentation of research papers by students and involvement in discussions.
- Summaries of papers.
- A term project that involves designing, constructing, and evaluating an interesting software system related to the problems and techniques discussed in class.
Topics planned
This course will be a broad survey of Networked Systems research, rather than an in-depth study of a particular sub-area. The exact list of topics is still evolving, but will be drawn from the following list.
- Communication Models
- Consensus
- Resource naming and discovery
- Scheduling and Load Balancing
- Fault Tolerance and Reliability
- Distributed systems monitoring and analysis
- Mobility and Energy Management
- Key-value stores
- In-memory computing (e.g. RAMCloud, FaRM)
- Distributed Storage
- Frameworks for data processing (e.g. MapReduce, Naiad, Spark)
- Case studies of specific systems
Books
There is no assigned textbook. However, there will be a variety of readings that will be available on the web page. The following list of books are available at the Engineering and Computer Science library, or through the library's e-resources. They may be useful for background reading and deeper study:
- Tanenbaum, Modern Operating Systems, 2nd ed. (Background), QA76.76 .O63 T359 2001
- Saltzer & Kaashoek, Principles of Computer System Design, 2009
- Coulouris, Dollimore & Kindberg, Distributed Systems: Concepts and Design, 4th ed., QA76.9 .D5 C68 2005X
- Lynch, Distributed Algorithms, QA 76.9.D5L96 1996
- Lynch et al., Atomic Transactions, QA 76.545.A86 1994
- Bernstein, Hadzilacos & Goodman, Concurrency Control and Recovery in Database Systems, QA 76.9.D3 B48 1987
- Silberschatz, Korth & Sudershan, Database System Concepts, 4th ed. QA76.9 .D3 K67 2002
- Casevant & Singhal, Readings in Distributed Computing Systems, QA 76.9.D5C35 1994
- Ananda & Srinivasan, Distributed Computing Systems: Concepts and Structures QA 76.9.D5D526 1991
- Mullender, Distributed Systems QA 76.9.D5D5937 1989
- Filman & Friedman, Coordinated Computing: Tools and Techniques for Distributed Software, QA 76.9 D5F55 1984
- Ceri & Pelagatti, Distributed Databases: Principles and Systems, QA 76.9.D3C386 1984X
- Andrews, Concurrent Programming: Principles and Practice. QA 76.642 A53 1991
- Jain, The Art of Computer Systems Performance Analysis, QA 76.9 E94J32 1991
- Schneier, Secrets and lies: digital security in a networked world, QA 76.9.A25S352 2000X
- Gray & Reuter, Transaction processing: concepts and techniques, QA 76.545.G73 1993
Conferences
These conferences are the major arenas for the publication of new ideas in computer systems research:
- SOSP - Symposium on Operating Systems Principles
- OSDI - Symposium on Operating Systems Design and Implementation
- NSDI - Network Systems Design and Implementation
- FAST - Conference on File and Storage Technologies
- IEEE S&P - Security & Privacy
- Usenix Security
- ACM CCS - Computer and Communications Security
- ASPLOS - Architectural Support for Programming Languages and Operating Systems
- SIGCOMM - Computer Communication
- SIGMETRICS - Computer/communication system performance
- USENIX Annual Technical Conference
- EuroSys
- ISCA - International Symposium on Computer Architecture
- HotOS - Hot Topics in Operating Systems Workshop
Final Note
* Everything here is subject to change.