CSC 2227S | Spring 2015 | Readings

Papers

For each meeting, readings are assigned. Usually, the readings will consist of two or three computer systems papers. The papers selected for this course are either classic papers or papers from recent top conferences. You are expected to read these papers thoroughly and submit a review BEFORE arriving at class on Wednesdays.

Each paper will be briefly presented by a student in the class, who will also lead the discussion of that paper. For each class meeting, we identify the topic and papers below; for each, we also try to identify good sources for background reading and for further investigation.

To enter your paper reviews, go here.

Electronic versions are available from the course review site.

(NOTE: This schedule is not set in stone. Some changes may be made to this schedule during the term)

Week 1 - January 14: Welcome to CSC 2227

This first meeting will be largely organizational in nature. In it, we will discuss how the class is going to work and what will be covered. In addition, we will very rapidly recap stuff you should already know, discussing what defines operating systems and distributed systems and what makes them continue to be interesting after all these years, and overviewing how the various topics in the course fit together.

There are no readings for this week. The following items are intended to help you refresh your memory of operating systems. These are to help you prepare for the course and assess your own knowledge of the pre-requisite material. Sample solutions to the OS exam questions will be posted next week.

First Lecture notes
OS Self Assessment (pdf)
Concurrency Self Assessment (pdf) (Skip 8-25, 32, 36, 40, 43-45, 52)

Week 2 - January 21: Historical Distributed Systems

presented by TBD

Read and review the following papers:

Grapevine: an exercise in distributed computing
Andrew D. Birrell, Roy Levin, Michael D. Schroeder, and Roger M. Needham. In Communications of the ACM, Vol. 25, No. 4, pp. 260-274, April 1982.
http://doi.acm.org/10.1145/358468.358487
A Comparison of Two Distributed Systems: Amoeba and Sprite
Fred Douglis, M. Frans Kaashoek, John K. Ousterhout, and Andrew S. Tanenbaum. In Computing Systems, Vol. 4, No. 3, pp. 353-384, December 1991.
https://www.usenix.org/legacy/publications/compsystems/1991/fall_douglis.pdf

Additional suggested reading

The LOCUS distributed operating system
Bruce Walker, Gerald Popek, Robert English, Charles Kline, and Greg Thiel. In Proceedings of the ninth ACM Symposium on Operating Systems Principles (SOSP '83), pp. 49-70, October 1983.
Experience with Grapevine: the growth of a distributed system.
Michael D. Schroeder, Andrew D. Birrell, and Roger M. Needham. 1984. In ACM Transactions on Computer Systems, Vol. 2, No. 1, pp. 3-23, February 1984.
http://doi.acm.org/10.1145/2080.2081
Distributed Operating Systems
Andrew S. Tanenbaum and Robbert Van Renesse. In ACM Computing Surveys, Vol. 17, No. 4, pp. 419-470, December 1985.
http://doi.acm.org/10.1145/6041.6074
The V distributed system
David Cheriton. In Communications of the ACM, Vol. 31, No. 3 pp. 314-333, March 1988.
http://doi.acm.org/10.1145/42392.42400

Week 3 - January 28: Consensus

Read and review the following papers:

Paxos Made Simple
Leslie Lamport. November 2001.
http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf
In Search of an Understandable Consensus Algorithm
Diego Ongaro and John Ousterhout. In Proceedings of the 2014 USENIX Annual Technical Conference, pp. 305-320, June 2014.
https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro

Additional suggested reading

There is more consensus in Egalitarian parliaments
Iulian Moraru, David G. Andersen, and Michael Kaminsky. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13), pp. 358-372, November 2013.
http://doi.acm.org/10.1145/2517349.2517350
Paxos Made Practical
David Mazieres. 2007.
http://www.scs.stanford.edu/~dm/home/papers/paxos.pdf
Viewstamped Replication Revisited
Barbara Liskov and James Cowling. MIT technical report MIT-CSAIL-TR-2012-021, July 2012.
http://pmg.csail.mit.edu/papers/vr-revisited.pdf

Week 4 - Feb. 4: Coordination Services

Read and review the following papers:

The Chubby Lock Service for Loosely-Coupled Distributed Systems
Mike Burrows. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI'06), pp. 335--350, November 2006.
https://www.usenix.org/legacy/event/osdi06/tech/burrows.html
ZooKeeper: Wait-free Coordination for Internet-scale Systems
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira and Benjamin Reed. In Proceedings of the 2010 USENIX Annual Technical Conference (ATC'10), pp. 145--158, June 2010.
https://www.usenix.org/legacy/events/atc10/tech/full_papers/Hunt.pdf

Additional suggested reading

Week 5 - Feb. 11: Distributed Hash Tables

Read and review the following papers:

Chord: A scalable peer-to-peer lookup service for internet applications.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. In Proceedings of the 2001 Conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM '01), pp. 149-160, August 2001.
http://doi.acm.org/10.1145/383059.383071
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems.
Antony I. T. Rowstron and Peter Druschel. In Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms (Middleware '01), pp. 329-350, November 2001.
http://research.microsoft.com/en-us/um/people/antr/PAST/pastry.pdf

Additional suggested reading

READING WEEK - Feb. 18: NO CLASS

Week 6 - Feb. 25: Key-Value Stores

Read and review the following papers:

Dynamo: Amazon’s Highly Available Key-value Store
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels. In Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP '07), pp. 205-220, October 2007.
http://doi.acm.org/10.1145/1294261.1294281
HyperDex: a distributed, searchable key-value store
Robert Escriva, Bernard Wong, and Emin Gün Sirer. In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, technologies, architectures, and protocols for computer communication (SIGCOMM '12), pp. 25--36, August 2012.
http://doi.acm.org/10.1145/2342356.2342360

Additional suggested reading

Week 7 - Mar 4: In-Memory Distributed Computing and Storage

Read and review the following papers:

Fast Crash Recovery in RAMCloud
Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP'11), pp. 29--41, October 2011.
http://doi.acm.org/10.1145/2043556.2043560
FaRM: Fast Remote Memory
Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI'14), pp. 401--414, April 2014.
https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-dragojevic.pdf

Additional suggested reading

MICA: A Holistic Approach to Fast In-Memory Key-Value Storage
Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI'14), pp. 429--444, April 2014.
https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-lim.pdf

Week 8 - Mar. 11: Distributed File Systems

Read and review the following papers:

Ceph: A Scalable, High-Performance Distributed File System
Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI'06), pp. 307--320, November 2006.
https://www.usenix.org/legacy/event/osdi06/tech/weil.html
GPFS: A Shared-Disk File System for Large Computing Clusters
Frank Schmuck and Roger Haskin. In Proceedings of the First USENIX Conference on File and Storage Technologies (FAST'02), pp. 231--244, January 2002
https://www.usenix.org/legacy/publications/library/proceedings/fast02/full_papers/schmuck/schmuck.pdf

Additional suggested reading

Scale and performance in a distributed file system
John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, and Michael J. West. In ACM Transactions on Computer Systems, Vol. 6, No. 1, pp. 51--81, February 1988.
http://doi.acm.org/10.1145/35037.35059
Frangipani: a scalable distributed file system
Chandramohan A. Thekkath, Timothy Mann, and Edward K. Lee. In Proceedings of the 16th ACM symposium on Operating Systems Principles (SOSP '97), pp. 224-237, October 1997.
http://doi.acm.org/10.1145/268998.266694
The Google file system
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP '03), pp. 29--43, October 2003.
http://doi.acm.org/10.1145/945445.945450

Week 9 - Mar. 20 (proposed rescheduled meeting): Programming Frameworks/Models

Read and review the following papers:

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys'07), pp. 59-72, March 2007. http://doi.acm.org/10.1145/1272996.1273005
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, pp. 15--28, April 2012. https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf

Additional suggested reading

MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat. In Proceedings of the 6th USENIX Symposium on Operating System Design and Implementation (OSDI'04), pp. 137--150, December 2004. San Francisco, CA, December, 2004.
https://www.usenix.org/legacy/events/osdi04/tech/dean.html
Naiad: A Timely Dataflow System
Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martin Abadi. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP'13), pp. 439--455, November 2013.
http://doi.acm.org/10.1145/2517349.2522738

Week 10 - Mar. 25: Scheduling and Load Balancing

Read and review the following papers:

Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. In Proceedings of the 5th European conference on Computer systems (EuroSys '10), pp. 265--278, 2010.
http://doi.acm.org/10.1145/1755913.1755940
Omega: flexible, scalable schedulers for large compute clusters
Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13), pp. 351--364, 2013.
http://doi.acm.org/10.1145/2465351.2465386

Additional suggested reading

Quincy: fair scheduling for distributed computing clusters
Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09), 261--276, October 2009.
http://doi.acm.org/10.1145/1629575.1629601
Sparrow: distributed, low latency scheduling
Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP '13), 69--84, November 2013.
http://doi.acm.org/10.1145/2517349.2522716
Apollo: scalable and coordinated scheduling for cloud-scale computing
Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation (OSDI'14), pp. 285--300, October 2014.
https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-boutin_0.pdf

Week 11 - Apr. 1: Distributed Performance Analysis & Debugging

Read and review the following papers:

DieCast: Testing Distributed Systems with an Accurate Scale Model
Diwaker Gupta, Kashi V. Vishwanath, and Amin Vahdat. In Proceedings of the 5th USENIX Symposium on Networked System Design and Implementation (NSDI'08), pp. 407–-421, April 2008.
https://www.usenix.org/legacy/event/nsdi08/tech/full_papers/gupta/gupta.pdf
The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services
Michael Chow, David Meisner, Jason Flinn, Daniel Peek, and Thomas F. Wenisch. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation (OSDI'14), pp. 217--231, October 2014.
https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chow.pdf

Additional suggested reading

Using Magpie for request extraction and workload modelling
Paul Barham, Austin Donnelly, Rebecca Isaacs and Richard Mortier. In Proceedings of the 6th USENIX Symposium on Operating System Design and Implementation (OSDI'04), pp. 259–-272, December 2004.
https://www.usenix.org/legacy/event/osdi04/tech/full_papers/barham/barham.pdf
Replay Debugging for Distributed Applications
Dennis Geels, Gautam Altekar, Scott Shenker, and Ion Stoica. In Proceedings of the 2006 USENIX Annual Technical Conference (ATC'06), pp. 289–-300, June 2006.
https://www.usenix.org/legacy/event/usenix06/tech/geels/geels.pdf

Week 12 - Apr. 8: Very Large Systems (Experiences)

Read and review the following papers:

Operating Systems Support for Planetary-Scale Network Services
Andy Bavier, Mic Bowman, Brent Chun, David Culler, Scott Karlin, Steve Muir, Larry Peterson, Timothy Roscoe, Tammo Spalink and Mike Wawrzoniak. In Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI'04), pp. 253–-266, March 2004.
https://www.usenix.org/legacy/events/nsdi04/tech/full_papers/bavier/bavier.pdf
Distributed Computing in Practice: The Condor Experience
Douglas Thain, Todd Tannenbaum, and Miron Livny. In Concurrency and Computation: Practice and Experience, Vol. 17, No. 2-4, pp. 323--356, February 2005.

http://research.cs.wisc.edu/htcondor/doc/condor-practice.pdf

Additional suggested reading