CSC 2227S | Spring 2015 | Readings

Papers

For each meeting, readings are assigned. Usually, the readings will consist of two or three computer systems papers. The papers selected for this course are either classic papers or papers from recent top conferences. You are expected to read these papers thoroughly and submit a review BEFORE arriving at class on Wednesdays.

Each paper will be briefly presented by a student in the class, who will also lead the discussion of that paper. For each class meeting, we identify the topic and papers below; for each, we also try to identify good sources for background reading and for further investigation.

To enter your paper reviews, go here.

Electronic versions are available from the course review site.

(NOTE: This schedule is not set in stone. Some changes may be made to this schedule during the term)

Week 1 - January 14: Welcome to CSC 2227

This first meeting will be largely organizational in nature. In it, we will discuss how the class is going to work and what will be covered. In addition, we will very rapidly recap stuff you should already know, discussing what defines operating systems and distributed systems and what makes them continue to be interesting after all these years, and overviewing how the various topics in the course fit together.

There are no readings for this week. The following items are intended to help you refresh your memory of operating systems. These are to help you prepare for the course and assess your own knowledge of the pre-requisite material. Sample solutions to the OS exam questions will be posted next week.

Week 2 - January 21: Historical Distributed Systems

presented by TBD

Read and review the following papers:

  1. Grapevine: an exercise in distributed computing
    Andrew D. Birrell, Roy Levin, Michael D. Schroeder, and Roger M. Needham. In Communications of the ACM, Vol. 25, No. 4, pp. 260-274, April 1982.
    http://doi.acm.org/10.1145/358468.358487
  2. A Comparison of Two Distributed Systems: Amoeba and Sprite
    Fred Douglis, M. Frans Kaashoek, John K. Ousterhout, and Andrew S. Tanenbaum. In Computing Systems, Vol. 4, No. 3, pp. 353-384, December 1991.
    https://www.usenix.org/legacy/publications/compsystems/1991/fall_douglis.pdf

Additional suggested reading

Week 3 - January 28: Consensus

Read and review the following papers:

  1. Paxos Made Simple
    Leslie Lamport. November 2001.
    http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf
  2. In Search of an Understandable Consensus Algorithm
    Diego Ongaro and John Ousterhout. In Proceedings of the 2014 USENIX Annual Technical Conference, pp. 305-320, June 2014.
    https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro

Additional suggested reading

Week 4 - Feb. 4: Coordination Services

Read and review the following papers:

  1. The Chubby Lock Service for Loosely-Coupled Distributed Systems
    Mike Burrows. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI'06), pp. 335--350, November 2006.
    https://www.usenix.org/legacy/event/osdi06/tech/burrows.html
  2. ZooKeeper: Wait-free Coordination for Internet-scale Systems
    Patrick Hunt, Mahadev Konar, Flavio P. Junqueira and Benjamin Reed. In Proceedings of the 2010 USENIX Annual Technical Conference (ATC'10), pp. 145--158, June 2010.
    https://www.usenix.org/legacy/events/atc10/tech/full_papers/Hunt.pdf

Additional suggested reading

Week 5 - Feb. 11: Distributed Hash Tables

Read and review the following papers:

  1. Chord: A scalable peer-to-peer lookup service for internet applications.
    Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. In Proceedings of the 2001 Conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM '01), pp. 149-160, August 2001.
    http://doi.acm.org/10.1145/383059.383071
  2. Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems.
    Antony I. T. Rowstron and Peter Druschel. In Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms (Middleware '01), pp. 329-350, November 2001.
    http://research.microsoft.com/en-us/um/people/antr/PAST/pastry.pdf

Additional suggested reading

READING WEEK - Feb. 18: NO CLASS

Week 6 - Feb. 25: Key-Value Stores

Read and review the following papers:

  1. Dynamo: Amazon’s Highly Available Key-value Store
    Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels. In Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP '07), pp. 205-220, October 2007.
    http://doi.acm.org/10.1145/1294261.1294281
  2. HyperDex: a distributed, searchable key-value store
    Robert Escriva, Bernard Wong, and Emin Gün Sirer. In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, technologies, architectures, and protocols for computer communication (SIGCOMM '12), pp. 25--36, August 2012.
    http://doi.acm.org/10.1145/2342356.2342360

Additional suggested reading

Week 7 - Mar 4: In-Memory Distributed Computing and Storage

Read and review the following papers:

  1. Fast Crash Recovery in RAMCloud
    Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP'11), pp. 29--41, October 2011.
    http://doi.acm.org/10.1145/2043556.2043560
  2. FaRM: Fast Remote Memory
    Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI'14), pp. 401--414, April 2014.
    https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-dragojevic.pdf

Additional suggested reading

Week 8 - Mar. 11: Distributed File Systems

Read and review the following papers:

  1. Ceph: A Scalable, High-Performance Distributed File System
    Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI'06), pp. 307--320, November 2006.
    https://www.usenix.org/legacy/event/osdi06/tech/weil.html
  2. GPFS: A Shared-Disk File System for Large Computing Clusters
    Frank Schmuck and Roger Haskin. In Proceedings of the First USENIX Conference on File and Storage Technologies (FAST'02), pp. 231--244, January 2002
    https://www.usenix.org/legacy/publications/library/proceedings/fast02/full_papers/schmuck/schmuck.pdf

Additional suggested reading

Week 9 - Mar. 20 (proposed rescheduled meeting): Programming Frameworks/Models

Read and review the following papers:

  1. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
    Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys'07), pp. 59-72, March 2007. http://doi.acm.org/10.1145/1272996.1273005
  2. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
    Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, pp. 15--28, April 2012. https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf

Additional suggested reading

Week 10 - Mar. 25: Scheduling and Load Balancing

Read and review the following papers:

  1. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
    Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. In Proceedings of the 5th European conference on Computer systems (EuroSys '10), pp. 265--278, 2010.
    http://doi.acm.org/10.1145/1755913.1755940
  2. Omega: flexible, scalable schedulers for large compute clusters
    Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13), pp. 351--364, 2013.
    http://doi.acm.org/10.1145/2465351.2465386

Additional suggested reading

Week 11 - Apr. 1: Distributed Performance Analysis & Debugging

Read and review the following papers:

  1. DieCast: Testing Distributed Systems with an Accurate Scale Model
    Diwaker Gupta, Kashi V. Vishwanath, and Amin Vahdat. In Proceedings of the 5th USENIX Symposium on Networked System Design and Implementation (NSDI'08), pp. 407–-421, April 2008.
    https://www.usenix.org/legacy/event/nsdi08/tech/full_papers/gupta/gupta.pdf
  2. The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services
    Michael Chow, David Meisner, Jason Flinn, Daniel Peek, and Thomas F. Wenisch. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation (OSDI'14), pp. 217--231, October 2014.
    https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chow.pdf

Additional suggested reading

Week 12 - Apr. 8: Very Large Systems (Experiences)

Read and review the following papers:

  1. Operating Systems Support for Planetary-Scale Network Services
    Andy Bavier, Mic Bowman, Brent Chun, David Culler, Scott Karlin, Steve Muir, Larry Peterson, Timothy Roscoe, Tammo Spalink and Mike Wawrzoniak. In Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI'04), pp. 253–-266, March 2004.
    https://www.usenix.org/legacy/events/nsdi04/tech/full_papers/bavier/bavier.pdf
  2. Distributed Computing in Practice: The Condor Experience
    Douglas Thain, Todd Tannenbaum, and Miron Livny. In Concurrency and Computation: Practice and Experience, Vol. 17, No. 2-4, pp. 323--356, February 2005.
  3. http://research.cs.wisc.edu/htcondor/doc/condor-practice.pdf (authors' version)

Additional suggested reading