Papers
For each meeting, readings are assigned. Usually, the readings will consist of two computer systems papers. The papers selected for this course are either classic papers or papers from recent top conferences. You are expected to read these papers thoroughly and submit a review BEFORE arriving at class on Tuesdays.
Each paper will be briefly presented by a student in the class, who will also lead the discussion of that paper. For each class meeting, we identify the topic and papers below; for each, we also try to identify good sources for background reading and for further investigation.
To enter your paper reviews, go here.
Electronic versions are available from the course review site.
(NOTE: This schedule is not set in stone. Some changes may be made to this schedule during the term)
Week 0 - May 4: Welcome to CSC 2227
There will be no class meeting this week.
The following items are intended to provide an overview of the course (how it will operate and what is expected of you) and to help you refresh your memory of operating systems. These are to help you prepare for the course and assess your own knowledge of the pre-requisite material.
- Course background notes
- OS Self Assessment (pdf)
- Concurrency Self Assessment (pdf) (Skip 8-25, 32, 36, 40, 43-45, 52)
Week 1 - May 11: Historical Distributed Systems
presented by Angela Demke Brown
Read and review the following papers:
- Grapevine: an exercise in distributed computing
Andrew D. Birrell, Roy Levin, Michael D. Schroeder, and Roger M. Needham. In Communications of the ACM, Vol. 25, No. 4, pp. 260-274, April 1982. (ACM SIGOPS HoF paper 2008)
http://doi.acm.org/10.1145/358468.358487 - A Comparison of Two Distributed Systems: Amoeba and Sprite
Fred Douglis, M. Frans Kaashoek, John K. Ousterhout, and Andrew S. Tanenbaum. In Computing Systems, Vol. 4, No. 3, pp. 353-384, December 1991.
https://www.usenix.org/legacy/publications/compsystems/1991/fall_douglis.pdf
Additional suggested reading
- The LOCUS distributed operating system
Bruce Walker, Gerald Popek, Robert English, Charles Kline, and Greg Thiel. In Proceedings of the ninth ACM Symposium on Operating Systems Principles (SOSP '83), pp. 49-70, October 1983.
- Experience with Grapevine: the growth of a distributed system.
Michael D. Schroeder, Andrew D. Birrell, and Roger M. Needham. 1984. In ACM Transactions on Computer Systems, Vol. 2, No. 1, pp. 3-23, February 1984.
http://doi.acm.org/10.1145/2080.2081 - Distributed Operating Systems
Andrew S. Tanenbaum and Robbert Van Renesse. In ACM Computing Surveys, Vol. 17, No. 4, pp. 419-470, December 1985.
http://doi.acm.org/10.1145/6041.6074 - The V distributed system
David Cheriton. In Communications of the ACM, Vol. 31, No. 3 pp. 314-333, March 1988.
http://doi.acm.org/10.1145/42392.42400
Week 2 - May 18: Classic Distributed File Systems
Read and review the following papers:
- Scale and Performance in a Distributed File System
John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, and Michael J. West. In ACM Transactions on Computer Systems (TOCS), vol. 6, no. 1, pp. 51--81, February 1988. (ACM SIGOPS HoF paper 2008, originally published in Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (SOSP’87), November 1987.)
https://doi.org/10.1145/35037.35059 - Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency
Cary G. Gray and David R. Cheriton. In Proceedings of the Twelfth ACM Symposium on Operating Systems Principles (SOSP’89), pp. 202--210, December 1989. (ACM SIGOPS HoF paper 2009)
https://doi.org/10.1145/74850.74870
Additional suggested reading
- Design and Implementation of the Sun Network Filesystem
Russel Sandberg, David Goldberg, Steve Kleiman, Dan Walsh, Bob Lyon. In Proceedings of the 1985 USENIX Summer Conference, pp. 119--130, June 1985.
nfs-sandberg85.pdf - Petal: Distributed Virtual Disks
Edward K. Lee and Chandramohan A. Thekkath. In Proceedings of the seventh international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), pp. 84-–92, September 1996.
https://doi.org/10.1145/237090.237157 - Frangipani: a scalable distributed file system
Chandramohan A. Thekkath, Timothy Mann and Edward K. Lee. In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles (SOSP '97), pp. 224-–237, October 1997.
https://doi.org/10.1145/268998.266694
Week 3 - May 25: Placement and Lookup Services
Read and review the following papers:
- Chord: A scalable peer-to-peer lookup service for internet applications.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. In Proceedings of the 2001 Conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM '01), pp. 149-160, August 2001. (ACM SIGOPS HoF paper 2015)
http://doi.acm.org/10.1145/383059.383071 - CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data
Sage A. Weil, Scott A. Brandt, Ethan L. Miller and Carlos Maltzahn. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC'06), November 2006.
https://doi.org/10.1109/SC.2006.19
Additional suggested reading
Several closely related papers were published in 2001, including Chord, Pastry, CAN, and the Tapestry technical report.
- Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems.
Antony I. T. Rowstron and Peter Druschel. In Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms (Middleware '01), pp. 329-350, November 2001.
http://research.microsoft.com/en-us/um/people/antr/PAST/pastry.pdf - A Scalable Content-Addressable Network
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp and Scott Shenker. In Proceedings of the 2001 Conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM '01), pp. 161--172, August 2001.
https://doi.org/10.1145/383059.383072 - Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing
Ben Zhao, John Kubiatowicz, and Anthony D. Joseph. UCB Technical Report UCB/CSD-01-1141, University of California Berkeley, Electrical Engineering and Computer Science Department, April, 2001.
https://people.eecs.berkeley.edu/~adj/publications/paper-files/CSD-01-1141.pdf
Week 4 - June 1: Coordination Services
Read and review the following papers:
- The Chubby Lock Service for Loosely-Coupled Distributed Systems
Mike Burrows. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI'06), pp. 335--350, November 2006. (ACM SIGOPS HoF paper 2017)
https://www.usenix.org/legacy/event/osdi06/tech/burrows.html - NetChain: Scale-Free Sub-RTT Coordination
Xin Jin, Xiaozhou Li, Haoyu Zhang, Nate Foster, Jeongkeun Lee, Robert Soulé, Changhoon Kim and Ion Stoica. In Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI’18), pp. 35--49, April 2018. (Best paper award)
https://www.usenix.org/node/211262
Additional suggested reading
Some understanding of Distributed Consensus protocols are needed to follow these coordination services papers. The first entry below (Raft) presents a consensus algorithm that was designed to be easy (or easier, compared to Paxos) to understand.
- In Search of an Understandable Consensus Algorithm
Diego Ongaro and John Ousterhout. In Proceedings of the 2014 USENIX Annual Technical Conference (ATC'14), pp. 305--319, June 2014. (Best paper award)
https://www.usenix.org/node/184041 - ZooKeeper: Wait-free Coordination for Internet-scale Systems
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira and Benjamin Reed. In Proceedings of the 2010 USENIX Annual Technical Conference (ATC'10), pp. 145--158, June 2010.
https://www.usenix.org/legacy/events/atc10/tech/full_papers/Hunt.pdf - Extensible Distributed Coordination
Tobias Distler, Christopher Bahn, Alysson Neves Bessani, Frank Fischer and Flavio P. Junqueira. In EuroSys '15: Proceedings of the Tenth European Conference on Computer Systems (EuroSys 15), pp. 1–16, April 2015.
https://doi.org/10.1145/2741948.2741954
Week 5 - June 8: Distributed Shared Logs
Read and review the following papers:
- Corfu: A Distributed Shared Log
Mahesh Balakrishnan, Dahlia Malkhi, John D. Davis, Vijayan Prabhakaran, Michael Wei and Ted Wobber. In ACM Transactions on Computer Systems (TOCS) vol. 31 no. 4 , pp. 10:1--10:24, December 2013. https://doi.org/10.1145/2535930
Additional suggested reading
Shared logs are a key component of many transactional systems. One early example is QuickSilver. Many others followed.
- Recovery Management in QuickSilver
Roger Haskin, Yoni Malachi, Wayne Sawdon and Gregory Chan. In ACM Transactions on Computer Systems, Vol. 6, No. 1, Pages 82--108, February 1968.
https://doi.org/10.1145/35037.35060 - Tango: Distributed Data Structures over a Shared Log
Mahesh Balakrishnan, Dahlia Malkhi, Ted Wobber, Ming Wu, Vijayan Prabhakaran, Michael Wei, John D. Davis, Sriram Rao, Tao Zou and Aviad Zuck. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP'13), pp. 325--340, November 2013.
https://doi.org/10.1145/2517349.2522732 - Building a Replicated Logging System with Apache Kafka.
Guozhang Wang, Joel Koshy, Sriram Subramanian, Kartik Paramasivam, Mammad Zadeh, Neha Narkhede, Jun Rao, Jay Kreps, Joe Stein. In Proceedings of the VLDB Endowment, Volume 8, Number 12, pp. 1654--1655, August 2015. http://www.vldb.org/pvldb/vol8/p1654-wang.pdf - vCorfu: A Cloud-Scale Object Store on a Shared Log
Michael Wei, Amy Tai, Christopher J. Rossbach, Ittai Abraham, Maithem Munshed, Medhavi Dhawan, Jim Stabile, Udi Wieder, Scott Fritchie, Steve Swanson, Michael J. Freedman and Dahlia Malkhi. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI'17), pp. 35--49, March 2017.
https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/wei-michael - Scalog: Seamless Reconfiguration and Total Order in a Scalable Shared Log
Cong Ding, David Chu, Evan Zhao, Xiang Li, Lorenzo Alvisi and Robbert van Renesse. In Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI'20), pp. 325--338, February 2020.
https://www.usenix.org/conference/nsdi20/presentation/ding
Week 6 - June 15: Key-Value Stores
Read and review the following papers:
- Bigtable: A Distributed Storage System for Structured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes and Robert E. Gruber. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI'06), pp. 205--218, November 2006. (Best paper award)
https://www.usenix.org/conference/osdi-06/bigtable-distributed-storage-system-structured-data - FlashStore: High Throughput Persistent Key-Value Store
Biplob Debnath, Sudipta Sengupta and Jin Li. In Proceedings of the VLDB Endowment, Volume 3, Number 1-2, pp. 1414–1425, September 2010.
https://doi.org/10.14778/1920841.1921015
Additional suggested reading
- Distributed Caching with Memcached
Brad Fitzpatrick. In Linux Jounal 2004(124), 5 pages, August 2004.
https://www.linuxjournal.com/article/7451 - Dynamo: Amazon’s Highly Available Key-value Store
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels. In Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP '07), pp. 205-220, October 2007.
http://doi.acm.org/10.1145/1294261.1294281 - HyperDex: a distributed, searchable key-value store
Robert Escriva, Bernard Wong, and Emin Gün Sirer. In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, technologies, architectures, and protocols for computer communication (SIGCOMM '12), pp. 25--36, August 2012.
http://doi.acm.org/10.1145/2342356.2342360
Week 7 - June 22: Designing Key-Value Stores for SSDs
Read and review the following papers:
- WiscKey: Separating Keys from Values in SSD-conscious Storage
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST'16), pp. 133--148, Feb. 2016.
https://www.usenix.org/node/194425 - KVell: the design and implementation of a fast persistent key-value store
Baptiste Lepers, Oana Balmau, Karan Gupta and Willy Zwaenepoel. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19), pp. 447–-461 , Oct. 2019.
https://doi.org/10.1145/3341301.3359628
Additional suggested reading
- NVMKV: A Scalable, Lightweight, FTL-aware Key-Value Store
Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala and Raju Rangaswami. In Proceedings of the 2015 USENIX Annual Technical Conference (ATC'15), pp. 207--219, July 2015.
https://www.usenix.org/conference/atc15/technical-session/presentation/marmol - PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram and Ittai Abraham. In Proceedings of the 26th Symposium on Operating Systems (SOSP'17), pp. 497--514, Oct. 2017.
https://doi.org/10.1145/3132747.3132765 - An Efficient Memory-Mapped Key-Value Store for Flash Storage
Anastasios Papagiannis, Giorgos Saloustros, Pilar González-Férez, amd Angelos Bilas. In Proceedings of the ACM Symposium on Cloud Computing (SoCC'18), pp. 490--502, Oct. 2018.
https://doi.org/10.1145/3267809.3267824
Week 8 - June 29: In-Memory Distributed Computing and Storage
Read and review the following papers:
- Fast Crash Recovery in RAMCloud
Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP'11), pp. 29--41, October 2011.
http://doi.acm.org/10.1145/2043556.2043560 - FaRM: Fast Remote Memory
Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI'14), pp. 401--414, April 2014.
https://www.usenix.org/conference/nsdi14/technical-sessions/dragojevic
Additional suggested reading
- MICA: A Holistic Approach to Fast In-Memory Key-Value Storage
Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI'14), pp. 429--444, April 2014.
https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-lim.pdf
Week 9 - July 6: Experiences with real-world large scale systems
Read and review the following papers:
- A large scale analysis of hundreds of in-memory cache clusters at Twitter
Juncheng Yang, Yao Yue and K. V. Rashmi. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pp. 191--208, Nov. 2020.
https://www.usenix.org/conference/osdi20/presentation/yang - Evolution of Development Priorities in Key-value Stores Serving Large-scale Applications: The RocksDB Experience
Siying Dong, Andrew Kryczka, Yanqin Jin and Michael Stumm. In Proceedings of the 19th USENIX Conference on File and Storage Technologies (FAST 21), pp. 33--49, Feb. 2021.
https://www.usenix.org/conference/fast21/presentation/dong
Additional suggested reading
- Scaling Memcache at Facebook
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung and Venkateshwaran Venkataramani. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI'13), pp. 385---398, April 2013.
https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/nishtala
Week 10 - July 13: Non-Volatile Main Memory
Read and review the following papers:
- NOVA-Fortis: A Fault-Tolerant Non-Volatile Main Memory File System
Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Steven Swanson, and Andy Rudoff. n Proceedings of the 26th Symposium on Operating Systems Principles (SOSP'17), pp. 478--496, October 2017.
https://doi.org/10.1145/3132747.3132761 - Twizzler: a Data-Centric OS for Non-Volatile Memory
Daniel Bittman, Peter Alvaro, Pankaj Mehra, Darrell D. E. Long and Ethan L. Miller. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC 20), pp. 65--80, July 2020.
https://www.usenix.org/conference/atc20/presentation/bittman
Additional suggested reading
Current systems research on non-volatile memory systems covers a wide range of topics, including designing persistent data structures and programming frameworks, persistent memory file systems, hybrid (or tiered) paging systems, and databases on persistent memory. The selections here are just a few interesting points in that space.
- Data tiering in heterogeneous memory systems
Subramanya R. Dulloor, Amitabha Roy, Zheguang Zhao, Narayanan Sundaram, Nadathur Satish, Rajesh Sankaran, Jeff Jackson, and Karsten Schwan. In Proceedings of the Eleventh European Conference on Computer Systems (EuroSys'16) pp. 15:1-15:16, April 2016.
DOI:https://doi.org/10.1145/2901318.2901344 - Thermostat: Application-transparent Page Management for Two-tiered Main Memory
Neha Agarwal and Thomas F. Wenisch. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'17), pp. 631--644, April 2017.
https://doi.org/10.1145/3037697.3037706 - Log-Structured Non-Volatile Main Memory
Qingda Hu, Jinglei Ren, Anirudh Badam, Jiwu Shu and Thomas Moscibroda. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC'17), pp. 703--717, July 2017.
https://www.usenix.org/conference/atc17/technical-sessions/presentation/hu - Pisces: A Scalable and Efficient Persistent Transactional Memory
Jinyu Gu, Qianqian Yu, Xiayang Wang, Zhaoguo Wang, Binyu Zang, Haibing Guan, and Haibo Chen. In Proceedings of the 2019 USENIX Annual Technical Conference (ATC'19), pp. 913--928, July 2019
https://www.usenix.org/conference/atc19/presentation/gu - SplitFS: reducing software overhead in file systems for persistent memory
Rohan Kadekodi, Se Kwon Lee, Sanidhya Kashyap, Taesoo Kim, Aasheesh Kolli, and Vijay Chidambaram. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP'19), pp. 494–508, October 2019.
https://doi.org/10.1145/3341301.3359631
Week 11 - July 20:
Read and review the following papers:
- Arrakis: The Operating System is the Control Plane
Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson and Timothy Roscoe. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'14), pp. 1--16, October 2014. (Best Paper award)
https://www.usenix.org/node/186141 - IX: A Protected Dataplane Operating System for High Throughput and Low Latency
Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis and Edouard Bugnion. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'14), pp. 49--65, October 2014. (Best Paper award)
https://www.usenix.org/node/186147
Additional suggested reading
Week 12 - July 27:
Read and review the following papers:
- Rethinking the Library OS from the Top Down
Donald E. Porter, Silas Boyd-Wickizer, Jon Howell, Reuben Olinsky and Galen C. Hunt. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'11), pp.291--304, March 2011.
https://doi.org/10.1145/1950365.1950399 - Unikernels: library operating systems for the cloud
Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand and Jon Crowcroft. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'13), pp. 416--472, March 2013.
https://doi.org/10.1145/2451116.2451167
Additional suggested reading