This page lists a few potential ideas for projects. Feel free to use one, propose something completely different, or refine one of these into your own idea. Note that these are very rough ideas and have not been evaluated for feasibility. Considerable refinement is necessary to turn one of these into a project proposal.
Each project should have an implementation component and an evaluation component.
1. Activity tracing for laptops and desktops
Motivation: When designing a new system (or part of a system), a key consideration is the type of workload that system needs to support. Benchmark design tries to reproduce typical workloads in a controlled fashion, however, to do a good job we first need to understand what a typical workload looks like. In some cases, this characterization is relatively simple -- for file or web server workloads we can instrument a single server and collect data (such as the file server traces used in the AFS paper) with varying clients. For single-user desktop and laptop systems, however, the problem is more challenging, and as a result, little hard data exists on how people actually use their personal computer systems. Given that most computer users do most of their computing on single-user systems, an understanding of realistic workloads is significant for file system design, disk and CPU scheduling, memory management, and energy management.
Project: Design and implement a framework for activity tracing that could reasonably be deployed on a large number of personal computers. The framework would need to be deployable without requiring installation of a modified kernel (e.g. as a module or driver). It would also need to be lightweight in terms of time and space overhead, and it would need to be careful to anonymize any user-identifiable data before logging the traces (to address concerns of user privacy), while still preserving enough detail to be useful. The goal would be to design a framework that could be extended to collect various sorts of activity, with at least one concrete example implemented. Possibilities include logging all file system requests (for file system or disk I/O scheduling studies) or logging process start and end activities (e.g. for scheduling and energy management, we would like to be able to answer questions like "How many applications execute concurrently?", "What sets of applications tend to execute together?" and "How frequently does the user switch between active applications?"). You should also consider how activity traces would be reported back to some central location, and how to separate the activity (disk, network, etc.) caused by the tracing framework itself from that of the normal user activity.
This project could be carried out for Linux or Windows (though given the number of desktops and laptops running Windows, that would be a bit more compelling). Using ETW (Event Tracing for Windows) with a new file system layer might be a reasonable way to approach the implementation.
2. Swap layout visualizer
Motivation: Studies on I/O prefetching for virtual memory pages have shown that the layout of pages on disk (i.e. in the swap file) can have a significant impact on the effectiveness of prefetching. If pages are organized on disk according to the order in which they will be accessed (and prefetched), then less time is lost due to disks positioning delays and the prefetches are more likely to be able to keep up with the application's rate of data consumption. There are, however, a large number of reasons why prefetching might be ineffective, so it is difficult to say whether the swap layout is a factor, or whether an improved layout could be found. Currently there are no tools to help understand how pages of virtual memory are placed on disk when they are paged out.
Project: Design and implement a visualization tool to dynamically display the status of an application's virtual memory pages, including the placement on disk for those pages that have been paged out. The tool should be able to handle single and multiple disk configurations. You would need to figure out a reasonable way to (a) collect, and (b) display, the necessary information. As always, the data collection mechanisms should be lightweight in terms of time and space to avoid perturbing the system being measured.
3. Swap layout optimization
Motivation: Same as #2.
Project: Investigate how the Linux page out mechanism operates and design and implement a strategy that an application could use to influence (and improve) the layout of its pages on swap. For example, an application might issue a system call (similar in spirit to madvise) to indicate that a particular range of virtual addresses will be used sequentially, or an application might identify pages that could be swapped out and expect the OS to place them in the swap file in the order that they are specified.
4. Performance of spinlock alternatives on current multiprocessors
Motivation: Spinlocks remain at the core of operating system synchronization on multiprocessors, as other alternatives are not universally applicable at the lowest levels. In 2001, John Mellor-Crummey and Michael L. Scott published an ACM TOCS paper titled "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors". This paper compared the performance of various spinlock algorithms (and other synchronization techniques like barriers) on a BBN Butterfly and a Sequent Symmetry; in 2006 the paper received the 2006 Dijkstra Prize in Distributed Computing. Since then, there have been substantial changes in multiprocessor systems, and ongoing research into spinlock alternatives with various desirable theoretical guarantees. What is missing is a thorough performance comparison of the latest synchronization proposals on the multiprocessor hardware that is actually available today.
Project: Design a set of microbenchmarks to test the performance of different spinlock implementations on a current multiprocessor. Use these to evaluate the "classic" techniques introduced or evaluated by Mellor-Crummey and Scott as well as several new techniques from the literature. There are several DCS graduate students working in this area from the theory side who might be able to suggest highly-regarded algorithms that should be included in the evaluation, or who might help you to use their own algorithms in your evaluation.
Last modified: Fri Jan 11 12:22:30 EST 2008