Please submit a printout of your code and documentation (comments in code explaining your strategy is fine, as is a separate document), using the dropbox in BA 2220.
Please also submit softcopy using the submit command on CDF (e.g. create a gzipped tarfile of all your code and any README or other documentation files, and then "submit -c csc469h -a A1 a1.tgz").
The goal is to look at synchronization and concurrency, and different ways of implementing it, with an eye on performance and correctness. To this end, you are asked to implement a concurrent hash table, intended to support a particular use described below.
For programs with data sets that exceed the capacity of physical memory, time stalled waiting for I/O (due to page faults) is a significant performance factor. One strategy to reduce the stall time is prefetching - predict the pages that will be needed and initiate the I/O ahead of time, such that no stall occurs when the page is actually used.
There are many ways to predict pages for prefetching; one approach relies on compiler analysis to automatically insert hints (in the form of prefetch system calls) into the application source code. Based on estimates of how long it takes to fetch a page from disk, the compiler analysis will attempt to insert prefetch instructions in the code such that they are expected to complete shortly before the original access to the page occurs. When this works successfully, the prefetches are effective, the stall time is hidden, and performance improves. When compiler-based prefetching is not successful, however, it can be very difficult to figure out what went wrong.
The main difficultly is that the prefetch requests are inserted into the program code by a static compiler analysis, but their behavior depends on dynamic interactions with the entire memory management system. Unexpected behavior could be the result of a flaw in the compiler or in the OS handling of the requests inserted by the compiler. Some possible problems include:
To gain insight into the causes of ineffective prefetching (and also to understand why something worked), we want to be able to track a page prefetch request from the point where the user-level application issues the prefetch, to the point where the application uses the prefetched page, or the prefetched page is evicted from memory (whichever comes first). During the "lifetime" of a prefetched page, it will go through several state changes in response to asynchronous events. The plan is to associate a small structure, called a prefetch record (or pf_rec), with each prefetched page. Every event that affects the page state will look up the page in a hash table (the page number is the hash key), and record some information in its prefetch record. The states, and the events that cause transitions are shown below:
Labels on arcs are events. In italics are responses that should be generated in handling an event, in addition to causing a state change. Not every possible transition is shown in this diagram, to reduce clutter and improve clarity. The states are:
To help with the tracking, the compiler includes an identifier, or tag with every prefetch instruction that it inserts in the code. One prefetch instruction can be responsible for a large number of prefetch requests (for example, if it is added to the body of a loop). While it is useful to know that (for example) 50% of the prefetches issued were too late, it is even more useful to know that 100% of prefetches issued by prefetch 1 were effective, and 100% of the prefetches issued by prefetch 2 were too late. Thus, we will maintain counters on a per-tag basis.
For each tag, we have a structure that counts the number of times each state was entered, the number of times CLEANUP was entered from each prior state (i.e., what was the state immediately before CLEANUP), and the number of times a prefetched page was reclaimed without being used (transitions from PF_READY to CLEANUP due to RECLAIM events).
We are willing to tolerate some stale data because in the real implementation many things can effect the exact order of events. (Also, in the real implementation, we would collect timestamps on all the transitions and maintain histograms of the durations of events to help understand events that happen close together. For example, if prefetches complete at nearly the same time as the page is touched, some will be counted as "PF_STALLED" and some will be "PF_READY" depending on the exact order of events. The time data would show that the "PF_READY" prefetches were not ready long before they were used, and the "PF_STALLED" ones were not late by much -- postprocessing of the data would probably group both into a "Nearly Correct" category. You do not have to worry about these details for the assignment)
We are not willing to tolerate the loss of events, or gross mis-categorization, however. For example, if a prefetch request has started (a prefetch record has been initialized), a page fault event should not be able to conclude that the page was not prefetched (unless the prefetch completed and the page has already been reclaimed). Similar, under no circumstances should two handlers initiate an I/O for the same page at the same time.
We will simulate a prefetching system using a "driver" process and a "handlers" process. The "handlers" represent the operating system responses to various events. The "driver" controls the simulation, and represents all the possible event sources (including system calls from the "application", I/O completion notices from the disk drives, and page replacements caused by the paging daemon). The "driver" and the "handlers" communicate with each other using System V IPC (msgsnd, msgrcv, etc.) to send messages for each event.
The "handlers" process consists of one or more threads to handle each type of event. Specifically, there is one handler thread for events associated with the main user-level application (SETUP system calls and page faults), one or more handler threads for prefetch requests, and a handler thread representing the paging daemon that deals with page reclamation events.
The "driver" process sends messages encoding an event to the appropriate handler thread. Messages will include a page number (if relevant, additional information like a prefetch tag and an action to be taken by the handler will be included). Using the page number, each handler will lookup the prefetch_record in the shared hash table. If one exists, the appropriate fields in the record will be updated (mostly this means updating the state field), and counters associated with the prefetch tag may also be updated. If there is no prefetch_record, most handlers will update counters of non-prefetched events (only the SETUP event will initialize a record if one does not already exist).
Some events are logically dependent on others (an "IODONE" cannot happen before the page is prefetched or faulted; a page cannot be reclaimed before it is either faulted into memory or the prefetch completes). To enforce this, your event handlers should send messages back to the simulation driver to let the driver know that dependent events can proceed.
Traces will end with a special "STOP" event, sent to the handler threads. When all the handler threads have exited, the parent thread in the handlers process will print overall statistics gathered during the run.
Start by getting the simulation working with the simplest possible locking discipline (the big giant lock on the entire hash table), to make sure you understand the states, events, and the actions to take in handling each event. Then try to improve the synchronization scheme.
On CDF at ~csc469h/fall/pub/A1/a1_starter.tar we provide a framework that implements the driver, the thread creation and messaging for the handlers, and defines the prefetch record structure (pf_rec), the states, events, and message actions, as well as a possible hash table structure (which you are free to change), and the summary counters. Everything else is up to you.
System V IPC message queues are kernel structures. There are system limits on the number you can create, and the maximum number of bytes in the queue, which you cannot change. Currently, the driver/handlers framework uses a single queue for all messages, with different handlers receiving different types of messages. Both use blocking sends and receives. If the message queue fills up, deadlock could result (driver blocks waiting to receive a specific kind of message but handler cannot send it because the queue is full, or vice versa). The driver currently yields (sched_yield) after sending out each event, to give the handlers a chance to run. If, however, you run into deadlocks due to full message queues, you may need to take other actions (for example, use more than 1 message queue, or have the handlers pull multiple messages off the queue at one time, and queue up pending work in a local data structure).