Due: Electronically, by 11:59pm, Friday, February 16
In this assignment you will implement synchronization primitives for OS/161 and learn how to use them to solve a synchronization problem. You will then extend the thread system and implement several system calls to allow OS/161 to support more interesting user programs. Once you have completed the programming exercises you should have a fairly solid grasp of the pitfalls of concurrent programming and, more importantly, how to avoid those pitfalls in the code you will write later this semester.
To complete this assignment you will need to be familiar with the OS/161 thread code. The thread system provides interrupts, control functions, and semaphores. You will be extending the thread system to provide additional functions that are found in most thread libraries.
This assignment consists of the following parts:
The main web page is https://stanley.cs.toronto.edu/csc369-2007-01/drproject - you must login with you CDF id and password. After you log in, you can select your project (groupXX) from the box on the upper right.
The "Wiki" has a link to the user guide (DrProjectGuide) - you should read about the code browser "DrProjectBrowser" and how to view changes to the source code, "DrProjectChangeset". Read about other topics as you need them.
For the remainder of the assignments, you will be using the same
repository. We will take advantage of SVN's branches to keep the
repository organized. The repository for your group has been created with
a branch for Assignment 1 called a1-branch. You should check
out this branch, do all your work in the branch, and commit the branch to
submit your assignment.
svn co https://stanley.cs.toronto.edu/svn/csc369-2007-01/<groupXX>/branches/a1-branch
You can also find the checkout command on the starting page of the DrProject Wiki for your project.
For the next assignment, I will merge the starter code into the trunk directory of your repository, and then you will create a new branch for the next assignment. This will allow you to merge solution code from A1 into your starter code for A2.
Important: Remember to do all your work in the branch, not in the trunk.
Since you have a fresh copy of the source code from your checkout, you will need to re-run "./configure --werror" in the src directory.
You will also have to reconfigure your kernel before you begin work on this assignment. The procedure for configuring a kernel is the same as in ASST0, except you will use the ASST1a and/or ASST1b configuration files. The only difference between the ASSTO and ASST1 configuration files is that for ASST1a we uncomment the "options synchprobs" line. Note that it is not possible to run user programs using the "p" command with "synchprobs" on. Therefore, for the system calls part of this assignment, you should configure using ASST1b, which is identical to the ASST0 configuration (that is, "options synchprobs" is commented out).
You should now see an ASST1a directory in the compile directory.
When you built OS/161 for ASST0, you ran make from kern/compile/ASST0. In ASST1, for the synchronization problems you run make from (you guessed it) kern/compile/ASST1a and for the system calls, you run make from kern/compile/ASST1b .
% cd ../compile/ASST1a % make depend % make % make install
If you are told that the compile/ASST1a directory does not exist, make sure you ran config for ASST1a.
For the system calls part of the assignment, you will need to configure a kernel with "options synchprobs" turned off again, by using the ASST1b configuration.
To give you some practice with Subversion, your first task is to update your repository with (at least parts of) the solution to A0. We encourage you to use your own solution - you and your partner should share your solutions at this point and discuss them - but you are also welcome to use the sample solutions.
You will want at least:
Your solutions to ASST1 will be tested by running OS/161 with command line arguments that correspond to the menu options in the OS/161 boot menu. We will be providing instructions on how to add the necessary menu options.
If your code is properly synchronized, the timing of context switches and the order in which threads run should not change the behaviour of your solution. Of course, your threads may print messages in different orders, but you should be able to easily verify that they follow all of the constraints applied to them and that they do not deadlock.
When you booted OS/161 in ASST0, you may have seen the options to run the thread tests. The thread test code uses the semaphore synchronization primitive. You should trace the execution of one of these thread tests in GDB to see how the scheduler acts, how threads are created, and what exactly happens in a context switch. You should be able to step through a call to mi_switch() and see exactly where the current thread changes.
Thread test 1 ( " tt1 " at the prompt or tt1 on the kernel command line) prints the numbers 0 through 7 each time each thread loops. Thread test 2 (" tt2 ") prints only when each thread starts and exits. The latter is intended to show that the scheduler doesn't cause starvation--the threads should all start together, spin for a while, and then end together.
thread_yield() is automatically called for you at intervals that vary randomly. While this randomness is fairly close to reality, it complicates the process of debugging your concurrent programs.
The random number generator used to vary the time between these thread_yield() calls uses the same seed as the random device in System/161. This means that you can reproduce a specific execution sequence by using a fixed seed for the random number generator. You can pass an explicit seed into random device by editing the "random" line in your sys161.conf file. For example, to set the seed to 1, you would edit the line to look like:
28 random seed=1
We recommend that while you are writing and debugging your solutions you pick a seed and use it consistently. Once you are confident that your threads do what they are supposed to do, set the random device to autoseed. This should allow you to test your solutions under varying conditions and may expose scenarios that you had not anticipated.
These questions will be addressed during tutorial. In order to ensure that you really understand the code, it is strongly recommended that you try to answer the questions before tutorial.
To implement synchronization primitives, you will have to understand the operation of the threading system in OS/161. It may also help you to look at the provided implementation of semaphores. When you are writing solution code for the synchronization problems it will help if you also understand exactly what the OS/161 scheduler does when it dispatches among threads.
- What happens to a thread when it exits (i.e., calls thread_exit() )? What about when it sleeps?
- What function(s) handle(s) a context switch?
- How many thread states are there? What are they?
- What does it mean to turn interrupts off? How is this accomplished? Why is it important to turn off interrupts in the thread subsystem code?
- What happens when a thread wakes up another thread? How does a sleeping thread get to run again?
- Describe how thread_sleep() and thread_wakeup() are used to implement semaphores. What is the purpose of the argument passed to thread_sleep() ?
- Are OS/161 semaphores "strong" or "weak"?
- Why does the lock API in OS/161 provide lock_do_i_hold(), but not lock_get_holder() ?
- What are the ELF magic numbers?
- What is the difference between UIO_USERISPACE and UIO_USERSPACE? When should one use UIO_SYSSPACE instead?
- Why can the struct uio that is used to read in a segment be allocated on the stack in load_segment() (i.e., where does the memory read actually go)?
- In runprogram(), why is it important to call vfs_close() before going to usermode?
- What function forces the processor to switch into usermode? Is this function machine dependent?
- In what file are copyin and copyout defined? memmove? Why can't copyin and copyout be implemented as simply as memmove?
The real work starts here!
Implement locks for OS/161. The interface for the lock structure is defined in kern/include/synch.h. Stub code is provided in kern/threads/synch.c. These should be "sleep locks" not "spin locks", in the sense that a waiting thread should sleep, rather than spin, until the lock is available.
Implement condition variables for OS/161. The interface for the cv structure is also defined in synch.h and stub code is provided in synch.c.
Additional tips and more info for this part of the assignment are here.
You will create a new directory called cache in the
kern directory and all the source code for this problem will
go in this directory. Any new header files should be added in the
kern/include directory. Remember to update
kern/conf/conf.kern so that the new files will be added to the
Makefiles. Remember also to add the new files to your repository.
More details to come...
A lock-up free cache is one that can continue to accept requests even while it is waiting for a response from the disk. This is useful, because some threads can access data in the cache, even while slow disk operations are loading data into some other part of the cache.
Your task is to implement synchronization for a lockup-free cache using locks and condition variables. Do not disable interrupts in your cache solution!
Multiple threads may access the cache concurrently, asking to read or write data at some physical disk location. For a read, if the data is cached, the data can be returned immediately. If the data is not cached, the cache must (i) kick something out of the cache to clear space (possibly having to write it back to disk if it is dirty, i.e., if the copy in the cache has been modified since it was read in from disk), (ii) ask the disk to fetch the block, and (iii) when the disk request completes, update the cache and return the data to the original caller. For a write, if the data is cached, the cached block is overwritten and marked dirty. If the data is not cached, the block can be written directly to its location on the disk, since all blocks are either read or written completely (it is not necessary to first read the block into the cache before overwriting it).
Although some requests may have to wait for the disk, it should be possible for other threads to access the cache and start other disk requests if needed. However, it should not be possible for the same disk block to be stored in more than one cache location at the same time. For example, if thread P reads block X, and the data is not in cache, it must start a disk read to obtain the block. If thread Q also reads block X at about the same time, it should wait for the read started by P to complete, rather than reading X into the cache again.
You can find some initial code to illustrate how the cache would work for a single thread on CDF at /u/csc369h/winter/pub/asst1/. Start by reading README.cache - it briefly explains the code provided. Make sure you look at the sample test function to get an idea of how the cache is used by threads. You need to provide synchronization to complete the implementation of cache_read() and cache_write(). (You will also need to update cache_init() to make sure anything new that you add gets initialized properly!).
For this part of the assignment, you will be extending the kernel thread subsystem to allow one thread to wait for another to exit. You must use your implementation of locks and condition variables to solve this problem.
We are providing a simple thread id management system (see pid.[ch] in the /u/csc369h/winter/pub/asst1/ directory on CDF). You should use these functions to allocate an identifier when a new thread is created, and to locate a thread (or its status information) given a thread id. You should think of pid.c as implementing a monitor that controls access to the shared pid data structures. The externally-callable functions in pid.c are the monitor's entry points. All access to the pid data structures (including querying or modifying any fields) must be done through a function provided by pid.c. You will need to add additional external functions to respect this requirement.
You must implement the following functions (since you are extending the thread subsystem, place your code in kern/thread/thread.c - remember to add prototypes to kern/include/thread.h):
You will also need to change the implementation of thread_exit so that it synchronizes correctly with the joining thread (if any).
You will probably find it convenient to implement most of these functions by calling a suitable function of the pid monitor (which you will also write).
You should also change the prototype and implementation of thread_fork so that it returns a thread id (pid) rather than a pointer to the thread structure itself. If the parent does not want to know the thread id (i.e., the last argument to thread_fork is NULL), you should place the new thread in the detached state immediately. (If the parent does not know the thread id, then it will not be able to call thread_join for the new thread).
In your design document, describe your solution to the join/detach/exit synchronization problem. Describe what will happen if (a) the join happens before the exit, (b) the detach happens before exit, (c) exit happens before join, and (d) exit happens before detach. Are there other possibilities?
The OS/161 menu thread forks new kernel threads to handle the commands given at the prompt. Use your implementation of thread_join to make the menu thread wait until the child thread has finished before printing the prompt and accepting another command.
To test this part of the assignment, reconfigure your kernel with "options synchprobs" commented out again.
Implement the following system calls:
It's crucial that your syscalls handle all error conditions gracefully (i.e., without crashing OS/161.) You should consult the OS/161 man pages included in the distribution (the .html files under src/man - load the file "src/man/syscall/index.html" and you should be able to browse through the man pages for the individual system calls) and understand fully the system calls that you must implement.
Your syscalls must return the correct value (in case of success) or error code (in case of failure) as specified in the man pages. Some of the grading scripts rely on the return of appropriate error codes; adherence to the guidelines is as important as the correctness of the implementation.
Recall that the file include/unistd.h contains the user-level interface definition of the system calls that you will be writing for OS/161 (including ones you will implement in later assignments). This interface is different from that of the kernel functions that you will define to implement these calls. Note that the user-level interface is already defined for the system calls you will write in this assignment, but you might want to read this file to see the type and number of arguments that will be passed to the OS. You need to think about a variety of issues associated with implementing system calls. Perhaps the most obvious one is: can two different user-level processes find themselves running a system call at the same time? Be sure to argue for or against this, and explain your final decision in the design document.
getpid(), waitpid()
A pid, or process ID, is a unique number that identifies a process. The implementation of getpid() should be straightforward, assuming you have already taken care of allocating thread id's when a new thread is created (since every user-level process executes in the context of a single kernel thread).
Your waitpid() system call should map nicely to the thread_join function in the kernel thread system.
fork()
This system call is the most difficult part of the assignment, but also the most rewarding. It enables multiprogramming (although without execv(), you can only run multiple copies of the same program) and make OS/161 a much more useful entity.
fork() is the mechanism for creating new processes. It should make a copy of the invoking process and make sure that the parent and child processes each observe the correct return value (that is, 0 for the child and the newly created pid for the parent). You will want to think carefully through the design of fork(). The core of the fork() system call will use the thread_fork() function to create a new kernel thread. The difficult part is starting the child process with the same state the parent had at the time that it made the system call.
You will want to look at how a fresh process is initially started by runprogram(), and how a process returns to userspace following a system call. The child process has some characteristics of each when it first goes to userlevel.
A note on errors and error handling of system calls:
The man pages in the OS/161 distribution contain a description of the error return values that you must return. If there are conditions that can happen that are not listed in the man page, return the most appropriate error code from kern/errno.h. If none seem particularly appropriate, consider adding a new one. If you're adding an error code for a condition for which Unix has a standard error code symbol, use the same symbol if possible. If not, feel free to make up your own, but note that error codes should always begin with E, should not be EOF, etc. Consult Unix man pages to learn about Unix error codes; on Linux systems "man errno" will do the trick.
Note that if you add an error code to src/kern/include/kern/errno.h, you need to add a corresponding error message to the file src/lib/libc/strerror.c.
Once again, discuss how you handle each of these system calls in your design document. Make sure you identify where to find your code modifications in the OS/161 source.
You must provide a well-organized design document discussing your solution to each part of this assignment. Include diagrams / drawings if it helps to clarify your design. Either plain text, or PDF formatted documents are acceptable. Name your document "design.txt" or "design.pdf" and place it in the src/asst1 directory. Additional guidelines for each part of the assignment are included above.
In addition, you should clearly describe the remaining problems for any part of your system that is still not working.