CSC 369S | Assignment 3 | Spring 2007

Assignment 3: File Systems

Due: 11:59PM, Friday, April 13, 2007

Introduction

Currently, user programs in OS/161 are extremely limited. In this assignment, you will add support for additional system calls that allow user programs to perform file and file system operations. The OS/161 file system, emufs, is just a layer on top of the host Unix file system. In this assignment you will also augment sfs, the native OS/161 file system by adding support for a buffer cache to improve performance.

Code Reading (25 points)

The OS/161 sfs file system we provide is very simple. The implementation resides in the fs/sfs directory. The fs/vfs directory provides the infrastructure to support multiple file systems. Your first task is to understand the VFS (virtual file system) layer, and how it is used in OS/161. You may wish to refer to other resources to help increase you understanding of VFS. Include a bibliography of these references in your design document, if you use any.

kern/include: You should examine the files fs.h, vfs.h, vnode.h, and sfs.h.

vfslookup.c contains name translation operations. vfspath.c supports operations on path names and implements the vfs operations. vnode.c has initialization and reference count management.

Question 1.
What is the standard interface to a file system (i.e., what functions must you implement to implement a new file system)?

Question 2.
What operations can you do on a vnode?

Question 3.
What lock protects the vnode reference count?

kern/fs/vfs: The file device.c implements raw device support.

Question 4.
What does a device pathname in OS/161 look like?

Question 5.
What does a raw device name in OS/161 look like?

Question 6.
What device types are currently supported?

Question 7.
What vnode operations are permitted on devices?

devnull.c implements the OS/161 equivalent of /dev/null, called "null:". vfscwd.c implements current working directory support.

Question 8.
Why is VOP_INCREF called in vfs_getcurdir()?

vfslist.c implements operations on the entire set of file systems.

Question 9.
How do items get added to the vfslist?

Question 10. What is the difference between VOP_ routines and FSOP_ routines?

Details of SFS (the Simple File System)

kern/fs/sfs/sfs_fs.c has file system routines for sfs.

Question 11.
How many characters are allowed in the name of an SFS volume?

Question 12.
What is the maximum length of a file name in SFS?

Question 13.
How many direct blocks does an SFS file have?

Question 14. (2pts)
What is the maximum size (in bytes) of an SFS file? How much disk space is needed to store such a maximum-sized file, including space for the file metadata (but not the directory entry)?

Question 15. (2pts)
Suppose you were told that the SFS had to be modified to support a maximum file size of 125K (1K == 1024 bytes). What would be the simplest and most efficient change you could make to meet this requirement?

Question 16.
What is the structure of an SFS directory entry?

Question 17. There is (currently) no buffer cache in sfs. However, the bitmaps and superblock are not written to disk on every modification. How is this possible?

Question 18. What do the statements in Question 17 mean about the integrity of your file system after a crash?

Question 19. Can you unmount a file system on which you have open files?

Question 20. List 3 reasons why a mount might fail.

sfs_io.c has block I/O routines, and sfs_vnode.c has file routines.

Question 21. Why is a routine like sfs_partialio() necessary? Why is this (currently) a performance problem?

sbin/mksfs implements the mksfs utility which creates an sfs file system on a device. disk.h/disk.c defines what the disk looks like.

Question 22. What is the inode number of the root?

Question 23. How do files get removed from the system?

Setting up

Code Setup

For this assignment, we are providing the setup for the system calls you will be adding, as well as basic setup for the file system buffer cache.

The code we provide can be obtained from the a3-branch of your DrProject repository. The starting point is the base OS/161 distribution with the A0 and A1 solutions. The ASST3 config file reverts back to using dumbvm, which has been modified slightly to return the memory size for use when initializing the buffer cache.

File System Setup

The next step is to set up an SFS file system on one of the disks. Use the hostbin/host-mksfs program in your ~/csc369/root directory (the hostbin programs should be executed on the host machine, not on OS/161). The command is:

host-mksfs DISK2.img VOLNAME

The second argument, VOLNAME, is a string that you provide to name the disk volume that contains the file system. You can choose anything you like, but you will be providing the VOLNAME to commands that operate on the file system, so you might want to keep it simple.

After you have the file system set up, you can boot OS/161, mount the filesystem (using the mount menu command - the "fstype" will be "sfs", you will need to figure out the name of the hard disk device to provide as the "device:" argument) and run the file system performance test from the kernel menu by specifying fs1.

You will find a utility in sbin called dumpsfs, and a corresponding version to use on the host system in hostbin called host-dumpsfs. Having a tool that can dump an entire file system is an invaluable debugging aid. As you modify your file system, be sure to keep this utility up to date, so that it can continue to be useful to you.

New System Calls (50 points)

You will need to add support for the system calls listed below. Some of these deal only with managing file system state, while others operate on the file system itself. Make sure you read the OS/161 man pages for these system calls for details on their arguments and operation.

Managing per-process file system state

open(), read(), write(), lseek(), close(), dup2(), chdir(), and __getcwd()

Although these system calls may seem to be tied to the filesystem, in fact, these system calls are really about manipulation of file descriptors, or process-specific filesystem state. A large part of this assignment is designing and implementing a system to track this state. We have designed part of this system for you, in userprog/file.c. Some of this information (such as the current working directory) is specific only to the process, but others (such as file offset) is specific to the process and file descriptor. Think carefully about the state you need to maintain, how to organize it, and when and how it has to change.

For any given process, the first file descriptors (0, 1, and 2) are considered to be standard input (stdin), standard output (stdout), and standard error (stderr). These file descriptors should start out attached to the console device ("con:"), but your implementation must allow programs to use dup2() to change them to point elsewhere.

For fork, you need to add support for copying the file descriptors from the parent to the newly-created child process.

Managing file systems

chdir(), getdirentry(), fstat(),remove(), rename() sync(), fsync()

The remaining system calls affect the state of the file system itself. You will find that many of these system calls can be easily handled by calling the appropriate VFS functions (after checking for errors with arguments provided by users!).

For this part of the assignment, the system calls should be considered complete when they handle checks for user errors, call the correct lower-level function, and return suitable results or error codes to the user level. If the lower file system layers do not implement the functionality that the system call requires to perform correctly, that is ok. The system calls should work whether they are applied to files in the emufs file system, or the sfs file system.

The general requirements for error codes are detailed in the OS/161 man pages. Specific requirements:

Buffer Cache (50 points)

At this point, the file system should be working well (you should be able to create and remove files for example); however, it would be much nicer if it had a buffer cache to improve access time for frequently-used files.

In principle, the file system buffer cache is very similar to the cache from the synchronization problem in A1. Several changes can be made to make it more suitable for use by the file system, however.

First, the buffer cache will logically be part of the VFS layer, so that any file system implementation can take advantage of the caching features. This has two main effects. Cache buffers must include not only the data block number that they store, but also a pointer to the file system structure and the inode number that the block belongs to. Also, instead of reading/writing the disk directly, data blocks will be read/written by invoking a file-system specific read or write function.

Second, it is often undesirable to have to memcpy entire blocks of data when operating on cached blocks. Sometimes we only need to update a single value in the block. Thus, we provide a buffer_get function which returns a buffer containing the desired block and allows the caller to operate on the data as it wishes. The buffer_get function should ensure that the buffer returned cannot be evicted from the cache while the caller is using it, but must not hold any locks on the buffer cache when it returns. To release a buffer previously obtained with buffer_get we also provide a buffer_put function. We also provide other functions to perform more specialized operations on cached blocks (for example, zeroing a block in the cache). The comments in buffer.h and buffer.c give more details on what these, and other buffer functions should do.

Third, the buffer cache needs to support file or file-system wide sync operations, which find all dirty buffers associated with a given file (inode number) or file system, and flush them to disk.

Finally, we can simplify some of our synchronization requirements from A1. Since blocks can only be accessed through the file that they belong to, and since the file system locks the file's vnode before doing any operations on it, it is not possible for two threads to simultaneously request the same block - the second should be blocked on the vnode lock before it can attempt to access the cache. It is still possible for threads to evict blocks belonging to other files, for which they do not hold the vnode lock, however.

In addition to implementing the buffer cache functions defined in buffer.h, you will need to modify SFS to use the buffer cache. Broadly, this means anytime there is a call to sfs_rblock, sfs_wblock or sfs_io you need to think about whether that operation should go through the buffer cache instead of accessing the disk directly. You will also need to change the sfs_sync and sfs_fsync functions to force any dirty cached blocks to be written back to disk.

We have provided initial setup of the buffer cache functions, including some minor changes to the SFS initialization code. All code associated with the buffer cache should be conditionally-compiled only if OPT_BUFFER is set. You can select this by un-commenting "options buffer" in the ASST3 configuration file. You should continue to isolate changes that support the buffer cache in this way, so that your system call implementations can be compiled and tested independently from your work on the buffer cache. In some cases, it may be more convenient to write new versions of entire functions, rather than trying to isolate frequent small code changes with the "#if OPT_BUFFER" markers.

You can start with the cache implementation from A1 if you like, although you will need to modify it substantially for use with SFS. Things you should think about include:

For this part of the assignment, you should write a short design document explaining how you implemented the buffer cache functions, what changes you made to SFS, and what choices you made for the two design points listed above (and any other that you encountered).

What to Submit