CSC148H - Assignment 3


Preamble

This assignment is due 30 November 2009 at 10:00am. No late assignments will be accepted.
Solutions should be clearly written, well documented, and conform to the PEP 8 style guide.
To submit the assignment, log in to the Mark Us webpage. You should be able to submit the required files. Your user name and password is the same as the CDF user name and password. Make sure you can log in as soon as possible, especially if you were late to enroll in the course. You may submit files as often as you like; the most recent version (submitted before Nov. 30, 10:00am) will be the one graded. Frequent submissions are encouraged.
For all questions, you are allowed to write helper methods to help your solution. Remember that any new methods you write should be appropriately named and documented.

Part 1 (50%)

Recall from our discussion in lecture that in the worst case, insertion and searching in a binary search tree are O(n) operations, but in the average case they are O(log n) (as this is the height of a randomly constructed binary search tree).
There exist many different ways of balancing binary search trees, although many of them are fairly complicated. For this part of the assignment you are going to implement one which, while it does not force the height of the tree to be O(log n), any sequence of m insert and search operations, n of which are insert, has worst case runtime O(m log n). We state this without proof, as the proof is very complicated.
To begin, download splaytree.py and node.py. Currently, this is just an iterative implementation of a binary search tree. Start by modifying the insert method so that every node, in addition to a left and right child, also has a reference to its parent. So for a node x, x.parent should refer to the parent of x. If x is the root, x.parent should be None.

Splay Tree

The idea behind these modifications is very similar to the Move-To-Front heuristic explored in one of the labs. Every time a node is searched for or inserted into the tree, we will perform a series of modifications to the tree (called splay steps) to make that node the new root of the tree. There are 7 cases which need to be dealt with. Which case you use in which situation is important; in every situation exactly one case applies. These will be implemented in the Node class, in the method splay.
For a node x, if x is the root of the tree, then x.splay() should do nothing.
If x is the child of the root, then we will perform what a single rotation to make x the root. There are two such cases. If x is the left child, then x.splay() should perform a right rotation:

If x is the right child, then x.splay() should perform a left rotation:

Note that A, B and C are (potentially empty) subtrees, whose parent references need to be updated accordingly. In this case, it is assumed that p was the root, so p.parent is None, and p is x.parent. Note that after a single rotation, the depth of x has decreased by 1; that is to say that it is one step closer to being the root. Additionally, the binary search tree property is maintained; it is probably worth your time to think about why this is the case. Keep in mind that these two cases are only for when the parent of x is the root.
If the parent of x is not the root, then x must have a parent p and a grandparent g. In these cases, we will perform two rotations to move x two spots closer to the root. There are four cases to consider here, which depend on which child p is of g, and which child x is of p.




Again, remember that the parent pointers need to get updated for all nodes involved; in particular, remember that g may or may not have a parent. It is also useful to notice that all four of these cases can be performed by making two rotations on the appropriate nodes. For example, in the fourth case, first a left rotation is performed on x, followed immediately by a right rotation on x. To make sure you understand this, you should draw the intermediate step, and think of which rotations need to be performed on which nodes in each of these four cases. As a suggestion, you may want to (but are not required to) add left_rotate and right_rotate methods to the Node class to simplify your code.

Insertion

To insert a value into a Splay Tree, you simply insert the value into the tree as you would in a regular BST, and then repeatedly perform a splay operation until it is the root. You should always use one of the latter four cases if possible (if x is not the child of the root). If value was already in the tree, then you should find the node which contains value and perform splay operations on it until it is the root. Modify the insert method provided to meet these specifications.

Search

After successfully finding the node with the value we are searching for (say x), repeatedly perform x.splay() until x is the root. If value is not in the tree, then let y be the node that would be its parent if it were inserted; in this case, you should repeatedly perform y.splay() until y is the root of the tree. Modify the search method provided to meet these specifications.

Testing

You will not be required to submit test code, but it is highly recommended you thoroughly test your splay operation. It is recommended you create each of those seven cases by hand (in trees with only 1, 2 or 3 nodes) and make sure your splay method does the right thing for each of those before proceeding. You should then make sure your code works when A, B, C, D are non-empty.

Remarks

Your task may seem complicated at first glance, but the code you need to write is fairly simple; you should make sure you understand how the splay operations work before attempting to write any code. Additionally, since all the cases are disjoint, you can implement them one at a time. It is recommended you implement the first three cases and make sure your code works for those before continuing.

Part 2 (30%)

In lecture, we talked about inversions. Given a list A, an inversion is a pair (i, j), with 0 <= i < j < len(A) such that A[i] > A[j].
For example, consider the list [1,3,2,4,7,6,5]. The inversions are as follows: {(1,2),(4,5),(4,6),(5,6)}.
Fill in code for the method count_inversions in inversions.py which takes as input a list items of unique integers and returns the number of inversions in that list. The method you write must have at least average case running time of O(n log n); a worst case running time of O(n log n) is preferred. A O(n2) will receive no marks.
There are multiple ways of solving this. One is to consider, for a given index j, how many values of i there are such that (i,j) is an inversion; this is exactly how many values in A[0:j] that are greater than A[j]. Consider taking a binary search tree (using the one written in class, or the one provided for part 1) and adding the size attribute to each node (as defined in A2) and try and figure out how to answer the query "given a value, how many elements in the tree are greater than value?" in time O(height of tree). Using this idea, you can get an average case runtime of O(n log n) for counting inversions (which is perfectly sufficient for this assignment). If you are interested in getting a worst case runtime of O(n log n), consider performing the same modification to a Splay Tree (adding the size attribute), and remember that after inserting a value it is always at the root. If you choose to modify code for part 1 to solve part 2, make sure you submit them with different names and in different files so your solution to part 1 does not get overwritten.

Part 3 (20%)

All the sorting algorithms discussed in class are comparison based sorts. It is a fact that any comparison based sorting algorithm must have running time O(n log n). If, however, you have additional information about what you are sorting, you can perhaps do better in certain conditions.
For the purpose of this question, it is assumed that the numbers to be sorted are non-negative integers (with no upper bound on their range). You will implement a different sorting algorithm called radix sort. Radix sort works as follows:
For iteration k of the sort:
1) Start with 10 buckets (lists), labeled 0 through 9.
2) For every number, we put it in the bucket labeled with its kth rightmost digit.
3) We then go through the buckets, starting at 0 and going to 9, reconstructing out list.
4) Repeat for each digit position.
As an example, consider sorting the following list: [99, 00, 32, 14, 51, 19, 24, 04]
For the first iteration, k = 0, and the numbers get inserted into the buckets (according to their rightmost digit) in the following order:
0: 00
1: 51
2: 32 
3:
4: 14 24 04
5:
6:
7:
8:
9: 99 19
To produce this, we went through our list in order, and inserted the numbers to the end of the bucket/list labeled with the rightmost digit. Now we append all these lists together, starting with bucket 0 to get
[00, 51, 32, 14, 24, 04, 99, 19]
We then repeat this process using the second digit from the right (k = 1). Going through our new list, inserting numbers to the back of the appropriate bucket, we get:
0: 00 04
1: 14 19
2: 24
3: 32
4:
5: 51
6:
7:
8:
9: 99
Appending these lists together, we get
[00, 04, 14, 19, 24, 32, 51, 99]
which is now sorted.
Now as it happened all the numbers in are list had at most 2 digits, so we only needed to iterate twice. For this to work for any list, you need to know how many digits are in the longest number and iterate that many times.
Each bucket has numbers inserted to the back, and when we merge all the buckets together we start with those at the front. This is the perfect opportunity to use a queue.
Download radixsort.py and implement the radix sort algorithm. This file comes with a definition of the radix_sort method, as well as a linked list implementation of a queue which supports constant time enqueue and dequeue. You should use this class to implement the 10 buckets.
Some questions to think about before you start programming:
How do you know how many times you should iterate?
Given a number x, how can we find the kth rightmost digit? Remember that x**y computes x to the power of y, and x % y computes the remainder when x is divided by y.

What to Submit

For Part 1: node.py splaytree.py
For Part 2: inversions.py + any other files necessary to run your code
For Part 3: radixsort.py
Note that although you are not required to submit tests for your code, you should test them; much of the marking will be done by automatically running your code on a variety of test cases. You are free to write and submit additional files with your solution. Make sure you submit all files required to run your solution.
Log in to Mark Us with your CDF username and password. Everything must be submitted by Nov. 30, 10am. No late assignments will be accepted.