Assignment #4 - Sorting and Linked Lists

Due Tuesday August 4, 2009 1pm

(You must submit electronically as well as submit a printed copy to the drop box in BA2220.)

Starter Code

Download linked_list.py and node.py. These modules contain the definitions for the LinkedList and Node classes used in Part 2 and Part 4. You may not make any changes to these modules.

Part 1 - Counting Out-of-Sequence Course Pairs

Jill is planning to enroll as a part-time student at university. She wants to take 1 course per semester until she completes all her course requirements. Before she can begin the program, however, she has to submit to her student advisor the sequence of courses that she proposes on taking. Since she is overly eager about beginning university, she forgets to check for course prerequisites. Checking the prerequisites is especially important at the university at which she is enrollng, since for each pair (X,Y) of courses offered by the university, either X is a prerequisite for Y, or Y is a prerequisite for X. Your job, as her student advisor, is to count the total number of out-of-sequence course pairs that is in her proposed sequence of courses.

More precisely, assume that the proposed sequence of courses that Jill submits to you is a list of strings, where each string in the list is a course name. We say that a pair of courses (X,Y) is out-of-sequence in this list if X has Y as a prerequisite, but X appears before Y in the list. You must count the total number of such pairs in the list.

For example, suppose Jill's list of proposed courses is ["A", "B", "C", "D"], and that the course prerequisites are as follows: "A" has "B", "C", and "D" as prerequisites, "B" has "C" and "D" as prerequisites, and "C" has "D" as a prerequisite. Then the total number of out-of-sequence course pairs in her proposed list is 6: ("A","B"), ("A","C"), ("A","D"), ("B","C"), ("B","D"), and ("C","D"). Assuming the same prerequisites, the list ["B","A","D","C"] has 4 out-of-sequence course pairs.

Write a function count_pairs(prereqs, courses) that returns the total number of out-of-sequence course pairs in Jill's proposed course sequence. The prereqs argument is a nested python dictionary, whose keys are course names. For any pair of courses with names course1 and course2, the entry preqreqs[course1][course2] will map to True if course1 is a prerequisite for course2, and False otherwise. The courses argument is a python list containing course names (i.e., strings) and represents Jill's proposed sequence of courses.

The naive way to implement count_pairs is to use a nested loop in which you count for each course X how many courses appearing after X are prerequisites for X. This approach has time complexity O(n2), where n is the total number of courses in the proposed sequence. It turns out that it is actually possible to count the total number of out-of-sequence course pairs much more efficiently, with worst-case time complexity O(n log(n)). Your solution must implement this more efficient approach.

There are a number of assumptions you may make:

Hint: Consider sorting the course list by prerequisite using a modified version of merge sort. Think about what information you can determine about the number of out-of-sequence course pairs during the merging of left and right sublists.

Part 2 - In-Place Merge Sort

Recall that one of the drawbacks of the recursive and iterative merge sort algorithms that we studied in class is that they need auxiliary space proportional to the size of the list being sorted. This is in contrast to sorting algorithms like bubble sort, selection sort, and insertion sort, which are "in-place" sorting algorithms. These algorithms use only a constant amount of temporary space in addition to the space occupied by the original list being sorted.

Your job in this part is to implement an in-place iterative version of merge sort that sorts a linked list. In particular, write a function mergesort_ll(alist) that takes an instance of LinkedList as an argument, and sorts the list using only a constant amount of memory in addition to the memory occupied by the list. Your function should have the same time complexity as the merge sort algorithms that we studied in class -- i.e., O(n log(n)). Your function should not return anything. (It isn't necessary for your function to return a newly sorted list, since your function is performing an in-place sort. The original linked list that was passed as an argument should be sorted in non-descending order when your function completes.)

Hint: You are likely using more than a constant amount of additional memory if you are using python lists or tuples anywhere in your code and their size grows to the same size as the list being sorted, or you are repeatedly creating new instances of the Node or LinkedList classes.

If you are stuck, try drawing some linked list diagrams and use them to reason about how your code should work.

Part 3 - Pancake Sorting

(Note: in the discussion below when we refer to a "stack", we always mean a "pancake stack"; we are not referring to the Stack ADT.)

The "Pancake Problem" was originally posed in an article of American Mathematical Monthly1, as follows:

The chef in our place is sloppy, and when he prepares a stack of pancakes they come out all different sizes. Therefore, when I deliver them to a customer, on the way to the table I rearrange them (so that the smallest winds up on top, and so on, down to the largest at the bottom) by grabbing several from the top and flipping them over, repeating this (varying the number I flip) as many times as necessary. If there are n pancakes, what is the maximum number of flips (as a function f(n) of n) that I will ever have to use to rearrange them?

We won't concern ourselves with determining an expression for f(n). Instead, we're interested in implementing the sorting approach described in the above quote. We can think of a list as a stack of pancakes, where the left end of the list represents the top of the stack, and the right end represents the bottom of the stack. The only way we can move items around in the list is by reversing some prefix of the list -- that is, by flipping some top part of the pancake stack. For example, if the list to be sorted is [10, 5, 17, 3], we would sort the list as follows: (i) reverse the prefix [10,5,17] so that the list becomes [17, 5, 10, 3]; (ii) reverse the whole list, so the list becomes [3,10,5,17]; (iii) reverse the prefix [3,10], so the list becomes [10,3,5,17]; (iv) reverse the prefix [10,3,5], so the list becomes [5,3,10,17]; (v) reverse the prefix [5,3] so the list becomes [3,5,10,17]. After the last step, the list is sorted.

The most straight-forward pancake sorting algorithm is a variant of selection sort. Recall that selection sort operates by executing a number of passes on the list being sorted. In each pass, selection sort finds the next largest item in the list and places it in its correct position. The pancake sorting algorithm imitates this behaviour also by executing a number of passes. In each pass, the next largest "pancake" in the list is found and the stack is flipped to bring that pancake to the top of the stack. This is followed by another flip to move the top item to its correct position.

First, implement a non-recursive function reverse(alist, k) that performs a prefix reversal on the first k elements of alist. The argument alist is a python list, and the argument k is a non-negative integer. For example, a call to reverse([10,3,5,17],3) will change the list [10,3,5,17] to [5,3,10,17]. This function does not return anything and should operate on the list in-place. Your function should only use a constant amount of memory in addition to the memory allocated to alist, and should have time complexity O(k).

Using the reverse function, write a non-recursive function pancake_sort(alist) that runs an in-place pancake sort on the alist argument, which is a python list. Your function should not move any items around in the list other than via calls to the reverse function, should only use a constant amount of memory in addition to the memory allocated to the original list, and should have time complexity O(n2).

Part 4 - Strawberry and Blueberry Pancake Sorting with Linked Lists

A variant of the "Pancake Problem" is the "Strawberry and Blueberry Pancake Problem". In this variant you are given an equal number of strawberry and blueberry pancakes in a stack, where each pancake in the stack is a different size from every other pancake. The goal is to rearrange the pancakes so that the largest pancake (strawberry or blueberry) is on the bottom, and that the rest of the stack is an alternating sequence of strawberry and blueberry pancakes, with each "flavour subsequence" appearing in order of pancake size. That is, if you were to remove all the strawberry pancakes from the stack, then the remaining blueberry pancakes would be in ascending order of size from top to bottom; likewise, if you were to remove all the blueberry pancakes from the stack, then the remaining strawberry pancakes would be in ascending order of size from top to bottom.

Examples
(We use regular python list notation to represent the lists in these examples, but you should actually think of them as linked lists.)

Example 1. Suppose you are given the pancake stack [10,3,17,15], where the first two elements are strawberry pancakes, and the next two are blueberry pancakes. The goal is to perform some number of pancake flips (i.e., prefix reversals) to get the stack to be [3,15,10,17]. The largest pancake is blueberry and is 17, so it's on the bottom. Next comes the largest strawberry pancake, 10. After this the next largest blueberry pancake, 15, and then at the very top the smallest strawberry pancake, 3.

Example 2. Suppose you are given the pancake stack [20, 14, 25, 10, 12, 11, 13, 15], where the first four elements are strawberry pancakes and the next four are blueberry pancakes. You must perform some number of pancake flips (i.e., prefix reversals) to get the stack to be [11,10,12,14,13,20,15,25].

Details

First, implement a non-recursive function reverse_ll(alist, k) that performs a prefix reversal on the first k nodes of alist. The argument alist is an instance of LinkedList, and k is a non-negative integer. This function does not return anything and should operate on the list in-place. Your function should only use a constant amount of memory in addition to the memory allocated to the nodes in alist, and should have time complexity O(k).

Using the reverse_ll function, write a non-recursive function pancake_sort_ll(alist) that takes an instance of LinkedList as an argument and performs an in-place strawberry and blueberry pancake sort on the list. The alist argument is guaranteed to contain an even number of nodes, where the first n/2 nodes represent strawberry pancakes and the next n/2 elements represent blueberry pancakes. You may assume for this part of the assignment that the data items in the nodes will be positive integers. (Do not make this assumption about the items being sorted in parts 2 and 3, though.) Your function should not move nodes around in the list other than via calls to the reverse_ll function, should only use a constant amount of memory in addition to the memory allocated to the original list, and should have time complexity O(n2).

Hint: The assumption that data items are positive integers may help you achieve the required space and time complexity. You may find it helpful to modify the data items during the course of execution, as long as the data items are changed back to their original values before your function terminates. Keep in mind that you cannot move items around other than by calling reverse_ll, so any changes that you make to a data item should not depend on the value of any other data item.

What to Submit

  1. Part 1 - submit your solution in the module part1.py.
  2. Part 2 - submit your solution in the module part2.py.
  3. Part 3 - submit your solution in the module part3.py.
  4. Part 4 - submit your solution in the module part4.py.

NOTE:

You need to submit electronically, and also submit a printed copy of your code. Be sure that you include your cdf id and student number in the hard copy that you hand in. The drop box for CSC148 is in BA2220.

References

1 Amer. Math. Monthly 82 (1975) 1010.