=========================================================================== CSC 236 Homework Assignment 3 -- Sample Solutions Winter 2008 =========================================================================== 1. First, here is the complete algorithm including all loop invariants and additional comments for correctness (and a little extra whitespace to separate the parts of the algorithm). # Precond: 'node' is a reference to the root of a non-empty BST where # each node stores a numerical value (in field 'datum'), in # addition to references to its 'parent', 'left' child, and 'right' # child. Also, every value in the tree is unique (does not appear # more than once). count = 0 average = 0.0 # LI^1: The subtree rooted at 'node' contains the smallest value in # the entire tree. while node.left: node = node.left # At this point, 'node.datum' is the smallest value in the tree. # LI^2: 'average' stores the average of every value in the tree that # is strictly less than 'node.datum' (or the average of every value # in the tree if 'node' is None), and 'count' stores the number of # values in the tree that are strictly less than 'node.datum' (or # the total number of values in the tree if 'node' is None). while node: count += 1 average += (node.datum - average) / count # Let 'current' denote the value just added to the average # (i.e., 'current' = 'node.datum' for the current node). if node.right: node = node.right # LI^3: The subtree rooted at 'node' contains the next largest # value in the tree after 'current'. while node.left: node = node.left else: # LI^4: 'current' is the largest value in the subtree rooted # at 'node'. while node.parent and node is node.parent.right: node = node.parent node = node.parent # At this point, 'node' is None if 'current' was the largest value # in the tree; otherwise, 'node.datum' is the next largest value in # the tree after 'current'. # Postcond: 'average' stores the average of every value in the tree. Now, we prove each loop invariant, and then the correctness of the entire algorithm. Loop 1: - Proof of LI^1, by induction on the number of iterations: BC: node_0 is the root of the entire tree, which is non-empty by the precondition and therefore contains a smallest value. IH: Suppose k >= 0 and LI^1_k holds. IS: To prove LI^1_{k+1}, suppose that there is an iteration k+1 (otherwise, LI^1_{k+1} = LI^1_k is trivially true by the IH). By the IH, the smallest value in the entire tree is contained in the subtree rooted at node_k. By the loop condition, node_k.left exists. By the BST ordering, every value in the subtree rooted at node_k.left is smaller than the value at node_k. Hence, the smallest value in the subtree rooted at node_k resides in the subtree rooted at node_k.left, and the algorithm correctly sets node_{k+1} = node_k.left. Conclusion: By induction, LI^1 holds at the end of each iteration. - Loop 1 terminates because the height of the tree rooted at node_k is a natural number that strictly decreases with each iteration. (Formally, let E_k = height of subtree rooted at node_k. Then, E_k is a natural number for all k and if there is an iteration k+1, E_{k+1} < E_k.) - At the end of loop 1, 'node' is the root of a subtree that contains the smallest value in the tree (by LI^1), and that does not have a left subtree (since the loop condition was false when the loop stopped). Since node.datum is less than every value stored in the subtree rooted at node.right (if any), this means that node.datum is the smallest value in the entire tree. Before proving the correctness of loop 2, we must prove the correctness of the inside loops (3 and 4). Loop 3: - Observation: every value outside the subtree rooted at 'current' is either strictly less or strictly greater than every value inside the subtree rooted at 'current', i.e., values in the subtree rooted at 'current' form a contiguous segment (with no gaps) of all the values in the tree. - Proof of LI^3, by induction on the number of iterations: BC: node_0 is the root of the right subtree of 'current', all of whose values are greater than 'current' (by the BST ordering), and less than other values greater than 'current', outside of the subtree rooted at 'current' (by the observation above). Hence, LI^3_0 holds. IH: Suppose k >= 0 and LI^3_k holds. IS: To prove LI^3_{k+1}, suppose that there is an iteration k+1 (otherwise, LI^3_{k+1} = LI^3_k is trivially true by the IH). By the IH, the next largest value in the tree after 'current' is contained in the subtree rooted at node_k. By the loop condition, node_k.left exists. By the BST ordering, every value in the subtree rooted at node_k.left is smaller than the value at node_k. Hence, the next largest value in the subtree rooted at node_k resides in the subtree rooted at node_k.left, and the algorithm correctly sets node_{k+1} = node_k.left. Conclusion: By induction, LI^3 holds at the end of each iteration. - Loop 3 terminates because the height of the tree rooted at node_k is a natural number that strictly decreases with each iteration. - At the end of loop 3, 'node' is the root of a subtree that contains the next largest value in the tree after 'current' (by LI^3), and that does not have a left subtree (since the loop condition was false when the loop stopped). Since node.datum is less than every value stored in the subtree rooted at node.right (if any), this means that node.datum is the next largest value after 'current' in the entire tree. Loop 4: - Proof of LI^4, by induction on the number of iterations: BC: node_0 does not have a right child (by the if condition) so its value is the largest in the subtree rooted at node_0 (by the BST ordering), i.e., LI^4_0 holds. IH: Suppose k >= 0 and LI^4_k holds. IS: To prove LI^4_{k+1}, suppose that there is an iteration k+1 (otherwise, LI^4_{k+1} = LI^4_k is trivially true by the IH). By the IH, 'current' is the largest value in the subtree rooted at node_k. By the loop condition, node_k.parent exists and node_k is its parent's right child. By the BST ordering, the values at node_k.parent and in the left child of node_k.parent (if any) are all less than the values in the subtree rooted at node_k, i.e., 'current' is the largest value in the subtree rooted at node_k.parent, and the algorithm correctly sets node_{k+1} = node_k.parent. Conclusion: By induction, LI^4 holds at the end of each iteration. - Loop 4 terminates because the distance (number of edges) from node_k to the root is a natural number that strictly decreases with each iteration. - At the end of loop 4, 'current' is the largest value in the subtree rooted at 'node' (by LI^4) and by the loop condition, either node.parent does not exist (i.e., 'node' is the root of the tree) or 'node' is its parent's left child. If 'node' is the root of the tree, then the next line sets 'node' to None because 'current' is the largest value in the entire tree. Otherwise, the next line sets 'node' to node.parent, which is the next largest value in the tree after 'current' (by the BST ordering and by the observation above for loop 3). Loop 2: - Before proving LI^2, we need to add one statement to the invariant, about the values of 'count' (see the code above). - Proof of LI^2, by induction on the number of iterations: BC: From loop 1, we know that node_0.datum is the smallest value in the tree, so count_0 = 0 is equal to the number of values less than node_0.datum and average_0 = 0.0 is the average of every value in the tree less than node_0.datum, i.e., LI^2_0 holds. IH: Suppose k >= 0 and LI^2_k holds. IS: To prove LI^2_{k+1}, suppose that there is an iteration k+1 (otherwise, LI^2_{k+1} = LI^2_k is trivially true by the IH). By the IH, count_k is the number of values less than node_k.datum and average_k is the average of every value less than node_k.datum, i.e., average_k is equal to S_k / count_k, where S_k is the sum of every value less than node_k.datum. The algorithm now sets count_{k+1} = count_k + 1 and average_{k+1} = average_k + (node_k.datum - average_k) / count_{k+1} = node_k.datum / count_{k+1} + (count_{k+1} average_k - average_k) / count_{k+1} = node_k.datum / count_{k+1} + (count_k average_k) / count_{k+1} = node_k.datum / count_{k+1} + (count_k S_k / count_k) / count_{k+1} = node_k.datum / count_{k+1} + S_k / count_{k+1} = (S_k + node_k.datum) / count_{k+1} = S_{k+1} / count_{k+1}, i.e., average_{k+1} is equal to the average of every value less than *or equal to* node_k.datum. From loops 3 and 4, we know that node_{k+1} is set to None (if node_k.datum was the largest value in the tree) or to the next largest value after node_k.datum. Hence, at the end of the iteration, count_{k+1} is the number of values less than node_{k+1}.datum (or the total number of values in the tree) and average_{k+1} is the average of every value less than node_{k+1}.datum (or the average of every value in the tree), and LI^2_{k+1} holds. Conclusion: By induction, LI^2 holds at the end of each iteration. - Loop 2 terminates because the number of values larger than node_k.datum is a natural number that decreases with each iteration. - At the end of loop 2, 'node' is None (by the loop condition), so 'average' stores the average of every value in the tree (by LI^2). Partial Correctness: For every input that meets the preconditions, if the algorithm terminates, then the postcondition is satisfied, from the point at the end of the proof of LI^2. Termination: Each loop has been shown to terminate, so the entire algorithm terminates. Hence, the algorithm is correct: For every input that meets the preconditions, the algorithm terminates and the postconditions hold. 2. Call this language L_2. The following NFA accepts L_2: -,0,1 -,1 -,0,1 -,0,1 --> q_0 -------> q_1 -------> q_2 ------->(q_3) Every string long enough and whose third-last character is not 0 is accepted because it is possible to take the transition from q_0 to q_1 on the third-last character and end in the accepting state q_3. At the same time, every string that is too short or whose third-last character *is* 0 cannot be accepted because it is not possible to take the transition from q_0 to q_1 on the third-last character to end processing in state q_3. A DFA can be obtained mechanically through the subset construction described in class. Using the same conventions as the example in class (e.g., writing q_023 to represent the set of states {q_0,q_2,q_3}) and writing down the DFA in a table (ASCII picture too messy), with input symbols down the left and states across the top -- initial state q_0, accepting states {q_03, q_013, q_023, q_0123}: q_0 q_01 q_02 q_012 q_03 q_013 q_023 q_0123 - q_01 q_012 q_013 q_0123 q_01 q_012 q_013 q_0123 0 q_0 q_02 q_03 q_023 q_0 q_02 q_03 q_023 1 q_01 q_012 q_013 q_0123 q_01 q_012 q_013 q_0123 It's possible to come up with the same DFA directly, by using states to keep track of the possible combinations of last three characters read. The meaning of each state of the DFA is as follows, using '.' to represent '- or 1' and treating missing characters as 0 (for strings whose length is less than 3): q_0: strings with last three characters 000 q_01: strings with last three characters 00. q_02: strings with last three characters 0.0 q_012: strings with last three characters 0.. q_03: strings with last three characters .00 q_013: strings with last three characters .0. q_023: strings with last three characters ..0 q_0123: strings with last three characters ... 3. Call this language L_3. First, we work out the relationship from #(w) to each of #(w-), #(w0), #(w1), i.e., how the numerical value changes with the addition of one more digit. Say w = d_k ... d_0 (where d_0,...,d_k (- {-,0,1}) and d (- {-,0,1}. #(wd) = d_k 3^{k+1} + ... + d_0 3^1 + d 3^0 = 3 #(w) + d In order to figure out whether or not #(w) is a multiple of 4, we only need to keep track of one piece of information: the value #(w) mod 4. Our FSA A will have 4 states: q_0, q_1, q_2, q_3, where our intention is to have every string w take the FSA to state q_{#(w) mod 4}. Accordingly, the initial state will be q_0 and the accepting states will be {q_0}. The FSA is given by the following diagram (reproduced on the right as a table, in case the diagram is too hard to read). 0 1 -->(q_0)<---> q_1 | q_0 | q_1 | q_2 | q_3 |\ _/| |\ ---|-----|-----|-----|----- -| 0/ |- - | q_3 | q_2 | q_1 | q_0 \| |/ \| 0 | q_0 | q_3 | q_2 | q_1 q_3 <---> q_2 1 | q_1 | q_0 | q_3 | q_2 1 0 To prove the correctness of A (i.e., that L(A) = L_3), first, we prove the following claim by induction on the length of the string w: \-/ i (- {0,1,2,3}, \delta*(q_0,w) = q_i iff #(w) mod 4 = i. BC: \delta*(q_0,\epsilon) = q_0 and #(\epsilon) mod 4 = 0 mod 4 = 0. IH: Suppose k (- N and \-/ w (- {-,0,1}^k, \-/ i (- {0,1,2,3}, \delta*(q_0,w) = q_i iff #(w) mod 4 = i. IS: Suppose w (- {-,0,1}^{k+1}. Since k >= 0, k+1 >= 1 so w = w'd for some w' (- {-,0,1}^k, d (- {-,0,1}. By definition, \delta*(q_0,w) = \delta(\delta*(q_0,w'),d). Let q_i = \delta*(q_0,w') -- by the IH, this implies #(w') mod 4 = i, so #(w) mod 4 = (3 #(w') + d) mod 4 = (3 i + d) mod 4 and it is possible to verify, by inspection, that this equation holds for all transitions in the table above, e.g., \delta(q_1,-) = q_2 because (3 * 1 - 1) mod 4 = 2, \delta(q_3,0) = q_1 because (3 * 3 + 0) mod 4 = 1, etc. By the statement above, we know that for all w (- {-,0,1}*, w (- L(A) iff \delta*(q_0,w) (- {q_0} iff #(w) mod 4 = 0 iff w (- L_3. Hence, L(A) = L_3. Coming up with a RE for L_3 is not easy to do directly. Rather, we apply the state-elimination construction from class to convert A above to a RE. Because q_0 is the only accepting state, we must eliminate every other state. The order of elimination does not matter -- it may yield different final REs, but if it is done correctly these will all be equivalent. - First, eliminate state q_3 (because it has no self-loop -- q_1 would have worked as well): 1 + -0 00 --------> q_1 | 0- + 1 | /|| 0 + -- <-------- || --> (q_0) - + 10||01 + - <-------- || | 1- | ||/ --------> q_2 -1 0+11 - Next, eliminate q_1 (because it has the simplest self-loop) -- too hard to draw in ASCII. Remaining transitions: q_0 -> q_0: 0 + -- + (1 + -0) (00)* (0- + 1) = R_1 q_0 -> q_2: -1 + (1 + -0) (00)* (01 + -) = R_2 q_2 -> q_0: 1- + (- + 10) (00)* (0- + 1) = R_3 q_2 -> q_2: 0 + 11 + (- + 10) (00)* (01 + -) = R_4 - Finally, eliminate q_2, leaving only a self-loop on q_0 with label: R_1 + R_2 (R_4)* R_3 The final RE is: ( R_1 + R_2 (R_4)* R_3 )* = ( 0 + -- + (1 + -0) (00)* (0- + 1) + (-1 + (1 + -0) (00)* (01 + -)) (0 + 11 + (- + 10) (00)* (01 + -))* (1- + (- + 10) (00)* (0- + 1)) )* Which is *obviously* correct... :) Note that -- + (1 + -0) (00)* (0- + 1) == -- + 1(00)*1 + 1(00)*0- + -0(00)*1 + -0(00)*0- == -(00)*- + 1(00)*1 + 1(00)*0- + -0(00)*1 == 11 + 10(00)*01 + 10(00)*- + -(00)*01 + -(00)*- == 11 + (- + 10) (00)* (01 + -) so R_1 == R_4, and similarly, R_2 == R_3. Unfortunately, this would not have allowed us to simplify the final RE.