=========================================================================== CSC 373H / L0101 Lecture Summary for Week 6 Winter 2005 =========================================================================== RNA secondary structure. - Input: A sequence of bases b_1,b_2,...,b_n, each b_i in {A,C,G,U}. Output: A sequence of pairs (i_1,j_1),(i_2,j_2),...,(i_k,j_k) where k is as large as possible and: . for each pair (i,j), 1 <= i < j-4 <= n-4; . for each pair (i,j), {b_i,b_j} = {A,U} or {C,G}; . no index is repeated (i.e., all i's and j's are distinct); . no two pairs "cross", i.e., for all pairs (i,j) and (i',j'), it is NOT the case that i < i' < j < j'. - Step 1: OPT[i,j] = max number of pairs on b_i,...,b_j - Step 2: OPT[i,j] = 0 for all i >= j-4 OPT[i,j] = max of: OPT[i,j-1], 1 + OPT[i+1,j-1] if b_i matches b_j, max( 1 + OPT[i,t-1] + OPT[t+1,j-1] ) for all t in [i+1,j-5] such that b_t matches b_j The first term is the best possible if b_j is unmatched. The second term is the best way to match b_i with b_j (if they match). The last term covers all other possible ways that b_j could be matched, and the best possible answer in each case. - Step 3: Observations: OPT[i,j] depends on previous values "below and to the left" also, we don't need to store OPT[i,j] for values of i >= n-4 or values of j <= 5. for i := n-5 downto 1: for j := i to i+4: OPT[i,j] := 0 for j := i+5 to n: OPT[i,j] := OPT[i,j-1] if b_i matches b_j and OPT[i,j] < 1 + OPT[i+1,j-1]: OPT[i,j] := 1 + OPT[i+1,j-1] for t := i+1 to j-5: if b_t matches b_j and OPT[i,j] < 1 + OPT[i,t-1] + OPT[t+1,j-1]: OPT[i,j] := 1 + OPT[i,t-1] + OPT[t+1,j-1] Example: see page 190 of textbook. - Step 4: (See Assignment 2.) ------------------ Divide and Conquer ------------------ Integer multiplication. - Problem: Multiply two integers x, y, given as sequences of bits x_0,x_1,...,x_{n-1}; y_0,y_1,...,y_{n-1} (low-order bit first, i.e., x = x_{n-1} ... x_1 x_0 in binary, and similarly for y). - Iterative algorithm: Multiply x by each bit of y, shifted appropriately, then add the n results to each other. Runtime = Theta(n^2) (n additions of up to 2n bits each). - Idea: Let X_0 = x_{n/2-1} ... x_1 x_0 and X_1 = x_{n-1} ... x_{n/2}, in binary (n can always be even by adding additional 0's to the left of x); define Y_0 and Y_1 similarly. Then, x = 2^{n/2} X_1 + X_0 and y = 2^{n/2} Y_1 + Y_0, and we can write x y = 2^n X_1 Y_1 + 2^{n/2} X_1 Y_0 + 2^{n/2} X_0 Y_1 + X_0 Y_0 How does this help? Original problem (compute x y) reduced to four subproblems of half size (compute X_1 Y_1, X_1 Y_0, X_0 Y_1, X_0 Y_0), together with some "shift" operations (multiplication by power of 2) and binary additions. This yields recursive algorithm directly. Multiply(x, y): // x, y are arrays of size n if n = 1: return x * y // multiplication of 1-bit numbers else: set arrays X_1, X_0, Y_1, Y_0 p1 := Multiply(X_1, Y_1) p2 := Multiply(X_1, Y_0) p3 := Multiply(X_0, Y_1) p4 := Multiply(X_0, Y_0) return 2^n p1 + 2^{n/2} p2 + 2^{n/2} p3 + p4 Runtime? Recursive algorithm yields recurrence relation for worst-case runtime T(n): T(1) = Theta(1) T(n) = 4 T(n/2) + Theta(n) where 4 T(n/2) comes from the time spent executing the four recursive calls, and Theta(n) comes from the time spent performing shifts and binary additions. Closed form? Master Theorem. - Let f be any nondecreasing function that satisfies the following recurrence, with constants integer a > 0, rational b > 1, real d >= 0, and integer k >= 1: f(n) = Theta(1) if n <= k, f(n) = a f(n/b) + Theta(n^d) if n > k. For example, f(n) could represent the runtime of a recursive algorithm that makes "a" recursive calls, each one to an input of size roughly n/b (ignoring floors and ceilings), in addition to taking time Theta(n^d) to perform work outside of the recursive calls. Then, f(n) has the following closed-form asymptotic solution: f(n) = Theta(n^{log_b a}) if a > b^d, f(n) = Theta(n^d log n) if a = b^d, f(n) = Theta(n^d) if a < b^d. Integer multiplication (continued). - Master Theorem applies to T(n), with a = 4, b = 2, d = 1. Since a = 4 > 2 = b^d, we have T(n) = Theta(n^{log_2 4}) = Theta(n^2). This is no better than simple iterative algorithm! - Trick: using same notation as before, notice that (X_1 + X_0) (Y_1 + Y_0) = X_1 Y_1 + X_1 Y_0 + X_0 Y_1 + X_0 Y_0. This is almost correct expression, except for shifts, and it involves only 1 multiplication instead of 4. Because terms X_1 Y_0 and X_0 Y_1 shift by same amount, we can use this to save one recursive call: x y = 2^n X_1 Y_1 + X_0 Y_0 + 2^{n/2} ( (X_1 + X_0) (Y_1 + Y_0) - X_1 Y_1 - X_0 Y_0 ) - This yields following recursive algorithm: Multiply2(x, y): if n = 1: return x * y // multiplication of 1-bit numbers else: set arrays X_1, X_0, Y_1, Y_0 p1 := Multiply2(X_1, Y_1) p2 := Multiply2(X_1 + X_0, Y_1 + Y_0) p3 := Multiply2(X_0, Y_0) return 2^n p1 + 2^{n/2} (p2 - p1 - p3) + p_3 with runtime T'(n) that satisfies: T'(n) = Theta(1) T'(n) = 3 T(n/2) + Theta(n) The constant hidden by the term Theta(n) is larger than for the first recursive algorithm (we perform more binary additions), but the Master Theorem still applies with a = 3, b = 2, d = 1, which yields T'(n) = Theta(n^{log_2 3}) = Theta(n^{1.58...}). This is strictly better than previous Theta(n^2)! - In practice, best known algorithm is "Fast Fourier Transform" (FFT) algorithm -- a more complicated divide-and-conquer algorithm -- with runtime Theta(n log n log log n).