=========================================================================== CSC 236 Lecture Summary for Week 6 Winter 2008 =========================================================================== General divide-and-conquer recurrences: - Many algorithms written using "divide-and-conquer" technique: split up problem, solve subproblems recursively, combine solutions. Worst-case running times of such algorithms satisfy recurrences of the form: { K if n < b/(b-1) T(n) = { { a_1 T(ceil(n/b)) + a_2 T(floor(n/b)) + f(n) if n >= b/(b-1) (for constants K > 0, a_1 >= 0, a_2 >= 0, b > 1 from the algorithm). Repeated substitution, where a = a_1 + a_2: T(n) = a T(n/b) + f(n) = a (a T(n/b^2) + f(n/b)) + f(n) = a^2 T(n/b^2) + a f(n/b) + f(n) = a^2 (a T(n/b^3) + f(n/b^2))+ a f(n/b) + f(n) = a^3 T(n/b^3) + a^2 f(n/b^2) + a f(n/b) + f(n) After k substitutions: T(n) = a^k T(n/b^k) + \sum_{i=0}^{k-1} a^i f(n/b^i) Base case reached roughly when k = log_b n: T(n) = a^{log_b n} K + \sum_{i=0}^{log_b n - 1} a^i f(n/b^i) = K n^{log_b a} + \sum_{i=0}^{log_b n - 1} a^i f(n/b^i) Cannot be simplified further without more information about f(n). If f(n) in Theta(n^d) (for constant d >= 0), then f(n) = c n^d (approximately) and T(n) = K n^{log_b a} + \sum_{i=0}^{log_b n - 1} a^i c (n/b^i)^d = K n^{log_b a} + c n^d \sum_{i=0}^{log_b n - 1} (a/b^d)^i Case 1: a = b^d, i.e., log_b a = d T(n) = K n^{log_b a} + c n^d \sum_{i=0}^{log_b n - 1} (a/b^d)^i = K n^d + c n^d log_b n = Theta(n^d log n) Case 2: a < b^d, i.e., log_b a < d \sum_{i=0}^{log_b n - 1} (a/b^d)^i 1 - (a/b^d)^{log_b n} 1 - a^{log_b n}/b^{d log_b n} = ----------------------- = ------------------------------- 1 - a/b^d 1 - a/b^d (n^d - n^{log_b a}) / n^d b^d n^d - n^{log_b a} = --------------------------- = --------- ------------------- (b^d - a) / b^d b^d - a n^d T(n) = K n^{log_b a} + c n^d \sum_{i=0}^{log_b n - 1} (a/b^d)^i b^d n^d - n^{log_b a} = K n^{log_b a} + c n^d --------- ------------------- b^d - a n^d = (K - c b^d / (b^d - a)) n^{log_b a} + c b^d / (b^d - a) n^d = Theta(n^d) Case 3: a > b^d, i.e., log_b a > d \sum_{i=0}^{log_b n - 1} (a/b^d)^i (a/b^d)^{log_b n} - 1 a^{log_b n}/b^{d log_b n} - 1 = ----------------------- = ------------------------------- a/b^d - 1 a/b^d - 1 (n^{log_b a} - n^d) / n^d b^d n^{log_b a} - n^d = --------------------------- = --------- ------------------- (a - b^d) / b^d a - b^d n^d T(n) = K n^{log_b a} + c n^d \sum_{i=0}^{log_b n - 1} (a/b^d)^i b^d n^{log_b a} - n^d = K n^{log_b a} + c n^d --------- ------------------- a - b^d n^d = (K + c b^d / (a - b^d)) n^{log_b a} - c b^d / (a - b^d) n^d = Theta(n^{log_b a}) - Master Theorem: If f(n) = Theta(n^d) for constant d >= 0, and a = a_1 + a_2, then the recurrence T(n) above has closed-form solution: { Theta(n^d) if a < b^d, T(n) = { Theta(n^d log n) if a = b^d, { Theta(n^{log_b a}) if a > b^d. Formal proof by induction (using exact recurrence for T(n)) requires proving statement for all powers of b, then extending to other values using fact that T(n) is non-decreasing (i.e., T(n) <= T(m) for all n <= m). See textbook for details. - Same technique can be used to prove upper/lower bounds: f(n) in O(n^d) or Omega(n^d) implies T(n) in O(...) or Omega(...) (same expressions as in Master Theorem), respectively. - Example: MergeSort worst-case time { 3 if n = 1, T(n) = { { T(ceil(n/2)) + T(floor(n/2)) + 21 n + 15 if n > 1, has the right form, with K = 3, a = 2, b = 2, d = 1, so a = b^d and Master Theorem states T(n) in Theta(n^d log n) = Theta(n log n). - Master Theorem also applies to RecBinSearch runtime, with K = 7, a = 1, b = 2, d = 0; so a = b^d and T(n) in Theta(n^d log n) = Theta(log n). - Master Theorem does NOT apply to Factorial or recursive Fibonacci, because their runtimes do not satisfy the appropriate type of recurrence. --------------------- Algorithm Correctness --------------------- Preconditions/postconditions: - "Precondition": statement specifying what conditions must hold _before_ an algorithm is executed (i.e., describes valid inputs). - "Postcondition": statement specifying what conditions hold _after_ an algorithm executes (i.e., describes expected output). - In general, we want weakest reasonable precondition (i.e., put as few constraints as possible, only specify what is strictly necessary) and strongest reasonable postcondition (i.e., specify as much as possible). Termination and partial correctness: - Algorithm correctness with respect to specific pre- and post-conditions usually broken down into two components: . If preconditions hold before execution, then algorithm eventually finishes executing ("termination"); . if preconditions hold before execution, then postconditions hold after execution ("partial correctness"). - Recursive algorithms: prove termination and partial correctness together, by induction on size of input (matches recursive structure of algorithm). - Iterative code: later... Recursive correctness - Recursive binary search: RecBinSearch(x,A,b,e): 1. if b == e: 2. if x <= A[b]: 3. return b else: 4. return e+1 else: 5. m = (b + e) / 2 # integer division 6. if x <= A[m]: 7. return RecBinSearch(x,A,b,m) else: 8. return RecBinSearch(x,A,m+1,e) Precondition? Elements of A comparable with each other and x, 0 <= b <= e < length(A) (assuming array indices start at 0), A[b..e] sorted in nondecreasing order (A[b] <= ... <= A[e]). Postcondition? RecBinSearch(x,A,b,e) terminates and returns index p such that: . b <= p <= e+1; . if b < p, then A[p-1] < x; . if p <= e, then x <= A[p]. Proof of correctness: By induction on size n = e+1-b, prove (precondition and execution) implies postcondition. Inductive structure of proof will follow recursive structure of algorithm. Base case: n = 1, i.e., e = b. Then, algo terminates (lines 1-3 contain no loop or call), and returns b if x <= A[b], e+1 if x > A[e], which satisfies postcondition. Ind. Hyp.: Let n > 1 and suppose postcondition holds after execution for all inputs of size k that satisfy precondition, for 1 <= k < n. Ind. Step: Consider call RecBinSearch(x,A,b,e) when e+1-b = n >= 2. --------------------------------------------------------------------------- [The following was not covered in lecture -- I will go over it *quickly* during the next lecture, so please read it and make sure to bring your questions, particularly if there are any steps that you are unsure about.] Test on line 1 fails, so b < e (since b <= e by precondition and b != e by negation of test) and algo executes line 4. [Exercise: prove b <= floor((b+e)/2) < e for all b < e.] Next, test on line 5 executes. Case 1: x <= A[m]. m < e -> m+1-b < e+1-b so by IH, RecBinSearch(x,A,b,m) returns index p such that: (1) b <= p <= m+1; (2) if b < p, then A[p-1] < x; (3) if p <= m, then x <= A[p]. Hence, . b <= p <= e+1 (from (1) since m < e); . if b < p, then A[p-1] < x (from (2)); . m < e -> m+1 <= e -> p <= e, so we must show x <= A[p]: o if p <= m, then x <= A[p] (from (3)); o if m < p, then A[m] <= A[p] (A is sorted) so x <= A[m] <= A[p]; in all cases, x <= A[p]. Therefore, current call satisfies postcondition. Case 2: A[m] < x. b <= m -> b < m+1 -> e+1-(m+1) < e+1-b so by IH, RecBinSearch(x,A,m+1,e) returns index p such that: (1) m+1 <= p <= e+1; (2) if m+1 < p, then A[p-1] < x; (3) if p <= e, then x <= A[p]. Hence, . b <= p <= e+1 (from (1) since b < m+1); . b < m+1 <= p so we must show A[p-1] < x: o if m+1 < p, then A[p-1] < x (from (2)); o if p <= m+1, then p-1 <= m so A[p-1] <= A[m] < x (A is sorted); in all cases, A[p-1] < x; . if p <= e, then x <= A[p] (from (3)). Therefore, current call satisfies postcondition. In all cases, current call satisfies postcondition. Therefore, by induction, RecBinSearch is correct. - NOTES: This may seem complicated, but only because we thought through borderline cases carefully -- in a sense, we ensured code works in all cases from the start, rather than write code carelessly and waste time fixing it up afterwards.