=========================================================================== CSC 373 Lecture Summary for Week 13 Fall 2008 =========================================================================== ------------- Randomization ------------- - Randomized algorithms make use of random numbers. A very important tool. - We've already seen randomized QuickSort and randomized Selection, that achieve expected runtime O(n log n) and O(n), respectively, independently of the input. The expectation is taken over all possible random choices made during the algorithm's execution. - These are examples of "Las Vegas" algorithms: solution is guaranteed to be correct, but runtime is random (depends on random choices). - Another important class of randomized algorithms is "Monte Carlo": runtime is deterministic, but answer is random (usually, one answer is certain and the other is correct with high probability). - Algorithms where both runtime and output are random are not used in practice... Example: equality testing. - A has bit-string x and B has bit-string y, and they need to determine whether x = y. Measure of interest: how many bits need to be exchanged between A and B. - Isn't this trivial? Just have A send x to B and B do comparison (or the other way around). Communication complexity is \Theta(n). Now suppose x,y represent contents of two 500GB hard drives... - Idea: Prime p is "witness" of x != y if x mod p != y mod p (treating x,y as numbers in binary). - Algorithm: . A chooses prime p in {2,...,n^2} uniformly at random, where n = |x| = |y| (so |p| <= log(n^2) = 2 log n) . A sends x mod p and p to B (at most 4 log n bits) . B computes y mod p and compares it with x mod p: output "same" if equal, "different" if not equal Note: Communication down to 4 log n bits, e.g., if n = 500GB <= 2^45 bits, then 4 log_2 n = 180 bits! - If B outputs "different", then x != y; if B outputs "same", then x may or may not be equal to y. Equivalently, if x = y, then B will always output "same"; if x != y, then B may output "same" or "different". - If x != y, what is the probability that B answers "same"? - Prime number theorem: number of primes in range [1,m] is m/ln m. So number of primes in {2,...,n^2} is n^2/ln(n^2) = n^2/2 ln n. - How many primes satisfy x mod p = y mod p even though x != y? At most n-1 (x mod p = y mod p <=> |x-y| is a multiple of p and since |x-y| < 2^n it has no more than n-1 prime divisors). - So probability that random p is "bad" is at most n-1 2 ln n ------------ <= ------ n^2 / 2 ln n n (e.g., for n = 2^45, probability of error = 62.383.../2^45 = 1.773...x10^-12 = 0.000000000001773...) - If this is "too high", rerun the algorithm K times independently. Communication goes up by a factor of K but probability of error goes down to (2 ln n / n)^K -- very small, very quickly! In fact, probability of error quickly becomes smaller than probability that computer used to run algorithm will crash... Miller-Rabin primality testing: Given m, is m prime? - Recent research result: O(n^3) algorithm (where n = log_2 m). Too slow in practice for large n. Miller-Rabin algorithm is O(n) Monte-Carlo algorihtm with error probability < 1/2. - If MR returns "composite", then m is composite. If MR returns "pseudoprime", then m is probably prime. - If m is composite, probability MR returns "pseudoprime" < 1/2. Run MR k times (increases runtime to O(k log m) but decreases probability of error to 1/2^k). - For most applications where prime numbers are needed (e.g., RSA cryptography), pseudoprime numbers work just as well as prime numbers (even if the pseudoprime number is actually composite). ------------------------------ Backtracking, branch-and-bound ------------------------------ Idea: brute-force (try all possibilities) with cutoff: while constructing possible solutions, rule out any partial solution that cannot be completed. For optimization problem, use easy-to-compute approximation to optimal value to bound best value of current partial solution and rule out bad possibilities early (called "branch-and-bound"). Uses: SAT solvers for constraint satisfaction problems. ------------ Local search ------------ Idea: define notion of "local change" for problem (e.g., replace disjoint edges (u_1,v_1), (u_2,v_2) with (u_1,v_2), (u_2,v_1) in TSP circuit), then starting from some initial candidate, repeatedly make local change as long as it improves value of candidate. Issues: - Runtime may not be polynomial. Stop process after a certain time -- solution will be better than initial, even if not as good as possible. - Locally optimal solutions that are not globally optimal. Handled by running again from multiple starting points, or using "simulated annealing" technique (allowing non-improving changes with some probability that decreases with runtime). Evolutionary Algorithms (genetic programming) are types of local search algorithms. ------ REVIEW ------ Main course topics: - Greedy algorithms (including proofs of correctness) - Dynamic programming - Divide-and-conquer (including runtime analysis and Master Theorem) - Network flow (properties and problem representation) - Linear programming (problem representation) - Approximation algorithms (computing approximation ratio)