=========================================================================== CSC 263H Updated Lecture Outline for Week 2 Winter 2004 =========================================================================== [[Q: denotes a question that you should think about and that will be answered during lecture.]] --------- QuickSort [section 4.3 in textbook] --------- The following algorithm sorts an input sequence S in non-decreasing order. QuickSort(S): 1. if |S| <= 1 then return S else: 2. select pivot p in S 3. partition the elements of S into: L = the elements of S less than p E = the elements of S equal to p G = the elements of S greater than p 4. return [QuickSort(L), E, QuickSort(G)] To fully specify algorithm, select _first_ element of S as "pivot" in line 2 (other choices are possible). Worst-case analysis of QuickSort: - Count only comparisons between elements of S. [[Q: where are these comparisons performed?]] - Upper-bound: . each element of S is pivot at most once, [[Q: why?]] . at most all other elements of S are compared to pivot, . so every pair of elements of S compared at most once, . there are (n choose 2) pairs of elements if |S| = n, . so T(n) <= (n choose 2). Hence, T(n) is in O(n^2). - Lower-bound: . Want to find specific input S of size n for which QuickSort(S) performs at least c n^2 comparisons (for some constant c). . Let C(n) denote number of comparisons performed on input [n, n-1, n-2, ..., 2, 1]. . During partition (line 3), pivot n is compared to all other elements (n-1 comparisons) yielding G = [], E = [n], L = [n-1, n-2, ..., 1]. . All other comparisons happen in recursive call QuickSort(L) on input [n-1, n-2, ..., 2, 1], which performs C(n-1) comparisons by definition of C. . Hence, C(n) satisfies recurrence: C(n) = n-1 + C(n-1) for n > 1, C(1) = 0. . Solution of recurrence: C(n) = n-1 + n-2 + ... + 1 + 0 = n(n-1)/2. . By definition, T(n) >= C(n) = (n choose 2). Hence, T(n) is in Omega(n^2). - This means T(n) is in Theta(n^2). Average-case analysis of QuickSort: - Sample space S_n = { all permutations of [1, 2, ..., n] }, with uniform probability distribution. Note implicit assumption in our choice of S_n: no repeated element. [[Q: why is this reasonable?]] - Random variable: t_n(S) = number of comparisons on input S in S_n. Want to figure out the value of t_n(S) where S is a random element of S_n. Do this by breaking up S_n into subsets: S_{n,i} = { all permutations of [1,2,...,n] whose first element is i } Then, Pr[S in S_{n,i}] = (n-1)!/n! = 1/n, and for random S in S_{n,i}, after the partition step, L will be a random permutation of [1,2,...,i-1], i.e., L is a random element of S_{i-1} G will be a random permutation of [i+1,...,n], i.e., G is equivalent to a random element of S_{n-i} - If S is a random permutation whose first element is i, then t_n(S) = n-1 + t_{i-1}(L) + t_{n-i}(G) where L will be a random permutation of [1,...,i-1] and G will be a random permutation of [i+1,...,n] (equivalent to a random permutation of [1,...,n-i]) after the partition step. Then, using partition of S_n into sets S_{n,i}, we can write: T'(n) = E[t_n(S) | S in S_n] n = sum E[t_n(S) | S in S_{n,i}] * Pr[S in S_{n,i}] i=1 n = sum 1/n ( n-1 + E[t_{i-1}(L) | L in S_{i-1}] i=1 + E[t_{n-i}(G) | G in S_{n-i}] ) (from argument above about value of t_n(S)) n = n-1 + 1/n sum ( T'(i-1) + T'(n-i) ) (from def'n of T') i=1 - This gives us the recurrence for T'(n) = E[t_n]: T'(0) = 0, T'(1) = 0, 2 n-1 T'(n) = n-1 + - * sum T'(j) n j=1 [[See if you can solve this recurrence (it's not easy). Hint: write expressions for T'(n) and T'(n-1).]] (St.George lectures finished about here after solving the above recurrence. UTM and UTSc lecturers covered the following material.) Randomized QuickSort: - Average-case analysis above works only if each permutation equally likely. In practice, this may not be true. - Instead of relying on unknown distribution of inputs, randomize algorithm by picking random element as pivot. This way, random behaviour of algorithm on any fixed input is equivalent to fixed behaviour of algorithm on a uniformly random input, i.e., expected worst-case time of randomized algorithm is Theta(n log n). - In general, randomized algorithms are good when there are many good choices but it is difficult to find one choice that is guaranteed to be good. ------------ Dictionaries [sections 2.5.1 and 3.1] ------------ Dictionary ADT (slightly simpler version than textbook): Objects: Sets S, each of whose elements x has a "key" field key[x] that must come from a totally ordered universe (e.g., key[x] could be integer, character, etc.). Not all keys need to be distinct. Operations: ISEMPTY(S): check whether S is empty or not SEARCH(S,k): return x in S s.t. key[x] = k, or NIL if no such x INSERT(S,x): insert x in S DELETE(S,x): remove x from S (given _element_ x, not just its key) [[Q: why not DELETE(S,k)?]] Many variations possible: - require unique keys / allow duplicate keys, - add operations to change information associated with a specific key, - etc. Data structures? [[Think of possibilities, from simple to complex.]]