=========================================================================== CSC 373H / L0101 Lecture Summary for Week 12 Winter 2005 =========================================================================== ------------------------ Approximation algorithms ------------------------ Load Balancing: See section 11.1 in text. - Given jobs with durations t_1, ..., t_n and machines M_1, ..., M_m, assign jobs to machines so that overall completion time is minimized. - More precisely, if A(i) = jobs assigned to machine M_i, then let T_i = SUM_{j in A(i)} t_i ("load" of machine M_i). We want to minimize MAX_{i=1..m} T_i ("makespan" = maximum load). - Greedy algorithm: A(1) := A(2) := ... := A(m) := {} T_1 := T_2 := ... := T_m := 0 for j = 1,2,...,n: find i s.t. T_i is minimum A(i) := A(i) U {j} T_i := T_i + t_j - Let T = MAX_{i=1..m} T_i be the makespan of the greedy solution. Let T* be the minimum makespan. We prove lower bounds on T*. (11.1) T* >= 1/m SUM_{j=1..n} t_j because otherwise, T_i < 1/m SUM t_j for each i would mean SUM T_i < m/m SUM t_j, i.e., total work scheduled is less than total work, a contradiction. (11.2) T* >= MAX_{j=1..n} t_j because some machine must be assigned longest job. - Now, consider greedy solution and let M_i be machine with max load (i.e., T = T_i). Let j be last job scheduled on M_i. By greedy property, T_i-t_j (load of M_i just before job j) was smallest of all loads when job j was scheduled, i.e., T_k (final value) >= T_k (up to job j-1) >= T_i-t_j so T_1+...+T_m >= m(T_i-t_j) or equivalently, T_i-t_j <= 1/m SUM_{k=1..m} T_k = 1/m SUM_{k=1..n} t_k (since all jobs scheduled on exactly one machine). Also, t_j <= MAX_{k=1..n} t_k (by definition of max). Hence, T = T_i = (T_i-t_j) + t+j <= 1/m SUM_{k=1..n} t_k + MAX_{k=1..n} t_k <= T* + T* = 2 T*. - Improved algorithm: First, sort jobs by duration, so t_1 >= t_2 >= ... >= t_n. Now, consider first m+1 jobs: for any assignment, some machine must get at least two jobs, and each one takes time >= t_{m+1}, so load must be at least 2 t_{m+1}. Hence, T* >= 2 t_{m+1}. In greedy solution, let M_i have max load. If M_i contains only one job, then it must be t_1 and greedy is optimal. If M_i contains at least two jobs, then let j = index of second job on M_i. j >= m+1 (because jobs 1..m each get assigned to a different machine), so t_j <= t_{m+1}. As before, T = T_i = (T_i-t_j) + t+j <= 1/m SUM_{k=1..n} t_k + t_{m+1} <= T* + T*/2 = 3/2 T*. Weighted Set Cover: See section 11.3 in text. - ASCII notation: "\/" for set union, "/\" for set intersection - Input: U (universe of elements), subsets S_1,...,S_m subset of U with nonnegative integer weights w_1,w_2,...,w_m for each subset. Output: Cover C subset of {1,2,...,m} such that \/_{i in C} S_i = U and SUM_{i in C} w_i is minimum. - Example: U = {a,b,c,d}, S_1 = {a,b}, w_1 = 2, S_2 = {a,c,d}, w_2 = 5, S_3 = {b,e}, w_3 = 1, S_4 = {c,d}, w_4 = 2. C = {1,2} is NOT a cover because S_1 \/ S_2 != U. C = {2,3} is a cover of weight w_2 + w_3 = 5+1 = 6. C = {1,3,4} is a cover of weight 2+1+2 = 5, which is minimum. - Greedy algorithm: // Select sets one by one to try to minimize weight and // maximize number of new elements covered, at the same time. C := {} // cover R := U // remaining elements (i.e., not yet covered) while R != {}: pick i such that w_i / |S_i /\ R| is minimal // this minimizes "weight per new element covered" C := C \/ {i} R := R - S_i return C - Analysis: . After picking i in main loop, for each s in S_i /\ R, let c_s = w_i / |S_i /\ R| (c_s is "cost paid to cover s", used only in analysis). . By definition, each element covered during algorithm is accounted for by exactly one c_s so (11.9) SUM_{i in C} w_i = SUM_{s in U} c_s. . But greedy algorithm might "overpay" for some sets, i.e., SUM_{s in S_k} c_s > w_k; can we bound how much greater? . (11.10) For all S_k, SUM_{s in S_k} c_s <= H(|S_k|) w_k (where H(n) = 1 + 1/2 + ... + 1/n = Theta(log n)).