Question: ========== Given n jobs with durations d_i, the goal is to schedule them on one machine so as to minimize total completion time. Where total completion time is the sum of all the waiting times (from time 0) of all the jobs. e.g. with durations 2,4,3 the optimal schedule is 2,3,4 and sum of waiting times is 2+5+9=16. Solution: ========== Algorithm ---------- Sort jobs based on completion time: A_1, A_2, ..., A_n with d_1 <= d_2 <=...<= d_n. Schedule in order of shortest to longest completion time. The algorithm is greedy, because at each step we schedule the job with the shortest duration (this is our criterion). Correctness: ------------ We say that a partial scheduling is *promising* if the remaining jobs can be scheduled so as to obtain an optimal schedule. We prove the following loop invariant by induction: "At each step, the algorithm holds a promising partial schedule" In other words, the partial schedule S_i the algorithm holds on stage i, can be extended to an optimal solution S_i^. Base: This is obviously true for S_0, as S_0 contains no selected jobs, and thus can be extended to an optimal solution S_0^. Step: Suppose the statement is true for S_i, and it can be extended to an optimal schedule S_i^. Our goal is to show that S_{i+1} can be extended to an optimal schedule S_{i+1}^. Note that the partial solution S_{i+1} is obtained from S_i by appending the job A_{i+1}. There are two cases to consider: Case 1: Job A_{i+1} is on the i+1-th position in S_i^. In this case S_i^ already extends S_{i+1}, and we can choose S_{i+1}^=S_i^. Case 2: Job A_{i+1} is on some other j-th position in S_i^. We must have j>i+1, because the first i jobs in S_i^ must agree with S_i. We create S_{i+1}^ from S_i^ by moving A_{i+1} from the j-th position to the i+1-th position. [draw a diagram] We claim that this operation does not increase the total completion time, and hence S_{i+1}^ is an optimal solution extending S_{i+1}. There are k=j-(i+1) jobs scheduled before A_{i+1} in S_i. They are moved forward by the operation (and their completion time increases). A_{i+1} is moved backwards by the operation, and its completion time decreases. No other completion times are affected. Denote the durations of the jobs that are moved forward by d_{i_1}, d_{i_2},..., d_{i_k}. Each of these durations is bigger or equal to d_{i+1}. What is the total effect of the change on the total completion time? Each of the k jobs is moved forward by d_{i+1}, giving a contribution of + k d_{i+1}. The job A_{i+1} is moved backwards by the combined length of the k jobs, giving a contribution of -sum_{t=1..k} d_{i_t}. Thus the change in the total completion time is k d_{i+1}-sum_{t=1..k} d_{i_t} <= k d_{i+1}-sum_{t=1..k} d_{i+1} = 0. This shows that the total completion time of S_{i+1}^ is smaller or equal to that of S_i^ (by the optimality of S_i^ it must be equal). Hence S_{i+1}^ is an optimal schedule extending S_{i+1}, which completes the proof. Conclusion: S_n is a complete schedule that can be "extended" to an optimal schedule S_n^. Since there are no activities remaining to schedule, we must have S_n = S_n^ -- optimal. Runtime Analysis: ------------------ Sort : O(n logn) Scheduling: O(n) Therefore, O(n logn). An alternative correctness proof: --------------------------------- Consider an Optimal Scheduling, O_1, .., O_n, of the n jobs. Now consider the scheduling outputted by the algorithm, A_1, .., A_n. Let C() denote the total completion time. Let X() denote the time it takes for the given job. Induction: We want to prove that the order chosen by the algorithm is as good as any optimal scheduling. Base Case: C(O_1) >= C(A_1), since C(A_1) is the shortest completion time of any job. Inductive Hypothesis: Assume C(A_i) <= C(O_i) for all 1 <= i < k Inductive Step: Consider scheduling of the first k jobs by the algorithm. Note: C(A_k) = C(A_(k-1)) + Sum(X(A_i) | i = 1..k) C(O_k) = C(O_(k-1)) + Sum(X(O_i) | i = 1..k) Need C(A_k) <= C(O_k) Since we know from IH that C(A_(k-1)) <= C(O_(k-1)) Therefore, we need Sum(X(A_i) | i = 1..k) <= Sum(X(O_i) | i = 1..k) But from the algorithm A_1, .., A_k are chosen such that they have the smallest completion time. Therefore, Sum(X(A_i) | i = 1..k) must be less than or equal to Sum(X(O_i) | i= 1..k) Therefore, C(A_n) <= C(O_n) by induction. And since C(O_n) is optimal, therefore, C(A_n) = C(O_n). Thus the algorithm produces an optimal scheduling. QED.