================================ WEEK 12 =================================== ============================================================================ Dynamic Programming ============================================================================ * Dynamic programming is a general technique that can be used to solve various problems that have the following property: the general problem can be broken up into subproblems in such a way that solutions to the subproblems can be combined to give a solution to the general problem. * For example, recall Floyd's algorithm for the all-pairs shortest paths problem: to compute a solution to the final problem, we first computed solutions to subproblems (where the paths were restricted to use only certain subsets of the vertices as intermediaries), and combined those solutions to get an overall answer. In general, writing a dynamic programming algorithm involves the following steps: 1. Characterize subproblem structure. 2. Define an array that contains the value you want to optimize for all relevant subproblems. 3. Write down a recursive definition for the array. 4. Compute the array bottom-up. 5. Compute a solution from the values you have. In the case of Floyd's algorithm, the array really was D[i,j,k], the length of a shortest path from vertex i to vertex j that uses only vertices from {0,...,k} as intermediaries. * Now, consider the following "Knapsack" problem: You're given a knapsack with capacity C (a nonnegative integer) and a collection of items, each of which has a weight w_i and a value v_i. So formally, your input is a list of nonnegative integer weights w_1,...,w_n, a list of nonnegative integer values v_1,...,v_n, and a nonnegative integer capacity C. You want to fill up your knapsack with items so that you get the maximum possible value; putting in more than one object of the same type is allowed. So formally, your output should be a list of indices i_1,...,i_k (for some k) such that w_{i_1} + ... + w_{i_k} <= C and v_{i_1} + ... + v_{i_k} is maximal. For example, if C = 11, w_1 = 3, w_2 = 4, w_3 = 5, w_4 = 7, v_1 = 2, v_2 = 3, v_3 = 4, v_4 = 5, then the best solution is i_1 = 3, i_2 = 3 for total value 8. * So, how do we solve a problem like this using dynamic programming? The first step is to try to understand the structure of the problem. In other words, *if* we had a solution to the problem, what could we say about this solution in terms of solutions to subproblems? So, imagine that we have a solution i_1,...,i_k. Now, if we look at the first k-1 items in this solution, i_1,...,i_{k-1}, do we know that they form a solution to some subproblem of the original problem? In this case, yes: if i_1,...,i_k gives us the maximum possible value for capacity C, then i_1,...,i_{k-1} must give us the maximum possible value for capacity C - w_{i_k}. * Now that we have this idea, we can define an array as follows: What are we trying to maximize? The total value of the items selected. So we'll keep track of the maximum value possible for certain subproblems. How do we index this array? Well, we've already thought of how solutions break up into subsolutions, so it makes sense to use the following definition: V[j] = maximum value attainable for capacity j. Obviously, the answer to the problem is the value V[C]. Moreover, we can write the following recurrence for V[j], based on the structure of the problem mentioned above: V[0] = 0 V[j] = max { V[j-1], max { v_i + V[j-w_i] } } for j > 0 i : w_i <= j Knapsack problem ---------------- * [Recall problem, array, and recurrence.] * Now, to compute the values according to the recurrence, the most natural thing to do would be to write a recursive function that computes V[C] by calling itself to get the values for smaller indices. Unfortunately, this will be very inefficient: it turns out that we will be recomputing some of the same values many times, which will cause a significant increase in the running time. * Instead, since we know that we will need potentially all values of V for smaller indices to compute V[C], we will simply compute the values bottom-up (starting at j = 0 and letting j increase). From the recurrence, we can directly write an algorithm that computes the values of V[j] according to this idea. I'll write the algorithm "in C++", assuming that `C' is an int containing the maximum capacity, that arrays w[n] and v[n] have already been defined (storing values from index 0 to n-1 in the C++ convention), and that an array V[C+1] has already been declared. V[0] = 0; for (int j = 1; j <= C; j++) { V[j] = V[j-1]; for (int i = 0; i < n; i++) { if (w[i] <= j && v[i] + V[j-w[i]] > V[j]) { V[j] = v[i] + V[j-w[i]]; } //end if } //end for } //end for * What about the actual solution? At the end of this algorith, we only know the *value* of the solution, but not the actual list of items that make it up. There is an easy way to modify this so that we get the actual list, by keeping track of a second array L[j] whose value is the index of the *last* item put in to get the value of V[j]. More precisely, we can add the following few lines to the algorithm to compute the values of L[j]: V[0] = 0; L[0] = -1; for (int j = 1; j <= C; j++) { V[j] = V[j-1]; L[j] = L[j-1]; for (int i = 0; i < n; i++) { if (w[i] <= j && v[i] + V[j-w[i]] > V[j]) { V[j] = v[i] + V[j-w[i]]; L[j] = i; } //end if } //end for } //end for * At the end, how do we get the actual answer from this? Simple: start at L[C], which is the index of the last item to put in, in order to get the maximum value of V[C]. We know that if we put in an item number L[C], we reduce the capacity by w[L[C]], so look next at L[C-w[L[C]]] for the next item, and so on... More precisely, we can output the list of item numbers to put in with this simple loop: for (int j = C; L[j] >= 0; j -= w[L[j]]) { cout << L[j]; } //end for * For the example with C = 11, w_1 = 3, w_2 = 4, w_3 = 5, w_4 = 7, and v_1 = 2, v_2 = 3, v_3 = 4, v_4 = 5, the algorithm will compute the following values: j | V | L For example, the value V[5] is computed like this: ------------- 0 | 0 | 0 V[5] = max { V[4], 2 + V[2], 3 + V[1], 4 + V[0] } 1 | 0 | 0 = max { 3, 2, 3, 4 } = 4 2 | 0 | 0 3 | 2 | 1 and 4 | 3 | 2 5 | 4 | 3 L[5] = 3 (the index of the last item put in to 6 | 4 | 3 get value V[5] = 4). 7 | 5 | 1 8 | 6 | 1 The final answer is (3, 3), as we would expect, 9 | 7 | 2 and as can easily be checked by tracing the output 10 | 8 | 3 routine given above with the values in the table. 11 | 8 | 3 Matrix Chain Multiplication --------------------------- - Consider the following problem: we want to multiply a sequence of matrices A_1 A_2 ... A_n, i.e., to compute the matrix multiplication A_1 x A_2 x ... x A_n. Because matrix product is associative, the answer will be the same no matter what order we do the product in, but the amount of computation might be quite different. - How much computation does it take to multiply two matrices A x B, when A is [n x k] and B is [k x m]? Proportional to n*k*m. So for example, if we have three matrices: A[10x100], B[100x5], and C[5x50], we can compute the product either as (A x B) x C or as A x (B x C). The final answer will be the same, but the total amount of work in the first case will be proportional to (10*100*5) + (10*5*50) = 5000 + 2500 = 7500 operations, while in the second case we get (100*5*50) + (10*100*50) = 25000 + 50000 = 75000, or 10 times more work! - So basically, the problem is that we need to figure out how to put parentheses in the multiplication to indicate an order of performing the products that needs the smallest amount of computation. Could we just look at all possibilities and pick the best one? We could, but unfortunately, it turns out that the number of possible ways to parenthesize a product A_1 x A_2 x ... x A_n is something called a "Catalan number" and is proportional to 4^n... - Let's think about the structure of the problem: suppose that we have one of the best ways of performing the product. Now look at the last multiplication performed: it will be of the form ( A_1 x ... x A_k ) x ( A_{k+1} x ... x A_n ) for some 1 <= k < n. Furthermore, the order of multiplications for this product will also be fully specified in each part. What do we know about this order? If the entire product is done in as efficient a way as possible, then each subproduct must be done as efficiently as possible also (otherwise, we could use a different way of performing the subproduct to get a better answer overall). - So, when trying to find the best order for doing the product, we will keep track of the best ways of performing subproducts and combine them in the right way. This leads us to define the following array, for 1 <= i <= j <= n: C[i][j] = minimum cost of multiplying A_i x A_{i+1} x ... x A_j - Now, we need to find a way to compute the values of C[i][j], i.e., we have to come up with a recurrence relation for our array. The base cases are relatively easy to come up with: how many operations must be performed to find the value of A_i (multiplied with nothing else)? Well, 0, of course. So C[i][i] = 0, for 1 <= i <= n What about the other values? Basically, we know that however the product is computed, there will be one last multiplication performed, and by our reasoning above, the subproducts in that case must also be optimal. Because we don't know ahead of time at which point between A_i and A_j the last multiplication will take place, we can just look at all possibilities and pick the best one! So C[i][j] = min { C[i][k] + C[k+1][j] + s_i*s_{k+1}*s_{j+1} } i <= k < j for 1 <= i < j <= n, where the last term is equal to the number of operations required to perform the multiplication of the matrices (A_i x ... x A_k) and (A_{k+1} x ... x A_j), that have dimensions [s_i x s_{k+1}] and [s_{k+1} x s_{j+1}], respectively. Matrix Chain Multiplication --------------------------- * For the rest of this problem, we will follow C/C++ conventions (i.e., we shift all indices to start at 0 instead of 1) so the input to the problem will be matrices A_0, A_1, ..., A_{n-1}, where A_0 has dimensions [s_0 x s_1], A_1 has dimensions [s_1 x s_2], ..., A_{n-1} has dimensions [s_{n-1} x s_n]. * Recall that we have defined C[i][j] = minimum cost of multiplying A_i x A_{i+1} x ... x A_j and from the subproblem structure, we know that C[i][j] satisfies the following recurrence: C[i][i] = 0, for 0 <= i <= n-1 C[i][j] = min { C[i][k] + C[k+1][j] + s_i*s_{k+1}*s_{j+1} } i <= k < j for 0 <= i < j <= n-1. * Now, we have to compute the values of C[i][j]. Once again, the most natural way to do this would be to write a recursive function that follows the recurrence relation, something like this (for this algorithm, assume that the values s_0, s_1, ..., s_n are given to us in an array s[]): int rec_matrix_product(int i, int j, int *s) { if (i == j) { return 0; } else { int C = rec_matrix_product(i, i, s) + rec_matrix_product(i+1, j, s) + s[i]*s[i+1]*s[j+1]; for (int k = i+1; k < j; k++) { int c = rec_matrix_product(i, k, s) + rec_matrix_product(k+1, j, s) + s[i]*s[k+1]*s[j+1]; if (c < C) C = c; } // for return C; } // if } // rec_matrix_product() What is the running time of this function? It's recursive, so we would need to write down a recurrence for its running time T(n): if we did this and solved the recurrence, we would get T(n) >= 2^n. So from a brute-force solution that takes time 4^n, we've found a recursive solution that takes time 2^n. It's an improvement, but not enough! * Instead, we can easily write a loop that computes the values in the array "bottom-up". We'll want to compute values of C[i][j] for all indices i and j such that j-i = 0, then all indices such that j-i = 1, then j-i = 2, and so on, so that we know that at every step, we have all the values we need to compute C[i][j] according to the recurrence. For this algorithm, assume that memory has already been allocated for the array C[i][j]: for (int i = 0; i < n; i++) C[i][i] = 0; for (int len = 1; len < n; len++) { for (int i = 0; i < n - len; i++) { int j = i + len; C[i][j] = C[i][i] + C[i+1][j] + s[i]*s[i+1]*s[j+1]; for (int k = i+1; k < j; k++) { if (C[i][k] + C[k+1][j] + s[i]*s[k+1]*s[j+1] < C[i][j]) C[i][j] = C[i][k] + C[k+1][j] + s[i]*s[k+1]*s[j+1]; } // for k } // for i } // for len * Running time? We have three nested loops, each one of which iterates at most n times, and the amount of work done inside the innermost loop is constant, so O(n^3). Definitely better than before! * What about the actualy answer (i.e., the actual order of performing the multiplications)? Again, the code above only gives us the minimum cost for computing the product, but not the actual sequence of operations for this product. And as usual, the way to get the sequence of operations is quite simple: for each pair (i,j), keep track of the value of k that gave the best result when finding C[i][j], in a separate array B[i][j]. This requires adding only a few lines to the code above: for (int i = 0; i < n; i++) { C[i][i] = 0; B[i][i] = i; } // for for (int l = 1; l < n; l++) { for (int i = 0; i < n-l; i++) { int j = i + l; C[i][j] = C[i][i] + C[i+1][j] + s[i]*s[i+1]*s[j+1]; B[i][j] = i; for (int k = i+1; k < j; k++) { if (C[i][k] + C[k+1][j] + s[i]*s[k+1]*s[j+1] < C[i][j]) { C[i][j] = C[i][k] + C[k+1][j] + s[i]*s[k+1]*s[j+1]; B[i][j] = k; } // if } // for k } // for i } // for len and writing a separate function that outputs the product. The easiest way to do this is with a recursive function, because of the way the breakpoints are stored: void output_product(int i, int j, int **B) { if (i == j) { cout << "A_" << i; } else { cout << "("; output_product(i, B[i][j], B); cout << " x "; output_product(B[i][j] + 1, j, B); cout << ")"; } // if } // output_product() Conclusion ---------- * Conceptually, dynamic programming is a simple technique used to efficiently compute values that are defined recursively, when the direct recursive computation is inefficient because of many repeated subproblems. * For example, consider the following definition of the Fibonacci numbers: F_0 = 0, F_1 = 1, F_n = F_{n-1} + F_{n-2}, for n > 1. This recursive definition leads very naturally to a recursive program to compute the values: int fib(int n) { if (n == 0) return 0; if (n == 1) return 1; return fib(n-1) + fib(n-2); } How efficient is this? Let's trace the call fib(6): fib(6). :..fib(5). : :..fib(4). : : :..fib(3). : : : :..fib(2). : : : : :..fib(1) : : : : :..fib(0) : : : :..fib(1) : : :..fib(2). : : :..fib(1) : : :..fib(0) : :..fib(3). : :..fib(2). : : :..fib(1) : : :..fib(0) : :..fib(1) :..fib(4). :..fib(3). : :..fib(2). : : :..fib(1) : : :..fib(0) : :..fib(1) :..fib(2). :..fib(1) :..fib(0) As you can see, there are many small values of n that end up being recomputed many times. For instance, in this example, we compute the value fib(3) three times, each time going through all of the recursive calls. Because the function goes through all of the recursive calls each time through, this is quite inefficient. The recursive algorithm certainly produces the correct answer (since it follows exactly the recursive definition of the values), but it does so in a wasteful manner. * If we use the dynamic programming technique, we simply store values in an array, in the right order, so that values can be computed simply by looking-up other values instead of computing them from scratch. In this simple example, we could just store values in increasing order in the array: int fib(int n) { // Allocate memory dynamically for the array. int *array = new int[n + 1]; // Compute the values in the array bottom-up. array[0] = 0; array[1] = 1; for (int i = 2; i <= n; i++) array[i] = array[i-1] + array[i-2]; // Get the answer from the array, then free the memory that was // dynamically allocated before returning the answer. int answer = array[n]; delete[] array; return answer; } Notice that this saves all of the extra work of the purely recursive version, since each value is computed exactly once and then reused (instead of being recomputed each time it is needed). * There is an alternative formulation of the dynamic programming idea, called "memoization": instead of computing the values in the array "bottom-up", we write a function that returns the value stored in the array, if it's been computed, or that computes the value recursively if it hasn't been computed. In other words, we are writing a recursive function with the "twist" that we remember computed values and reuse them. In general, if you know ahead of time that you will need all smaller values (directly or indirectly), then it is slightly more efficient (and easier) to use dynamic programming and compute the values bottom-up. If you do not know what to expect, or if you know ahead of time that you will *not* need every smaller value, then memoization is a useful alternative.