Tutorial Notes: November 7th, 2006 ================================== Outline: 1) Longest Increasing Subsequence Problem 2) Sequence Alignment Problem (textbook section 6.6) Problem #1: The longest increasing subsequence problem ====================================================== [Warm-up: go through this problem quickly as the second problem is much longer!] Formally: Given a sequence a_1, a_2, \ldots, a_n, find the longest subset such that for every i < j, a_i < a_j. Example: Given sequence S = 1, 2, 1, 3, 4, 2, 3, 4, 5, 8, 1, 6, 7, 4 What is the longest increasing subsequence? S = 1, 2, 1, 3, 4, 2, 3, 4, 5, 8, 1, 6, 7, 4 ^ ^ ^ ^ ^ ^ ^ Step 1: Describe an array of values you want to compute. ------------------------------------------------------- Consider the 1-dimensional array A of size n (in the example n=14), where A[i] = the length of the longest increasing subsequence ending at position i The longest increasing subsequence has length max{A[i], 1 \leq i \leq n}. Step 2: Give a recurrence for computing later values from earlier values. ------------------------------------------------------------------------- Initialization: A[1] = 1 Recurrence: max{A[j]: a_j < a_i and 1 \leq j < i} + 1 (if such an A[j] exists) A[i] = 1 (if a_j \geq a_i for all 1 \leq j < i) Step 3: Give a high-level program. ---------------------------------- Instead of giving the algorithm, we will use the recurrence to solve the example. Recall S = 1, 2, 1, 3, 4, 2, 3, 4, 5, 8, 1, 6, 7, 4 A[1] = 1 A[2] = max{A[j]: a_j < a_2 and 1 \leq j < 2} + 1 = 2 A[3] = 1 since a_j \geq a_3 for all 1 \leq j < 3 A[4] = 3 A[5] = 4 A[6] = 2 A[7] = 3 A[8] = 4 A[9] = 5 A[10] = 6 A[11] = 1 A[12] = 6 A[13] = 7 A[14] = 4 Step 4: Show how to use values in the array to compute an optimal solution. ---------------------------------- The optimal length is max{A[i], 1 \leq i \leq n} = 7. To find the optimal solution, start at A[13] and work backwards: Find an array position with A[i] = A[13] -1, i< 13 and a_i < a_13 -> A[12] Find an array position with A[i] = A[12] -i, i< 12 and a_i < a_12 -> A[9] repeat: A[8], a{7], A[6], A[3]. Thus, an optimal solution is a_3, a_6, a_7, a_8, a_9, a_12, a_13. Problem #2: Sequence alignment problem (textbook section 6.6) ============================================================= When we search an online dictionary for the word "ocurrance", it may come back with "Perhaps you mean occurrence?" The dictionary program has deemed that these two words are "similar": o-currAnce occurrEnce [Note: a '-' represents a gap.] In this example, 8 letters are lined up correctly. Intuitively, we are "lining up" two strings, such that the maximum number of letters are lined up. Formally, suppose we are given two strings X and Y, where X = x_1x_2...x_m and Y = y_1y_2...y_n. The symbol indexes are represented by the sets {1, 2,..., m} and {1, 2,..., n}, respectively. We want to find a matching of these sets; in other words: - a set of ordered pairs M = {(i,j),(i',j'), (i'',j''), ... } - i, i', i'', ... are positions in X - j, j', j'', ... are positions in Y - each position in X occurs in at most 1 pair, and the same holds for each position in Y - if (i,j), (i',j') \in M and i