Current genetics and biochemistry have given prominence to the problem of quickly identifying patterns in sequences of amino acids and other biological objects. Clearly you'd want algorithms for rapidly matching strings and substrings, but other subtler patterns are important. Consider the problem of sequences, subsequences, and supersequences (this example if from Algorithms by Cormen, Leiserson, and Rivest).
We can consider strings to be sequences, for example "ACCTGAA" can be thought of as a sequence of characters with the first character 'A', the second character 'C', the fifth character 'G', and so on (I'm using a 1-origin instead of 0-origin for a reason that will become clear in a moment).
Suppose string X = "ACCTGAA" and string Y = "ATA". Then we say that Y is a subsequence of X (or X is a supersequence of Y), since we can produce Y by deleting characters 2,3, 5, and 7 from X (there's another deletion that also works). Clearly, by this rule, the empty sequence "" is a subsequence of any other sequence.
Given two sequences, = , and = , what is a longest common subsequence of and ? Brute force would mean we would list every subsequence of -- each element is either in or out of each subsequence, leading to possibilities -- and then check whether each is a subsequence of . Exponential algorithms get old really fast.
However, the problem has an optimal-substructure property. Define , the th prefix of , to be the sequence (the special case is the empty sequence -- that's the reason for the 1-origin). Now we can make the following observation about any Longest Common Subsequence (LCS) = of = and = . If , then and is an LCS of and . If and , then is an LCS of and . Symmetrically, if and , then is an LCS of and .
Define a two-dimensional array == the length of an LCS of and . You have the following recursive formula:
You can also pose (and solve) the problem of a shortest common supersequence of and , in an analogous way.
A weirder (to me) result comes from asking: in how many distinct ways is a subsequence of . In other words, how many distinct deletions of letters from yield . In our previous example, X = "ACCTGAA", and Y = "ATA", we'd find that occurs as a subsequence of twice. The empty sequence occurs exactly once as a subsequence of any other sequence.
Counting these subsequence is clearly not possible. Suppose is a sequence of 64 'A's and is a sequence of 32 'A's. Then occurs as a subsequence of in as many ways as you can choose 32 'A's to delete: 64-choose-32 -- greater than .
To make this concrete, suppose you apply to legally change your name to a sequence of characters taken from the set . You quietly obtain DNA samples from friends and acquaintances. Now you count the number of times your new name (let's call it TACCAG) occurs in other people's DNA, and you bill them a dollar for each use of your intellectual property. You might be fabulously rich, but you're faced with the prospect of an exponential algorithm to count the number of licenses to charge for.
But, optimal substructure comes to the rescue. If we denote your new name as , your friend's DNA sequence as , and as the number of times occurs as a subsequence of , we get the following cases