Tutorial 5 for CSC 310 Today's tutorial should be about adaptive modeling, Markov models, and the combination of the two (adaptive Markov models). You should mention to them (in case they missed the last lecture) that there is a demonstration package (in C) for adaptive modeling using arithmetic coding on the course web page, and in /u/radford/310/ac1 on CDF. This package will be used for the next assignment, so they may want to look at it now (it is documented). They can also use it to play with stuff related to this tutorial. The adaptive modeling method gone over in lecture is to estimate the probability of a symbol by the number of times it has occurred before plus one, divided by the total number of previous symbols plus I, where I is the size of the alphabet. This is the "Laplace" model. As a first exercise, consider a binary source, with sybols 0 and 1, and suppose we compress a file that consists of ALL 0s. For simplicity, suppose the encoder and decoder both know that the file contains n symbols, so that no EOF symbol is required. If we use the Laplace model, how many bits will such a file on n 0s compress to, if we do the encoding optimally? One approach: Figure out how many bits are used to encode each 0, then do the sum. The i'th 0 will be assigned probability i/(i+1), and hence will result in log_2 (1 / (i/(i+1))) = log_2 ((i+1)/i) bits being added (sooner or later) to the output file. So the total number of bits is sum_{i=1}^n log_2 ((i+1)/i) Note that (i+1)/i = 1 + 1/i). Note also that log_2 (1+1/i) = ln (1+1/i) / ln(2) and that once i is fairly big, 1+1/i is only slightly greater than 1, and therefore ln (1+1/i) is approximately equal to 1/i. So the number of bits needed is approximately (1/1 + 1/2 + 1/3 + 1/4 + 1/5 + ... + 1/n) / ln(2) except that the first few terms aren't quite right. Does the series above converge as n goes to infinity? No, it diverges. But it diverges rather slowly, with the sum of n terms being approximately ln(n) Is there an easier way to figure this out? Yes. We could look at the probability of the entire sequences of 0s. Multiplying out the probabilities for each symbol in turn, according to the Laplace model, we get (1/2) (2/3) (3/4) ... (n/(n+1)) = 1/(n+1) So with an optimal encoder, a file consisting of n 0s should compress to log_2 (n+1) bits. For instance, a file of 1000000 0s should compress to 20 bits, a compression factor of 50000! (In practice, if the file size isn't know in advance, a few more bits would be needed for the EOF symbol.) You could ask how this result is affected by the alphabet size. Suppose that the data compression program works on bytes (an alphabet of 256 symbols) rather than bits. Does that greatly change how much a file of 1000000 0s is compressed? After this, you could talk about Markov sources. In particular, consider a binary Markov source in which P(X_{i+1}=1 | X_i=1) = P(X_{i+1}=0 | X_i=0) = 1023/1024 Ie, there is only a 1/1024 chance that the next bit will be different from the previous bit. What would a typical sequence generated from this source look like? (Answer: it would have strings of hundreds to thousands of consecutive 0s or 1s, but with occasional switches at intervals of hundreds or thousands.) If we know the probabilities above, how many bits will an optimal encoder using this Markov model compress a typical sequence of length n to? What if we instead use a model in which the symbols are independent? Finally, suppose we don't know the probabilities for the model above, but we do know that we should use a first-order Markov model. If we estimate the probabilities by the Laplace scheme, how many bits will be required to encode a typical sequence of length 1000000?