Tutorial 5 for CSC 310

Today's tutorial should be about adaptive modeling, Markov models, and
the combination of the two (adaptive Markov models).

You should mention to them (in case they missed the last lecture) that
there is a demonstration package (in C) for adaptive modeling using
arithmetic coding on the course web page, and in /u/radford/310/ac1 on
CDF.  This package will be used for the next assignment, so they may
want to look at it now (it is documented).  They can also use it to
play with stuff related to this tutorial.


The adaptive modeling method gone over in lecture is to estimate the
probability of a symbol by the number of times it has occurred before
plus one, divided by the total number of previous symbols plus I,
where I is the size of the alphabet.  This is the "Laplace" model.

As a first exercise, consider a binary source, with sybols 0 and 1,
and suppose we compress a file that consists of ALL 0s.  For
simplicity, suppose the encoder and decoder both know that the file
contains n symbols, so that no EOF symbol is required.

If we use the Laplace model, how many bits will such a file on n 0s
compress to, if we do the encoding optimally?

One approach: Figure out how many bits are used to encode each 0, then
do the sum.  The i'th 0 will be assigned probability i/(i+1), and
hence will result in log_2 (1 / (i/(i+1))) = log_2 ((i+1)/i) bits being 
added (sooner or later) to the output file.  So the total number of 
bits is

    sum_{i=1}^n  log_2 ((i+1)/i)

Note that (i+1)/i = 1 + 1/i).  Note also that 

    log_2 (1+1/i) = ln (1+1/i) / ln(2)

and that once i is fairly big, 1+1/i is only slightly greater than 1,
and therefore ln (1+1/i) is approximately equal to 1/i.  So the number
of bits needed is approximately

    (1/1 + 1/2 + 1/3 + 1/4 + 1/5 + ... + 1/n) / ln(2)

except that the first few terms aren't quite right.  Does the series
above converge as n goes to infinity?  No, it diverges.  But it diverges
rather slowly, with the sum of n terms being approximately ln(n)

Is there an easier way to figure this out?  Yes.  We could look at the
probability of the entire sequences of 0s.  Multiplying out the
probabilities for each symbol in turn, according to the Laplace model,
we get

   (1/2) (2/3) (3/4) ... (n/(n+1)) = 1/(n+1)

So with an optimal encoder, a file consisting of n 0s should compress
to log_2 (n+1) bits.  For instance, a file of 1000000 0s should
compress to 20 bits, a compression factor of 50000!  (In practice, if 
the file size isn't know in advance, a few more bits would be needed for 
the EOF symbol.)

You could ask how this result is affected by the alphabet size.
Suppose that the data compression program works on bytes (an alphabet
of 256 symbols) rather than bits.  Does that greatly change how much a
file of 1000000 0s is compressed?


After this, you could talk about Markov sources.  In particular, 
consider a binary Markov source in which 

   P(X_{i+1}=1 | X_i=1) = P(X_{i+1}=0 | X_i=0) = 1023/1024

Ie, there is only a 1/1024 chance that the next bit will be different
from the previous bit.  What would a typical sequence generated from
this source look like?  (Answer:  it would have strings of hundreds to 
thousands of consecutive 0s or 1s, but with occasional switches at
intervals of hundreds or thousands.)

If we know the probabilities above, how many bits will an optimal
encoder using this Markov model compress a typical sequence of length
n to?  What if we instead use a model in which the symbols are
independent?


Finally, suppose we don't know the probabilities for the model above,
but we do know that we should use a first-order Markov model.  If we
estimate the probabilities by the Laplace scheme, how many bits will
be required to encode a typical sequence of length 1000000?