=========================================================================== CSC 310H Lecture 11 Summary Fall 2007 =========================================================================== ------ Topics ------ - Assumptions about probabilities. - Guess probabilities. - Compute probabilities. - Adaptively estimate probabilities using counts, Laplace model. - Markov sources. - Markov source + Laplace model + Arithmetic coding - Reference: sections 2.6, 6.2. ------------------------------- Assumptions about probabilities ------------------------------- - So far, we assumed that the source generates symbols: 1. with known probabilities, and 2. independently of one another. - These assumptions are no practical. - For coding to be efficient, we need a good probabilistic model of the source. - For coding to be correct, the encoder and decoder need to use the same probabilities. ------------------- Guess probabilities ------------------- Ideas: - What if we simply guess the symbol probabilities as q_1, .., q_I, when in fact they are p_1, .., p_I. Still assume independence for now. - The entropy of the source is really sum_{j=1}^I{p_j*log(1/p_j)}. - If we were able to obtain the perfect symbol code for this source (the q_i's would have to be powers of 2), then a_i would get a codeword of length log(1/q_i). - Thus, the average codeword length would be sum_{j=1}^I{p_j*log(1/q_j)}. - The difference between these quantities, sum_{j=1}^I{p_j*log(p_j/q_j)}, is called the relative entropy between distributions p and q. - This quantity is 0 if and only if p=q. Thus, when guessing the wrong distribution, we'll have to pay at least that difference between the average encoding length and the entropy. Reference: section 2.6. --------------------- Compute probabilities --------------------- Ideas: - The encoder could go through the entire string and compute the probabilities with which each symbol occurs, then encode using these. - The decoder must be aware of these probabilities, thus they must be communicated as well, probably as a header of the file containing the actual encoding. - Theoretically, this code cannot be optimal because it is not complete. Say we have a bitstring of length n with k 1's in it. If the encoder sends k, the count of 1 bits in a bit string, and then encodes the string using symbol probabilities k/n for 1's and (n-k)/n for 0's, the code is not complete because, after seeing k, it will never be the case that we receive the encoding of a bit string with k-1 1's. - However, this scheme might work well, especially if the number of bits required to send the symbol counts is much less than the size of the message itself. ------------------------------------------------------------- Adaptively estimate probabilities using counts, Laplace model ------------------------------------------------------------- Ideas: - A way to avoid sending over the probabilities is to estimated them using counts of occurences up to the point where we are encoding. In this case, the decoder will have access to this information, so it can use the same probabilities as the encoder. - Given a sequence of coin tosses of length n, in which heads came up k times, we can encode the N+1 toss using Pr[heads] = k/n and Pr[tails] = (n-k)/n. - A slight problem occurs when some of the symbol counts are 0. To avoid it, we will increase all counts by 1. - If for all i, f_i is the count of occurences of symbol a_i so far, we give symbol a_k probability p_k = (f_k + 1)/(sum_{j=1}^I{f_j + 1}). - Arithmetic coding can easily be adapted to change probabilities in this way. - Not clear how to implement this using Huffman coding. - If using an end-of-string marker #, can give it a constant probability p_#, and perform the Laplace estimation on the remaining (1-p_#) of the probability space. - Can check that the probability for a certain N-symbol string X is 1/((N+I-1)!/(I-1)!) * prod_{j=1}^I{n_j!} where n_j is the number of occurences of symbol a_j in X. Reference: end of section 6.2. -------------- Markov sources -------------- Idea: - So far, we assumed symbols are generated independently of one another. This might work for strings of coin tosses, but in general it is rarely the case. Defintion: A K-th order Markov source is one in which the probabilities for generating the next symbol depend on the preceding K symbols. - The probability a specific N-symbol string X is generated by a Markov source of order 2 is Pr[X] = Pr[X_1 = a_{i_1}] * Pr[X_2 = a_{i_2}|X_1 = a_{i_1}] * Pr[X_3 = a_{i_3}|X_2 = a_{i_2} and X_1 = a_{i_1}] * .. Pr[X_N = a_{i_N}|X_{N-1} = a_{i_{N-1}} and X_{N-2} = a_{i_{N-2}}] - Using M(i,j,k) = Pr[X_l = a_k|X_{l-1} = a_j and X_{l-2} = a_i], Pr[X] = Pr[X_1 = a_{i_1}] * Pr[X_2 = a_{i_2}|X_1 = a_{i_1}] * M(i_1, i_2, i_3) * .. M(i_{N-2}, i_{N-1}, i_N) - We call M(i,j,k) the probability of symbol a_k in context (a_i a_j). - Need to handle beginning of the string differently. For example, imagine the context is a special character meaning space-before-string. - If these probabilities are now known, and in general they won't be, they can be estimated with the Laplace model as before. Let F(i,j,k) be the count of occurences of a_k in context (a_i a_j). Then use M(i,j,k) = (F(i,j,k) + 1)/sum_{k'=1}^I{F(i,j,k') + 1} ------------------------------------------------- Markov source + Laplace model + Arithmetic coding ------------------------------------------------- Idea: - Complete (quite potent) coding solution: for some fixed K, use a source model of order K, estimate symbol probabilities using Laplace model, encode symbols using Arithmetic coding. - Performance depends on the choice of K. - large K: - more space needed for keeping track of possible contexts - counts (thus probabilities) more accurate on the long run - small K: - less space - counts for various contexts get populated faster when the number of contexts is small - the model doesn't learn as much on the long run