=========================================================================== CSC 310H Lecture 12 Summary Fall 2007 =========================================================================== ------ Topics ------ - PPM - Dictionary idea - LZ77 --- PPM --- - Use Markov source + Laplace model + Arithmetic coding, but vary the order of the source. - Let K be some maximum order. - When a symbol appears, keep track of it in all contexts of up to K symbols long. - To encode, if symbol never occured in current context of order K, send an "escape" symbol and switch to a lower order context. - Give these "escape" symbols a constant count of 1 in all contexts. - Result: one of the leading methods for text compression. --------------- Dictionary idea --------------- - PPM does so well because, over time, it "learns" the vocabulary of the words appearing in the input string. - Many operations per every symbol make it slow compared to other methods. - Lempel-Ziv is a family of compression schemes that use the past as a dictionary, thus avoiding the need to transmit the dictionary. ---- LZ77 ---- - Use a buffer of size W that contains S previously encoded symbols and W-S new symbols. - Find the longest possible match between the new W-S symbols and a string from the window of size S. Send a pointer to previous occurence, (a number in {0, .., S}), the lenght of the match, and the symbol immediately following the matched substring.