=========================================================================== CSC 310H Lecture 3 Summary Fall 2007 =========================================================================== ------ Topics ------ - Mathematical setup - Symbol codes - Uniquely decodable, instantaneously decodable - Kraft-McMillan inequality - reference: sections 5.1, 5.2. ------------------ Mathematical setup ------------------ The source produces symbols from an alphabet A_X = {a_1, .., a_I}. We denote the N-symbol string to be encoded by X = (X_1, .., X_N). We want to encode X into an M-symbol string Z = (Z_1, .., Z_M) over alphabet A_Z. In general, we consider A_Z = {0, 1}. Assumptions for now: - The channel between the encoder and the decoder is noiseless. - We want lossless decoding. - X is generated by a fixed, known, probabilistic model P(X) where each symbol a_i is produced with some probability p(a_i) independently of other symbols produced. ------------ Symbol Codes ------------ Idea: encode one symbol of X at a time, concatenate encodings with no punctuation. Example: - source alphabet: A_X = {C, G, T, A} - one possible code: C->0, G->10, T->110, A->1110. - X = CGGAT -> Z = 010101110110 Reference: read section 5.1. Important: - definition of a symbol code - definition of a codeword - good symbol code has the following properties: 1. unique decoding 2. easy to decode 3. good compression (we didn't worry about this in this lecture) - definition of uniquely decodable (UD) - definition of instantaneously decadable (ID) ------------------------- Kraft-McMillan Inequality ------------------------- Theorem [Kraft]: IF Sum_{i=1}^I {2^{-l_i}} <= 1, THEN there exists an ID code with codeword lengths l_i. Theorem [McMillan]: IF a code is UD, THEN Sum_{i=1}^I {2^{-l_i}} <= 1. Reference: read section 5.2