===========================================================================
CSC 310H                     Lecture 6 Summary                    Fall 2007
===========================================================================

------
Topics
------

 -  Optimal codes
 -  Operations that can only improve a code
 -  Huffman codes
 -  Optimality of Huffman codes
 -  reference: section 5.5.

-------------
Optimal codes
-------------

Definition:

    A symbol code is "optimal" if its average codeword length L is as small
    as that of any other symbol code.

IMPORTANT:

    The optimal symbol code for a certain source NEED NOT have L=H! It
    might very well happen that this is unachievable. For example, for any
    2 symbol source, no matter what the probabilities are, the optimal
    symbol code for that source is always {0, 1}, which has L=1. However,
    the entropy of that source depends on the symbol probabilities, and it
    can be very close to 0.

    A symbol code is optimal only means that you cannot improve its L with
    any other symbol code.

Definition:

    A symbol code is "complete" if it satisfies the KM inequality with
    equality, i.e., Sum_{i=1}^I {2^{-l_i}} = 1.

Theorem:

    An ID symbol code is complete iff for every leaf at the last level of
    the tree corresponding to that node, either the leaf is a codeword, or
    one of its prefixes is a codeword.

Idea:

    - An optimal code must be complete (except for the case where some
    symbols have probability 0).

---------------------------------------
Operations that can only improve a code
---------------------------------------

 1. If xby is a codeword for various y and some bit b, and xb'y' is not a
    codeword for b'=/=b and any y', can shorten xby into xy.

 2. If p_i < p_j and l_i < l_j, swap codewords for a_i & a_j.

 3. If (2) above can no longer be applied, we can move the codeword for the
    second least probable symbol to be the sibling of the codeword for the
    least probable symbol.

Idea:

 -  Let C be an optimal ID code. So, (1) and (2) can no longer be applied.
    There exists another optimal code C' in which the codewords for the two
    least probable symbols are siblings.

-------------
Huffman codes
-------------

Ideas:

 -  Take 2 least probable symbols, a_{I-1}, a_I.

 -  Combine them into a meta-symbol a' that has probability the sum of the
    two symbols you start with, p' = p_{I-1} + p_I.

 -  Run Huffman coding algorithm on the set of symbols {a_1, .., a_{I-2},
    a'}.

 -  Take the codeword z generated for a' by this recursive call and make
    z0 and z1 the codewords for a_{I-1} and a_I.

Reference: read section 5.5.

---------------------------
Optimality of Huffman codes
---------------------------

Ideas:

 -  Induction on I: "for *any* source probabilities, Huffman procedure
    applied to a source with I symbols produces an optimal code."

 -  Basis case: I=2, optimal code is always {0, 1}.

 -  Inductive case: assume statement true for I-1, cosider procedure being
    applied to a source with {a_1, .., a_I} and {p_1 >= .. >= p_I}.

 -  Inside Huffman procedure, a code is generated for the source {a_1, ..,
    a_{I-2}, a'}, {p_1, .., p_{I-2}, p'} with p' = p_{I-1} + p_I.

 -  Let C_H = code output for I-symbol source by Huffman procedure.

 -  Let C'_H = code output for (I-1)-symbol source by recursive call inside
    Huffman procedure.

 -  Assume C_H is not optimal for I-symbol source. We will derive a
    contradiction.

 -  Let C_O = optimal code for I-symbol source.

 -  By fact from above, we can assume in C_O codewords for a_{I-1} and a_I
    are siblings, let their parrent be z.

 -  Let C'_O = code for (I-1)-symbol source where z is codeword for a'.

 -  Let L_H, L'_H, L_O, L'_O be the average codeword lengths for C_H, C'_H,
    C_O, C'_O, respectively. Then:

     -  L_H = L'_H + p_{I-1} + p_I

     -  L_O = L'_O + p_{I-1} + p_I

     -  L_O < L_H (by assumption that L_H is not optimal).

     -  L'_H <= L'_O (by induction hypothesis, L'_H is optimal).

 -  These four relations are a contradiction.

Reference: solution to solved exercise 5.16 on pp105-6.