===========================================================================
CSC 310H                     Lecture 7 Summary                    Fall 2007
===========================================================================

------
Topics
------

 -  Extensions of the source
 -  Lossy coding
 -  Smallest sufficient subset and essential bit content
 -  Statement of Shannon's Noiseless Coding Theorem
 -  reference: sections 5.6, 4.2, 4.3, 4.6

------------------------
Extensions of the source
------------------------

Reference:

 -  Formally defined in section 4.3, "Extended ensembles".

 -  Huffman coding on extended sources explained in 5.6, "The extra bit".

Ideas:

 -  When encoding the N-th extension, the per-symbol overhead is 1/N.

 -  Increasing N, the number of bits per symbol needed to encode is H.

------------
Lossy coding
------------

Reference: sections 4.2, 4.3.

Ideas:

 -  For lossless coding, the number of bits per symbol is essentially H.
    Can we do better if we are prepared to allow for errors? I.e., lossy
    coding.

 -  For all lossy coding discussion below, we will only consider codes that
    assign codewords of the same length. Even though we might do a bit
    better by using other schemes (such as Humman coding), it turns out
    that the improvement will be minimal.

 -  So, to encode a set of r symbols, we will use a code in which all
    codewords have length ceil(log(r)).

----------------------------------------------------
Smallest sufficient subset and essential bit content
----------------------------------------------------

Reference: sections 4.2, 4.3.

Ideas:

 -  H_0(X) = log(|A_X|), is the "raw bit content" of X. To encode X with an
    equal-length code, we need precisely ceil(H_0(X)) bits per symbol.

 -  For 0 <= delta < 1, S_delta(X) is the smallest subset of A_X such that
    Pr[ x not in S_delta(X) ] <= delta.

 -  To achieve error <= delta, we simply ignore symbols outside S_delta and
    use an equal-length code on S_delta.

 -  H_delta(X) = log(|S_delta|) is the "essential bit content" of X. The
    number of bits per symbol for an equal-length code on S_delta is
    exactly ceil(H_delta(X)).

-----------------------------------------------
Statement of Shannon's Noiseless Coding Theorem
-----------------------------------------------

Reference: section 4.3, 4.6.

 -  As with lossless coding, suppose we perform delta-error lossy coding on
    X^N, the N-th extension of the source X.

 -  The number of bits per symbol we achieve is exactly
    (1/N)*ceil(H_delta(X^N)). We drop the ceiling as it becomes
    insignificant as N increases.

 -  How good is this compared to lossless coding, where we achive almost H?

Theorem [Shannon]:

    For all epsilon > 0, for all 0 < delta < 1, there exists N_0 such that
    for all N > N_0,

        H - epsilon < (1/N) * H_delta(X^N) < H + epsilon

Reference: The proof is in sections 4.4 and 4.5, and it involves using the
           Law of Large Numbers.

Notes:

 -  I will not ask you for details about the proof, but I expect you to
    completely understand the statement of the Theorem. This includes the
    definition of H_delta and the play of the parameters.

 -  The second < says:

    If delta is very small, i.e., we allow for a tiny error probability,
    then as N grows, the per-symbol length of this equal-length code
    approaches H:

        (1/N) * H_delta(X^N) < H + epsilon

    This motivates the claim at the beginning of this lecture that using
    codes other than equal-length codes does not improve things too much.
    The best we can hope for is H anyway.

 -  The first < says:

    If delta approaches 1, i.e., the error probability is huge, then as N
    grows, the per-symbol length we need to encode is still

        (1/N) * H_delta(X^N) > H - epsilon

    which gets arbitrarily close to the entropy H.