|
|
 |
Extra exercises
Information theory
- Solve the weighing problem for N coins of
which one is odd, and known to be lighter.
You may find it interesting to minimize the
average number of weighings required,
(rather than the maximum number) for the
cases N=6 and N=7.
[D. G. Mead, The average number of weighings to locate
a counterfeit coin,
IEEE Trans IT Sept 1979, vol 25, no 5, 616-617]
Mutual information.
Consider a channel with input $x$ and output $y$.
Let |Y| = some integer greater than 5.
Let |X| = |Y| choose 2.
Let the marginal distribution of X be uniform.
Let the transition matrix from X to Y be a weight-two matrix like this:
(with every 1 entry replaced by 0.5)
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0
0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1 1 0 0 0
0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 1 0
0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1
0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 1
[rows are labelled by y, cols by x.]
Now, assume that one input $x$ is selected randomly and
transmitted { twice\/} over the channel.
Let $y_1$ and $y_2$ be the two outputs.
Find the mutual informations $I(X;Y_1)$
and $I(Y_1;Y_2)$.
Solution:
I(Y1:Y2) = log|Y| - ( 1 + 0.5 log(|Y|-1) )
I(X:Y1) - H(Y1|X) = log|Y| - 2
-
A correlation puzzle [Hard]
-
Investigate the two codes used on CDs, and quantify how far each one is
from the capacity of an appropriate channel model.
[CDs use an error-correcting code and a run-length limiting
code. The ECC is composed of three interleaved reed-solomon codes;
typical parameters in bytes are (N,K)=(28,24) and (32,28).
The inner run-length-limiting code is known as "EFM", which stands for
eight-to-fourteen modulation; this code converts every eight bits into
a string of seventeen bits (yes, seventeen, not fourteen!)
in such a way that the smallest run length is 3 bits, the average run length
is 7 bits, and the maximum runlength is 11 bits.]
- Entropy inequalities
The order-$r$ {R\'enyi} entropy is
H^{(r)}() = FRACTION {1}{1-r} log ( p_i^r ) , {\ for $r 1$}.
Using Jensen's inequality, prove that, for any $r$ this function of $$ is a bound on the
Shannon entropy of $$.
[Hint: I found the case $r=2$ easiest to start with.]
- Compression
One simple way to encode a binary file of length N=10,000 in which a
fraction f=0.01 of the bits are 1s is to send the 100 or so
positions of the 1s using 14-bit integers.
This code has redundancy because there are many ways (roughly 100! = 100x99x98x...x1)
of encoding one file, since the positions can be conveyed in
any order.
In the bits back coding method, this redundancy is
exploited by piggy-backing other data onto the transmission.
In this case, another source of bits can be
used to choose from the 100! possible orders of the bits.
The receiver can then recover the extra bits once the
message has been received.
Implement this bits-back method.
Can you find a way to implement bits-back coding
such that the `extra bits' are actually
the original coded positions? (A bit like a snake swallowing itself!)
-
Coding for a redundant language.
[ABC] -> 2
[DEF] -> 3
[GHI] -> 4
[JKL] -> 5
[MNO] -> 6
[PQRS] -> 7
[TUV] -> 8
[WXYZ] -> 9
| | The standard mobile phone encoder
|
When sending text on mobile phones, some people use the iTap/T9/predictive text
system, in which, for example, HELLO is written as 43556,
and, since no other word in the dictionary is encoded as 43556,
the phone deduces that you wanted HELLO.
However, some words collide under this iTap encoding, for example
KISS and LIPS are both 5477.
Investigate how much better the iTap system would work if
a different encoding of the alphabet A-Z onto the numbers 2..9
were used.
-
Capacity of a simple deletion channel.
A channel has 8 input bits and the output is always exactly 7 bits.
One of the 8 bits is deleted, at random. The other 7 are
transmitted without error. (Note the difference from
an erasure channel, which replaces the erased bits by "?".)
Estimate the capacity of this channel.
Is the optimal input distribution uniform?
Find the highest rate code that you can, that communicates reliably over this channel.
Inference
- Message passing
Imagine you want to help out a dictation school, where students are
given dictation, then the teacher has to count how many errors each
student made in what they wrote.
Write an inference algorithm that aligns
one piece of text with another, assuming that
the second one differs from the first by random
insertions, deletions and substitutions.
Assume a small probability of insertion per character, p_i,
a similar probability of deletion, p_d, and a substitution probability p_s.
Find both the most probable single alignment (using the min-sum algorithm, aka Viterbi),
and the probability distribution of the alignment at any
particular character in the noisy sequence (using the sum-product algorithm).
Can you make a program that, as the second user writes his text,
computes on the fly where the second user has probably got up to?
Remember, the aim is for efficient computation, so try to make a method whose cost is
at most proportional to N per iteration, where N is the length of the first piece of text.
- Message passing II
Imagine that N coins, with biases pn, are tossed, once each;
they are independent but have different biases. The outcome is a binary vector x.
The outcomes (1/0) are summed, and the sum is c.
What is the probability, given this sum, that coin n had outcome 1?
Use a message-passing method to answer this question, and to generate
efficiently a sample from the posterior distribution of x.
[Hint: one option is to make a trellis in which the vertical coordinate
is the partial sum after the nth outcome is added to the total;
the horizontal coordinate runs over the n coins. Run a forward pass
to find the probability of each outcome conceivable outcome c;
run a backward pass from the particular value of c
that occurred; then (just like the path-counting example)
use the product of the forward and backward messages
to pick a path through the trellis from the posterior distribution.]
- Metropolis method
Assume the data set {x} = { 1.1, 0.5, 1.8, 2.1, 0.3 }
come independently from an exponential distribution
P(x|lamba) = exp(-x/lamba)/Z(lambda).
Use a Metropolis method to draw samples from the posterior
distribution of lambda.
(Pick an appropriate prior on lambda.)
- Reversibility
Show, by constructing a simple example, that
Gibbs sampling (in which the K variables are updated
in a fixed sequence)
is not reversible,
even though it is composed of a sequence of
K transition operators that are reversible.
- Change point inference.
Make some data that are a noisy version of
a simple function with a discontinuity.
eg, y = (t < t0) ? y0 : y1
Add Gaussian noise to y to get N data points {d}.
Now imagine that you do not know the time t0 of the
discontinuity. Infer it.
-
Hidden Markov Model.
Implement an algorithm that, given parameters pi, A, B of a hidden Markov model,
and given a data sequence x, will
-
find the posterior probability of each state at each time [using sum-product algorithm]
-
find the most probable state-sequence [max-product algorithm, aka min-sum, aka Viterbi]
-
draw a sample from the posterior distribution
of the hidden sequence [using information from the sum-product algorithm].
Construct a simple example that illustrates why the sequence found by the
min-sum (Viterbi) algorithm may be a poor way of representing the posterior distribution.
-
Life in high dimensions.
Consider a hypersphere of radius 1 and a hypercube of radius r (i.e.,
width of cube is 2r, and diameter of sphere is 2), both in K dimensions.
Set K to 256.
Use simple Monte Carlo methods to find
- The fraction of the hypersphere that is enclosed inside the hypercube as
a function of r, for r from 0 to 1. How small can r go, and still give 95%
of the hypersphere inside the cube?
-
The fraction of the hypercube that is enclosed in the hypersphere as
a function of r, for r from 0 to 1. When the cube contains 95% of the hypersphere,
what fraction of the cube is inside the hypersphere?
(Exercise by John Platt; for further reading see his 2004 paper on bit-indexing.)
Decision theory
- In financial mathematics, it is often assumed that
a stock price will vary in accordance with
a random walk with a drift parameter mu
and a volatility parameter sigma.
d log S = mu dt + sigma dz
where dz is a Wiener process with mean zero and variance dt.
An option is a financial instrument giving the owner
the right (but not the obligation) to buy or sell something.
For example, a gambler interested in owning
a stock in case its value goes up a lot might buy an option to
buy the stock at some future time $T$ for a fixed price $K$
(known as the strike price).
If the price indeed is above $K$ at the expiration time,
the gambler can
exercise the option and turn an instant profit of
$S_T - K$ by immediately selling the stock.
The gambler has to pay an appropriate price for this option.
What should the price of an option be?
A `European' option can be exercised only at its expiration time $T$;
an `American' option may also be exercised at any time
prior to $T$. Chop time up into $N$ steps of duration
$T/N$ and use a discrete-time message-passing algorithm
to work out the correct price for an option of each type,
as a function of the strike price $K$ and the other parameters.
Assume the gamblers who price options wish to maximize their
expected return (not the log of their return) and for simplicity
assume that the interest rate for cash is zero.
(a) Model the variation in the log of the stock
price by a random walk with
discrete steps up or down by $log f$, with
probabilities $p_{+}$ and $p_{-}$ respectively.
[For thoroughness, if you want, show
that $log f$ and $p_{+} 1/2 + epsilon$ are given in terms
of the other parameters and $alpha = mu*mu*T/(N*sigma^2)$
by
epsilon = 0.5 / sqrt( 1.0 + 1.0/alpha )
log f = sigma^2/mu * alpha / (2.0 * epsilon) ]
Work out the expected value of the European option
by propagating backward from the leaves of the
tree of possible prices.
(b) For the American option, your message-passing algorithm
should keep track, for every node in the tree, of
1) the payoff if the option is exercised at that time;
2) the expected payoff if the option is not exercised
at that time, and at subsequent times the optimal decisions
are made.
3) the value of the option (which is the max of those two quantities).
-
Using information content to get insight into algorithm complexity.
Consider an algorithm that sorts a bunch of objects by calling
a subroutine that compares two of them for you, returning
either "+" or "-". For simplicity, assume there are no ties..
How many times must an algorithm call the binary comparison
operation to be able to guarantee that it will correctly
sort the list?
[Hint: Think of the initial state as being a permutation,
which we are conveying to someone. How much information is
there in a permutation?]
Neural networks
- Another derivation of the Hopfield network
Think of associative memory as an inference problem.
Imagine that you know that a set of N memories
{x} (with elements {+/-1}
have given rise to `data' in the form of the Hebbian weight matrix
W = sum x x^T.
Imagine furthermore that a noisy version y of one of the memories x0
is provided.
Your task now is to infer x0
given the available information, W, y, and N.
Some approximations are required in order to get a simple
answer out.
Write down the posterior of x0.
The contribution from y to the likelihood of x0 is a simple bias.
The central quantity to think about is the other likelihood factor,
P(W|x0).
Approximate this quantity by assuming that, conditional on x0,
W_ij is Gaussian-distributed with mean
mu=x0_i x0_j
and variance (N-1)
(treating the contributions from the (N-1) other memories
as independent).
Approximate the likelihood factor P(W|x0) by
prod_{ij} P(W_{ij}|x0).
[An approximation since in fact the other patterns {x} introduce
weak dependencies among the W_{ij}.]
Now write down dynamics for each variable x^0_i that greedily maximize
the posterior probability P(x0|W,y).
What have you got?
Thanks to Máté Lengyel for suggesting this exercise.
|
Site last modified Thu Sep 30 20:34:34 BST 2004
|
|