lec1a


Stochastic MDL using the wrong distribution

	over codes

•

If we want to communicate the code for a datavector, the

most efficient method requires us to pick a code

randomly from the posterior distribution over codes.

–

This is easy if there is only a small number of possible

codes. It is also easy if the posterior distribution has a

nice form (like a Gaussian or a factored distribution)

–

But what should we do if the posterior is intractable?

•

This is typical for non-linear distributed representations.

•

We do not have to use the most efficient coding scheme!

–

If we use a suboptimal scheme we will get a bigger

description length.

•

The bigger description length is a bound on the minimal

description length.

•

Minimizing this bound is a sensible thing to do.

–

So replace the true posterior distribution by a simpler

distribution.

•

This is typically a factored distribution.