 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
If we want to
communicate the code for a datavector, the
|
|
most efficient
method requires us to pick a code
|
|
|
randomly from
the posterior distribution over codes.
|
|
|
– |
This
is easy if there is only a small number of possible
|
|
codes.
It is also easy if the posterior distribution has a
|
|
|
nice
form (like a Gaussian or a factored distribution)
|
|
|
– |
But
what should we do if the posterior is intractable?
|
|
|
• |
This is
typical for non-linear distributed representations.
|
|
• |
We do not have
to use the most efficient coding scheme!
|
|
– |
If
we use a suboptimal scheme we will get a bigger
|
|
|
description
length.
|
|
|
• |
The
bigger description length is a bound on the minimal
|
|
|
description
length.
|
|
|
• |
Minimizing this bound is a sensible thing
to do.
|
|
|
– |
So
replace the true posterior distribution by a simpler
|
|
|
distribution.
|
|
|
• |
This is
typically a factored distribution.
|
|