Lecture 23

Mode averaging

•

If we generate from the model,

half the instances of a 1 at the

data layer will be caused by a

(1,0) at the hidden layer and half

will be caused by a (0,1).

–

So the recognition weights

will learn to produce (0.5,0.5)

–

This represents a distribution

that puts half its mass on

very improbable hidden

configurations.

•

Its much better to just pick one

mode. This is the best

recognition model you can get if

you assume that the posterior

over hidden states factorizes.


-10 -10

+20 +20

-20


Mode
averaging


True
posterior