lec2b

Mode averaging

•

If we generate from the model,

half the instances of a 1 at the

data layer will be caused by a

(1,0) at the hidden layer and half

will be caused by a (0,1).

–

So the recognition weights

will learn to produce (0.5,0.5)

–

This represents a distribution

that puts half its mass on

very improbable hidden

configurations.

•

Its much better to just pick one

mode and pay one bit.


-10 -10

+20 +20

-20


minimum of
KL(Q\|\|P)


minimum of
KL(P\|\|Q)

P