lec2b

Comparison with hidden Markov models

•

Our inference procedure is incorrect because it ignores

the future.

•

Our learning procedure is slightly wrong because the

inference is wrong and also because we use contrastive

divergence.

•

But the model is exponentially more powerful than an

HMM because it uses distributed representations.

–

Given N hidden units, it can use N bits of information

to constrain the future.

–

An HMM can only use log N bits of history.

–

This is a huge difference if the data has any kind of

componential structure. It means we need far fewer

parameters than an HMM, so training is actually

easier, even though we do not have an exact

maximum likelihood algorithm.