lecture 25

Comparison with hidden Markov models

•

The inference procedure is incorrect because it ignores

the future.

•

The learning procedure is wrong because the inference

is wrong and also because we use contrastive

divergence.

•

But the model is exponentially more powerful than an

HMM because it uses distributed representations.

–

Given N hidden units, it can use N bits of information

to constrain the future. An HMM only uses log N bits.

–

This is a huge difference if the data has any kind of

componential structure. It means we need far fewer

parameters than an HMM, so training is not much

slower, even though we do not have an exact

maximum likelihood algorithm.