A three stage training procedure
(Taylor, Hinton and Roweis)
First learn a static model of pairs or triples of
time frames ignoring the directed temporal
connections between hidden units.
Then use the inferred hidden states to train a
“fully observed” sigmoid belief net that captures
the temporal structure of the hidden states.
Finally, use the conditional RBM model to fine
tune all of the weights.