Stacking temporal RBM’s
Treat the hidden activities of the first level
TRBM as the data for the second-level
TRBM.
So when we learn the second level, we
get connections across time in the first
hidden layer.
After greedy learning, we can generate from
the composite model
First, generate from the top-level model
by using alternating Gibbs sampling
between the current hiddens and
visibles of the top-level model, using the
dynamic biases created by the  previous
top-level visibles.
Then do a single top-down pass through
the lower layers, but using the
autoregressive inputs coming from
earlier states of each layer.