We describe the ``wake-sleep'' algorithm that allows a multilayer, unsupervised, stochastic neural network to build a hierarchical, top-down generative model of an ensemble of data vectors. Because the generative model uses distributed representations that are a non-linear function of the input, it is intractable to compute the posterior probability distribution over hidden representations given the generative model and the current data vector. It is therefore intractable to fit the generative model to data using standard techniques such as gradient descent or EM. Instead of computing the posterior distribution exactly, a ``Helmholtz Machine'' uses a separate set of bottom-up ``recognition'' connections to produce a compact approximation to the posterior distribution. The wake-sleep algorithm uses the top-down generative connections to provide training data for the bottom-up recognition connections and vice versa. In this paper, we show that the wake-sleep algorithm can be generalized to model the temporal structure in sequences of data vectors. This gives a very simple online algorithm that fits generative models which have distributed hidden representations which can be exponentially more powerful than conventional Hidden Markov Models.
in F. Fogelman-Soulie and R. Gallinari (editors) ICANN-95, pp. 483-490.
Hinton, G. E., Dayan, P., Frey, B. J., and Neal, R. M. (1995) ``The ``wake-sleep'' algorithm for unsupervised neural networks'', Science, vol. 268, pp. 1158-1161: abstract, associated references.