Dept. of Computer Science, University of Toronto

We describe the ``wake-sleep'' algorithm that allows a multilayer,
unsupervised, stochastic neural network to build a hierarchical, top-down
generative model of an ensemble of data vectors. Because the generative model
uses distributed representations that are a non-linear function of the input,
it is intractable to compute the posterior probability distribution over
hidden representations given the generative model and the current data vector.
It is therefore intractable to fit the generative model to data using standard
techniques such as gradient descent or EM. Instead of computing the posterior
distribution exactly, a ``Helmholtz Machine'' uses a separate set of bottom-up
``recognition'' connections to produce a compact approximation to the
posterior distribution. The wake-sleep algorithm uses the top-down generative
connections to provide training data for the bottom-up recognition connections
and *vice versa*. In this paper, we show that the wake-sleep algorithm
can be generalized to model the temporal structure in sequences of data
vectors. This gives a very simple online algorithm that fits generative
models which have distributed hidden representations which can be
exponentially more powerful than conventional Hidden Markov Models.

in F. Fogelman-Soulie and R. Gallinari (editors) *ICANN-95*, pp. 483-490.

Hinton, G. E., Dayan, P., Frey, B. J., and Neal, R. M. (1995) ``The ``wake-sleep'' algorithm for unsupervised neural networks'',Science, vol. 268, pp. 1158-1161: abstract, associated references.