Autoencoders, Minimum Description Length and Helmholtz Free
  Energy
  Geoffrey E. Hinton 
  Department of Computer Science
  University of Toronto 
  and
  Richard S. Zemel
  Computational Neuroscience Laboratory
  The Salk Institute
  Abstract
  An autoencoder network uses a set of  recognition
    weights to convert an input vector into a code vector.  It then uses a set of generative 
  weights to convert the code vector into an approximate reconstruction of the input vector.
    We derive an objective function for training autoencoders based on the minimum
  Description Length (MDL) principle.  The aim is to minimize the information required
  to describe both the code vector and the reconstruction error.  We show that this
  information is minimized by choosing code vectors stochastically according to a Boltzmann
  distribution, where the generative weights define the energy of each possible code vector
  given the input vector.  Unfortunately, if the code vectors use distributed
  representations, it is exponentially expensive to compute this Boltzmann distribution
  because it involves all possible code vectors.  We show that the recognition weights
  of an autoencoder can be used to compute an approximation to the Boltzmann distribution
  and that this approximation gives an upper bound on the description length.  Even
  when this bound is poor, it can be used as a Lyapunov function for learning both the
  generative and the recognition weights.  We demonstrate that this approach can be
  used to learn factorial codes.
  Download  [Postscript] [pdf]
  Advances in Neural Information Processing Systems 6. D.S.
  Touretzky, M.C. Mozer and M.E. Hasselmo. MIT Press. 
  [home page]  [publications]