Instead of trying to explicitly extract the coordinates of a
datapoint on the manifold, map the datapoint  to an
energy valley in a high-dimensional space.
The learned energy function in the high-dimensional
space restricts the available configurations to a low-
dimensional manifold.
We do not need to know the manifold dimensionality
in advance and it can vary along the manifold.
We do not need to know the number of manifolds.
Different manifolds can share common structure.
But we cannot create the right energy valleys by direct
interactions between pixels.
So learn a multilayer non-linear mapping between the
data and a high-dimensional latent space in which we
can construct the right valleys.