An apparently crazy idea
Its hard to learn stochastic generative models that use
non-linear distributed representations. This is because
its hard to infer (or sample from) the posterior distribution
over the hidden variables.
Crazy idea: do inference wrong.
Maybe learning will still work
Can we find an objective function that is:
Easy to optimize because it does not require correct
inference.
Easy to justify  because it makes a sensible trade-off.
Has deep connections to statistical physics and information
theory.