What can conditional Boltzmann machines do that
backpropagation cannot do?
If we put connections
between the output units, the
BM can learn that the output
patterns have structure and it
can use this structure to
avoid giving silly answers.
To do this with backprop
we need to consider all
possible answers and this
could be exponential.
one unit for each possible output vector
output units
output units
  hidden units
  hidden units
input units
input units