Dealing with compositional structure
Consider a dataset in which each image contains N
different things:
 A distributed representation requires a number of
neurons that is linear in N.
A localist representation (i.e. a mixture model)
requires a number of neurons that is exponential in N.
Mixtures require one model for each possible combination.
Distributed representations are generally much harder to
fit to data, but they are the only reasonable solution.
Boltzmann machines use distributed representations
to model binary data.