A spectrum of representations
PCA is powerful because it uses
distributed representations but limited
because its representations are linearly
related to the data
Autoencoders with more hidden
layers are not limited this way.
Clustering is powerful because it uses
very non-linear representations but
limited because its representations are
local (not componential).
We need representations that are both
distributed and non-linear
Unfortunately, these are typically
very hard to learn.
Local         Distributed
PCA
Linear
 non-
linear
What
we
need
clustering