We assume that the data was generated from a
number of different classes. The aim is to cluster
data from the same class together.
How do we decide the number of classes?
Why not put each datapoint into a separate class?
What is the payoff for clustering things together?
Clustering is not a very powerful way to model
data, especially if each data-vector can be
classified in many different ways? A one-out-
of-N classification is not nearly as informative
as a feature vector.
We will see how to learn feature vectors later.