Clustering
We assume that the data was generated from a
number of different classes. The aim is to cluster
data from the same class together.
How do we decide the number of classes?
Why not put each datapoint into a separate class?
 What is the payoff for clustering things together?
What if the classes are hierarchical?
What if each datavector can be classified in
many different ways? A one-out-of-N
classification is not nearly as informative as a
feature vector.