A generative view of clustering
We need a sensible measure of what it means to cluster
the data well.
This makes it possible to judge different methods.
It may make it possible to decide on the number of
clusters.
An obvious approach is to imagine that the data was
produced by a generative model.
Then we can adjust the parameters of the model to
maximize the probability density that it would produce
exactly the data we observed.