Avoiding local optima
EM can easily get stuck in local optima.
It helps to start with very large Gaussians that
are all very similar and to only reduce the
variance gradually.
As the variance is reduced, the Gaussians
spread out along the first principal component
of the data.