How do we know that the updates improve things?
Updating each Gaussian definitely improves the
probability of generating the data if we generate
it from the same Gaussians after the parameter
updates.
But we know that the posterior will change
after updating the parameters.
A good way to show that this is OK is to show
that there is a single function that is improved by
both the E-step and the M-step.
The function we need is called Free Energy.