The M-step chooses the parameters that
minimize the cost function
(with the assignment probabilities held fixed)
This is easy. We just fit each Gaussian to the data
weighted by the assignment probabilities that the
Gaussian has for the data.
When you fit a Gaussian to data you are maximizing
the log probability of the data given the Gaussian.
This is the same as minimizing the energies of the
datapoints that the Gaussian is responsible for.
If a Gaussian is assigned a probability of 0.7 for a
datapoint the fitting treats it as 0.7 of an observation.
Since both the E-step and the M-step decrease the
same cost function, EM converges.