The M-step chooses the parameters that
minimize the cost function
(with the responsibilities held fixed)
This is easy. We just fit each Gaussian to the data
weighted by the responsibilities that the Gaussian has
for the data.
When you fit a Gaussian to data you are maximizing
the log probability of the data given the Gaussian.
This is the same as minimizing the energies of the
datapoints that the Gaussian is responsible for.
If a Gaussian has a responsibility of 0.7 for a
datapoint the fitting treats it as 0.7 of an observation.
Since both the E-step and the M-step decrease the
same cost function, EM converges.