lec14

Why EM converges


•	There is a cost function that is reduced by both the E-step
	and the M-step.

Cost = expected energy – entropy

•	The expected energy term measures how difficult it is to
	generate each datapoint from the Gaussians it is assigned
	to. It would be happiest giving all the responsibility for each
	datapoint to the most likely Gaussian (as in K-means).

•	The entropy term encourages “soft” assignments. It would
	be happiest spreading the responsibility for each datapoint
	equally between all the Gaussians.