The advantage of using F to understand EM
There is clearly no need to use the optimal
distribution over hidden configurations.
We can use any distribution that is convenient
so long as:
we always update the distribution in a way that
improves F
We change the parameters to improve F given the
current distribution.
This is very liberating. It allows us to justify all
sorts of weird algorithms.