How many bits must we send?
Model parameters:
It depends on the priors and how accurately they are
sent.
Lets ignore these details for now
Codes:
If all n clusters are equiprobable, log n
This is extremely plausible, but wrong!
We can do it in less bits
This is extremely implausible but right.
Data misfits:
If sender & receiver assume a Gaussian distribution
within the cluster, -log[p(d)|cluster] which depends on
the squared distance of d from the cluster center.