 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
|
|
|
|
|
|
 |
Each
time we replace the prior over the hidden units by a better
|
prior, we
win by the difference in the probability assigned
|
|
|
|
|
|
 |
Now
we cancel out all of the partition functions except the top one
|
and replace
log probabilities by goodnesses using the fact that:
|
|
|
|
|
|
 |
|
|
|
 |
This
has simple derivatives that give a more justifiable
|
fine-tuning
algorithm than contrastive wake-sleep.
|
|
|
|
|