chapter 13 slide 24. What would be the gradient of the log probability of the data, for the recognition (red) weights?

A: 0. They are not involved in the data generation process.


What is the prior over the top hidden layer of an SBN like?

A: the logistic of the biases. The units are independent.


What is the prior over h of an RBM like?

A: to find out the probability of a particular hidden layer configuration, we have to consider all possible visible layer configurations. Therefore, this prior is not simple (as in SBNs).


What is the posterior over h (given v) of an RBM? Simple or complex?

A: easy. clamp visible configuration. Every hidden configuration has its own energy (together with the visible configuration), which is easy to calculate. The hidden units are independent of each other, in this CONDITIONAL distribution.


What is the posterior over h (given v) of an SBN with just one hidden layer? Simple or complex?

A: Explaining away makes the hidden units (conditionally) dependent, and this makes the distribution complex.


For getting an unbiased estimate of the gradient for an SBN, we need a sample from the posterior distribution over hidden units, given a visible units configuration from the training data. Do we need that, too, for getting an unbiased estimate of the gradient for a BM?