An analogy
In a mixture model, we define the probability of a datavector to be
The learning rule for the mixing proportions is to make them match
the posterior probability of using each Gaussian.
The weights of an RBM implicitly define a mixing proportion for each
possible hidden vector.
To fit the data better, we can leave p(v|h) the same and make
the mixing proportion of each hidden vector more like the
posterior over hidden vectors.