lecture 24


The non-linearity used for reconstructing

	bags of words

•

Divide the counts in a bag of words vector by N, where N

is the total number of non-stop words in the document.

–

The resulting probability vector gives the probability of

getting a particular word if we pick a non-stop word at

random from the document.

•

At the output of the autoencoder, we use a softmax.

–

The probability vector defines the desired outputs of

the softmax.

•

When we train the first RBM in the stack we use the

same trick.

–

We treat the word counts as probabilities, but we

make the visible to hidden weights N times bigger

than the hidden to visible because we have N

observations from the probability distribution.