 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Divide the
counts in a bag of words vector by N, where N
|
|
is the total
number of non-stop words in the document.
|
|
|
|
– |
The
resulting probability vector gives the probability of
|
|
|
getting
a particular word if we pick a non-stop word at
|
|
|
random
from the document.
|
|
|
| • |
At the output of
the autoencoder, we use a softmax.
|
|
|
|
– |
The
probability vector defines the desired outputs of
|
|
|
the
softmax.
|
|
|
| • |
When we train
the first RBM in the stack we use the
|
|
|
same trick.
|
|
|
|
– |
We
treat the word counts as probabilities, but we
|
|
|
make
the visible to hidden weights N times bigger
|
|
|
than
the hidden to visible because we have N
|
|
|
observations
from the probability distribution.
|
|