How to compress the count vector
output
vector
 2000  reconstructed counts
We train the neural network
to reproduce its input vector
as its output
This forces it to compress as
much information as possible
into the 10 numbers in the
central bottleneck.
These 10 numbers are then a
good way to compare
documents.
See Ruslan
Salakhutdinov’s talk
500 neurons
250 neurons
10
250 neurons
500 neurons
input
vector
     2000  word counts