How to compress the count vector
output
vector
2000 reconstructed counts
We train the neural
network to reproduce its
input vector as its output
This forces it to
compress as much
information as possible
into the 10 numbers in
the central bottleneck.
These 10 numbers are
then a good way to
compare documents.
500 neurons
250 neurons
10
250 neurons
500 neurons
input
vector
2000 word counts