lec2b

How to compress the count vector

output

vector

2000 reconstructed counts

•

We train the neural network

to reproduce its input vector

as its output

•

This forces it to compress as

much information as possible

into the 10 numbers in the

central bottleneck.

•

These 10 numbers are then a

good way to compare

documents.

–

See Ruslan

Salakhutdinov’s talk

500 neurons

250 neurons

10

250 neurons

500 neurons


input
vector

2000 word counts