 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
We can use an
autoencoder to find low-
|
|
|
dimensional codes
for documents that allow
|
|
|
fast and accurate
retrieval of similar
|
|
|
documents from a
large set.
|
|
|
• |
We start by
converting each document into a
|
|
|
|
“bag of
words”. This a 2000 dimensional
|
|
|
vector that
contains the counts for each of the
|
|
2000 commonest
words.
|
|