Retrieving documents that are similar
to a query document
We can use an autoencoder to find low-
dimensional codes for documents that allow
fast and accurate retrieval of similar
documents from a large set.
We start by converting each document into a
“bag of words”.  This a 2000 dimensional
vector that contains the counts for each of the
2000 commonest words.