 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Suppose we could
convert each document into a binary
|
|
|
feature vector
in such a way that similar documents have
|
|
similar feature
vectors.
|
|
|
|
– |
This
creates a “semantic” address space that allows
|
|
|
us
to use the memory bus for retrieval.
|
|
|
| • |
Given a query
document we first use the autoencoder to
|
|
|
compute its
binary address.
|
|
|
|
– |
Then
we fetch all the documents from addresses that
|
|
|
are
within a small radius in hamming space.
|
|
|
|
– |
This
takes constant time. No comparisons are
|
|
|
required
for getting the shortlist of semantically similar
|
|
|
documents.
|
|