 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
We have only
implemented it for a million
|
|
|
documents with
20-bit codes --- but what could
|
|
|
|
possibly go
wrong?
|
|
|
|
– |
A
20-D hypercube allows us to capture enough
|
|
of
the similarity structure of our document set.
|
|
|
• |
The shortlist
found using binary codes actually
|
|
|
improves the
precision-recall curves of TF-IDF.
|
|
|
|
– |
Locality
sensitive hashing (the fastest other
|
|
|
method)
is 50 times slower and has worse
|
|
|
precision-recall
curves.
|
|