the next generation of neural networks

33

How good is a shortlist found this way?

•

We have only implemented it for a million

documents with 20-bit codes --- but what could

possibly go wrong?

–

A 20-D hypercube allows us to capture enough

of the similarity structure of our document set.

•

The shortlist found using binary codes actually

improves the precision-recall curves of TF-IDF.

–

Locality sensitive hashing (the fastest other

method) is 50 times slower and has worse

precision-recall curves.