Performance of the autoencoder at
document retrieval
Train on bags of 2000 words for 400,000 training cases
of business documents.
First train a stack of RBM’s. Then fine-tune with
backprop.
Test on a separate 400,000 documents.
Pick one test document as a query. Rank order all the
other test documents by using the cosine of the angle
between codes.
Repeat this using each of the 400,000 test documents
as the query (requires 0.16 trillion comparisons).
Plot the number of retrieved documents against the
proportion that are in the same hand-labeled class as the
query document.