Factoring the weight matrices
 t-1            t
Represent each word by a hundred-
dimensional real-valued feature
vector.
This only requires 1.7 million
parameters.
Inference is still very easy.
 Reconstruction is done by
computing the posterior over the
17,000 real-valued points in feature
space for the most recent word.
First use the hidden activities to
predict a point in the space.
Then use a Gaussian around this
point to determine the posterior
probability of each word.
t-2       t-1        t