 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| • |
Represent each
word by a hundred-
|
|
|
dimensional
real-valued feature
|
|
|
vector.
|
|
|
|
– |
This
only requires 1.7 million
|
|
|
parameters.
|
|
|
| • |
Inference is
still very easy.
|
|
|
| • |
Reconstruction is done by
|
|
|
computing the
posterior over the
|
|
|
17,000
real-valued points in feature
|
|
|
space for the
most recent word.
|
|
|
|
– |
First
use the hidden activities to
|
|
|
predict
a point in the space.
|
|
|
|
– |
Then
use a Gaussian around this
|
|
point
to determine the posterior
|
|
|
probability
of each word.
|
|