lecture 25

Factoring the weight matrices

t-1 t

•

Represent each word by a hundred-

dimensional real-valued feature

vector.

–

This only requires 1.7 million

parameters.

•

Inference is still very easy.

•

Reconstruction is done by

computing the posterior over the

17,000 real-valued points in feature

space for the most recent word.

–

First use the hidden activities to

predict a point in the space.

–

Then use a Gaussian around this

point to determine the posterior

probability of each word.

t-2 t-1 t