 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| |
Use the previous
hidden state to transmit
|
|
hundreds of bits
of long range semantic
|
|
|
information
(dont try this with an HMM)
|
|
|
|
|
The
hidden states are only trained to
|
|
|
help
model the current word, but this
|
|
|
causes
them to contain lots of useful
|
|
|
semantic
information.
|
|
|
| |
Optimize the CRBM
to predict the
|
|
|
conditional
probability distribution for the
|
|
|
most recent word.
|
|
|
|
|
With
17,000 words and 1000 hiddens
|
|
|
this
requires 52,000,000 parameters.
|
|
|
|
|
The
corresponding autoregressive
|
|
|
model requires 578,000,000
|
|
|
parameters.
|
|