 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
We eliminate the
noise model for each data component,
|
|
|
and we use the same
number of factors as data
|
|
|
components.
|
|
|
• |
Given the weight
matrix, there is now a one-to-one
|
|
|
mapping between
data vectors and hidden activity
|
|
|
vectors.
|
|
|
• |
To make the data
probable we want two things:
|
|
|
|
– |
The
hidden activity vectors that correspond to data
|
|
|
vectors
should have high prior probabilities.
|
|
|
|
– |
The
mapping from hidden activities to data vectors
|
|
|
should
compress the hidden density to get high density
|
|
in
the data space. i.e. the matrix that maps hidden
|
|
|
activities
to data vectors should have a small
|
|
|
determinant.
Its inverse should have a big determinant
|
|