
















• 
We eliminate the
noise model for each data component,



and we use the same
number of factors as data



components.



• 
Given the weight
matrix, there is now a onetoone



mapping between
data vectors and hidden activity



vectors.



• 
To make the data
probable we want two things:




– 
The
hidden activity vectors that correspond to data



vectors
should have high prior probabilities.




– 
The
mapping from hidden activities to data vectors



should
compress the hidden density to get high density


in
the data space. i.e. the matrix that maps hidden



activities
to data vectors should have a small



determinant.
Its inverse should have a big determinant

