lec7post

The square, noise-free case

•

We eliminate the noise model for each data component,

and we use the same number of factors as data

components.

•

Given the weight matrix, there is now a one-to-one

mapping between data vectors and hidden activity

vectors.

•

To make the data probable we want two things:

–

The hidden activity vectors that correspond to data

vectors should have high prior probabilities.

–

The mapping from hidden activities to data vectors

should compress the hidden density to get high density

in the data space. i.e. the matrix that maps hidden

activities to data vectors should have a small

determinant. Its inverse should have a big determinant