A big difference in behaviour of FA and PCA
Suppose we have data in which dimensions A and B
have very small variance but very high correlation and
dimension C has high variance but no correlation with
the other dimensions.
With only one factor, factor analysis will choose to
represent what is common to A and B.
It wouldn’t save anything by representing C as with its
factor because it still has to communicate it under a
Gaussian.
With only one factor, PCA will represent C.
It can send the factor value for free.