Self-supervised backprop in a linear network
If the hidden and output layers are linear, it will
learn hidden units that are a linear function of
the data and minimize the squared
reconstruction error.
This is exactly what Principal Components
Analysis does.
The M hidden units will span the same space as
the first M principal components found by PCA
Their weight vectors may not be orthogonal
They will tend to have equal variances