 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
If the hidden
and output layers are linear, it will
|
|
|
learn hidden
units that are a linear function of the
|
|
|
data and minimize
the squared reconstruction
|
|
|
error.
|
|
|
• |
The m hidden
units will span the same space as
|
|
|
|
the first m
principal components
|
|
|
|
– |
Their
weight vectors may not be orthogonal
|
|
|
|
– |
They
will tend to have equal variances
|
|