Why do we whiten data?
Images typically have strong pair-wise correlations.
Learning higher order statistics is difficult when there are
strong pair-wise correlations.
Small changes in parameter values that improve the
modeling of higher order statistics may be rejected
because they form a slightly worse model of the much
stronger pair-wise statistics.
So we often remove the second-order statistics before
trying to learn the higher-order statistics.