Combining networks reduces variance
We want to compare two expected squared errors
Method 1: Pick one of the predictors at random
Method 2:  Use the average of the predictors, y
This term vanishes