Review: Sum of Squared Errors

Suppose we are trying to predict y using x, with the predictions being of the form \(y^{(i)} \approx = a_0 + a_1 x^{(i)}\). We can choose to measure how good of a job we are doing at predicting y’x using x’s by computing the Sum of Squared Errors:

\[SSE = \Sigma_{i = 1}^{m} (y^{(i)} - \hat{y}^{(i)})^2\], where \(\hat{y}^{(i)} = a_0 + a_1 x^{(i)}\) is the prediction for row i.

A smaller SSE is better – ideally, the predictions match the true data perfectly. However, the SSE grows if we have more rows to predict. That seems unideal – we want a measure that tells us how well we’re predicting, not how large is our dataset.

A solution is to compute the Mean Squared Error:

\[MSE = SSE/m.\]

The MSE won’t grow if we just add more data, since it’s the average suqre of the discrepancy between the prediction and the correct y.

We can go one step further, and compute the Root-Mean-Squared Error. This will give as a number on the same scale as the y’s. With some fancy math, it can be shown that in some situations, if the RMSE error is \(\sigma\), our prediction won’t be off by more than \(2\sigma\) 95% of the time. (But more on that later.)

The RMSE is

\[RMSE = \sqrt{MSE} = \sqrt{(\Sigma_{i = 1}^{m} (y^{(i)} - \hat{y}^{(i)})^2)/m},\] where \(m\) is the number of rows.