Lecture 2

The loss function

•

Fitting a model to data is typically done by finding the

parameter values that minimize some loss function.

•

There are many possible loss functions. What criterion

should we use for choosing one?

–

Choose one that makes the math easy (squared error)

–

Choose one that makes the fitting correspond to

maximizing the likelihood of the training data given

some noise model for the observed outputs.

–

Choose one that makes it easy to interpret the learned

coefficients (easy if mostly zeros)

–

Choose one that corresponds to the real loss on a

practical application (losses are often asymmetric)