CSC2515 Fall 2007
 Introduction to Machine Learning

Lecture 2: Linear regression

Linear models

Some types of basis function in 1-D

Two types of linear model that are equivalent with respect to learning

The loss function

Minimizing squared error

A geometrical view of the solution

When is minimizing the squared error equivalent to Maximum Likelihood Learning?

Multiple outputs

Least mean squares: An alternative approach for really big datasets

Regularized least squares

A picture of the effect of the regularizer

A problem with the regularizer

Why does shrinkage help?

Why shrinkage helps

Other regularizers

The lasso: penalizing the absolute values of the weights

A geometrical view of the lasso compared with a penalty on the squared weights

An example where minimizing the squared error gives terrible estimates

One dimensional cross-sections of loss functions with different powers

Minimizing the absolute error

The bias-variance trade-off
(a figment of the frequentists lack of imagination?)

The bias-variance decomposition

How the regularization parameter affects the bias and variance terms

An example of the bias-variance trade-off

Beating the bias-variance trade-off

The Bayesian approach

Slide 28

"With no data we sample..."

Using the posterior distribution

The predictive distribution for noisy sinusoidal data modeled by a linear combination of nine radial basis functions.

A way to see the covariance of the predictions for different values of x

Bayesian model comparison

Definition of the evidence

Using the evidence

How the model complexity affects the evidence

Determining the hyperparameters that specify the variance of the prior and the variance of the output noise.

Empirical Bayes