lec8

Supervised Maximum Likelihood Learning

•

Finding a set of weights, W, that minimizes the

squared errors is exactly the same as finding a W

that maximizes the log probability that the model

would produce the desired outputs on all the

training cases.

–

We implicitly assume that zero-mean Gaussian

noise is added to the model’s actual output.

–

We do not need to know the variance of the

noise because we are assuming it’s the same

in all cases. So it just scales the squared error.