CSC321: Neural
Networks
Lecture 8: The Bayesian way to fit models
Some problems with picking the parameters that are most likely to generate the data
Using a distribution over parameter values
Lets do it again: Suppose we get a tail
A cheap trick to avoid computing the posterior probabilities of all weight vectors
Why we maximize sums of log probs