We had that slide full of math about why it is better to average models than to use a single model. It mathematically analyzed bias and variance of a group of models. Let's try to understand it informally, using an analogy. Suppose the Ukrainian government is worried that the Russian army might push further into the Ukraine, after massing its troops in Crimea. The Ukrainian government doesn't know how many troops Russia has in Crimea, but they need an estimate. Being firm believers in the merits of democracy, they ask every Ukrainian citizen how many Russian troops they think are in Crimea. Some citizens know what they're talking about, and some don't, but everybody thinks they know, so everybody gave an answer. Some say that Russia has brought a million soldiers into Crimea; others think that it brought in no more than a thousand; most mention something in between. The Ukrainian government collects all of these estimates, and now needs to formulate its own estimate. The government has two possible methods in mind for deciding on their own estimate: Method A: Believe that the average of all citizens' estimates is the correct number. Method B: Pick a citizen at random, and believe that his estimate is correct. Question 1. Which method is better, or are they perhaps equally (un)desirable? Answer notes: A is better. Question 2. Is this a fair analogue to combining models for machine learning? Answer notes: basically, yes, except that we have only one training case. Question 3. What is the analogue, in this political story, of bias and variance of the collection of models? Answer notes: bias = how wrong the average is; variance is the variance of the citizens' answers. Question 4. What would be the analogue, in this political story, of a mixture of experts (video 10b)? Answer notes: add a mixture manager who picks a citizen for every question. Question 5. Remember, for mixtures of experts, that DURING TRAINING it's better to pick one expert (from the manager's distribution) than to average their predictions, lest they start to "correct for others' mistakes" during training. What is the analogue to that argument in the political story? Answer notes: during training it's best not to tell people what their fellow citizens are saying, i.e. pretend that we're using method B. At test time, switch to method A.