You will need to get http://www.cs.toronto.edu/~hinton/csc321/matlab/classbp2.m http://www.cs.toronto.edu/~hinton/csc321/matlab/experiment.m http://www.cs.toronto.edu/~hinton/csc321/matlab/assign2data09.mat You will also need the files which you should already have from assignment1 load assign2data09.mat and then type restart = 1; maxepoch = 2000; numhid = 60; epsilon = .01; finalmomentum = 0.8; weightcost = 0; classbp2; You can set the variable errorprintfreq in classbp2 to make it measure the error as frequently as you want. classbp2 differs from classbp1 in several ways. It uses momentum to speed the learning and weightcost to keep the weights small. It expects you to set more global variables by hand (see the code file). This makes it easier to set up experiments in which you try many different settings. PART 1 (2 points) Using numhid=60 and maxepoch=2000 and weightcost=0, play around with epsilon and finalmomentum to find settings that make tE low after 2000 epochs. Briefly report what you discover. Include the values of epsilon and finalmomentum that work best and say what values they produce for the test errors and the cross-entropy error. If you were instead asked to find the epsilon that produced the best minimum value (not the best final value) for test set cross entropy, would you expect to find a larger or smaller epsilon? In a sentence, justify your answer. PART 2 (2 points) Using numhid=60 and maxepoch=2000 and finalmomentum=0.8 set epsilon to a sensible value based on your experiments in part 1 and then try various values for weightcost to see how it affects the final value of tE. You may find the file experiment.m useful, but you will have to edit it. Briefly report what you discovered and include a plot of the final value of tE against the weightcost. Your report on Parts 1 and 2 combined should be NOT MORE THAN ONE PAGE long, but graphs and printouts of runs can be attached. PART 3: (4 points) Copy the files: http://www.cs.toronto.edu/~hinton/csc321/matlab/bayeswithbest.m http://www.cs.toronto.edu/~hinton/csc321/matlab/makeallvecs.m http://www.cs.toronto.edu/~hinton/csc321/matlab/maketeacher.m http://www.cs.toronto.edu/~hinton/csc321/matlab/applyweights.m http://www.cs.toronto.edu/~hinton/csc321/matlab/hinton.m http://www.cs.toronto.edu/~hinton/csc321/matlab/blob.m First type makeallvecs; This makes a matrix in which each row is a possible weight vector. then type maketeacher; This makes a teacher network Then type: numcases=10; bayeswithbest; Figure 2 will show you how well the outputs of the teacher on the TEST data can be predicted by bayes-averaging the outputs of all possible nets. "Bayes-averaging" means weighting the prediction of each net by the posterior probability of that net given the training data and the prior (which is flat in this example). The third column is the predictions of the best single net. Figure 1 will show you how well the outputs of the teacher on the training data are predicted by the bayes-average and by the best single net. The best net is defined to be the network with the highest probability given the training data. The code will print out the Bayes-averaged error on the training and test sets as well as the error of the best network. Figure 3 shows a histogram of the posterior probability distribution across all 9^4 weight vectors. Notice that the posterior can be very spread out so that even the best net gets a very small posterior probability. Your report should be at most half a page and should describe the effects of changing the number of training cases. You should also try using maketeacher to see how the results depend on the particular teacher net.