You will need to get

http://www.cs.toronto.edu/~hinton/csc321/matlab/classbp2.m
http://www.cs.toronto.edu/~hinton/csc321/matlab/experiment.m
http://www.cs.toronto.edu/~hinton/csc321/matlab/assign2data09.mat

You will also need the files which you should already 
have from assignment1

load assign2data09.mat and then type

restart       = 1;
maxepoch      = 2000;
numhid        = 60;
epsilon       = .01;
finalmomentum = 0.8;
weightcost    = 0;

classbp2;

You can set the variable errorprintfreq in classbp2 to make
it measure the error as frequently as you want.

classbp2 differs from classbp1 in several ways.  It uses momentum to
speed the learning and weightcost to keep the weights small. It
expects you to set more global variables by hand (see the code
file). This makes it easier to set up experiments in which you try
many different settings.

PART 1 (2 points)
   Using numhid=60 and maxepoch=2000 and weightcost=0, play
   around with epsilon and finalmomentum to find settings that make tE low
   after 2000 epochs. Briefly report what you discover. Include the
   values of epsilon and finalmomentum that work best and say what
   values they produce for the test errors and the cross-entropy
   error. If you were instead asked to find the epsilon that produced the best
   minimum value (not the best final value) for test set cross entropy,
   would you expect to find a larger or smaller epsilon?  In a sentence,
   justify your answer. 

PART 2 (2 points) 
   Using numhid=60 and maxepoch=2000 and finalmomentum=0.8 set epsilon
   to a sensible value based on your experiments in part 1 and then try various
   values for weightcost to see how it affects the final value of
   tE. You may find the file experiment.m useful, but you will have to
   edit it. Briefly report what you discovered and include a plot of
   the final value of tE against the weightcost.

Your report on Parts 1 and 2 combined should be NOT MORE THAN ONE PAGE
long, but graphs and printouts of runs can be attached.


PART 3: (4 points)

Copy the files:

http://www.cs.toronto.edu/~hinton/csc321/matlab/bayeswithbest.m
http://www.cs.toronto.edu/~hinton/csc321/matlab/makeallvecs.m
http://www.cs.toronto.edu/~hinton/csc321/matlab/maketeacher.m
http://www.cs.toronto.edu/~hinton/csc321/matlab/applyweights.m
http://www.cs.toronto.edu/~hinton/csc321/matlab/hinton.m
http://www.cs.toronto.edu/~hinton/csc321/matlab/blob.m


First type
makeallvecs;
This makes a matrix in which each row is a possible weight vector.

then type
maketeacher;
This makes a teacher network

Then type:
numcases=10;
bayeswithbest;

Figure 2 will show you how well the outputs of the teacher on the TEST
data can be predicted by bayes-averaging the outputs of all possible
nets. "Bayes-averaging" means weighting the prediction of each net by
the posterior probability of that net given the training data and the
prior (which is flat in this example).  The third column is the
predictions of the best single net.  Figure 1 will show you how well
the outputs of the teacher on the training data are predicted by the
bayes-average and by the best single net. The best net is defined to 
be the network with the highest probability given the training data.
The code will print out the Bayes-averaged error on the training and
test sets as well as the error of the best network.

Figure 3 shows a histogram of the posterior probability distribution
across all 9^4 weight vectors. Notice that the posterior can be very
spread out so that even the best net gets a very small posterior
probability.

Your report should be at most half a page and should describe the
effects of changing the number of training cases. You should also try
using maketeacher to see how the results depend on the particular
teacher net.