Teaching Assistant

Fall 2004

CSC411 Machine Learning and Data Mining. University of Toronto

Lecture slides on linear SVM (Nov.24): Slides (pdf)

Winter 2004

CSC321 Introduction to Neural Networks and Machine Learning. University of Toronto

Tutorials

Week 1.(Jan.9) Introduction to Matlab I (m file)

Week 2.(Jan.16) Introduction to Matlab II (m files in zip)

Week 3.(Jan.23) Assignment 1.

Week 4.(Jan.30) Marking Scheme for Assignment 1.

1. (2 points) Large number of hidden units is no good (redundance and overfitting), you may give the number that works fine.

2. (2 Points) Describe and compare correctly the performance of the misclassification error and cross entropy on both training data set and validation set.

3. (1 Point) State that the cross entropy is a better error measure than the number of misclassification error for this classification problem.

4. (2 Points) Explain the 2 figures reasonably and explicitly.

Bonus (1 point):

5. Vary the learning rate, monitor the performance.

6. Vary the data set size, monitor the performance.

7. Plot the whole data set..

Week 5. (Feb 6) Assignment 2.

Week 6&7.(Feb.20) Marking Scheme for Assignment 2.

PART A

1. (2 points) Decribe the setting that the final vE will be low, which means it is less than 750.

2. (1 Point) Note that the learning rate can not be too large.

3. (1 Point) Note that the momentum rate can not be too large.

PART B

4. (2 Points) Decribe the setting that the final vE will be reliably lower than 700.

5. (1 Point) Note that the net will overfit if the weightcost is too small.

6. (1 Point) (a) Decribe how the generalization changes with the weightcost, either by ploting the graph or giving the numerical values of vE vs.Wc;

You can also get one point if (b) Note that if learning rate and alpha are small, the weight decay will be unnecessary even if numhid is large. That means you may notice early stopping is also a method to improve the generalization.

Assignment 4 NEW

Interpretation of hinton diagram:

a. The size of blocks is proportional to the probabilities, or frequencies.

b. The KxK diagram displays the transition probabilities: T(i,j)=P(next=jth state|current=ith state)

c. The LAxK diagram displays the outputing probabilities: B(i,j)=P(out = ith alphabet|current = jth state)

Some code for displaying the alphabet and state in the LAxK diagram:

figure(2);clf; hinton(E);
for i = LA:-1:1
    text(-1, 0.5+LA-i,alphabet(i));
    text(K+0.5, 0.5+LA-i,alphabet(i));
end
for i = 1:K
    text(i-0.5,30,num2str(i));
end
drawnow;

Fall 2003