CSC 411F: Machine Learning & Data Mining
Fall, 2004
Tutorials

Tutorial 1, Sept. 17 - Introduction to Matlab

The Matlab demo that David was going to present is contained in this file. Just copy it, as well as these files (data, poly.m) to your directory, start matlab, then run the demo by typing matlab_intro at the prompt.


Tutorial 2, Sept 24 - Naive Bayes and K Nearest Neighbours

An example of Naive Bayes and KNN for classifying TV Guide listings as "record" or "don't record". For details on these algorithms, please see the handout. A Matlab demo of KNN on 2-dimensional data. Note how the decision boundary changes as K is increased.


Tutorial 3, Oct. 1 - Pruning decision trees

We covered the last 5 slides of "Lecture 6: Decision Trees II". This included a discussion of cross validation (e.g. to determine the best tree height), and this example of how pruning can help.


Tutorial 4, Oct. 8 - Perceptrons

Decision boundary for perceptrons (it's a line). Example of a 2-layer perceptron that computes XOR. Gradient descent: decrease error on the training set by moving weights in the direction of the negative gradient. Convex surfaces.


Tutorial 5, Oct. 15 - Training Neural Networks

We covered the following lecture notes. We also talked about weight decay, and how small weights correspond to simple hypotheses. For an example demonstrating that small weights cause sigmoid units to behave like linear units, run this matlab script.


Tutorial #6, Oct. 22 - Review for the midterm


Tutorial #7, Nov. 5 - Mixture of Gaussians and PCA

A review of the MOG code to be used for A3. Note that's important to set minVary to something > 0. Otherwise you can get infinite probability be setting one mean = a data point, and the corresponding variance to 0. Try different settings of minVary using mog_on_points.m. Why are principal components the eigenvectors of the data covariance? (The covariance matrix is an ellipse that best fits the data, and the eigenvectors are its axes, with length equal to the corresponding eigenvalues.) PCA is used for dimensionality reduction, which is similar to lossy compression. When PCA is applied to images of faces, the resulting principal components are 'eigenfaces'. Try changing the number of eigenvectors used in pca_on_faces.m and see how that effects quality of the image reconstructions. Here's a link to tutorial7.zip.


Tutorial #8, Nov. 12 - Isomap

Examples showing how Isomap works on various data sets. To see some results, look at the figures in the Isomap paper. You can also download the code from here and try it out.


Tutorial #9, Nov. 19 - Reinforcement Learning

Formulate Tic-Tac-Toe as a Q-learning task (text exercise 13.3). Given a reward function for a grid, fill in the V* and Q function values (text exercise 13.2). Learning the Q function is done in an online fashion, e.g. while playing games of Tic-Tac-Toe against an opponent. When learning, an exploration vs. exploitation tradeoff must be made (text section 13.3.5, page 379).


Tutorial #10: SVM's and HW4

Linear SVM vs. Perceptron: how do they differ? What are the "support vectors"? To compare them, try the following perceptron code and the SVM code available here on the tutorial10a.mat data. The GUI I used can be invoked by running the 'uiclass' function in the SVM package. (Note: If you're having trouble with the SVM software under Linux, copy the qp.mexglx in tutorial10.zip to the svm directory.) What happens when you change (x . x_i) in the decision function to exp(-|x - x_i|^2)? Try it out by changing the kernel from 'linear' to 'Gaussian RBF' in the Matlab demo.

A4 Bakeoff tips: Use a validation set. The number of images of person p is length(traindata{p}). Image i of person p can be accessed using traindata{p}{i} - it's a 112x92 pixel array. If your classifiers take too long to run, try reducing the size of the data using a dimensionality reduction method, or the imresize function. If you use PCA, try using the modified version in tutorial7.zip.


Tutorial #11 - Dec. 3, 2004

Example of HMM filtering on this model with P(R_0) = 0.5, U_1 = yes, and U_2 = yes. Sample questions from the 2003 exam.