CSC321: Introduction to Machine Learning and Neural Networks (Winter 2016)

About CSC321

This course serves as an introduction to machine learning, with an emphasis on neural networks. We introduce the foundations of machine learning and cover mathematical and computational methods used in machine learning. We cover several advanced topics in neural networks in depth. We use the Python NumPy/SciPy stack. Students should be comfortable with calculus, probability, and linear algebra.

Required math background

Here's (roughly) the math that you need for this course. Linear Algebra: vectors: the dot product, vector norm, vector addition; matrices: matrix multiplication. Calculus: derivatives, derivatives as the slope of the function; integrals. Probability: random variables, expectation, independence. Other topics will be needed, but are not part of the pre-requisites, so I will devote an appropriate amount of lecture time to them.

Deep Dream


All announcement will be on Piazza.

Teaching team

Instructor: Michael Guerzhoy. Office: BA5244, Email: guerzhoy at cs.toronto.edu (please include CSC321 in the subject, and please ask questions on Piazza if they are relevant to everyone.)

CSC321 TAs

Study Guide

The CSC321 study guide (continuously updated)

Getting help

Michael's office hours: Wednesday 2:30-3:30, Thursday 6-7, Friday 2-3. Or email for an appointment. Or drop by to see if I'm in. Feel free to chat with me after lecture.

Course forum on Piazza

Piazza is a third-party discussion forum with many features that are designed specifically for use with university courses. We encourage you to post questions (and answers!) on Piazza, and read what other questions your classmates have posted. However, since Piazza is run by company separate from the university, we also encourage you to read the privacy policy carefully and only sign up if you are comfortable with it. If you are not comfortable with singing up for Piazza, please contact me by email to discuss alternative arrangements.

Lectures and Tutorials

L0101/L2001: TR1-2 in BA1200 (sometimes also R2 in BA1200). Tutorials R2 SS1073 with Alvin/TBA (odd-numbered student#) and SS1084 (even-numbered student#) with Aditya/TBA
L5101/L2501: T6-8 in BA1220 (sometimes also T8 in BA1220). Tutorials T8, BA2159 with Matt/Aditya/TBA (odd-numbered student#) and BA2185 with Sara/TBA (even-numbered student#)


We will be using the Python 2 NumPy/SciPy stack in this course. It is installed on CDF.

For the first two projects, the most convenient Python distribution to use is Anaconda. If you are using an IDE and download Anaconda, be sure to have your IDE use the Anaconda Python.

I recommend the Pyzo/IEP IDE available with Pyzo. To run IEP on CDF, simply type iep in the command line. You can download my modification of IEP which includes a parentheses matcher here.

We will be using Google's TensorFlow in the second half of the course. Note that TensorFlow is difficult to set up on Windows (but fairly straightforward to install on Linux or OS X). Instructions for installing TensorFlow and/or running it on CDF are here.


Geoffrey Hinton's Coursera course contains great explanations for the intution behind neural networks.
Deep Learning by Yoshua Bengio, Ian Goodfellow, and Aaron Courville is an advanced textbook with good coverage of deep learning and a brief introduction to machine learning.
Learning Deep Architectures for AI by Yoshua Bengio contains an in-depth tutorial on learning RBMs.
Pattern Recognition and Machine Learning by Christopher M. Bishop is a very detailed and thorough book on the foundations of machine learning. A good textbook to buy to have as a reference for this and future machine learning courses, though it's not required.
The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman is also an excellent reference book, available on the web for free at the link.
The CS229 Lecture Notes by Andrew Ng are a concise introduction to machine learning.
Andrew Ng's Coursera course contains excellent explanations.
Pedro Domnigos's Coursera course is a more advanced course.
CS231n: Convolutional Neural Networks for Visual Recognition at Stanford (archived 2015 version) is an amazing advanced course, taught by Fei-Fei Li and Andrej Karpathy (a UofT alum). The course website contains a wealth of materials.
CS224d: Deep Learning for Natural Language Processing at Stanford, taught by Richard Socher. CS231, but for NLP rather than vision. More details on RNNs are given here.

Python Scientific Lecture Notes by Valentin Haenel, Emmanuelle Gouillart, and Gaël Varoquaux (eds) contains material on NumPy and working with image data in SciPy. (Free on the web.)


The better of:

35%: Projects
30%: Midterm
35%: Final Exam
35%: Projects
10%: Midterm
55%: Final Exam

You must receive at least 40% on the final exam to pass the course.


A sample report/LaTeX template containing advice on how to write project reports AI courses is here (see the pdf and tex files). (The example is based on Programming Computer Vision, pp27-30.) Key points: your code should generate all the figures used in the report; describe and analyze the inputs and the outputs; add your interpretation where feasible.

Project 1 (7%): Face Recognition and Gender Classification using k-NN (due: Feb. 3 at 10PM)
Project 2 (10%): Handwritten Digit Recognition with Neural Networks (due: Feb. 22 26 28 at 10PM, no late submissions after Feb. 29 10PM)
Project 3 (10%): Convolutional Neural Networks and Transfer Learning in TensorFlow (due: Mar. 21 at 10PM, Mar. 24 at 10PM for the bonus)
Project 4 (8%): Fun with RNNs (due: Apr. 4 at 10PM, Apr. 7 at 10PM for the bonus)
Lateness penalty: 10% of the possible marks per day, rounded up (4% for Projects 3 and 4). Projects are only accepted up to 72 hours (3 days) after the deadline.


2016 exam paper


March 4, 4pm-6pm, BA1180 (even student numbers) and 1190 (odd student numbers). (Make-up midterm for those who have a documented (a screenshot and/or explanatory email is sufficient) conflict with the main timeslot: 6pm-8pm on the same day, location TBA.).

Coverage: the lectures and the projects, focusing on the lectures.

Midterm paper + marking scheme

Lecture notes

Coming up: introduction to Numpy/SciPy, k-Nearest Neighbours, linear regression and gradient descent. See Andrew Ng's coursera course Weeks 1 and 2, Notes, part 1 from CS229, and Friedman et al. 2.1-2.3. Gradients: Understanding the Gradient, Understanding Pythagorean Distance and the Gradient

Week 1: Welcome to CSC321, numpy_demo.py (bieber.jpg), K Nearest Neighbours, Linear Regression.

Why is it that you can only barely see see the maple leaf on the red channel of the Bieber photo? Because the red channel is close to 1 for the red flag, so it shows as white when viewing just the red channel of the photo.

Q in lecture: are there a lot of examples of functions with deep and narrow local minima that are easy to miss? A: Yes, all over the place. For example, we can convert any NP-hard problem into such a function (otherwise it woudn't be NP-hard!). See here for an example of a construction of such a function. In some scenarios, neural networks can be another example. (Although much of the time, they are not.) More on this later.

Week 2: More numpy: Intro to vectorization. Visualizing functions of two variables: surface plots, contour plots, and heatmaps. Understanding gradient descent: on the board. Implementing gradient descent: in one variable, in multiple variables.

Thinko in lecture: see here for the real story of how to find a vector that points in the direction of steepest ascent in 3D.

Multiple linear regression, linear classifiers

Maximum likelihood (on the board).

The Home Depot Challenge on Kaggle.

Coming up: Logistic Regression, intro to Bayesian inference, multilayer Neural Networks. See lectures VI and VII-IX from Andrew Ng's course and the Neural Networks lecture from Pedro Domingos's course.

Recommended lectures from Prof. Hinton's Coursera course: Lectures 1-3.

Week 3: Learning Linear Regression and Logistic Regression models using Maximum Likelihood.

Intro to Bayesian Inference. bayes.py.

Classification Using Multilayer Neural Networks.

Tutorial: Learning linear regression models with Gradient Descent; vectorizing code. Tutorial plan, code, galaxy data (info)

Week 4: Vectorizing neural networks; Backpropagation; One-Hot Encoding; activation functions, intro to optimizing neural networks. (UPDATED Feb. 4)

Coming up: learning features with multilayer neural networks (reading/viewing: Lecture 5 from Hinton's course); generalization and overfitting (reading/viewing: Lecture 9 from Hinton's course; Ch. 7 of Deep Learning); momentum and learning rates (Lecture 6 from Hinton's course; Ch. 8 of Deep Learning); Convolutional Neural Networks (Lecture 5 from Hinton's course; Ch. 9 of Deep Learning); Recurrent Neural Networks (Lecture 7 from Hinton's course; Ch. 10 of Deep Learning.)

Week 5: How Neural Networks See. Overfitting; preventing overfitting.

Some more weight matrix images (obtained by training on 64x64 images).

Just for fun: The Dead Salmon Study; steep learning curves on Yann LeCun's website.

Tutorial: Backpropagation and gradient flow. Slides, code. Please do not print out the slides before the tutorial: the idea is for you to work most of those things out in tutorial, and printing out the slides defeats the point.

Vectorization extra review: Recap of gradients for linear regression, computing the gradients using loops, Tutorial 1 recap: first try at vectorization, a nicer vectorization scheme that doesn't use sum(). Slides, code.

Week 6: warp-up of the regularization lecture; a bit about neuroscience; introduction to convolutional neural networks; ConvNet architectures.

Just for fun: Hubel and Wiesel's research, Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

Week 7: Understanding How ConvNets See, Deep Dream, Neural Style

Just for fun: Deep Dream Grocery Trip, Neural Style demo.

Just for fun: The Matthew Effect

Week 8: midterm take-up;Intro to the handout TensorFlow code; Recurrent Neural Networks

Just for fun: Donald Trump RNN; The Unreasonable Effectiveness of RNN

Week 9: RNNs and Vanishing Gradients: Part 2 (one_layer.py); Solving the Vanishing Gradients Problem with GRU;LSTM; Machine Translation with LSTM

Tutorial (Week 9/10 for afternoon section, Week 10 for evening section): the gradient in min-char-rnn.py, line-by-line. Solutions (SPOILERS: don't look before attempting at least the first few lines) here. Handout here.

Coming up: More on Baysian learning (see Lectures 9 and 10 in the Hinton coursera course and Radford Neal's tutorial); Markov Chain Monte Carlo (see the beginning of this review); Restricted Botlzmann Machines (See Section 5 of Learning Deep Architectures for AI). For RBMs, see also Lectures 11-14 from Hinton's coursera course for a somewhat different approach than what we're doing. Note that all those readings are more complete, but also more advanced that what we do in class. You are of course only responsible for what's done in class.

Week 10: Improving Learning in Neural Networks; Intro to Bayesian Inference and Markov Chain Monte Carlo (MCMC).

Week 11: Markov Chain Monte Carlo (MCMC)

Tutorial (in the usual lecture room): bayes.py (handout)

Restricted Boltzman Machines (RBM): Definition and Sampling

Coming up: Wrapping up RBMs; a little bit of autoencoders (see the beginning of Lecture 15 in the Coursera course); Dropout

Week 12: Training RBMs

Intro to autoencoders.

In-lecture mini-tutorial: Metropolis Algorithm (code, handout, visualization)

The big picture: "success is guaranteed"


All project submission will be done electronically, using the MarkUs system. You can log in to MarkUs using your CDF login and password.

To submit as a group, one of you needs to "invite" the other to be partners, and then the other student needs to accept the invitation. To invite a partner, navigate to the appropriate Assignment page, find "Group Information", and click on "Invite". You will be prompted for the other student's CDF user name; enter it. To accept an invitation, find "Group Information" on the Assignment page, find the invitation listed there, and click on "Join". Only one student must invite the other: if both students send an invitation, then neither of you will be able to accept the other's invitation. So make sure to agree beforehand on who will send the invitation! Also, remember that, when working in a group, only one person must submit solutions.

To submit your work, again navigate to the appropriate Exercise or Assignment page, then click on the "Submissions" tab near the top. Click "Add a New File" and either type a file name or use the "Browse" button to choose one. Then click "Submit". You can submit a new version of any file at any time (though the lateness penalty applies if you submit after the deadline) — look in the "Replace" column. For the purposes of determining the lateness penalty, the submission time is considered to be the time of your latest submission.

Once you have submitted, click on the file's name to check that you have submitted the correct version.


Web-based LaTeX interfaces: WriteLaTeX, ShareLaTeX

TeXworks, a cross-platform LaTeX front-end. To use it, install MikTeX on Windows and MacTeX on Mac

Detexify2 - LaTeX symbol classifier

The LaTeX Wikibook.

Additional LaTeX Documentation, from the home page of the LaTeX Project.

LaTeX on Wikipedia.

Policy on special consideration

Special consideration will be given in cases of documented and serious medical and personal cirumstances.

Valid HTML 4.01 Transitional