Machine learning is a powerful set of techniques that allow computers to learn from data rather than having a human expert program a behavior by hand. Neural networks are a class of machine learning algorithm originally inspired by the brain, but which have recently have seen a lot of success at practical applications. They're at the heart of production systems at companies like Google and Facebook for face recognition, speech-to-text, and language understanding.
This course gives an overview of both the foundational ideas and the recent advances in neural net algorithms. Roughly the first 2/3 of the course focuses on supervised learning -- training the network to produce a specified behavior when one has lots of labeled examples of that behavior. The last 1/3 focuses on unsupervised learning -- the algorithm isn't given any examples of the correct behavior, and the goal is instead to discover interesting regularities in the data.
All course-related announcements will be sent to the class mailing list, csc321h1s [at] teach.cs.toronto.edu.
We will use Discourse as the discussion forum. If you have a question you think would be relevant to the whole class, please post it there so that everyone gets the benefit of the answer. Please include the part of the course (e.g. "Homework 1", "Lecture 2") in the subject.
If you want to contact the course staff privately, please e-mail csc321ta [at] cs.toronto.edu (for the TAs and the instructor) or csc321prof [at] cs.toronto.edu (for only the instructor).
All assignment deadlines are at 11:59pm of the date listed. Please see the course information handout for detailed policies (marking, lateness, etc.).
Afternoon: 1/5, 1-2pm; Night: 1/10, 6-7pm
What are machine learning and neural networks, and what would you use them for? Supervised, unsupervised, and reinforcement learning. How this course is organized.
Afternoon: 1/10, 1-2pm; Night: 1/10, 7-8pm
Linear regression, a supervised learning task where you want to predict a scalar valued target. Formulating it as an optimization problem, and solving either directly or with gradient descent. Vectorization. Feature maps and polynomial regression. Generalization: overfitting, underfitting, and validation.
Afternoon: 1/12, 1-2pm; Night: 1/17, 6-7pm
Binary linear classification. Visualizing linear classifiers. The perceptron algorithm. Limits of linear classifiers.
Afternoon: 1/17, 1-2pm; Night: 1/17, 7-8pm
Comparison of loss functions for binary classification. Cross-entropy loss, logistic activation function, and logistic regression. Hinge loss. Multiway classification. Convex loss functions. Gradient checking. (Note: this is really a lecture-and-a-half, and will run into what's scheduled as Lecture 5.)
Lecture 5: Multilayer Perceptrons [Slides] [Sorry, no notes.]
Afternoon: 1/19, 1-2pm; Night: 1/24, 6-7pm
Multilayer perceptrons. Comparison of activation functions. Viewing deep neural nets as function composition and as feature learning. Limitations of linear networks and universality of nonlinear networks.
Suggested reading: Deep Learning Book, Sections 6.1-6.4
Afternoon: 1/24, 1-2pm; Night: 1/24, 7-8pm
The backpropagation algorithm, a method for computing gradients which we use throughout the course.
Lecture 7: Optimization [Slides] [Sorry, no notes.]
Afternoon: 1/26, 1-2pm; Night: 1/31, 6-7pm
How to use the gradients computed by backprop. Features of optimization landscapes: local optima, saddle points, plateaux, ravines. Stochastic gradient descent and momentum.
Suggested reading: Deep Learning Book, Chapter 8
Afternoon: 1/31, 1-2pm; Night: 1/31, 7-8pm
Guest lecture by David Duvenaud
Lecture 9: Generalization [Slides] [Sorry, no notes.]
Afternoon: 2/2, 1-2pm; Night: 2/7, 6-7pm
Bias/variance decomposition, data augmentation, limiting capacity, early stopping, weight decay, ensembles, stochastic regularization, hyperparameter tuning.
Suggested reading: Deep Learning Book, Chapter 7
Lecture 10: Distributed Representations [Slides] [Sorry, no notes.]
Afternoon: 2/7, 1-2pm; Night: 2/7, 7-8pm
Language modeling, n-gram models (a localist representation), neural language models (a distributed representation), and skip-grams (another distributed representation).
Afternoon: 2/9, 1-2pm; Night: 2/14, 6-7pm
Convolution operation. Convolution layers and pooling layers. Equivariance and invariance. Backprop rules for conv nets.
Lecture 12: Image Classification [Slides] [Sorry, no notes.]
Afternoon: 2/14, 1-2pm; Night: 2/14, 7-8pm
Conv net architectures applied to handwritten digit and object classification. Measuring the size of a conv net.
Afternoon: 2/16, 1-2pm; Night: 2/28, 7:30-8:30pm
Conv net visualizations: guided backprop, gradient descent on inputs. Deep Dream. Neural style transfer.
Afternoon: 3/2, 1-2pm; Night: 3/7, 6-7pm
Recurrent neural nets. Backprop through time. Applying RNNs to language modeling and machine translation.
Afternoon: 3/7, 1-2pm; Night: 3/7, 7-8pm
Why RNN gradients explode and vanish, both in terms of the mechanics of backprop, and conceptually in terms of the function the RNN computes. Ways to deal with it: gradient clipping, input reversal, LSTM.
Lecture 16: ResNets and Attention [Slides] [Notes coming soon]
Afternoon: 3/9, 1-2pm; Night: 3/14, 6-7pm
Deep Residual Networks. Attention-based models for machine translation and caption generation. Neural Turing Machines.
Afternoon: 3/14, 1-2pm; Night: 3/14, 7-8pm
Maximum likelihood estimation. Basics of Bayesian parameter estimation and maximum a-posteriori estimation.
Afternoon: 3/16, 1-2pm; Night: 3/21, 6-7pm
K-means. Mixture modeling: posterior inference and parameter learning.
Lecture 19: Boltzmann Machines [Slides] [Notes coming soon]
Afternoon: 3/21, 1-2pm; Night: 3/21, 7-8pm
Boltzmann machines: definition; marginal and contitional probabilities; parameter learning. Restricted Boltzmann machines.
Afternoon: 3/23, 1-2pm; Night: 3/28, 6-7pm
Principal component analysis; autoencoders; layerwise training; applying autoencoders to document and image retrieval
Lecture 21: Bayesian Hyperparameter Optimization [Slides]
Afternoon: 3/28, 1-2pm; Night: 3/28, 7-8pm
Bayesian linear regression; Bayesian optimization.
Lecture 22: Adversarial Learning [Slides]
Afternoon: 3/30, 1-2pm; Night: 4/4, 6-7pm
Adversarial examples; generative adversarial networks (GANs).
Lecture 23: Go [Slides]
Afternoon: 4/4, 1-2pm; Night: 4/4, 7-8pm
Afternoon: 1/12, 2-3pm; Night: 1/10, 8-9pm
Afternoon: 1/19, 2-3pm; Night: 1/17, 8-9pm
Tutorial 3: Backpropagation[PDF]
Afternoon: 1/26, 2-3pm; Night: 1/24, 8-9pm
Afternoon: 2/2, 2-3pm; Night: 1/31, 8-9pm
Tutorial 5: Optimization and Generalization [Slides]
Afternoon: 2/9, 2-3pm; Night: 2/7, 8-9pm
Tutorial 6: Convolutional Networks [Slides]
Afternoon: 2/16, 2-3pm; Night: 2/14, 8-9pm
Tutorial 7: Recurrent Neural Networks
Afternoon: 3/9, 2-3pm; Night: 3/7, 8-9pm
Tutorial 8: Maximum Likelihood Afternoon: 3/16, 2-3pm; Night: 3/14, 8-9pm
No tutorial this week!
Afternoon: 3/23, 2-3pm; Night: 3/21, 8-9pm
Tutorial 10: Bayesian Learning
Afternoon: 3/30, 2-3pm; Night: 3/28, 8-9pm
No tutorial this week!
The programming assignments will all be done in Python using the NumPy scientific computing library, but prior knowledge of Python is not required. Basic Python will be taught in a tutorial. We will be using Python 2, not Python 3, since this is the version more commonly used in machine learning.
You have several options for how to use Python:
Once Python is installed, there are two ways you can edit and run Python code:
Here are some recommended background readings on Python and NumPy.