Machine learning is a powerful set of techniques that allow computers to learn from data rather than having a human expert program a behavior by hand. Neural networks are a class of machine learning algorithm originally inspired by the brain, but which have recently have seen a lot of success at practical applications. They're at the heart of production systems at companies like Google and Facebook for face recognition, speechtotext, and language understanding.
This course gives an overview of both the foundational ideas and the recent advances in neural net algorithms. Roughly the first 2/3 of the course focuses on supervised learning  training the network to produce a specified behavior when one has lots of labeled examples of that behavior. The last 1/3 focuses on unsupervised learning and reinforcement learning.
See the course information handout.
There are two sections of the course. Since both sections are fully subscribed, please attend the one you are registered for.
Instructor  Lecture Time  Lecture Room  Tutorial Time  Tutorial Room  
Section 1  Jimmy Ba  Tuesday 12 Thursday 12 
MS 2172  Thursday 23  MS 2172 
Section 2  Roger Grosse  Tuesday 68  BA 1170  Tuesday 89  BA 1170 
Most written homeworks and programming assignments will be due on Thursdays at 11:59pm. Please see the course information handout for detailed policies (marking, lateness, etc.).
The following schedule is subject to change.
Out  Due  Materials  Notes  
Homework 1  1/18  1/24  [Handout]  
Programming Assignment 1  1/18  1/31 
[Handout] [Starter Code] 

Homework 2  2/1 
2/11 
[Handout] [maml.py] 

Programming Assignment 2  2/16  2/28 
[Handout] [Starter Code] 

Homework 3  3/1  3/7  [Handout]  
Homework 4  3/8  3/14  [Handout]  
Programming Assignment 3  3/15 
3/22 
[Handout] [Starter Code] 

Programming Assignment 4  3/22 
3/31 
[Handout] [Starter Code] 

Homework 5  3/29  4/4  [Handout] 
Grad students will do a final project in place of the final exam. Students must form teams of 23. The deadline for proposals is March 1, but you are encouraged to submit a proposal earlier so that you can receive feedback earlier. The deadline for final reports is April 18 26. You can find the full project requirements here.
All students (undergrads and grad students) must take the midterm test. It will be held from 6:107:40pm on Friday, Feb. 15, in EX 200 (Exam Centre). It will be a 90 minute exam.
It will cover up through Lecture 9 (conv nets). Only material covered in lecture will be tested, so we won't test material that is only in the tutorials, readings, etc. However, we will place more emphasis on topics you've had an opportunity to practice in homeworks, tutorials, etc. There will be some conceptual questions, and some mathematical questions (similar to individual steps in the homeworks).
The format will be similar to CSC321 midterms from past years, so you might like to use these to practice. Note that the topics covered in different years might not correspond exactly.
Here are the midterm questions and solutions.
Only undergrads will take the final exam. Grad students do a final project instead.
The exam will take place from 9amnoon on Thursday, April 25. The rooms are as follows:
Practice exams. These are from CSC321, a thirdyear version of this course. All but 2017 and 2018 were with different instructors, and topics varied from year to year.
Here is a tentative schedule, which will likely change as the course goes on. Each "Lecture" corresponds to 50 minutes, so each 2hour lecture session will cover 2 of them.
Suggested readings are just that: resources we recommend to help you understand the course material. They are not required, i.e. you are only responsible for the material covered in lecture. Most of the papers listed are more advanced than the corresponding lecture, and are of interest if you want to know where our knowledge comes from or follow current frontiers of research.
Goodfellow = I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning.
Topic  Dates  Slides  Suggested Readings  
Lecture 1  Introduction  1/8  [Slides]  Course notes: Introduction 
Lecture 2  Linear Models  1/8, 1/10  [Slides]  Course notes (hopefully all review): 
Lecture 3  Multilayer Perceptrons  1/15  [Slides] [Colab] 
Course notes: Multilayer Perceptrons 
Lecture 4  Backpropagation  1/15, 1/17  [Slides]  Course notes: Backpropagation 
Lecture 5  Distributed Representations  1/22  [Slides] 
Course notes: Distributed Representations Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. JMLR 2003. J. Pennington, R. Socher, and C. Manning. GloVe: Global vectors for word representation. EMNLP 2014. 
Lecture 6  Automatic Differentiation  1/22, 1/24  [Slides] 
Course notes: Automatic Differentiation D. Maclaurin, D. Duvenaud, and R. P. Adams. Gradientbased hyperparameter optimization through reversible learning. ICML 2015. 
Lecture 7  Optimization I  1/29  [Slides] 
Course notes: Optimization Goodfellow, Chapter 8 G. Goh. Why momentum really works. Distill, 2017. C. Shallue, J. Lee, J. Antognini, J. SohlDickstein, R. Frostig, and G. E. Dahl. Measuring the effects of data parallelism on neural network training. arXiv, 2018. 
Lecture 8  Optimization II  1/29, 1/31  See L7.  
Lecture 9  Convolutional Networks  2/5  [Slides] 
Course notes: Convolutional Networks Goodfellow, Sections 9.19.5 
Lecture 10  Image Classification (not tested) 
2/5, 2/7  [Slides] 
Course notes: Image Classification A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. NIPS, 2012. C. Szegedy et al. Going deeper with convolutions. CVPR, 2015. O. Russakovsky et al. ImageNet large scale visual recognition challenge. IJCV, 2015. 
Lecture 11  Optimizing the Input (not tested) 
2/12  [Slides] 
C. Olah, A. Mordvintsev, and L. Schubert. Feature visualization: how neural networks build up their understanding of images. Distill, 2017. 
catchup  2/12, 2/14  
Midterm Test  2/15  
Reading Week  2/182/22  
Lecture 12  Generalization  2/26  [Slides] 
Course notes: Generalization Goodfellow, Chapter 7 N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. JMLR, 2014. 
Lecture 13  Recurrent Neural Nets  2/26, 2/28  [Slides] 
Course notes: Recurrent Neural Nets Goodfellow, 10.110.4 
Lecture 14  Exploding and Vanishing Gradients  3/5  [Slides] 
Course notes: Exploding and Vanishing Gradients 
Lecture 15  Autoregressive and Reversible Models  3/5, 3/7  [Slides] 
Course notes: Autoregressive and Reversible Models A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu. Pixel recurrent neural networks. ICML 2016. A. van den Oord et al. WaveNet: a generative model for raw audio. 2016 L. Dinh, J. SohlDickstein, and S. Bengio. Density estimation using Real NVP. ICLR 2017. 
Lecture 16  Attention  3/12  [Slides] 
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. ICLR 2015. A. Vaswani et al. Attention is all you need. NIPS 2017. A Graves et al. Hybrid computing using a neural network with dynamic external memory. Nature, 2016. 
Lecture 17  Variational Autoencoders  3/12, 3/14  [Slides] 
Background: C. Olah. Visual Information Theory D. Kingma and M. Welling. Autoencoding variational Bayes. ICLR 2014. 
Lecture 18  Generative Adversarial Nets  3/19  [Slides] 
I. Goodfellow et al. Generative adversarial nets. NIPS 2014. J.Y. Zhu et al. Unpaired imagetoimage translation using cycleconsistent adversarial networks. ICCV 2017. 
catchup  3/19, 3/21  
Lecture 19 
Bayesian Neural Nets (not on exam) 
3/26  [Slides] 
A. Graves. Practical variational inference for neural networks. NIPS 2011. C. Blundell et al.. Weight uncertainty in neural networks. ICML 2015. 
Lecture 20  Policy Gradient  3/26, 3/28  [Slides] 
J. Peters and S. Schaal. Policy gradient methods for robotics. IROS 2006. J. Schulman et al. Proximal policy optimization algorithms. 2017 
Lecture 21  QLearning  4/2  [Slides] 
A. Mnih et al. Humanlevel control through deep reinforcement learning. Nature, 2015. 
Lecture 22  Go  4/2, 4/4  [Slides] 
D. Silver et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016. D. Silver et al. Mastering the game of Go without human knowledge. Nature, 2017. T. Anthony, Z. Tian, and D. Barber. Thinking fast and slow with deep learning and tree search. NIPS 2017. 
Dates  Topic  Materials  
Tutorial 1  1/15, 1/17  Multivariable Calculus Review 
[ipynb] [PDF] 
Tutorial 2  1/22, 1/24  Autograd  [ipynb] 
Tutorial 3  1/29, 1/31  PyTorch  [ipynb] 
Tutorial 4  2/5, 2/7  Conv Nets  [ipynb] 
Tutorial 5  2/12, 2/14  Midterm Review  
2/15  Midterm Test  
2/182/22  Reading Week  
Tutorial 6  2/26, 2/28  Neural Net Best Practices  [Colab] 
Tutorial 7  3/5, 3/7  RNNs  [ipynb] 
Tutorial 8  3/12, 3/14  Information Theory  [ipynb] 
Tutorial 9  3/19, 3/21  Pyro  [ipynb] 
Tutorial 10  3/26, 3/28  Reinforcement Learning 
[Slides] [ipynb] 
Tutorial 11  4/2, 4/4  Final Exam Review 