CSC 421/2516 Winter 2019

Neural Networks and Deep Learning


Machine learning is a powerful set of techniques that allow computers to learn from data rather than having a human expert program a behavior by hand. Neural networks are a class of machine learning algorithm originally inspired by the brain, but which have recently have seen a lot of success at practical applications. They're at the heart of production systems at companies like Google and Facebook for face recognition, speech-to-text, and language understanding.

This course gives an overview of both the foundational ideas and the recent advances in neural net algorithms. Roughly the first 2/3 of the course focuses on supervised learning --- training the network to produce a specified behavior when one has lots of labeled examples of that behavior. The last 1/3 focuses on unsupervised learning and reinforcement learning.

Policies (marking, prerequisites, etc.)

See the course information handout.

Where and When

There are two sections of the course. Since both sections are fully subscribed, please attend the one you are registered for.

Instructor Lecture Time Lecture Room Tutorial Time Tutorial Room
Section 1 Jimmy Ba Tuesday 1-2
Thursday 1-2
MS 2172 Thursday 2-3 MS 2172
Section 2 Roger Grosse Tuesday 6-8 BA 1170 Tuesday 8-9 BA 1170

Teaching Staff


Most written homeworks and programming assignments will be due on Thursdays at 11:59pm. Please see the course information handout for detailed policies (marking, lateness, etc.).

The following schedule is subject to change.

Out Due Materials Notes
Homework 1 1/18 1/24 [Handout]
Programming Assignment 1 1/18 1/31 [Handout]
[Starter Code]
Homework 2 2/1 2/7
Programming Assignment 2 2/16 2/28 [Handout]
[Starter Code]
Homework 3 3/1 3/7 [Handout]
Homework 4 3/8 3/14 [Handout]
Programming Assignment 3 3/15 3/21
[Starter Code]
Programming Assignment 4 3/22 3/28
[Starter Code]
Homework 5 3/29 4/4 [Handout]

Grad Student Projects

Grad students will do a final project in place of the final exam. Students must form teams of 2-3. The deadline for proposals is March 1, but you are encouraged to submit a proposal earlier so that you can receive feedback earlier. The deadline for final reports is April 18 26. You can find the full project requirements here.



All students (undergrads and grad students) must take the midterm test. It will be held from 6:10-7:40pm on Friday, Feb. 15, in EX 200 (Exam Centre). It will be a 90 minute exam.

It will cover up through Lecture 9 (conv nets). Only material covered in lecture will be tested, so we won't test material that is only in the tutorials, readings, etc. However, we will place more emphasis on topics you've had an opportunity to practice in homeworks, tutorials, etc. There will be some conceptual questions, and some mathematical questions (similar to individual steps in the homeworks).

The format will be similar to CSC321 midterms from past years, so you might like to use these to practice. Note that the topics covered in different years might not correspond exactly.

Here are the midterm questions and solutions.

Final Exam

Only undergrads will take the final exam. Grad students do a final project instead.

The exam will take place from 9am-noon on Thursday, April 25. The rooms are as follows:

Practice exams. These are from CSC321, a third-year version of this course. All but 2017 and 2018 were with different instructors, and topics varied from year to year.


Here is a tentative schedule, which will likely change as the course goes on. Each "Lecture" corresponds to 50 minutes, so each 2-hour lecture session will cover 2 of them.

Suggested readings are just that: resources we recommend to help you understand the course material. They are not required, i.e. you are only responsible for the material covered in lecture. Most of the papers listed are more advanced than the corresponding lecture, and are of interest if you want to know where our knowledge comes from or follow current frontiers of research.

Goodfellow = I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning.

Topic Dates Slides Suggested Readings
Lecture 1 Introduction 1/8 [Slides] Course notes: Introduction
Lecture 2 Linear Models 1/8, 1/10 [Slides] Course notes (hopefully all review):
Lecture 3 Multilayer Perceptrons 1/15 [Slides]
Course notes: Multilayer Perceptrons
Lecture 4 Backpropagation 1/15, 1/17 [Slides] Course notes: Backpropagation
Lecture 5 Distributed Representations 1/22 [Slides]

Course notes: Distributed Representations

Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. JMLR 2003.

J. Pennington, R. Socher, and C. Manning. GloVe: Global vectors for word representation. EMNLP 2014.

Lecture 6 Automatic Differentiation 1/22, 1/24 [Slides]

Course notes: Automatic Differentiation

Autograd tutorial


D. Maclaurin, D. Duvenaud, and R. P. Adams. Gradient-based hyperparameter optimization through reversible learning. ICML 2015.

Lecture 7 Optimization I 1/29 [Slides]

Course notes: Optimization

Goodfellow, Chapter 8

G. Goh. Why momentum really works. Distill, 2017.

C. Shallue, J. Lee, J. Antognini, J. Sohl-Dickstein, R. Frostig, and G. E. Dahl. Measuring the effects of data parallelism on neural network training. arXiv, 2018.

Lecture 8 Optimization II 1/29, 1/31 See L7.
Lecture 9 Convolutional Networks 2/5 [Slides]

Course notes: Convolutional Networks

Goodfellow, Sections 9.1-9.5

Lecture 10 Image Classification
(not tested)
2/5, 2/7 [Slides]

Course notes: Image Classification

A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. NIPS, 2012.

C. Szegedy et al. Going deeper with convolutions. CVPR, 2015.

O. Russakovsky et al. ImageNet large scale visual recognition challenge. IJCV, 2015.

Lecture 11 Optimizing the Input
(not tested)
2/12 [Slides]

C. Olah, A. Mordvintsev, and L. Schubert. Feature visualization: how neural networks build up their understanding of images. Distill, 2017.

catch-up 2/12, 2/14
Midterm Test 2/15
Reading Week 2/18-2/22
Lecture 12 Generalization 2/26 [Slides]

Course notes: Generalization

Goodfellow, Chapter 7

N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. JMLR, 2014.

Lecture 13 Recurrent Neural Nets 2/26, 2/28 [Slides]

Course notes: Recurrent Neural Nets

Goodfellow, 10.1-10.4

Lecture 14 Exploding and Vanishing Gradients 3/5 [Slides]

Course notes: Exploding and Vanishing Gradients

Lecture 15 Autoregressive and Reversible Models 3/5, 3/7 [Slides]

Course notes: Autoregressive and Reversible Models

A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu. Pixel recurrent neural networks. ICML 2016.

A. van den Oord et al. WaveNet: a generative model for raw audio. 2016

L. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using Real NVP. ICLR 2017.

Lecture 16 Attention 3/12 [Slides]

D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. ICLR 2015.

A. Vaswani et al. Attention is all you need. NIPS 2017.

A Graves et al. Hybrid computing using a neural network with dynamic external memory. Nature, 2016.

Lecture 17 Variational Autoencoders 3/12, 3/14 [Slides]

Background: C. Olah. Visual Information Theory

D. Kingma and M. Welling. Auto-encoding variational Bayes. ICLR 2014.

Lecture 18 Generative Adversarial Nets 3/19 [Slides]

I. Goodfellow et al. Generative adversarial nets. NIPS 2014.

J.-Y. Zhu et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV 2017.

catch-up 3/19, 3/21
Lecture 19 Bayesian Neural Nets
(not on exam)
3/26 [Slides]

A. Graves. Practical variational inference for neural networks. NIPS 2011.

C. Blundell et al.. Weight uncertainty in neural networks. ICML 2015.

Lecture 20 Policy Gradient 3/26, 3/28 [Slides]

J. Peters and S. Schaal. Policy gradient methods for robotics. IROS 2006.

J. Schulman et al. Proximal policy optimization algorithms. 2017

Lecture 21 Q-Learning 4/2 [Slides]

A. Mnih et al. Human-level control through deep reinforcement learning. Nature, 2015.

Lecture 22 Go 4/2, 4/4 [Slides]

D. Silver et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016.

D. Silver et al. Mastering the game of Go without human knowledge. Nature, 2017.

T. Anthony, Z. Tian, and D. Barber. Thinking fast and slow with deep learning and tree search. NIPS 2017.


Here is a tentative tutorial schedule. Details may change as the course goes on.
Dates Topic Materials
Tutorial 1 1/15, 1/17 Multivariable Calculus Review [ipynb]
Tutorial 2 1/22, 1/24 Autograd [ipynb]
Tutorial 3 1/29, 1/31 PyTorch [ipynb]
Tutorial 4 2/5, 2/7 Conv Nets [ipynb]
Tutorial 5 2/12, 2/14 Midterm Review
2/15 Midterm Test
2/18-2/22 Reading Week
Tutorial 6 2/26, 2/28 Neural Net Best Practices [Colab]
Tutorial 7 3/5, 3/7 RNNs [ipynb]
Tutorial 8 3/12, 3/14 Information Theory [ipynb]
Tutorial 9 3/19, 3/21 Pyro [ipynb]
Tutorial 10 3/26, 3/28 Reinforcement Learning [Slides]
Tutorial 11 4/2, 4/4 Final Exam Review