CSC 421/2516 Winter 2019

Neural Networks and Deep Learning

Overview

Machine learning is a powerful set of techniques that allow computers to learn from data rather than having a human expert program a behavior by hand. Neural networks are a class of machine learning algorithm originally inspired by the brain, but which have recently have seen a lot of success at practical applications. They're at the heart of production systems at companies like Google and Facebook for face recognition, speech-to-text, and language understanding.

This course gives an overview of both the foundational ideas and the recent advances in neural net algorithms. Roughly the first 2/3 of the course focuses on supervised learning --- training the network to produce a specified behavior when one has lots of labeled examples of that behavior. The last 1/3 focuses on unsupervised learning and reinforcement learning.

Policies (marking, prerequisites, etc.)

See the course information handout.

Where and When

There are two sections of the course. Since both sections are fully subscribed, please attend the one you are registered for.

	Instructor	Lecture Time	Lecture Room	Tutorial Time	Tutorial Room
Section 1	Jimmy Ba	Tuesday 1-2 Thursday 1-2	MS 2172	Thursday 2-3	MS 2172
Section 2	Roger Grosse	Tuesday 6-8	BA 1170	Tuesday 8-9	BA 1170

Teaching Staff

Instructors: Roger Grosse, Jimmy Ba
Office Hours:
- Roger: Monday 5-6pm in PT290F
- Jimmy: Friday 1-2pm in PT290D
Head TA: Alex Adam
TAs: TBA
Staff emails:
- TAs and instructors: csc421staff [at] cs.toronto.edu
- Instructors and head TA only: csc421instructors [at] cs.toronto.edu
- Please do not contact us at our personal emails.
We will use Piazza for the course forum. If your question is about the course material and doesn't give away any hints for the homework, please post to Piazza so that the entire class can benefit from the answer.

Assignments

Most written homeworks and programming assignments will be due on Thursdays at 11:59pm. Please see the course information handout for detailed policies (marking, lateness, etc.).

The following schedule is subject to change.

	Out	Due	Materials	Notes
Homework 1	1/18	1/24	[Handout]
Programming Assignment 1	1/18	1/31	[Handout] [Starter Code]
Homework 2	2/1	~~2/7~~ 2/11	[Handout] [maml.py]
Programming Assignment 2	2/16	2/28	[Handout] [Starter Code]
Homework 3	3/1	3/7	[Handout]
Homework 4	3/8	3/14	[Handout]
Programming Assignment 3	3/15	~~3/21~~ 3/22	[Handout] [Starter Code]
Programming Assignment 4	3/22	~~3/28~~ 3/31	[Handout] [Starter Code]
Homework 5	3/29	4/4	[Handout]

Grad Student Projects

Grad students will do a final project in place of the final exam. Students must form teams of 2-3. The deadline for proposals is March 1, but you are encouraged to submit a proposal earlier so that you can receive feedback earlier. The deadline for final reports is April 18 26. You can find the full project requirements here.

Tests

Midterm

All students (undergrads and grad students) must take the midterm test. It will be held from 6:10-7:40pm on Friday, Feb. 15, in EX 200 (Exam Centre). It will be a 90 minute exam.

It will cover up through Lecture 9 (conv nets). Only material covered in lecture will be tested, so we won't test material that is only in the tutorials, readings, etc. However, we will place more emphasis on topics you've had an opportunity to practice in homeworks, tutorials, etc. There will be some conceptual questions, and some mathematical questions (similar to individual steps in the homeworks).

The format will be similar to CSC321 midterms from past years, so you might like to use these to practice. Note that the topics covered in different years might not correspond exactly.

2015 midterm: version 1 and solutions; version 2 and solutions. Note that this exam was too difficult, and the marks were adjusted upwards.
2017 midterm: version 1, version 2, and solutions.
2018 midterm: version 1, version 2, and solutions.

Here are the midterm questions and solutions.

Final Exam

Only undergrads will take the final exam. Grad students do a final project instead.

The exam will take place from 9am-noon on Thursday, April 25. The rooms are as follows:

Surname A-G: Bahen (BA) 2159
Surname H-Z: Medical Sciences (MS) 2158
No, it's not a typo that it's two different buildings with adjacent room numbers.

Practice exams. These are from CSC321, a third-year version of this course. All but 2017 and 2018 were with different instructors, and topics varied from year to year.

Lectures

Here is a tentative schedule, which will likely change as the course goes on. Each "Lecture" corresponds to 50 minutes, so each 2-hour lecture session will cover 2 of them.

Suggested readings are just that: resources we recommend to help you understand the course material. They are not required, i.e. you are only responsible for the material covered in lecture. Most of the papers listed are more advanced than the corresponding lecture, and are of interest if you want to know where our knowledge comes from or follow current frontiers of research.

Goodfellow = I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning.

	Topic	Dates	Slides	Suggested Readings
Lecture 1	Introduction	1/8	[Slides]	Course notes: Introduction
Lecture 2	Linear Models	1/8, 1/10	[Slides]	Course notes (hopefully all review): Linear Regression Linear Classifiers Training a Classifier
Lecture 3	Multilayer Perceptrons	1/15	[Slides] [Colab]	Course notes: Multilayer Perceptrons
Lecture 4	Backpropagation	1/15, 1/17	[Slides]	Course notes: Backpropagation
Lecture 5	Distributed Representations	1/22	[Slides]	Course notes: Distributed Representations Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. JMLR 2003. J. Pennington, R. Socher, and C. Manning. GloVe: Global vectors for word representation. EMNLP 2014.
Lecture 6	Automatic Differentiation	1/22, 1/24	[Slides]	Course notes: Automatic Differentiation Autograd tutorial Autodidact D. Maclaurin, D. Duvenaud, and R. P. Adams. Gradient-based hyperparameter optimization through reversible learning. ICML 2015.
Lecture 7	Optimization I	1/29	[Slides]	Course notes: Optimization Goodfellow, Chapter 8 G. Goh. Why momentum really works. Distill, 2017. C. Shallue, J. Lee, J. Antognini, J. Sohl-Dickstein, R. Frostig, and G. E. Dahl. Measuring the effects of data parallelism on neural network training. arXiv, 2018.
Lecture 8	Optimization II	1/29, 1/31		See L7.
Lecture 9	Convolutional Networks	2/5	[Slides]	Course notes: Convolutional Networks Goodfellow, Sections 9.1-9.5
Lecture 10	Image Classification (not tested)	2/5, 2/7	[Slides]	Course notes: Image Classification A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. NIPS, 2012. C. Szegedy et al. Going deeper with convolutions. CVPR, 2015. O. Russakovsky et al. ImageNet large scale visual recognition challenge. IJCV, 2015.
Lecture 11	Optimizing the Input (not tested)	2/12	[Slides]	C. Olah, A. Mordvintsev, and L. Schubert. Feature visualization: how neural networks build up their understanding of images. Distill, 2017.
catch-up		2/12, 2/14
	Midterm Test	2/15
	Reading Week	2/18-2/22
Lecture 12	Generalization	2/26	[Slides]	Course notes: Generalization Goodfellow, Chapter 7 N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. JMLR, 2014.
Lecture 13	Recurrent Neural Nets	2/26, 2/28	[Slides]	Course notes: Recurrent Neural Nets Goodfellow, 10.1-10.4
Lecture 14	Exploding and Vanishing Gradients	3/5	[Slides]	Course notes: Exploding and Vanishing Gradients
Lecture 15	Autoregressive and Reversible Models	3/5, 3/7	[Slides]	Course notes: Autoregressive and Reversible Models A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu. Pixel recurrent neural networks. ICML 2016. A. van den Oord et al. WaveNet: a generative model for raw audio. 2016 L. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using Real NVP. ICLR 2017.
Lecture 16	Attention	3/12	[Slides]	D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. ICLR 2015. A. Vaswani et al. Attention is all you need. NIPS 2017. A Graves et al. Hybrid computing using a neural network with dynamic external memory. Nature, 2016.
Lecture 17	Variational Autoencoders	3/12, 3/14	[Slides]	Background: C. Olah. Visual Information Theory D. Kingma and M. Welling. Auto-encoding variational Bayes. ICLR 2014.
Lecture 18	Generative Adversarial Nets	3/19	[Slides]	I. Goodfellow et al. Generative adversarial nets. NIPS 2014. J.-Y. Zhu et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV 2017.
	catch-up	3/19, 3/21
Lecture 19	Bayesian Neural Nets (not on exam)	3/26	[Slides]	A. Graves. Practical variational inference for neural networks. NIPS 2011. C. Blundell et al.. Weight uncertainty in neural networks. ICML 2015.
Lecture 20	Policy Gradient	3/26, 3/28	[Slides]	J. Peters and S. Schaal. Policy gradient methods for robotics. IROS 2006. J. Schulman et al. Proximal policy optimization algorithms. 2017
Lecture 21	Q-Learning	4/2	[Slides]	A. Mnih et al. Human-level control through deep reinforcement learning. Nature, 2015. Nature article (closed-access) Preprint (open access)
Lecture 22	Go	4/2, 4/4	[Slides]	D. Silver et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016. D. Silver et al. Mastering the game of Go without human knowledge. Nature, 2017. Nature article (closed-access) Preprint (open access) T. Anthony, Z. Tian, and D. Barber. Thinking fast and slow with deep learning and tree search. NIPS 2017.

Tutorials

Here is a tentative tutorial schedule. Details may change as the course goes on.

	Dates	Topic	Materials
Tutorial 1	1/15, 1/17	Multivariable Calculus Review	[ipynb] [PDF]
Tutorial 2	1/22, 1/24	Autograd	[ipynb]
Tutorial 3	1/29, 1/31	PyTorch	[ipynb]
Tutorial 4	2/5, 2/7	Conv Nets	[ipynb]
Tutorial 5	2/12, 2/14	Midterm Review
	2/15	Midterm Test
	2/18-2/22	Reading Week
Tutorial 6	2/26, 2/28	Neural Net Best Practices	[Colab]
Tutorial 7	3/5, 3/7	RNNs	[ipynb]
Tutorial 8	3/12, 3/14	Information Theory	[ipynb]
Tutorial 9	3/19, 3/21	Pyro	[ipynb]
Tutorial 10	3/26, 3/28	Reinforcement Learning	[Slides] [ipynb]
Tutorial 11	4/2, 4/4	Final Exam Review