CSC 321 Winter 2018

Intro to Neural Networks and Machine Learning

Source: CycleGAN. You will implement this model for Assignment 4.

Overview

Machine learning is a powerful set of techniques that allow computers to learn from data rather than having a human expert program a behavior by hand. Neural networks are a class of machine learning algorithm originally inspired by the brain, but which have recently have seen a lot of success at practical applications. They're at the heart of production systems at companies like Google and Facebook for face recognition, speech-to-text, and language understanding.

This course gives an overview of both the foundational ideas and the recent advances in neural net algorithms. Roughly the first 2/3 of the course focuses on supervised learning -- training the network to produce a specified behavior when one has lots of labeled examples of that behavior. The last 1/3 focuses on unsupervised learning and reinforcement learning.

Policies (marking, prerequisites, etc.)

See the course information handout.

Where and When

There are two sections of the course. Since both sections are fully subscribed, please attend the one you are assigned.

Winter term 2018
Instructor: Roger Grosse
Office hours: Mondays 10am-noon, Pratt 290F. (That's the D. L. Pratt Building, not the E. J. Pratt Library!)
Teaching assistants: George-Alexandru (Alex) Adam, Tristan Aumentado-Armstrong, Harris Chan, Jing Yao (Jason) Li, Matthew MacKay, Shengyang (Chase) Sun, Bowen Xu, and Guodong (Jelly) Zhang
Staff e-mails:

TAs and instructor: csc321staff [at] cs.toronto.edu
Instructor only: rgrosse [at] cs.toronto.edu
Please do not contact the TAs at their personal e-mails.

Afternoon section:

Lectures: Tuesdays and Thursdays, 1:10-2:00pm, in Bahen 1170.
Tutorials: Thursdays, 2:10-3:00pm. There is no tutorial on January 5. Rooms assigned by last name:
- A-J: Bahen 3008
- K-R: ~~Bahen 3012~~ Bahen 1170
- S-Z: Bahen 2145

Night section:

Lectures: Tuesdays, 6:10-8:00pm, in Bahen 1170.
Tutorials: Tuesdays, 8:20-9:00pm. Note the unusual start time. This is meant to give you a dinner break. Rooms assigned by last name:
- A-J: Lash Miller Chemical Labs (LM) 157
- K-R: Astronomy and Astrophysics (AB) 114
- S-Z: Bahen 1200

Communications

All course-related announcements will be sent to the class mailing list, csc321h1s [at] teach.cs.toronto.edu.

We will use Piazza for the course forum. Details to follow.

If you want to contact the course staff privately, please e-mail csc321staff [at] cs.toronto.edu (for the TAs and the instructor) or rgrosse [at] cs.toronto.edu (for only the instructor).

Homeworks, Programming Assignments, and Tests

All assignment deadlines are at 11:59pm on the date listed. Please see the course information handout for detailed policies (marking, lateness, etc.).

Marking scheme:

Midterm: 15%
Final exam: 35%. A minimum mark of 30% is required in order to pass the course.
Written homeworks: 20% total. (There are 6 total, and the lowest mark will be dropped, so they're worth 4% each.)
4 Programming assignments: 30% total. (Your highest two marks will count for 10% each, and your lowest two will count for 5% each.)

Schedule:

Homework 1 (due ~~1/17~~ 1/19)
Homework 2 (due 1/24)
- Extra TA office hours:
  - Tuesday, 1/23, 2-3pm, in Pratt 290C
  - Wednesday, 1/24, 10-11am, in Pratt 290C.
Homework 3 (due 1/31)
- Extra TA office hours:
  - Tuesday, 1/30, 3-4pm, in Pratt 290C
  - Wednesday, 1/31, 2-3pm, in Pratt 290C
Programming Assignment 1: Learning Word Representations (due 2/7)
- Handout
- Code
- Extra TA office hours:
  - Friday, 2/2, 11am-noon, in Pratt 290C
  - Tuesday, 2/6, 3-4pm, in Pratt 290C
  - Wednesday, 2/7, 11am-12, in Pratt 290C
Homework 4 (due 2/14)
- Extra TA office hours:
  - Tuesday, 2/13, 3-4pm, in ~~Pratt 290C~~ Pratt 378
  - Wednesday, 2/14, 11am-12, in ~~Pratt 290C~~ Pratt 378
Programming Assignment 2: Image Colorization (due 2/28)
- Handout
- Code
- Extra TA office hours:
  - Friday, 2/23, 4-5pm, in Pratt 266
  - Tuesday, 2/27, 11am-12, in Pratt 266
  - Wednesday, 2/28, 11am-12, in Pratt 266
Midterm

Afternoon: The test will be held during lecture time, 3/6, 1:10-2:00pm. There will be lecture as usual on 3/8, but no tutorial.
Night: The test will be held during lecture time, 3/6, 6:10-7:00pm. There will be a half-hour break, followed by a lecture from 7:30-8:30, and no tutorial.
The test will be similar in format and difficulty to the 2017 midterm.
Practice tests:
- 2013 Midterm
- Questions from 2014 Midterm
- 2015 Midterm
  - Version 1, Solutions
  - Version 2, Solutions
  - Note: this test was too hard, and the marks were adjusted upwards.
- 2017 Midterm
Extra TA office hours:
- Thursday, 3/1, 4-5pm, Pratt 266
- Friday, 3/2, 2-3pm, Pratt 266
- Monday, 3/5, 3-4pm, Pratt 266
Versions: night, afternoon
Solutions

Homework 5 (due ~~3/14~~ 3/16)
- Extra TA office hours:
  - Tuesday, 3/13, 11am-noon, in PT 266
  - Thursday, 3/15, 3:30-4:30pm, in PT 266
  - Friday, 3/16, 10-11am, in PT 266
Programming Assignment 3: Translation (due ~~3/21~~ 3/23)
- Handout
- Code
- Extra TA office hours:
  - Friday, 3/16, 4-5pm, in PT 266
  - Monday, 3/19, 1-2pm, in PT 266
  - Wednesday, 3/21, 3-4pm, in PT 378
  - Friday, 3/23, 1-2pm, in PT 266
Homework 6 ~~(due 3/28)~~ Canceled.
Programming Assignment 4: Style Transfer (due 4/3, but you may hand it in on 4/4 with no penalty.)
- Handout
- Code
- Extra TA office hours:
  - Wednesday, 3/28, 11am-noon, in Pratt 266
  - Friday, 3/30, 4-5pm, in Pratt 266
  - Monday, 4/2, noon-1pm, in Pratt 266
  - Wednesday, 4/4, 11am-noon, in Pratt 266
Final Exam: Friday, 4/20, 9am-noon
- Last names A-Y: Clara Benson Building (BN) 2N
- Last names Z: Clara Benson Building (BN) 2S
- Past exams (all but 2017 were under different instructors):
  - 2011
  - 2012
  - 2013
  - 2016
  - 2017, and solutions
    - Questions 4, 9, 16, 17, 18, and 20 are on material we didn't cover.
- Exam, and Solutions

Lectures

Lecture 1: Introduction [Slides] [Lecture Notes]

Afternoon: 1/4, 1-2pm; Night: 1/9, 6-7pm

What are machine learning and neural networks, and what would you use them for? Supervised, unsupervised, and reinforcement learning. How this course is organized.
Lecture 2: Linear Regression [Slides] [Lecture Notes]

Afternoon: 1/9, 1-2pm; Night: 1/9, 7-8pm

Linear regression, a supervised learning task where you want to predict a scalar valued target. Formulating it as an optimization problem, and solving either directly or with gradient descent. Vectorization. Feature maps and polynomial regression. Generalization: overfitting, underfitting, and validation.
Lecture 3: Linear Classification [Slides] [Lecture Notes]

Afternoon: 1/11, 1-2pm; Night: 1/16, 6-7pm

Binary linear classification. Visualizing linear classifiers. The perceptron algorithm. Limits of linear classifiers.
Lecture 4: Learning a Classifier [Slides] [Lecture Notes]

Afternoon: 1/16, 1-2pm; Night: 1/16, 7-8pm

Comparison of loss functions for binary classification. Cross-entropy loss, logistic activation function, and logistic regression. Hinge loss. Multiway classification. Convex loss functions. Gradient checking. (Note: this is really a lecture-and-a-half, and will run into what's scheduled as Lecture 5.)
Lecture 5: Multilayer Perceptrons [Slides] [Lecture Notes]

Afternoon: 1/18, 1-2pm; Night: 1/23, 6-7pm

Multilayer perceptrons. Comparison of activation functions. Viewing deep neural nets as function composition and as feature learning. Limitations of linear networks and universality of nonlinear networks.

Suggested reading: Deep Learning Book, Sections 6.1-6.4
Lecture 6: Backpropagation [Slides] [Lecture Notes]

Afternoon: 1/23, 1-2pm; Night: 1/23, 7-8pm

The backpropagation algorithm, a method for computing gradients which we use throughout the course.
Lecture 7: Distributed Representations [Slides] [Lecture Notes]

Afternoon: 1/25, 1-2pm; Night: 1/30, 6-7pm

Language modeling, n-gram models (a localist representation), neural language models (a distributed representation), and skip-grams (another distributed representation).
Lecture 8: Optimization [Slides] [Lecture Notes]

Afternoon: 1/30, 1-2pm; Night: 1/30, 7-8pm

How to use the gradients computed by backprop. Features of optimization landscapes: local optima, saddle points, plateaux, ravines. Stochastic gradient descent and momentum.

Suggested reading: Deep Learning Book, Chapter 8
Lecture 9: Generalization [Slides] [Lecture Notes]

Afternoon: 2/1, 1-2pm; Night: 2/6, 6-7pm

Bias/variance decomposition, data augmentation, limiting capacity, early stopping, weight decay, ensembles, stochastic regularization, hyperparameter tuning.

Suggested reading: Deep Learning Book, Chapter 7
Lecture 10: Automatic Differentiation [Slides]

Afternoon: 2/6, 1-2pm; Night: 2/6, 7-8pm

How to implement an automatic differentiation system. Based on Autodidact, a pedagogical implementation of Autograd.
Lecture 11: Convolutional Networks [Slides] [Lecture Notes]

Afternoon: 2/8, 1-2pm; Night: 2/13, 6-7pm

Convolution operation. Convolution layers and pooling layers. Equivariance and invariance. Backprop rules for conv nets.
Lecture 12: Image Classification [Slides] [Lecture Notes]

Afternoon: 2/13, 1-2pm; Night: 2/13, 7-8pm

Conv net architectures applied to handwritten digit and object classification. Measuring the size of a conv net.
Lecture 13: Catch-Up

Afternoon: 2/15, 1-2pm; Night: 2/27, 6-7pm

There is no Lecture 13 because we're superstitious. Also, we've fallen roughly a full lecture behind schedule, so this will sync up the schedule with what's actually covered.
Lecture 14: Optimizing the Input [Slides: 1, 2]

Afternoon: 2/27, 1-2pm; Night: 2/27, 7-8pm

Interesting things you can do with gradient descent on the inputs: conv net visualizations, adversarial inputs, Deep Dream.
Lecture 15: Recurrent Neural Nets [Slides] [Lecture Notes]

Afternoon: 3/1, 1-2pm; Night: 3/6, 7:30-8:30pm

Recurrent neural nets. Backprop through time. Applying RNNs to language modeling and machine translation.
Lecture 16: Learning Long-Term Dependencies [Slides] [Lecture Notes]

Afternoon: 3/8, 1-2pm; Night: 3/13, 6-7pm

Why RNN gradients explode and vanish, both in terms of the mechanics of backprop, and conceptually in terms of the function the RNN computes. Ways to deal with it: gradient clipping, input reversal, LSTM.
Lecture 17: ResNets and Attention [Slides] [Lecture Notes]

Afternoon: 3/13, 1-2pm; Night: 3/13, 7-8pm

Deep Residual Networks. Attention-based models for machine translation and caption generation.
Lecture 18: Learning Probabilistic Models [Slides] [Lecture Notes]

Afternoon: 3/15, 1-2pm; Night: 3/20, 6-7pm

Maximum likelihood estimation. Optional: basics of Bayesian parameter estimation and maximum a-posteriori estimation.
Lecture 19: Generative Adversarial Networks [Slides] [Lecture Notes]

Afternoon: 3/20, 1-2pm; Night: 3/20, 7-8pm

Topics TBA
Lecture 20: Autoregressive and Reversible Models [Slides] [Lecture Notes]

Afternoon: 3/22, 1-2pm; Night: 3/27, 6-7pm

Topics TBA
Lecture 21: Policy Gradient [Slides]

Afternoon: 3/27, 1-2pm; Night: 3/27, 7-8pm

Topics TBA
Lecture 22: Q-Learning [Slides]

Afternoon: 3/29, 1-2pm; Night: 4/3, 6-7pm

Topics TBA
Lecture 23: Go [Slides]

Afternoon: 4/3, 1-2pm; Night: 4/3, 7-8pm

Topics TBA

Tutorials

Note that there is no tutorial after the first lecture for the afternoon section, and no tutorial after the final lecture for the night section.

Tutorial 1: Linear Regression [PDF] [IPython Notebook]

Afternoon: 1/11, 2-3pm; Night: 1/9, 8-9pm
Tutorial 2: Classification [PDF] [IPython Notebook]

Afternoon: 1/18, 2-3pm; Night: 1/16, 8-9pm
Tutorial 3: Backpropagation [IPython Notebook] [Derivation (PDF)]

Afternoon: 1/25, 2-3pm; Night: 1/23, 8-9pm
Tutorial 4: Autograd [IPython Notebook]

Afternoon: 2/1, 2-3pm; Night: 1/30, 8-9pm
Tutorial 5: PyTorch [IPython Notebooks: 1, 2, 3, 4]]

Afternoon: 2/8, 2-3pm; Night: 2/6, 8-9pm
Tutorial 6: Conv Nets [Slides] [IPython Notebooks: MNIST, CIFAR-10]

Afternoon: 2/15, 2-3pm; Night: 2/13, 8-9pm
Tutorial 7: Midterm Review

Afternoon: 3/1, 2-3pm; Night: 2/27, 8-9pm

This tutorial will effectively be extra office hours. But if there are recurring questions, or solutions to past exams you'd like to see gone over, we can discuss those as a class. Tutorial will be held in the main lecture hall.
Tutorial 8: Attention and Maximum Likelihood [starter code and data, solution]

Afternoon: 3/15, 2-3pm; Night: 3/13, 8-9pm
Tutorial 9: GANs [GAN, DCGAN (incomplete), DCGAN]

Afternoon: 3/22, 2-3pm; Night: 3/20, 8-9pm
Tutorial 10: Policy Gradient [slides, IPython Notebook]

Afternoon: 3/29, 2-3pm; Night: 3/27, 8-9pm

Computing resources

The programming assignments will be done in Python using the NumPy scientific computing library. Some of the assignments will also use PyTorch. More details to follow.

Python and NumPy

The programming assignments will all be done in Python using the NumPy scientific computing library, but prior knowledge of Python is not required. Basic Python will be taught in a tutorial. We will be using Python 2, not Python 3, since this is the version more commonly used in machine learning.

You have several options for how to use Python:

You can install Python yourself on your own machine. For most of you, this will be the most convenient option. (Our assignments will not require especially heavy computation.)
- Anaconda provides a single-click installer for most common platforms, and this is likely the easiest way to install Python and the required libraries.
- You can install Python, NumPy, and Matplotlib manually. This takes a bit more work than using Anaconda.
You can run it on the Teaching Labs machines. All required libraries are already installed. Accounts should have already been created for registered students by the start of the course. If you are having a problem with a CDF account, ask us.

Once Python is installed, there are two ways you can edit and run Python code:

You can edit the code in a general-purpose text editor, such as Emacs, Vim, or GEdit, and run Python from the command line. (If you’re not already familiar with a text editor, GEdit is probably the easiest to start with.) We recommend IPython rather than the default Python console. If you’re already comfortable with one of these editors and with the command line, this may be the easiest way to go. For most of us in the machine learning research group at U of T, this is how we use Python on a day-to-day basis. If you’re new to this mode of programming, it may take 5-10 hours before you feel comfortable with it. But if you’re concentrating in computer science, you’ll need to learn this stuff eventually, so why not now?
If you’re newer to programming, you may feel more comfortable with an IDE. We recommend Spyder because it’s included in Anaconda, and it’s intended for the sort of numerical computing we do in this class. Here are instructions for using it with Anaconda. There are lots of other IDEs for Python, though.

Here are some recommended background readings on Python and NumPy.

If you have programming experience but not in Python, read Learn X in Y Minutes for a concise summary of the language. You can probably pick up Python quickly if you are familiar with another general-purpose language (C, Java, Matlab, etc.).
Read this tutorial on NumPy, the library we’ll use for array manipulation and linear algebra.

AutoGrad

TBA

PyTorch

TBA