CSC 2515 Fall 2019

Machine Learning

Overview

Machine learning is a set of techniques that allow machines to learn from data and experience, rather than requiring humans to specify the desired behavior by hand. Over the past two decades, machine learning techniques have become increasingly central both in AI as an academic field, and in the technology industry. This course provides a broad introduction to some of the most commonly used ML algorithms.

The first half of the course focuses on supervised learning. We begin with nearest neighbours, decision trees, and ensembles. Then we introduce parametric models, including linear regression, logistic and softmax regression, and neural networks. We then move on to unsupervised learning, focusing in particular on probabilistic models, but also principal components analysis and K-means. Finally, we cover the basics of reinforcement learning.

Where and When

There are two sections of the course. Tutorials will be held in the main lecture room.

Lecture Time Tutorial Time Lecture/Tutorial Room Start End
Section 1 Wednesday 10am-noon Wednesday noon-1pm Bahen 1190 Sept. 11 Nov. 27
Section 2 Thursday 2-4pm Thursday 4-5pm Bahen 1180 Sept. 12 Nov. 28

Contact

Policies

Prerequisites (an undergrad course in each is sufficient):

Marking Scheme.

Collaboration policy. You are expected to work on the homeworks by yourself. You should not discuss them with anyone except the TAs or the instructor.

Academic Integrity. By this point in your studies, you've heard this lots of times, so we'll keep it brief: avoid academic offenses (i.e. cheating). All graded work in this course is individual work.

Lateness. Homeworks will be accepted up to 3 days late, but 10% will be deducted for each day late, rounded up to the nearest day.

Remarks. Remark requests for homeworks should be made through MarkUs, and will be considered by the same TA who marked the assignment. The deadline for requesting a remark is typically one week after the marked assignments are returned. Remark requests for exams will be handled by the instructor; details to be announced later.

Exceptions. Exceptions to the course policies such as late homeworks or missed tests require permission of the instructor. For medical excuses, you should obtain an official Student Medical Certificate.

Auditing. If you are a U of T student, then you may audit the course (i.e. sit in on the lectures) only if there are empty seats available after everyone enrolled in the course has been seated. Anyone else (i.e. non-students) is not permitted to audit; this is a University policy. No University resources will be committed to the auditor, i.e. we won't mark homeworks or exams.

Homeworks

Most homeworks will be due on Thursdays at 11:59pm. You will submit through MarkUs; directions are given in the assignment handouts.

Out Due Materials TA Office Hours
Homework 1 9/13 9/26 [Handout]
[clean_real.txt]
[clean_fake.txt]
[clean_script.py]
Fri 9/20, 12-1pm, in BA3201
Mon 9/23, 11am-noon, in BA3201
Wed 9/25, 2-4pm, in BA3201
Thu 9/26, 11am-noon, in BA3201
Homework 2 9/25 10/10 [Handout]
[q2.py]
Fri 10/4, 12-1pm, in BA3201
Mon 10/7, 11am-noon, in BA3201
Wed 10/9, 2-4pm, in BA3201
Thu 10/10, 11am-noon, in BA3201
Homework 3 10/11 10/24 10/26 [Handout]
Fri 10/18, 12-1pm, in BA3201
Mon 10/21, 11am-noon, in BA3201
Wed 10/23, 2-4pm, in BA3289
Thu 10/24, 11am-noon, in BA3289
Homework 4 11/1 11/14 [Handout]
[Code and Data]
Fri 11/8, 12-1pm, in BA3201
Fri 11/8, 6-7pm, in BA3201
Mon 11/11, 11am-noon, in BA3201
Mon 11/11, 2-4pm, in BA3201
Thu 11/14, 11am-noon, in BA3201
Homework 5 11/15 11/28 [Handout]
Wed 11/20, 2-3pm, in BA3201
Mon 11/25, 11am-noon, in BA3201
Wed 11/27, 3-4pm, in BA3201
Wed 11/27, 6-7pm, in BA3201
Thu 11/28, 11am-noon, in BA3201

Tests

The course will have a midterm and a final exam.

The midterm will be held from 4:10pm to 5:40pm on Wednesday, Oct. 30, in the Health Sciences building, room 610. See Lecture 6 slides for more information. You might find the following practice exams helpful:

There will be office hours for mid-term according to the following schedule:
Fri 10/25, 12-1pm, in BA3201
Fri 10/25, 6-7pm, in BA3201
Mon 10/28, 11am-noon, in BA3201
Tue 10/29, 2-4pm, in BA3201
Wed 10/30, noon-1pm, in BA1190 (lecture room)

Here are the midterm questions and solutions.

The final exam will be held from 3pm to 6pm on Tuesday, Dec. 17, in the Banting Institute, room 131.

There will be office hours for final exam according to the following schedule:
Mon 12/2, 12-1pm, in BA5287
Tue 12/3, 12-1pm, in BA3289
Wed 12/4, 11-12am, in BA3201
Thu 12/5, 5-7pm, in BA3201
Fri 12/6, 12-1pm, in BA3201
There will be no office hours the week before the exam (Dec 9-13) as TAs are away attending the NeurIPS conference. Please ask your questions on Piazza.

Here is the Fall 2018 final, for practice. (This one was a bit on the hard side. Question 4 was not covered this year.)

Lectures

Here is a tentative schedule, which will likely change as the course goes on.

Suggested readings are just that: resources we recommend to help you understand the course material. They are not required, i.e. you are only responsible for the material covered in lecture.

ESL = The Elements of Statistical Learning, by Hastie, Tibshirani, and Friedman.
MacKay = Information Theory, Inference, and Learning Algorithms, by David MacKay.
Barber = Bayesian Reasoning and Machine Learning, by David Barber.
Bishop = Pattern Recognition and Machine Learning, by Chris Bishop.
Sutton and Barto = Reinforcement Learning: An Introduction, by Sutton and Barto.
Goodfellow = Deep Learning, by Goodfellow, Bengio, and Courville.

Topic(s) Dates Slides Suggested Readings
Lecture 1 Introduction
Nearest Neighbours
9/11, 9/12 [Slides]

ESL: Chapters 1, 2.1-2.3, and 2.5
Metacademy: K nearest neighbors

Lecture 2 Decision Trees
Ensembles
9/18, 9/19 [Slides]

ESL: 9.2, 2.9, 8.7, 15
Metacademy: decision trees, entropy, mutual information, bias/variance decomposition, bagging, random forests

Lecture 3 Linear Regression
Linear Classifiers
9/25, 9/26 [Slides]

Bishop: 3.1, 4.1, 4.3
Course notes: linear regression, linear classifiers
Metacademy: linear regression, closed-form solution, gradient descent, ridge regression

Lecture 4 Softmax Regression
SVMs
Boosting
10/2, 10/3 [Slides]

Bishop: 7.1, 14.3
Course notes: logistic regression, optimization, SVMs and boosting

Lecture 5 Neural Networks 10/9, 10/10 [Slides]

Bishop: 5.1-5.3
Course notes: multilayer perceptrons, backprop

Lecture 6 Convolutional Networks 10/16, 10/17 [Slides]

Course Notes: conv nets, image classification
Goodfellow, sections 9.1-9.5

Lecture 7 PCA
K-Means
Maximum Likelihood
10/23, 10/24 [Slides] Bishop: 12.1, 9.1
Lecture 8 Probabilistic Models 10/30, 10/31 [Slides]

Bishop: 2.1-2.3, 4.2
MacKay: chapters 21, 23, 24
Course notes: probabilistic models

Lecture 9 Expectation-Maximization 11/6, 11/7 [Slides]

Bishop: 9.2-9.4
Barber: 20.1-20.3
Course notes: mixture models

Lecture 10 Reinforcement Learning 11/13, 11/14 [Slides]

Sutton and Barto: 3, 4.1, 4.4, 6.1-6.5

Lecture 11 Differential Privacy 11/20, 11/21 [Slides] Dwork and Roth, 2014. The Algorithmic Foundations of Differential Privacy. Chapters 2, 3.1-3.5.
Lecture 12 Algorithmic Fairness 11/27, 11/28 [Slides]

Barocas, Hardt, and Narayanan. Fairness and Machine Learning. Chapters 1 and 2.

Zemel et al., 2013. Learning fair representations.

Louizos et al., 2015. The variational fair autoencoder.

Hardt et al., 2016. Equality of opportunity in supervised learning.

Tutorials

The tutorial schedule and materials will be posted as the course goes on.
Topic Dates Materials
Tutorial 1 NumPy review, K-Nearest-Neigbors 9/11, 9/12 [ipynb]
Reviews:
[Linear algebra slides], [NumPy basics], [ipynb ex1],
[SVD slides], [ipynb SVD], [ipynb ex2]
Tutorial 2 eigendecompositions, SVD, basic information theory 9/18, 9/19 [Slides]
Tutorial 3 Gradient Descent 9/25, 9/26 [Slides]
[Lecture ipynb]
[Worksheet ipynb] [Convexity]
Tutorial 4 Random Forests and XGBoost 10/2, 10/3 [Slides]
Tutorial 5 Autograd + PyTorch 10/9, 10/10 [Autograd ipynb] [Pytorch ipynb]
Tutorial 6 Convnets 10/16, 10/17 [CNNs ipynb]
Tutorial 7 Mid-term Review 10/23, 10/24 [Slides]
- Mid-term, No Tutorials 10/30, 10/31
Tutorial 8 Probabilistic models, Bayesian Inference, Pyro 11/6, 11/7 [ipynb]
Tutorial 9 Reinforcement Learning 11/13, 11/14 [Slides]
Tutorial 10 Reinforcement Learning 2 11/20, 11/21 [Slides] [ipynb]
Tutorial 11 Final exam Review 11/27, 11/28 [Slides]

Paper Readings

5% of your total mark is allocated to reading a set of classic machine learning papers. We hope these papers are both interesting and understandable given what you learn in this course. The 5 points are allocated on an honor system; at the end of the term, you'll check a box to indicate that you've done the readings. You don't need to hand anything in, and the readings will not be tested on the exam.
  1. P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. [pdf]
  2. A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. NIPS 2012. [pdf]
  3. R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. NIPS 2007. [pdf]
  4. B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Research, 1997. [pdf]
  5. V. Mnih et al. Human-level control through deep reinforcement learning. Nature, 2015. [article]
  6. M. Hardt, E. Price, and N. Srebro. Equality of opportunity in supervised learning. NIPS 2016. [short version] [long version (optional)]

Computing Resources

For the homework assignments, we will use Python, and libraries such as NumPy, SciPy, and scikit-learn. We will use Python 3. You have two options: