CSC 311 Fall 2021: Introduction to Machine Learning

Overview

Machine learning (ML) is a set of techniques that allow computers to learn from data and experience, rather than requiring humans to specify the desired behaviour by hand. ML has become increasingly central both in AI as an academic field, and in industry. This course provides a broad introduction to some of the most commonly used ML algorithms. It also serves to introduce key algorithmic principles which will serve as a foundation for more advanced courses, such as CSC412/2506 (Probabilistic Learning and Reasoning) and CSC413/2516 (Neural Networks and Deep Learning).

We start with nearest neighbors, the canonical nonparametric model. We then turn to parametric models: linear regression, logistic regression, softmax regression, and neural networks. We then move on to unsupervised learning, focusing in particular on probabilistic models, but also principal components analysis and K-means. Finally, we cover the basics of reinforcement learning.

Announcements

Where and When

Unfortunately, due to the evolving COVID-19 situation, the specific class format is subject to change. As of this writing (9/2), we are required to have an in-person component to this class; we've decided to have in-person lectures. Fortunately, we're not required to force you to show up (with the exception that the final exam may need to be in person - more on that below). We are designing the course so that you do not need to show up in person if you don't want to, and so that not showing up will not put you at a disadvantage. We'll do everything we can to maintain a safe environment while following the letter of University policy.

The first two weeks of class will be entirely virtual, to accommodate international arrivals. After that point, the current plan is for lectures to be held in-person. As this would require in-person gatherings well beyond the size currently allowed by Ontario public health authorities for most purposes, we can make few assumptions about what will be permitted and safe by the time in-person instruction is set to begin. Please continue monitoring Quercus and this course web page for further updates.

Tutorials and office hours will be held virtually throughout the term. Students are encouraged to attend both lecture and tutorial each week, but attendance won't be taken.

We will also accommodate students who are unable to attend in person. Specifically, one of the four lecture sections will be held virtually. Additionally, students will have access to the lecture videos from Fall 2020, which follows roughly the same schedule as this year's class.

Most terms, we allow students to attend sections other than their assigned ones. However, this year, if you choose to attend in-person, you must attend your assigned section, since overcrowding during a pandemic would create an unsafe situation. Note that different sections are held simultaneously in different rooms, so make sure you are in the correct room. It's likely that many of you will want to switch between virtual and in-person formats. Here are the policies on that:

Specifics about online delivery will be sent to enrolled students through Quercus.

Section Lecture Time Lecture Room Instructor Tutorial Time
LEC0101, LEC2001 Friday 11:00-13:00 KP 108 Roger Grosse Friday 15:00-16:00
LEC0102 Friday 11:00-13:00 Virtual Guodong Zhang Friday 15:00-16:00
LEC0201 Thursday 16:00-18:00 ES 1050 Roger Grosse Thursday 19:00-20:00
LEC0202 Thursday 16:00-18:00 SS 2102 Rahul Krishnan Thursday 19:00-20:00

Modulo capacity constraints for the in-person sections, it will be up to you to decide whether the in-person lecture experience is worth the risk. We can't make the decision for you, but do consider that (as of this writing) most in-person gatherings of more than 25 people are banned, and that not only is the lecture a crowded indoor setting, but most of your fellow students will have been attending lots of other in-person lectures. In keeping with the theme of the course, where we use data to make decisions under uncertainty, you may find the MicroCovid Calculator helpful in reasoning about your own risk tolerance.

Course videos and materials belong to your instructor, the University, and/or other source depending on the specific facts of each situation, and are protected by copyright. In this course, you are permitted to download session videos and materials for your own academic use, but you should not copy, share, or use them for any other purpose without the explicit permission of the instructor.

For questions about recording and use of videos in which you appear please contact the instructors.

Teaching Staff

Communication

We will use Piazza for the course forum.

All office hours will take place via Gather Town. Details will be communicated via Quercus.

Instructors

Roger Grosse Rahul Krishnan Guodong Zhang
Office Hours (Virtual) Monday 10am-12 Monday 6-7pm Monday 8-9pm
Email Instructors csc311-f21-profs@cs.toronto.edu

Teaching Assistants

Homework Test Prep Project
Office Hours (virtual) See homework schedule below. Midterm:
Tuesday 10/19 2-4pm
Wednesday 10/20 7-9pm
Final:
Tuesday 12/7, 3:30-4:30pm
Wednesday 12/8, 3-5pm, 8-9pm
Thursday 12/9 9-11am
Thursday 11/18 2-4pm
Friday 11/19 1-6pm
Wednesday 11/24, 2-6pm
Thursday 11/25, 2-4pm
Friday 11/26, 8am-noon
Tuesday 11/30, 3-5pm
Wednesday 12/1, 1-5pm

Marking Scheme

We will use the following marking scheme:

Homeworks

Homeworks will generally be due at 11:59pm on Wednesdays, and submitted through MarkUs. Please see the course information handout for detailed policies (marking, lateness, etc.). The detailed schedule will be posted soon.

# Out Due Materials TA Office Hours
1 9/16 9/29 [handout]
[clean_fake.txt]
[clean_real.txt]
[clean_script.py]
Thursday, 9/23, 8am-noon
Monday, 9/27, 3:30-5:30pm
Tuesday, 9/28, 4-6pm and 8:30-10:30pm
2 9/30 10/13 [handout]
[starter code]
Thursday, 10/7, 2-4pm
Friday, 10/8, 4-6pm
Monday, 10/11, 1-3pm
Tuesday, 10/12, 9-11am and 4-6pm
Wednesday, 10/13, 3-5pm
3 10/14 11/3 [handout]
[code and data]
Thursday 10/28, 9:30am-11am
Thursday 10/28, 3-4pm
Friday 10/29, 12:30pm-2pm
Friday 10/29, 2pm-5pm
Monday 11/1, 12pm-2pm
Tuesday 11/2, 2-4pm
Wednesday 11/3, 2-3pm
5% of your course grade comes from minor assignments associated with the ethics module. All of these assignments will be short, and we expect that most of you will receive full marks.
Assignment Due % final grade Marking
Initial Survey 10/29 0.5% full credit for submitting
Class Participation (Nov. 18/19) N/A 2% You get these 2 points automatically.
Reflections on In-Class Activity 11/27 2% A good-faith effort receives full credit.
Final Survey 12/15 0.5% Full credit for submitting.

Final Project

For your final project, you will attempt to solve a Netflix-Competition-style matrix completion problem. The goal is to predict, in the context of a personalized education platform, whether a student will correctly answer a diagnostic question. In groups of 2-3, you will implement and evaluate several algorithms from the course, and then propose and evaluate an extension to one of these algorithms. This will hopefully be a fun exercise that gives you a feel for what you'd do on a daily basis as a data scientist or machine learning engineer. Here is the project handout and starter code. The final report is due December 3.

Schedule

This is a tentative schedule, which will likely change as the course goes on.

Suggested readings are optional; they are resources we recommend to help you understand the course material. All of the textbooks listed below are freely available online.

Bishop = Pattern Recognition and Machine Learning, by Chris Bishop.
ESL = The Elements of Statistical Learning, by Hastie, Tibshirani, and Friedman.
MacKay = Information Theory, Inference, and Learning Algorithms, by David MacKay.
Barber = Bayesian Reasoning and Machine Learning, by David Barber.
Sutton and Barto = Reinforcement Learning: An Introduction, by Sutton and Barto.

# Dates Topic Materials Suggested Readings
1 9/9, 9/10 Lecture: Introduction, Nearest Neighbours
Tutorial: Probability Review
Lecture: [Slides]
Tutorial: [Slides]

ESL: 1, 2.1-2.3, 2.5

2 9/16, 9/17 Lecture: Decision Trees, Bias-Variance Decomposition
Tutorial: Linear Algebra Review
Lecture: [Slides]
Tutorial: [Slides] [Worksheet] [Solutions]
Bishop: 3.2
ESL: 2.9, 9.2
Course notes: Generalization
3 9/23, 9/24 Lecture: Linear Models I
Tutorial: Bias-Variance Decomposition
Lecture: [Slides]
Tutorial: [Worksheet]
Bishop: 3.1
ESL: 3.1 - 3.2
Course notes: Linear Regression, Calculus
4 9/30, 10/1 Lecture: Linear Models II
Tutorial: Optimization
Lecture: [Slides]
Tutorial: [Slides] [Worksheet] [Solutions]
Bishop: 4.1, 4.3
ESL: 4.1-4.2, 4.4, 11
Course notes: Linear Classifiers, Training a Classifier
5 10/7, 10/8 Lecture: Linear Models III, Neural Nets I
Tutorial: PyTorch
Lecture: [Slides]
Tutorial: [Colab]
6 10/14, 10/15 Lecture: Neural Networks II
Tutorial: Midterm Review
Lecture: [Slides]
Tutorial: [Slides]
Bishop: 5.1-5.3
Course notes: Multilayer Perceptrons, Backpropagation
7 10/21, 10/22 Lecture: Probabilistic Models
Tutorial: midterm test
Lecture: [Slides]
ESL: 2.6.3, 6.6.3, 4.3.0
MacKay: 21, 23, 24
Course notes: Probabilistic Models
8 10/28, 10/29 Lecture: Multivariate Gaussians, GDA
Tutorial: Linear Algebra Review II: Eigenvalues, SVD
Lecture: [Slides]
Tutorial: [Slides]
Bishop: 12.1
9 11/4, 11/5 Lecture: Principal Component Analysis, Matrix Completion
Tutorial: Final Project Overview
Lecture: [Slides]
Tutorial: [Slides] [Colab]
ESL: 14.5.1
10 11/18, 11/19 Lecture: Embedded Ethics Unit on Recommender Systems
Tutorial: no tutorial this week
Lecture: [Part 1] [Part 2] Recommended: Beyond Engagement
11 11/25, 11/26 Lecture: k-Means, EM Algorithm
Tutorial: EM Algorithm
Lecture: [Slides]
Tutorial: [Slides]
MacKay: 20
Bishop: 9
Barber: 20.1-20.3
Course notes: Mixture Modeling
12 12/2, 12/3 Lecture: Reinforcement learning
Tutorial: Final Exam Review
Lecture: [Slides]
Tutorial: [Slides]
Sutton and Barto: 3, 4.1, 4.4, 6.1-6.5

Computing Resources

For the homework assignments, we will use Python 3, and libraries such as NumPy, SciPy, and scikit-learn. You have two options:
  1. The easiest option is probably to install everything yourself on your own machine.

    1. If you don't already have python 3, install it.

      We recommend some version of Anaconda (Miniconda, a nice lightweight conda, is probably your best bet). You can also install python directly if you know how.

    2. Optionally, create a virtual environment for this class and step into it. If you have a conda distribution run the following commands:

          conda create --name csc311
          source activate csc311
    3. Use pip to install the required packages

          pip install scipy numpy autograd matplotlib jupyter sklearn
  2. All the required packages are already installed on the Teaching Labs machines.