STA314H1F: Statistical Methods for Machine Learning I

Overview

Machine learning (ML) is a set of techniques that allow computers to learn from data and experience, rather than requiring humans to specify the desired behaviour by hand. ML has become increasingly central both in statistics as an academic discipline, and in the data science industry. This course provides a broad introduction to commonly used ML methods, as well as the key statistical concepts underlying ML. It serves as a foundation for more advanced courses, such as STA414 (Statistical Methods for Machine Learning II).

We will cover statistical methods for supervised and unsupervised learning from data: training error, test error and cross-validation; classification, regression, and logistic regression; principal components analysis; stochastic gradient descent; decision trees and random forests; k-means clustering and nearest neighbour methods. Computational tutorials will support the efficient application of these methods.

Announcements

Course Information

Unless otherwise specified, lectures and tutorials will be held synchronously, either online via Zoom or in-person. There will be two mandatory tests held during the scheduled class time. Please see the syllabus for detailed policies (marking, lateness, etc.) and attendance instructions.

Staff

Instructor Chris Maddison
Office Hours Monday 3-4PM and Friday 12-1PM via Zoom
Email sta314@utoronto.ca

Lectures

Section Time Location
LEC0201 Friday 10-12PM Online - Zoom
LEC0101 Monday 1-3PM Online - Zoom

Tutorials

Students are enrolled in tutorial groups based on their stated preferences for in-person vs. online. Tutorial groups can be checked on Quercus via the People tab. Do not attend an in-person tutorial section unless you are enrolled in that section on Quercus.

Time Type Location
Friday 1-2PM Online Online - Zoom
Friday 1-2PM In-person BA 1230
Monday 4-5PM Online Online - Zoom
Monday 4-5PM In-person AB 107

Homework

The course will have four homeworks, due at 11:59pm on Thursdays on the week that they are due. They will be submitted through Quercus. This is a tentative schedule, and any changes will be announced.

# Out Due Materials General Office Hours Python Office Hours Credit
1 Sep. 17 Sep. 30 [handoutV3]
[code]
[notebook]
24 Sept. 3-5PM
24 Sept. 7-9PM
28 Sept. 9-11AM
28 Sept. 2-4PM
29 Sept. 3-5PM
24 Sept. 8-9PM
27 Sept. 5-6PM
15%
2 Oct. 1 Oct. 14 Oct. 15 [handoutV2]
[code]
[notebook]
6 Oct. 2-6PM
8 Oct. 2-3PM (Prof. Maddison)
8 Oct. 5-7PM
11 Oct. 6-8PM
12 Oct. 9AM-12PM
13 Oct. 11AM-12PM (Prof. Maddison)
13 Oct. 1-5PM
5 Oct. 1-3PM
12 Oct. 5-8PM
15%
3 Oct. 29 Oct. 31 Nov. 11 Nov. 15 [handout]
[code]
[q1 notebook]
[q3 notebook]
4 Nov. 3-5PM, 7-8PM
5 Nov. 3-5PM
8 Nov. 6-8PM
9 Nov. 12-1PM, 1-3PM, 6-8PM
10 Nov. 2-4PM
11 Nov. 2-4PM
12 Nov. 7-8PM
2 Nov. 5-6PM
4 Nov. 5-6PM
5 Nov. 5-6PM
9 Nov. 5-6PM
11 Nov. 6-8PM
15%
4 Nov. 12 Nov. 13 Nov. 25 Nov. 29 [handout]
[code]
[notebook]
16 Nov. 3-6PM
17 Nov. 7-9PM
19 Nov. 3-4PM
23 Nov. 3-5PM, 5-7PM
24 Nov. 6-9PM
26 Nov. 3-5PM, 5-7PM
29 Nov. 9-11AM
19 Nov. 7-9PM
24 Nov. 9-12AM
15%

Tests

The course will have two tests, each with a duration of 1 hour and held during the normal class time. You must take the test with your assigned section, unless you have prior permission from the instructor. Please note, the lecture schedule on both days will be somewhat unusual; see details below.

# Friday Section Monday Section Material Covered Review Office Hours Credit
1 Oct. 22
Test 10-11AM
Lecture 11-1PM
Oct. 25
Test 1-2PM
Lecture 2-4PM
Lec 01 - Lec 06 19 Oct. 6-7PM
20 Oct. 3-5PM
21 Oct. 2-4PM
20%
2 Dec. 3
Test 10-11AM
Lecture 12-2PM
Dec. 6
Test 1-2PM
Lecture 2-4PM
Lec 01 - Lec 10
emphasis on Lec 7 - Lec 10
1 Dec. 1-3PM, 4-5PM
2 Dec. 4-6PM
20%

Practice Midterms

Please be aware that these are from a different series of courses that may vary in difficulty and may have covered different material. Nevertheless, you should be able to get a sense for the style of questions that I may ask, and it will be very good practice to work through these.

Lectures

This is a preliminary schedule; it may change throughout the term. The suggested readings (see legend below) are completely optional, but recommended.

Dates Lecture
Topic
Lecture
Slides
Tutorial Suggested
Readings
Sept. 10
Sept. 13
Introduction, supervised learning, & k-NN [slides]
[notes]
Probability review
[slides]
Preliminaries
ESL 1.
ESL 2.1-2.3, 2.5
Sept. 17
Sept. 20
Decision Trees [slides]
[notes]
Linear algebra I & NumPy basics
[slides]
[presentation notebook]
[worksheet notebook]
ESL 9.2
LTFP 2.1-2.3
Sept. 24
Sept. 27
Bias-Variance Decomposition [slides]
[notebook]
Bias-Variance & Info. Theory
[worksheet]
[Q1 solution]
Generalization
ESL 2.9, 8.7
PRML 3.2
Oct. 1
Oct. 4
Ensembles & Linear Regression [slides] Optimization & gradient descent
[slides]
[presentation notebook]
[worksheet notebook]
Linear Regression
Calculus
ESL 3.1-3.2
ESL 4.1-4.2, 4.4
PRML 4.1, 4.3
asynch.
delivery
Linear classification [slides] None (Thanksgiving) Optimization
PRML 4.1.2
Oct. 15
Oct. 18
Linear classification II [slides] Midterm Review
[slides]
ESL 12.1-12.2
ESL 10.1-10.5
Oct. 22
Oct. 25
Unsupervised learning & k-Means [slides] None PRML 9.1
Oct. 29
Nov. 1
Principal Component Analysis [slides] Linear algebra II
[slides]
[presentation notebook]
[worksheet notebook]
PRML 12.1
Nov. 5
Nov. 15
Matrix factorization & probabilistic models [slides] PCA in practice
[notes]
ESL 14.5.1
Nov. 19
Nov. 22
Probabilistic models [slides] Multivariate Gaussians
[notes]
ESL 2.6.3, 6.6.3, 4.3.0
Nov. 26
Nov. 29
Bayesian linear regression & Probabilistic PCA [slides]
[ppca.py]
Final Review
[notes]
PRML 3.3, 12.2
Dec. 3
Dec. 6
AlphaGo & the frontiers of ML [slides] None

Suggested References

Computing

Python Programming Language

For the homework assignments, we will use Python, and libraries such as NumPy, SciPy, and scikit-learn. We will not be expecting you to know advanced Python programming, however we will expect that you are able to do the following.

Python Tutorials

There are a number of great Python tutorials on the web.

Using Python

There are a few options for running Python yourself.