Machine learning (ML) is a set of techniques that allow computers to learn from data and experience, rather than requiring humans to specify the desired behaviour by hand. ML has become increasingly central both in statistics as an academic discipline, and in the data science industry. This course provides a broad introduction to commonly used ML methods, as well as the key statistical concepts underlying ML. It serves as a foundation for more advanced courses, such as STA414 (Statistical Methods for Machine Learning II).
We will cover statistical methods for supervised and unsupervised learning from data: training error, test error and cross-validation; classification, regression, and logistic regression; principal components analysis; stochastic gradient descent; decision trees and random forests; k-means clustering and nearest neighbour methods. Computational tutorials will support the efficient application of these methods.
Unless otherwise specified, lectures and tutorials will be held synchronously, either online via Zoom or in-person. There will be two mandatory tests held during the scheduled class time. Please see the syllabus for detailed policies (marking, lateness, etc.) and attendance instructions.
Instructor | Chris Maddison |
Office Hours | Monday 3-4PM and Friday 12-1PM via Zoom |
sta314@utoronto.ca |
Section | Time | Location |
---|---|---|
LEC0201 | Friday 10-12PM | Online - Zoom |
LEC0101 | Monday 1-3PM | Online - Zoom |
Students are enrolled in tutorial groups based on their stated preferences for in-person vs. online. Tutorial groups can be checked on Quercus via the People tab. Do not attend an in-person tutorial section unless you are enrolled in that section on Quercus.
Time | Type | Location |
---|---|---|
Friday 1-2PM | Online | Online - Zoom |
Friday 1-2PM | In-person | BA 1230 |
Monday 4-5PM | Online | Online - Zoom |
Monday 4-5PM | In-person | AB 107 |
The course will have four homeworks, due at 11:59pm on Thursdays on the week that they are due. They will be submitted through Quercus. This is a tentative schedule, and any changes will be announced.
# | Out | Due | Materials | General Office Hours | Python Office Hours | Credit |
---|---|---|---|---|---|---|
1 | Sep. 17 | Sep. 30 | [handoutV3] [code] [notebook] |
24 Sept. 3-5PM 24 Sept. 7-9PM 28 Sept. 9-11AM 28 Sept. 2-4PM 29 Sept. 3-5PM |
24 Sept. 8-9PM 27 Sept. 5-6PM |
15% |
2 | Oct. 1 | [handoutV2] [code] [notebook] |
6 Oct. 2-6PM 8 Oct. 2-3PM (Prof. Maddison) 8 Oct. 5-7PM 11 Oct. 6-8PM 12 Oct. 9AM-12PM 13 Oct. 11AM-12PM (Prof. Maddison) 13 Oct. 1-5PM |
5 Oct. 1-3PM 12 Oct. 5-8PM |
15% | |
3 | [handout] [code] [q1 notebook] [q3 notebook] |
4 Nov. 3-5PM, 7-8PM 5 Nov. 3-5PM 8 Nov. 6-8PM 9 Nov. 12-1PM, 1-3PM, 6-8PM 11 Nov. 2-4PM 12 Nov. 7-8PM |
2 Nov. 5-6PM 5 Nov. 5-6PM 9 Nov. 5-6PM 11 Nov. 6-8PM |
15% | ||
4 | [handout] [code] [notebook] |
16 Nov. 3-6PM 17 Nov. 7-9PM 19 Nov. 3-4PM 23 Nov. 3-5PM, 5-7PM 24 Nov. 6-9PM 26 Nov. 3-5PM, 5-7PM 29 Nov. 9-11AM |
19 Nov. 7-9PM 24 Nov. 9-12AM |
15% |
The course will have two tests, each with a duration of 1 hour and held during the normal class time. You must take the test with your assigned section, unless you have prior permission from the instructor. Please note, the lecture schedule on both days will be somewhat unusual; see details below.
# | Friday Section | Monday Section | Material Covered | Review Office Hours | Credit |
---|---|---|---|---|---|
1 | Oct. 22 Test 10-11AM Lecture 11-1PM |
Oct. 25 Test 1-2PM Lecture 2-4PM |
Lec 01 - Lec 06 | 19 Oct. 6-7PM 20 Oct. 3-5PM 21 Oct. 2-4PM |
20% |
2 | Dec. 3 Test 10-11AM Lecture 12-2PM |
Dec. 6 Test 1-2PM Lecture 2-4PM |
Lec 01 - Lec 10 emphasis on Lec 7 - Lec 10 |
1 Dec. 1-3PM, 4-5PM 2 Dec. 4-6PM |
20% |
Please be aware that these are from a different series of courses that may vary in difficulty and may have covered different material. Nevertheless, you should be able to get a sense for the style of questions that I may ask, and it will be very good practice to work through these.
This is a preliminary schedule; it may change throughout the term. The suggested readings (see legend below) are completely optional, but recommended.
Dates | Lecture Topic |
Lecture Slides |
Tutorial | Suggested Readings |
---|---|---|---|---|
Sept. 10 Sept. 13 |
Introduction, supervised learning, & k-NN | [slides] [notes] |
Probability review [slides] |
Preliminaries ESL 1. ESL 2.1-2.3, 2.5 |
Sept. 17 Sept. 20 |
Decision Trees | [slides] [notes] |
Linear algebra I & NumPy basics [slides] [presentation notebook] [worksheet notebook] |
ESL 9.2 LTFP 2.1-2.3 |
Sept. 24 Sept. 27 |
Bias-Variance Decomposition | [slides] [notebook] |
Bias-Variance & Info. Theory [worksheet] [Q1 solution] |
Generalization ESL 2.9, 8.7 PRML 3.2 |
Oct. 1 Oct. 4 |
Ensembles & Linear Regression | [slides] | Optimization & gradient descent [slides] [presentation notebook] [worksheet notebook] |
Linear Regression Calculus ESL 3.1-3.2 ESL 4.1-4.2, 4.4 PRML 4.1, 4.3 |
asynch. delivery |
Linear classification | [slides] | None (Thanksgiving) | Optimization PRML 4.1.2 |
Oct. 15 Oct. 18 |
Linear classification II | [slides] | Midterm Review [slides] |
ESL 12.1-12.2 ESL 10.1-10.5 |
Oct. 22 Oct. 25 |
Unsupervised learning & k-Means | [slides] | None | PRML 9.1 |
Oct. 29 Nov. 1 |
Principal Component Analysis | [slides] | Linear algebra II [slides] [presentation notebook] [worksheet notebook] |
PRML 12.1 |
Nov. 5 Nov. 15 |
Matrix factorization & probabilistic models | [slides] | PCA in practice [notes] |
ESL 14.5.1 |
Nov. 19 Nov. 22 |
Probabilistic models | [slides] | Multivariate Gaussians [notes] |
ESL 2.6.3, 6.6.3, 4.3.0 |
Nov. 26 Nov. 29 |
Bayesian linear regression & Probabilistic PCA | [slides] [ppca.py] |
Final Review [notes] |
PRML 3.3, 12.2 |
Dec. 3 Dec. 6 |
AlphaGo & the frontiers of ML | [slides] | None |
For the homework assignments, we will use Python, and libraries such as NumPy, SciPy, and scikit-learn. We will not be expecting you to know advanced Python programming, however we will expect that you are able to do the following.
There are a number of great Python tutorials on the web.
There are a few options for running Python yourself.
The easiest option is probably to install everything yourself on your own machine.
If you don’t already have python, install it. We recommend using Anaconda. You can also install python directly using the instructions here.
conda create --name sta314
source activate sta314
pip install scipy numpy autograd matplotlib jupyter sklearn