Machine learning (ML) is a set of techniques that allow computers to learn from data and experience, rather than requiring humans to specify the desired behaviour by hand. ML has become increasingly central both in statistics as an academic discipline, and in the data science industry. This course provides a broad introduction to commonly used ML methods, as well as the key statistical concepts underlying ML. It serves as a foundation for more advanced courses, such as STA414 (Statistical Methods for Machine Learning II).
We will cover statistical methods for supervised and unsupervised learning from data: training error, test error and cross-validation; classification, regression, and logistic regression; principal components analysis; stochastic gradient descent; decision trees and random forests; k-means clustering and nearest neighbour methods. Computational tutorials will support the efficient application of these methods.
The syllabus contains all of the course policies. Unless otherwise specified, the lectures, tutorials, office hours, midterm and final exam will be delivered in-person. There will be one mandatory midterm test held during the scheduled class time, and we will have a FAS proctored final examination.
Instructor | Chris Maddison |
Email the professor | sta314-f25-prof@cs.toronto.edu |
Email the entire course staff | sta314-f25-tas@cs.toronto.edu |
There are two sections of this course being offered this term. Office hours will be held during the third hour of the lecture time slot. You can attend any of the office hours, but please attend your assigned lecture and tutorial section. Room information should be available on ACORN.
Students enrolled in LEC0101 must enroll in one of TUT0101-0104. Students enrolled in LEC5101 must enroll in one of TUT0201-0204. Note that the Monday tutorial sections correspond to lectures given in the preceding week, as this term starts on a Tuesday.
Section | Lecture | Tutorial | Office Hours |
---|---|---|---|
LEC0101 | Wed 11AM-1PM in MP |
TUT0101 Mon 11AM-12PM in CR TUT0102 Mon 11AM-12PM in CR TUT0103 Mon 11AM-12PM in CR |
Mon 10-11AM in AH |
LEC5101 | Tue 5-7PM in AH |
TUT0201 Wed 5-6PM in HS TUT0202 Wed 5-6PM in HA TUT0204 Wed 5-6PM in MS |
Tue 7-8PM in AH |
The course will have four homeworks, due at 11:59pm on Mondays on the week that they are due. They will be submitted through Crowdmark on Quercus. We will be hosting TA office hours to help you prepare your assignments. This is a tentative schedule, and any changes will be announced.
Office hours are held at either the Sid. Smith Stats. Aid Centre or Zoom (see Quercus home page for the Zoom link)
# | Out | Due | Materials | General Office Hours | Python Office Hours | Credit |
---|---|---|---|---|---|---|
1 | Sep 8 | Sep 22 | [handout] [code] |
TBD |
|
7.5% |
2 | Sep 22 | Oct 6 | TBD | TBD | TBD | 7.5% |
3 | Oct 20 | Nov 3 | TBD | TBD | TBD | 7.5% |
4 | Nov 3 | Nov 17 | TBD | TBD | TBD | 7.5% |
The course will have one midterm test held during the normal class time and a final exam proctored by FAS. The final exam schedules will be available on the A&S page. We will be hosting TA office hours to help you review for the tests and exams.
For both the midterm and final exam, you will be allowed to bring one double-sided aid-sheet (8.5" by 11").
You must take the tests with your assigned section, unless you have prior permission from the instructor. Please note, the lecture schedule on both days will be somewhat unusual; see details below.
Office hours are held at either the Sid. Smith Stats. Aid Centre or Zoom (see Quercus home page for the Zoom link)
Test | LEC0101 Date | LEC5101 Date | Material Covered | Review Office Hours | Credit |
---|---|---|---|---|---|
Midterm | Oct. 15 Test 11AM-12PM Lecture 12-1PM |
Oct. 14 Test 5-6PM Lecture 6-7PM |
Lec 01 - Lec 06 | TBD | 30% |
Final | See the A&S page | Lec 01 - Lec 11 | TBD | 40% |
Please be aware that these are from a different series of courses that may vary in difficulty and may have covered different material. Nevertheless, you should be able to get a sense for the style of questions that I may ask, and it will be very good practice to work through these.
This is a preliminary schedule; it may change throughout the term. The suggested readings (see legend below) are completely optional, but recommended.
Date Span | Lecture | Tutorial | Assessment Due |
Suggested Readings |
---|---|---|---|---|
Sep 2–Sep 8 | Introduction & supervised learning [slides] |
Probability review [slides] |
Preliminaries ESL 1. ESL 2.1-2.3, 2.5 |
|
Sep 9–Sep 15 | Decision trees [slides] |
Linear algebra & Numpy basics [slides] [lecture notebook] [worksheet notebook] |
ESL 9.2 LTFP 2.1-2.3 |
|
Sep 16–Sep 22 | Bias-variance decomposition | Bias-variance & info. theory | HW1 | Generalization ESL 2.9, 8.7 PRML 3.2 |
Sep 23–Sep 29 | Ensembles & linear regression | Optimization & gradient descent | Linear Regression Calculus ESL 3.1-3.2 ESL 4.1-4.2, 4.4 PRML 4.1, 4.3 |
|
Sep 30–Oct 6 | Linear classification | No tutorial | HW2 | Optimization PRML 4.1.2 |
Oct 7–Oct 13 | Linear classification II | No tutorial | ESL 12.1-12.2 ESL 10.1-10.5 |
|
Oct 14–Oct 20 | Unsupervised learning | No tutorial | Midterm | PRML 9.1 |
Oct 21–Nov 3* | Principal Component Analysis | Linear algebra II | HW3 | PRML 12.1 |
Nov 4–Nov 10 | Matrix factorization & probabilistic models | PCA in practice | ESL 14.5.1 | |
Nov 11–Nov 17 | Probabilistic models | Multivariate Gaussians | HW4 | ESL 2.6.3, 6.6.3, 4.3.0 |
Nov 18–Nov 24 | Bayesian linear regression & probabilistic PCA | Final Review | PRML 3.3, 12.2 | |
Nov 25–Dec 1 | The frontiers of ML | No tutorial |
*Reading week is Oct 27–Oct 31, therefore the Nov 3 tutorial corresponds to lectures given on Oct 22.
For the homework assignments, we will use Python, and libraries such as NumPy, SciPy, and scikit-learn. We will not be expecting you to know advanced Python programming, however we will expect that you are able to do the following.
There are a number of great Python tutorials on the web.
There are a few options for running Python yourself.
The easiest option is probably to install everything yourself on your own machine.
If you don’t already have python, install it. We recommend using Anaconda. You can also install python directly using the instructions here.
conda create --name sta314
source activate sta314
pip install scipy numpy autograd matplotlib jupyter sklearn