Machine learning (ML) is a set of techniques that allow computers to learn from data and experience, rather than requiring humans to specify the desired behaviour by hand. ML has become increasingly central both in statistics as an academic discipline, and in the data science industry. This course provides a broad introduction to commonly used ML methods, as well as the key statistical concepts underlying ML. It serves as a foundation for more advanced courses, such as STA414 (Statistical Methods for Machine Learning II).
We will cover statistical methods for supervised and unsupervised learning from data: training error, test error and cross-validation; classification, regression, and logistic regression; principal components analysis; stochastic gradient descent; decision trees and random forests; k-means clustering and nearest neighbour methods. Computational tutorials will support the efficient application of these methods.
The syllabus contains all of the course policies. Unless otherwise specified, the lectures, tutorials, office hours, midterm and final exam will be delivered in-person. There will be one mandatory midterm test held during the scheduled class time, and we will have a FAS proctored final examination.
Instructor | Chris Maddison |
Email the professor | sta314-f25-prof@cs.toronto.edu |
Email the entire course staff | sta314-f25-tas@cs.toronto.edu |
There are two sections of this course being offered this term. Instructor office hours will be held during the third hour of the lecture time slot. You can attend any of the office hours, but please attend your assigned lecture and tutorial section. Room information should be available on ACORN.
Students enrolled in LEC0101 must enroll in one of TUT0101-0104. Students enrolled in LEC5101 must enroll in one of TUT0201-0204. Note that the Monday tutorial sections correspond to lectures given in the preceding week, as this term starts on a Tuesday.
Section | Lecture | Tutorial | Instructor Office Hours |
---|---|---|---|
LEC0101 | Wed 11AM-1PM in MP |
TUT0101 Mon 11AM-12PM in CR TUT0102 Mon 11AM-12PM in CR TUT0103 Mon 11AM-12PM in CR |
Mon 10-11AM in AH |
LEC5101 | Tue 5-7PM in AH |
TUT0201 Wed 5-6PM in HS TUT0202 Wed 5-6PM in HA TUT0204 Wed 5-6PM in MS |
Tue 7-8PM in AH |
We will be hosting TA office hours throughout the term to support you as you prepare your homeworks or as you review for exams. Every week the office hours will be mostly focused on the current assessment, but you can ask any question you want to ask about the course. This is a preliminary schedule and may change throughout the term. Be sure to check this schedule for the most up-to-date information. If attendance is low, we reserve the right to cancel some of the hours later in the term.
TA office hours are held at the Sidney Smith Stats. Aid Centre or Zoom (see Quercus home page for the Zoom link).
Legend:
IP = In Person (Stats Aid Centre in Sid. Smith), V = Virtual (Zoom);
G = General Questions, P = Python Focused
Week and Focus | Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|
Sept 15 Focus: HW1 |
4-5PM (IP, P) | 2-3PM (V, G) 3-4PM (IP, G) |
|||
Sept 22 Focus: HW2 |
9-10AM (V, G) | 10-11AM (V, G) | |||
Sept 29 Focus: HW2 |
10-11AM (IP, P) | 2-3PM (V, G) | 9-10AM (V, G) 4-5PM (IP, P) |
3-4PM (IP, G) | |
Oct 6 Focus: Midterm Review |
10-11AM (IP, G) | 2-3PM (V, G) | 9-10AM (V, G) | 10-11AM (V, G) | |
Oct 13 Focus: Midterm Review |
4-5PM (IP, G) | 3-4PM (V, G) | |||
Oct 20 Focus: HW3 |
2-3pm (V, G) | 4-5pm (IP, G) | 3-4pm (IP, P) | ||
Oct 27 Focus: HW3 |
10-11am (IP, G) 4-5pm (IP, G) |
9-10am (V, G) 4-5pm (IP, P) |
10-11am (V, P) | ||
Nov 3 Focus: HW4 |
3-4pm (IP, G) | ||||
Nov 10 Focus: HW4 |
10-11AM (IP, P) 4-5PM (IP, G) 5-6PM (IP, G) |
4-5PM (IP, P) | 3-4PM (IP, P) | ||
Nov 17 |
|||||
Nov 24 Focus: Exam Review |
4-5PM (IP, G) | 2-3PM (V, G) | 9-10AM (V, G) | 10-11AM (V,G) | |
Dec 1 Focus: Exam Review |
10-11AM (IP, G) 4-5PM (IP, G) |
4-5PM (IP, G) | 3-4PM (IP, G) |
The course will have four homeworks, due at 11:59pm on Mondays on the week that they are due. They will be submitted through Crowdmark on Quercus. We will be hosting TA office hours to help you prepare your assignments. See the Course Information section for timing details. This is a tentative schedule, and any changes will be announced.
# | Out | Due | Materials | Credit |
---|---|---|---|---|
1 | Sep 8 | Sep 22 | [handout] [code] |
7.5% |
2 | Sep 22 | Oct 6 | [handout] [code] |
7.5% |
3 | Oct 20 | Nov 3 | TBD | 7.5% |
4 | Nov 3 | Nov 17 | TBD | 7.5% |
The course will have one midterm test held during the normal class time and a final exam proctored by FAS. The final exam schedules will be available on the A&S page. We will be hosting TA office hours to help you review for the tests and exams.
For both the midterm and final exam, you will be allowed to bring one double-sided aid-sheet (8.5" by 11").
You must take the tests with your assigned section, unless you have prior permission from the instructor. Please note, the lecture schedule on both days will be somewhat unusual; see details below.
Test | LEC0101 Date | LEC5101 Date | Material Covered | Credit |
---|---|---|---|---|
Midterm | Oct. 15 Test 11AM-12PM Lecture 12-1PM |
Oct. 14 Test 5-6PM Lecture 6-7PM |
Lec 01 - Lec 05 | 30% |
Final | See the A&S page | Lec 01 - Lec 11 | 40% |
Please be aware that these are from a different series of courses that may vary in difficulty and may have covered different material. Nevertheless, you should be able to get a sense for the style of questions that I may ask, and it will be very good practice to work through these.
This is a preliminary schedule; it may change throughout the term. The suggested readings (see legend below) are completely optional, but recommended.
Week | Lecture | Tutorial | Assessment Due |
Suggested Readings |
---|---|---|---|---|
Sep 2–Sep 8 | Introduction & supervised learning [slides] |
Probability review [slides] |
Preliminaries ESL 1. ESL 2.1-2.3, 2.5 |
|
Sep 9–Sep 15 | Decision trees [slides] |
Linear algebra & Numpy basics [slides] [lecture notebook] [worksheet notebook] |
ESL 9.2 LTFP 2.1-2.3 |
|
Sep 16–Sep 22 | Bias-variance decomposition [slides] [notebook] |
Bias-variance & info. theory [worksheet] |
HW1 | Generalization ESL 2.9, 8.7 PRML 3.2 |
Sep 23–Sep 29 | Ensembles & linear regression [slides] |
Optimization & gradient descent [slides] [lecture notebook] [worksheet notebook] |
Linear Regression Calculus ESL 3.1-3.2 ESL 4.1-4.2, 4.4 PRML 4.1, 4.3 |
|
Sep 30–Oct 6 | Linear classification [slides] |
No tutorial | HW2 | Optimization PRML 4.1.2 |
Oct 7–Oct 13 | Linear classification II | Midterm Review (async.) [slides] |
ESL 12.1-12.2 ESL 10.1-10.5 |
|
Oct 14–Oct 20 | Unsupervised learning | No tutorial | Midterm | PRML 9.1 |
Oct 21–Nov 3* | Principal Component Analysis | Linear algebra II | HW3 | PRML 12.1 |
Nov 4–Nov 10 | Matrix factorization & probabilistic models | PCA in practice | ESL 14.5.1 | |
Nov 11–Nov 17 | Probabilistic models | Multivariate Gaussians | HW4 | ESL 2.6.3, 6.6.3, 4.3.0 |
Nov 18–Nov 24 | Bayesian linear regression & probabilistic PCA | Final Review | PRML 3.3, 12.2 | |
Nov 25–Dec 1 | The frontiers of ML | No tutorial |
*Reading week is Oct 27–Oct 31, therefore the Nov 3 tutorial corresponds to lectures given on Oct 22.
For the homework assignments, we will use Python, and libraries such as NumPy, SciPy, and scikit-learn. We will not be expecting you to know advanced Python programming, however we will expect that you are able to do the following.
There are a number of great Python tutorials on the web.
There are a few options for running Python yourself.
The easiest option is probably to install everything yourself on your own machine.
If you don’t already have python, install it. We recommend using Anaconda. You can also install python directly using the instructions here.
conda create --name sta314
source activate sta314
pip install scipy numpy autograd matplotlib jupyter sklearn