STA314 Fall 2025: Statistical Methods for Machine Learning I

Overview

Machine learning (ML) is a set of techniques that allow computers to learn from data and experience, rather than requiring humans to specify the desired behaviour by hand. ML has become increasingly central both in statistics as an academic discipline, and in the data science industry. This course provides a broad introduction to commonly used ML methods, as well as the key statistical concepts underlying ML. It serves as a foundation for more advanced courses, such as STA414 (Statistical Methods for Machine Learning II).

We will cover statistical methods for supervised and unsupervised learning from data: training error, test error and cross-validation; classification, regression, and logistic regression; principal components analysis; stochastic gradient descent; decision trees and random forests; k-means clustering and nearest neighbour methods. Computational tutorials will support the efficient application of these methods.

Announcements

Course Information

The syllabus contains all of the course policies. Unless otherwise specified, the lectures, tutorials, office hours, midterm and final exam will be delivered in-person. There will be one mandatory midterm test held during the scheduled class time, and we will have a FAS proctored final examination.

Staff

Instructor Chris Maddison
Email the professor sta314-f25-prof@cs.toronto.edu
Email the entire course staff sta314-f25-tas@cs.toronto.edu

Lectures, Tutorials, Office Hours

There are two sections of this course being offered this term. Office hours will be held during the third hour of the lecture time slot. You can attend any of the office hours, but please attend your assigned lecture and tutorial section. Room information should be available on ACORN.

Students enrolled in LEC0101 must enroll in one of TUT0101-0104. Students enrolled in LEC5101 must enroll in one of TUT0201-0204. Note that the Monday tutorial sections correspond to lectures given in the preceding week, as this term starts on a Tuesday.

Section Lecture Tutorial Office Hours
LEC0101 Wed 11AM-1PM in MP TUT0101    Mon 11AM-12PM in CR
TUT0102    Mon 11AM-12PM in CR
TUT0103    Mon 11AM-12PM in CR
Mon 10-11AM in AH
LEC5101 Tue 5-7PM in AH TUT0201    Wed 5-6PM in HS
TUT0202    Wed 5-6PM in HA
TUT0204    Wed 5-6PM in MS
Tue 7-8PM in AH

Homework

The course will have four homeworks, due at 11:59pm on Mondays on the week that they are due. They will be submitted through Crowdmark on Quercus. We will be hosting TA office hours to help you prepare your assignments. This is a tentative schedule, and any changes will be announced.

Office hours are held at either the Sid. Smith Stats. Aid Centre or Zoom (see Quercus home page for the Zoom link)

# Out Due Materials General Office Hours Python Office Hours Credit
1 Sep 8 Sep 22 [handout]
[code]
TBD
  • 11AM-12PM Aid Centre
  • 12-1PM Zoom
7.5%
2 Sep 22 Oct 6 TBD TBD TBD 7.5%
3 Oct 20 Nov 3 TBD TBD TBD 7.5%
4 Nov 3 Nov 17 TBD TBD TBD 7.5%

Midterm and Final Exam

The course will have one midterm test held during the normal class time and a final exam proctored by FAS. The final exam schedules will be available on the A&S page. We will be hosting TA office hours to help you review for the tests and exams.

For both the midterm and final exam, you will be allowed to bring one double-sided aid-sheet (8.5" by 11").

You must take the tests with your assigned section, unless you have prior permission from the instructor. Please note, the lecture schedule on both days will be somewhat unusual; see details below.

Office hours are held at either the Sid. Smith Stats. Aid Centre or Zoom (see Quercus home page for the Zoom link)

Test LEC0101 Date LEC5101 Date Material Covered Review Office Hours Credit
Midterm Oct. 15
Test 11AM-12PM
Lecture 12-1PM
Oct. 14
Test 5-6PM
Lecture 6-7PM
Lec 01 - Lec 06 TBD 30%
Final See the A&S page Lec 01 - Lec 11 TBD 40%

Practice Midterms

Please be aware that these are from a different series of courses that may vary in difficulty and may have covered different material. Nevertheless, you should be able to get a sense for the style of questions that I may ask, and it will be very good practice to work through these.

Schedule

This is a preliminary schedule; it may change throughout the term. The suggested readings (see legend below) are completely optional, but recommended.

Date Span Lecture Tutorial Assessment
Due
Suggested
Readings
Sep 2–Sep 8 Introduction & supervised learning
[slides]
Probability review
[slides]
Preliminaries
ESL 1.
ESL 2.1-2.3, 2.5
Sep 9–Sep 15 Decision trees
[slides]
Linear algebra & Numpy basics
[slides]
[lecture notebook]
[worksheet notebook]
ESL 9.2
LTFP 2.1-2.3
Sep 16–Sep 22 Bias-variance decomposition Bias-variance & info. theory HW1 Generalization
ESL 2.9, 8.7
PRML 3.2
Sep 23–Sep 29 Ensembles & linear regression Optimization & gradient descent Linear Regression
Calculus
ESL 3.1-3.2
ESL 4.1-4.2, 4.4
PRML 4.1, 4.3
Sep 30–Oct 6 Linear classification No tutorial HW2 Optimization
PRML 4.1.2
Oct 7–Oct 13 Linear classification II No tutorial ESL 12.1-12.2
ESL 10.1-10.5
Oct 14–Oct 20 Unsupervised learning No tutorial Midterm PRML 9.1
Oct 21–Nov 3* Principal Component Analysis Linear algebra II HW3 PRML 12.1
Nov 4–Nov 10 Matrix factorization & probabilistic models PCA in practice ESL 14.5.1
Nov 11–Nov 17 Probabilistic models Multivariate Gaussians HW4 ESL 2.6.3, 6.6.3, 4.3.0
Nov 18–Nov 24 Bayesian linear regression & probabilistic PCA Final Review PRML 3.3, 12.2
Nov 25–Dec 1 The frontiers of ML No tutorial

*Reading week is Oct 27–Oct 31, therefore the Nov 3 tutorial corresponds to lectures given on Oct 22.

Suggested References

Computing

Python Programming Language

For the homework assignments, we will use Python, and libraries such as NumPy, SciPy, and scikit-learn. We will not be expecting you to know advanced Python programming, however we will expect that you are able to do the following.

Python Tutorials

There are a number of great Python tutorials on the web.

Using Python

There are a few options for running Python yourself.