CSC 2541 Winter 2025: Large Models

Overview and Motivation

Large Models

Large language models have revolutionized artificial intelligence and machine learning. These models, trained on massive datasets, can generate human-like text, code, and (apparently) engage in complex reasoning tasks. Driving these breakthroughs are a couple of empirical findings: large models improve predictably with the amount of compute used to train them and diverse capabilities emerge as the models see more data from the internet. These findings motivated an immense industrial effort to build and deploy very large models.

The course will focus on understanding the practical aspects of large model training through an in-depth study of the Llama 3 technical report. We will cover the whole pipeline, from pre-training and post-training to evaluation and deployment.

Students will be expected to present a paper, prepare code notebooks, and complete a final project on a topic of their choice. While the readings are largely applied or methodological, theoretically-minded students are welcome to focus their project on a theoretical topic related to large models.

The course is heavily inspired by similar courses like CS336: Language Modeling from Scratch taught by Tatsunori Hashimoto, Percy Liang, Nelson Liu, and Gabriel Poesia at Stanford and CS 886: Recent Advances on Foundation Models taught by Wenhu Chen and Cong Wei at Waterloo.

Course Information

Teaching Staff

Instructor: Chris Maddison
TAs: Ayoub El Hanchi and Frieda Rong
Email Instructor and TAs: csc2541-large-models@cs.toronto.edu

What, When, and Where

Syllabus and Policies

The full syllabus and policies are available here.

Assignments and Grading

Assignments for the course include paper presentations and a final project. The marking scheme is as follows:

Prerequisites

This is a graduate course designed to guide students in an exploration of the current state of the art. So, while there are no formal prerequisites, the course does assume a certain level of familiarity with machine learning and deep learning concepts. A previous course in machine learning such as CSC311 or STA314 or ECE421 is required to take full advantage of the course, and, ideally, students will have taken a course in deep learning such as CSC413. In addition, it is strongly recommended that students have a strong background in linear algebra, multivariate calculus, probability, and computer programming.

Auditing

It is possible for non-enrolled persons to audit this course (sit in on the lectures) only if the auditor is a student at U of T, and no University resources are to be committed to the auditor. This means that students of other universities, employees of outside organizations, or any other non-students, are not permitted to be auditors.

Schedule and Readings

The structure of the course closely follows the structure of Meta's technical report:

(Llama 3) Llama Team, AI @ Meta. "The Llama 3 Herd Of Models". arXiv:2407.21783.

With the exception of the first two weeks, each week students will be presenting a paper that elaborates on the section of the report that is assigned that week.

This is a preliminary schedule, and it may change throughout the term. We won't check whether you've read the assigned readings, but you will get more out of the course if you do.

Day Topic Core Readings Papers for Student Presentations
10/1 The Bitter Lesson
[slides]
None
17/1 A Tiny Large Model
[github]
[notebook]
[slides]
None
24/1 Pre-training:
Scaling
31/1 Pre-training:
Parallelism
7/2 Prompting
  • Read any one of the papers from the Student Presentations column for this week.
14/2 Post-training:
Alignment
28/2 Post-training:
Capabilities
7/3 Evaluation
14/3 Safety TBD
21/3 Deployment
28/3 Beyond Language TBD
4/4 Future Directions TBD

The following reports cover the technical details of other important large models: