Machine Learning for Large-Scale
Data Analysis and
Instructor: Laurent Charlin
Class Schedule: I am teaching the course twice this term, once in French and once in English.
Office hours: Wednesday 9am--11am (Virtual Room)
Rooms are password-protected. Reach out to me by email if you need the password.
In this course, we will study machine learning models, a type of
statistical analysis that focuses on prediction, for analyzing very
large datasets ("big data").
The plan is to survey different machine learning
techniques (supervised, unsupervised, reinforcement learning) as well
as some applications (e.g., recommender systems). We will also study
large-scale machine learning and will discuss distributed
computational frameworks (Hadoop and Spark).
Due to the online nature of the semester, this course will be given as a flipped classroom. It is an instructional strategy where students learn the material before they come to class. The material will be a mix of readings and video capsules. Class time is reserved for more active activities such as problem solving, demonstrations, and questions-answering. In addition, class time will contain a short summary of the week's material.
Mathematical maturity will be assumed.
Python knowledge will be assumed. If you do not know Python I have
listed a few ways to learn the basics below. I recommend option 1
Complete Chapters 1, 2, 3 (sign in using the link I sent you to access Chapters 2 and 3).
CAM offers introductory python courses in September
(currently only in French). Register here: CAM registration
Here is the tutorial we used in 2018: Fall 2018 tutorial. While I think the
first two options are superior, this will give you an idea of
the level I am expecting.
particularly recommend this
Further a machine-learning tutorial using python will be provided on week #4.
08/31. Class introduction and math review. [slides]
09/14. Machine learning fundamentals
Required readings: Chapter 5 of
Deep Learning (the book). You can skim 5.4 (except 5.4.4) to 5.10.
Learning Problem [14:40]
Types of Experiences [13:15]
A first Supervised Model [8:03]
Model Evaluation [15:26]
Model Validation [3:08]
Bias / Variance tradeoff [11:50]
09/21. Supervised learning algorithms
Sections 4.1-4.3, 4.5 of The Elements of
Statistical Learning (available online),
Sections 3.5 and 4.2 of Machine Learning (K.
Nearest Neighbor [19:05]
Linear Classification [15:26]
Introduction to Probabilistic Models (for Classification) [11:55]
The Naive Bayes Model [24:28]
Naive Bayes Example [9:26]
09/28. Python for scientific computations and machine
learning [Practical Session]
The tutorial that you will follow is here (on
I encourage you to start the tutorial ahead of time and to
finish it during our 90 minutes together.
10/05. Neural networks and deep learning
From linear classification to neural networks [19:28]
Training neural networks [20:14]
Learning representations [13:40]
Neural networks hyperparameters [25:20]
Neural networks takeaways [7:00]
10/13. Recurrent Neural networks and Convolutional neural networks
Required readings: Sections 10, 10.1, 10.2 (skim 10.2.2,
skip 10.2.3), and 10.7. Sections 9, 9.1, 9.2, 9.3 (9.11 for fun).
Both from Deep Learning (the book).
Modelling Sequential Data [8:42]
Practical Overview of RNNs [29:32]
RNNs for language modelling [15:13]
Overview of CNNs [13:30]
Convolutions and Pooling [26:00]
Conclusions and Practical remarks [9:17]
10/19. Unsupervised learning
Required reading: Section 14.3 (skip 14.3.5 and
14.3.12) of the Elements of Statistical Learning.
Introduction to unsupervised learning [8:17]
K-means clustering [41:58] (there's a natural break at 22:28)
GMMs for clustering [17:52]
Beyond Clustering [14:42]
Exercises Unsupervised (colab), answers (colab)
If you wish to work outside of colab, here are the files to download: 1) Unsupervised_questions.ipynb AND utilities.py
10/26. Reading week (no class)
11/02. Project team meetings
11/09. Parallel computational paradigms for large-scale data processing
Intro. to Distributed Computing for ML [19:35]
11/16 Recommender systems
11/23 Sequential decision making I
Motivating RL [8:22]
Planning with MDPs [12:16]
MDP objective [14:16]
Algorithms for solving MDPs [17:51]
Optional: Demo of the policy iteration algorithm (from Andrej Karpathy)
11/30 Sequential decision making II
Note: In this capsule, there is a mistake in the second equation of the policy iteration algorithm (the transition should be given a and not π(s)), the slides have been corrected (see slides 47 and 48)
Introduction to RL [13:31]
A first RL algorithm [17:13]
RL Algorithms for Control [21:10]
Required reading: Sections 1 through 4 from this Survey
Other reading: Chapters 1,3,4, and 6 from Reinforcement Learning: An Introduction
Optional: Demo of the TD algorithm (from Andrej Karpathy)
12/07 Class project presentations
Due date: final report December 11 (by the end of the day).
Project presentation (10%)
Final Exam (30%)
Date: December 18, Time: 9:00am-12:00pm (Montreal time),
Material covered: Everything covered in class + required lectures.
Past exam: Fall 2018
Capsule summaries (10%)
Provide a short summary (10 to 15 lines of text in the form) of 5 capsules throughout the semester.
Post your summaries using this form
The Elements of Statistical Learning: Data Mining,
Inference, and Prediction, Second Edition
Hastie, Trevor, Tibshirani, Robert, Friedman, Jerome, 2009 [ESL]
Deep Learning. Ian Goodfellow, Yoshua Bengio and, Aaron Courville. [DL]
Reinforcement Learning : An Introduction Hardcover. Richard S. Sutton, Andrew G. Barto. A Bradford Book. 2nd edition [RL-Sutton-Barto]
Machine Learning. Kevin Murphy. MIT Press. 2012. [ML-Murphy]
Recommender Systems Handbook, Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. 2011. [RSH]
Data Algorithms : Recipes for Scaling Up with Hadoop and Spark 1st Edition. Mahmoud Parsian. O'Reilly. 2015 [DA]
Python for Data Analysis : Data Wrangling with Pandas, NumPy, and IPython. Wes McKinney. O'Reilly. 2012 [PDA]
Pattern Recognition and Machine Learning. Christopher Bishop. 2006 [PRML]
Advanced Analytics with Spark. O'Reilly. Second Edition. 2017