Machine Learning for LargeScale
Data Analysis and
Decision Making
MATH 60629A
Fall 2022
[Schedule]
[Evaluations]
[References]
[Fall 2019]
[Français]
Instructor: Laurent Charlin
Class Schedule: I am teaching the course twice this term, once in French and once in English.
Office hours: Wednesday 1pm2pm (Room 4.817)
Description:
In this course, we will study machine learning models, a type of
statistical analysis that focuses on prediction, for analyzing very
large datasets ("big data").
We will survey different machine learning
techniques (supervised, unsupervised, reinforcement learning) as well
as some applications (e.g., recommender systems) and ways to
scaleup computations (e.g., distributed frameworks).
**Course delivery:**
This course will be given as a flipped classroom. It is an instructional strategy where students learn the material before they come to class. The material will be a mix of readings and video capsules. Class time is reserved for more active activities such as problem solving, demonstrations, and questionsanswering. In addition, class time will contain a short summary of the week's material.
Mathematical Note:
Mathematical maturity will be assumed.
Programming Note:
Python knowledge will be assumed. If you do not know Python I have
listed a few ways to learn the basics below. I recommend option 1
(Data Camp) or option 2 below:
 DataCamp.
Complete Chapters 1, 2, 3 of the Introduction to
Python course. To get access to Chapters 2 and 3 use the link I
sent you.
 HEC
CAM offers introductory python courses in September. Register here: CAM registration
 Here is the tutorial we used in 2018: Fall 2018 tutorial. While I think the
first two options are superior, this will give you an idea of
the level I am expecting.
particularly recommend this
Further a machinelearning tutorial using python will be provided on week #4.
Weekly Schedule
 08/29. Class introduction and math review. [slides]
 09/12. Machine learning fundamentals
 09/19. Supervised learning algorithms
 09/26. Python for scientific computations and machine
learning [Practical Session]
 The tutorial that you will follow is here (on
colab),
Solutions.
 I encourage you to start the tutorial ahead of time and to
finish it during our 180 minutes together.
 10/05. Neural networks and deep learning
 10/11. Recurrent Neural networks and Convolutional neural networks
 10/17. Unsupervised learning
 10/24. Reading week (no class)
 10/31. Project team meetings
 11/07. Parallel computational paradigms for largescale data processing
 11/14 Recommender systems
 11/21 Sequential decision making I
 11/28 Sequential decision making II
 12/05 Class project presentations
 This class will be help in Room Manuvie (1st floor, blue section)
Evaluations
 Homework (20%)
 Project (30%)
 Due date: study plan October 28. Final report December 15 (by the end of the day).
 Instructions
 Project presentation (10%)
 Final Exam (30%)
 Date: December 12, Time: 9:00am12:00pm,
 Documentation allowed: cheat sheet (standard size 8.5 x 11, double sided), calculator.
 Material covered: Everything covered in class + required lectures.
 Past exam: Fall 2018, Fall 2020 (Solutions)
 Capsule summaries (10%)
 Provide a short summary (10 to 15 lines of text in the form) of 10 capsules throughout the semester.
 Post your summaries using this form
 Deadline: December 17
References
 The Elements of Statistical Learning: Data Mining,
Inference, and Prediction, Second Edition
Hastie, Trevor, Tibshirani, Robert, Friedman, Jerome, 2009 [ESL]
 Deep Learning. Ian Goodfellow, Yoshua Bengio and, Aaron Courville. [DL]
 Reinforcement Learning : An Introduction Hardcover. Richard S. Sutton, Andrew G. Barto. A Bradford Book. 2nd edition [RLSuttonBarto]
 Machine Learning. Kevin Murphy. MIT Press. 2012. [MLMurphy]
 Recommender Systems Handbook, Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. 2011. [RSH]
 Data Algorithms : Recipes for Scaling Up with Hadoop and Spark 1st Edition. Mahmoud Parsian. O'Reilly. 2015 [DA]
 Python for Data Analysis : Data Wrangling with Pandas, NumPy, and IPython. Wes McKinney. O'Reilly. 2012 [PDA]
 Pattern Recognition and Machine Learning. Christopher Bishop. 2006 [PRML]
 Advanced Analytics with Spark. O'Reilly. Second Edition. 2017
