Machine Learning for Large-Scale Data Analysis and Decision Making
MATH 60629A

Fall 2024

[Schedule] [Evaluations] [References] [Fall 2019] [Français]

Instructor: Laurent Charlin

Class Schedule: I am teaching the course twice this term, once in French and once in English.

Day/Time Section Room

Tuesday 3:30pm--6:30pm J01 (English) Decelles, Natashquan

Wednesday 8:30am--11:30am J02 (French) Decelles, Natashquan

Office hours: Wednesday 1pm--2pm (Room 4.830)

Description:
In this course, we will study machine learning models, a type of statistical analysis that focuses on prediction, for analyzing very large datasets ("big data").
We will survey different machine learning techniques (supervised, unsupervised) as well as some applications (e.g., recommender systems) and ways to scale-up computations (e.g., distributed frameworks).

**Course delivery:** This course will be given as a flipped classroom. It is an instructional strategy where students learn the material before they come to class. The material will be a mix of readings and video capsules. Class time is reserved for more active activities such as problem solving, demonstrations, and questions-answering. In addition, class time will contain a short summary of the week's material.

Mathematical Note: Mathematical maturity will be assumed.

Programming Note: Python knowledge will be assumed. If you do not know Python I have listed a few ways to learn the basics below. I recommend option 1 (Data Camp) or option 2 below:

DataCamp. Complete Chapters 1, 2, 3 of the Introduction to Python course. To get access to Chapters 2 and 3 use the link I sent you.
HEC CAM offers introductory python courses in September. Register here: CAM registration
Here is the tutorial we used in 2018: Fall 2018 tutorial. While I think the first two options are superior, this will give you an idea of the level I am expecting. particularly recommend this

Further a machine-learning tutorial using python will be provided on week #4.

Weekly Schedule

08/27. Class introduction and math review. [slides]
- Required reading: Prologue to The Master Algorithm
- Suggested reading: Chapter 1 of ESL
- Math review (if needed): have a look at the resources page.
09/03. Machine learning fundamentals
- Required readings: Chapter 5 of Deep Learning (the book). You can skim 5.4 (except 5.4.4) to 5.10.
- Capsules: [slides]
  1. Learning Problem [14:40]
  2. Types of Experiences [13:15]
  3. A first Supervised Model [8:03]
  4. Model Evaluation [15:26]
  5. Regularization [4:09]
  6. Model Validation [3:08]
  7. Bias / Variance tradeoff [11:50]
- In-class material:
  - Summary
  - Exercises (colab), answers (colab)
  - If you do not want to use colab, here are the two files you need to download: 1a) Fundamentals_questions.ipynb AND 2) utilities.py
09/10. Supervised learning algorithms
- References:
  Sections 4.1-4.3, 4.5 of The Elements of Statistical Learning (available online),
  Sections 3.5 and 4.2 of Machine Learning (K. Murphy)
- Capsules: [slides]
  1. Nearest Neighbor [19:05]
  2. Linear Classification [15:26]
  3. Introduction to Probabilistic Models (for Classification) [11:55]
  4. The Naive Bayes Model [24:28]
  5. Naive Bayes Example [9:26]
- In-class material:
  - Summary
  - Exercises (colab) answers (colab)
  - If you do not want to use colab, here are the two files you need to download:1a) Fundamentals_questions.ipynb AND 2) utils.py
09/17. Python for scientific computations and machine learning [Practical Session]
- The tutorial that you will follow is here (on colab), Solutions.
- I encourage you to start the tutorial ahead of time and to finish it during our 180 minutes together.
09/24. Neural networks and deep learning
- Required readings: Sections 6.1--6.3 and 6.5 (stop at 6.5.4) of Deep Learning (the book).
- Other reference: Chapter 11 of the Elements of Statistical Learning (available online).
- Capsules: [slides]
  1. From linear classification to neural networks [19:28]
  2. Training neural networks [20:14]
  3. Learning representations [13:40]
  4. Neural networks hyperparameters [25:20]
  5. Neural networks takeaways [7:00]
- In-class exercises:
  - Summary
  - Exercises (colab), answers (colab)
10/01. No class
10/08. Recurrent Neural networks and Convolutional neural networks
- Required readings: Sections 10, 10.1, 10.2 (skim 10.2.2, skip 10.2.3), and 10.7. Sections 9, 9.1, 9.2, 9.3 (9.11 for fun). Both from Deep Learning (the book).
- Capsules: [slides]
  1. Modelling Sequential Data [8:42]
  2. Practical Overview of RNNs [29:32]
  3. RNNs for language modelling [15:13]
  4. Overview of CNNs [13:30]
  5. Convolutions and Pooling [26:00]
  6. Conclusions and Practical remarks [9:17]
- In-class material:
10/15. Unsupervised learning
- Required reading: Section 14.3 (skip 14.3.5 and 14.3.12) of the Elements of Statistical Learning.
- Capsules: [slides]
  1. Introduction to unsupervised learning [8:17]
  2. K-means clustering [41:58] (there's a natural break at 22:28)
  3. GMMs for clustering [17:52]
  4. Beyond Clustering [14:42]
- In-class material:
  - Summary
  - Exercises Unsupervised (colab)
10/22. Reading week (no class)
10/29. Project team meetings
11/05. Parallel computational paradigms for large-scale data processing
- This will be a 3-hour lecture (i.e. no review classroom). The capsule summaries will be accepted nontheless.
- Capsules: [diapos]
  1. Intro. to Distributed Computing for ML [19:35]
  2. MapReduce [17:41]
  3. Spark [17:37]
- Course slides
- Class summary
11/12 Recommender systems
- Required preparation for the case: Case Presentation and class execution (answer to Question 1 must be submitted by the 11 at the latest)
  - Class slides
11/19 Attention and Transformers
- Will be given in class.
- Complementary reading: An Introduction to Transformers
- Slides
11/26 Modern generative models
- Will be given in class.
- Slides
12/03 Class project presentations

Evaluations

Homework (20%)

Available here.
Due on October 18.
Project (30%)
- Due date: study plan October 25. Final report December 15 (by the end of the day).
- Instructions
Project presentation (10%)
- Instructions
Final Exam (30%)
- Date: December 8, Time: 1:30pm-4:30pm,
- Documentation allowed: cheat sheet (standard size 8.5 x 11, double sided), calculator.
- Material covered: Everything covered in class + required lectures.
- Past exam: Fall 2018, Fall 2020 (Solutions)
Capsule summaries (10%)
- Provide a short summary (10 to 15 lines of text in the form) of 10 capsules throughout the semester.
- The summary of a capsule must be provided before its class (e.g., a summary of capsule on "Learning Problems" must be submitted by 09/03).
- Post your summaries using this form

References

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition Hastie, Trevor, Tibshirani, Robert, Friedman, Jerome, 2009 [ESL]
Deep Learning. Ian Goodfellow, Yoshua Bengio and, Aaron Courville. [DL]
Reinforcement Learning : An Introduction Hardcover. Richard S. Sutton, Andrew G. Barto. A Bradford Book. 2nd edition [RL-Sutton-Barto]
Machine Learning. Kevin Murphy. MIT Press. 2012. [ML-Murphy]
Recommender Systems Handbook, Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. 2011. [RSH]
Data Algorithms : Recipes for Scaling Up with Hadoop and Spark 1st Edition. Mahmoud Parsian. O'Reilly. 2015 [DA]
Python for Data Analysis : Data Wrangling with Pandas, NumPy, and IPython. Wes McKinney. O'Reilly. 2012 [PDA]
Pattern Recognition and Machine Learning. Christopher Bishop. 2006 [PRML]
Advanced Analytics with Spark. O'Reilly. Second Edition. 2017

Machine Learning for Large-Scale Data Analysis and Decision Making MATH 60629A Fall 2024

Weekly Schedule

Evaluations

References

Machine Learning for Large-Scale Data Analysis and Decision Making
MATH 60629A

Fall 2024