Machine Learning for Large-Scale Data Analysis and Decision Making
80-629-17A

Fall 2019

[Schedule]  [Evaluations]  [References]  [Fall 2018]  [Français


Instructor: Laurent Charlin

Class Schedule: I am teaching the course twice this term, once in French and once in English.

Day/Time Section Room
Wednesday 8:30am--11:30am J01 (English) CSC Raymond Chabot Grant Thornton
Thursday 8:30pm--11:30am J01 (French) Quebecor

Office hours: Wednesday 11:30am--12:30pm. CSC 4.817.

Description:
In this course, we will study machine learning models, a type of statistical analysis that focuses on prediction, for analyzing very large datasets ("big data").
The plan is to survey different machine learning techniques (supervised, unsupervised, reinforcement learning) as well as some applications (e.g., recommender systems). We will also focus on large-scale machine learning and will discuss distributed computational frameworks (Hadoop and Spark).

**Mathematical Note:** Mathematical maturity will be assumed.

**Programming Note:** Python knowledge will be assumed. If you do not know Python I have listed a few ways to learn the basics below. I recommend option 1 (HEC CAMS):

  1. Offline. HEC CAMS offers introductory python courses in September (both in English and in French). Register here: CAMS registration
  2. Online. DataCamp. Do chapters 1, 2, 3 (only Chapter 1 is free).
  3. Here is the tutorial we used last year: Fall 2018 tutorial. While I think the first two options are superior, this will give you an idea of the level I am expecting. particularly recommend this

Further a machine-learning tutorial using python will be provided on week #4.


Weekly Schedule

  1. 08/28. Class introduction and math review. [slides]
  2. 09/04. Machine learning fundamentals [slides]
    • Required readings: Chapter 5 of Deep Learning (the book). You can skim 5.5 to 5.10.
  3. 09/11. Supervised learning algorithms [slides]
    • References:
      Sections 4.1-4.3, 4.5 of The Elements of Statistical Learning (available online),
      Sections 3.5 and 4.2 of Machine Learning (K. Murphy)
  4. 09/18. Python for scientific computations and machine learning [**In Lab -- Decelles, Laboratoire Lachute**]
    • The tutorial that you will follow is here. Note it is on colab which is a Google environment for running python program (it is a lot like a jupyter notebook). It is very useful but requires a Google account (ensure that either yourself, or your partner--you can work in pairs--has one). Solutions
  5. 09/25. Neural networks and deep learning [slides]
  6. 10/02. Recurrent Neural networks and Convolutional neural networks [slides]
    • Required readings: Sections 10, 10.1, 10.2 (skim 10.2.2, skip 10.2.3), and 10.7. Sections 9, 9.1, 9.2, 9.3 (9.11 for fun). Both from Deep Learning (the book).
  7. 10/09. Unsupervised learning [slides]
    • Required reading: Section 14.3 (skip 14.3.5 and 14.3.12) of the Elements of Statistical Learning.
  8. 10/16. Reading week (no class)
  9. 10/23. Project team meetings
  10. 10/30. Parallel computational paradigms for large-scale data processing [slides]
  11. 11/06 Recommendation systems [slides]
  12. 11/13 Sequential decision making I [slides]
  13. 11/20 Sequential decision making II [slides]
  14. 11/27 Class project presentations


Evaluations

  1. Homework (20%)
  2. Project (30%)
  3. Project presentation (10%)
  4. Final Exam (30%)
    • Date: December 5, Time: 9am-12pm, Room: Check on HEC en ligne.
    • Documentation allowed: cheat sheet (standard size 8.5 x 11, double sided), calculator.
    • Material covered: Everything covered in class + required lectures. EXCEPT:
      • Week #3 Supervised learning: slides 38--42 (starting at "Beyond Naive")
      • Week #11 Recommender Systems: slides 25--53 (starting at "Probabilistic Matrix Factorization")
      • Week #13 Sequential decision making II: slides 27--30 (starting at "Q-Learning")
    • Past exam: Fall 2018
  5. Class participation (10%).
    • 4 Quizzes (5 minutes each) at the beginning of the class and spread out across the semester. The best 3 will count toward your final grade.


References

  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition Hastie, Trevor, Tibshirani, Robert, Friedman, Jerome, 2009
  • Deep Learning. Ian Goodfellow, Yoshua Bengio and, Aaron Courville. [DL]
  • Reinforcement Learning : An Introduction Hardcover. Richard S. Sutton, Andrew G. Barto. A Bradford Book. 2nd edition [RL-Sutton-Barto]
  • Machine Learning. Kevin Murphy. MIT Press. 2012. [ML-Murphy] 2016 [ML]
  • Recommender Systems Handbook, Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. 2011. [RSH]
  • Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman, Jeff Ullman. Cambridge University Press. 2014. [MMDS]
  • Decision Theory. Halsted. 1986. [DT]
  • Data Algorithms : Recipes for Scaling Up with Hadoop and Spark 1st Edition. Mahmoud Parsian. O'Reilly. 2015 [DA]
  • Python for Data Analysis : Data Wrangling with Pandas, NumPy, and IPython. Wes McKinney. O'Reilly. 2012 [PDA]
  • Data Science from Scratch : First Principles with Python. Joel Grus. 2015 [DSS]
  • Pattern Recognition and Machine Learning. Christopher Bishop. 2006 [PRML]
  • Advanced Analytics with Spark. O'Reilly. Second Edition. 2017