Machine Learning for LargeScale
Data Analysis and
Decision Making
8062917A
This is the official website of the course. I will keep it up to date. In case of disagreement with Zone
Cours this website will preval.
[Schedule]
[Evaluations]
[References]
Instructor: Laurent Charlin
Class Schedule: Wednesday 8:30am11:30am. CSC
Saine Marketing.
Office hours: Wednesday 11:30am12:30pm. CSC 4.817.
Description:
In this course, we will study machine learning models, a type of
statistical analysis that focuses on prediction, for analyzing very
large datasets ("big data"). In addition to standard models, we will
study models for analyzing user behaviour and for decision making.
Massive datasets are now common and require scalable analysis tools.
Machine learning provides such tools and is widely used for modelling
problems across many fields including artificial intelligence,
bioinformatics, finance, marketing, education, transportation, and
health.
**Note:**
Mathematical maturity will be assumed. Programming will also be
required but python tutorial(s) will be provided in the first few
weeks of the class. The plan is to survey different machine learning
techniques (supervised, unsupervised, reinforcement learning) as well
as some applications (e.g., recommender systems). We will also focus
on largescale machine learning and will discuss distributed
computational frameworks (Hadoop and Spark).
Weekly Schedule
 08/30 Class introduction and math review. [slides]
 09/06 Programming with Python I [**In Lab  Decelles, Laboratoire LACED**]
 09/13 Machine learning fundamentals. [slides] [**CSteCath, Quebecor**]
 Required readings: Chapter 5 of
Deep Learning (the book).
 09/20 Python for scientific computations and machine learning [**In Lab  Decelles, Laboratoire LACED**]
 09/27 Supervised learning algorithms [slides]
 References:
Sections 4.14.3, 4.5 of The Elements of
Statistical Learning (available online), Sections 3.5 and 4.2 of Machine Learning (K.
Murphy)
 10/04 Neural networks and deep learning [slides]
 10/11 Unsupervised learning [slides]
 Required reading: Section 14.3 (skip 14.3.5 and 14.3.12) of the Elements of Statistical Learning (available online).
 10/25 Project team meetings
 11/01 Parallel computational paradigms for largescale data processing [slides]
 11/08 Recommendation systems I [slides]
 11/15 Sequential decision making I [slides]
 11/22 Sequential decision making II [slides]
 11/29 Class project presentations [** Groupe Cholette (CSC  Yellow Section)**]
Evaluations
 Homework (20%).
 Project (30%).
 Due date: study plan 23/10 (October 23), final report 19/12 (December 19)
 Instructions
 Project presentation (10%).
 Final Exam (30%).
 08/12 (December 8) 9am12pm, room: CSC Deloitte.
 Documentation allowed: cheat sheet (standard size 8.5 x 11, double sided), calculator.
 Material: all slides and required readings
 Except: 1) "Unsupervised learning" after slide 14
(probabilistic clustering)
and 2) "Sequential decision making II" after slide 25
(Qlearning).
 Class participation (10%).
References
 The Elements of Statistical Learning: Data Mining,
Inference, and Prediction, Second Edition
Hastie, Trevor, Tibshirani, Robert, Friedman, Jerome, 2009
 Deep Learning. Ian Goodfellow, Yoshua Bengio and, Aaron Courville. [DL]
 Reinforcement Learning : An Introduction Hardcover. Richard S. Sutton, Andrew G. Barto. A Bradford Book. 2nd edition [RLSuttonBarto]
 Machine Learning. Kevin Murphy. MIT Press. 2012. [MLMurphy]
2016 [ML]
 Recommender Systems Handbook, Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. 2011. [RSH]
 Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman,
Jeff Ullman. Cambridge University Press. 2014. [MMDS]
 Decision Theory. Halsted. 1986. [DT]
 Data Algorithms : Recipes for Scaling Up with Hadoop and Spark 1st
Edition. Mahmoud Parsian. O'Reilly. 2015 [DA]
 Python for Data Analysis : Data Wrangling with Pandas, NumPy, and IPython. Wes McKinney. O'Reilly. 2012 [PDA]
 Data Science from Scratch : First Principles with Python. Joel Grus. 2015 [DSS]
 Pattern Recognition and Machine Learning. Christopher Bishop. 2006 [PRML]
 Advanced Analytics with Spark. O'Reilly. Second Edition. 2017
