Machine Learning for Large-Scale
Data Analysis and
Decision Making
80-629-17A
Fall 2019
[Schedule]
[Evaluations]
[References]
[Fall 2018]
[Français]
Instructor: Laurent Charlin
Class Schedule: I am teaching the course twice this
term, once in French and once in English.
Office hours: Wednesday 11:30am--12:30pm. CSC 4.817.
Description:
In this course, we will study machine learning models, a type of
statistical analysis that focuses on prediction, for analyzing very
large datasets ("big data").
The plan is to survey different machine learning
techniques (supervised, unsupervised, reinforcement learning) as well
as some applications (e.g., recommender systems). We will also focus
on large-scale machine learning and will discuss distributed
computational frameworks (Hadoop and Spark).
**Mathematical Note:**
Mathematical maturity will be assumed.
**Programming Note:**
Python knowledge will be assumed. If you do not know Python I have
listed a few ways to learn the basics below. I recommend option 1 (HEC
CAMS):
Offline. HEC
CAMS offers introductory python courses in September (both
in English and in French). Register here: CAMS registration
Online. DataCamp.
Do chapters 1, 2, 3 (only Chapter 1 is free).
Here is the tutorial we used last year: Fall 2018 tutorial. While I think the
first two options are superior, this will give you an idea of
the level I am expecting.
particularly recommend this
Further a machine-learning tutorial using python will be provided on week #4.
Weekly Schedule
08/28. Class introduction and math review. [slides]
09/04. Machine learning fundamentals [slides]
Required readings: Chapter 5 of
Deep Learning (the book). You can skim 5.5 to 5.10.
09/11. Supervised learning algorithms [slides]
References: Sections 4.1-4.3, 4.5 of The Elements of
Statistical Learning (available online), Sections 3.5 and 4.2 of Machine Learning (K.
Murphy)
09/18. Python for scientific computations and machine
learning [**In Lab -- Decelles, Laboratoire Lachute**]
The tutorial that you will follow is here. Note it is on colab which is a Google environment for
running python program (it is a lot like a jupyter notebook). It is
very useful but requires a Google account (ensure that either
yourself, or your partner--you can work in pairs--has one).
Solutions
09/25. Neural networks and deep learning [slides]
10/02. Recurrent Neural networks and Convolutional neural networks [slides]
Required readings: Sections 10, 10.1, 10.2 (skim 10.2.2,
skip 10.2.3), and 10.7. Sections 9, 9.1, 9.2, 9.3 (9.11 for fun).
Both from Deep Learning (the book).
10/09. Unsupervised learning [slides]
Required reading: Section 14.3 (skip 14.3.5 and
14.3.12) of the Elements of Statistical Learning.
10/16. Reading week (no class)
10/23. Project team meetings
10/30. Parallel computational paradigms for large-scale data processing [slides]
11/06 Recommendation systems [slides]
11/13 Sequential decision making I [slides]
11/20 Sequential decision making II [slides]
11/27 Class project presentations
Evaluations
Homework (20%)
Project (30%)
Project presentation (10%)
Final Exam (30%)
Date: December 5, Time: 9am-12pm, Room: Check on HEC en ligne.
Documentation allowed: cheat sheet (standard size 8.5 x 11, double sided), calculator.
Material covered: Everything covered in class + required
lectures. EXCEPT:
Week #3 Supervised learning: slides 38--42 (starting at "Beyond Naive")
Week #11 Recommender Systems: slides 25--53 (starting at "Probabilistic Matrix Factorization")
Week #13 Sequential decision making II: slides 27--30 (starting at "Q-Learning")
Past exam: Fall 2018
Class participation (10%).
4 Quizzes (5 minutes each) at the beginning of the class and
spread out across the semester. The best 3 will count toward your
final grade.
References
The Elements of Statistical Learning: Data Mining,
Inference, and Prediction, Second Edition
Hastie, Trevor, Tibshirani, Robert, Friedman, Jerome, 2009
Deep Learning. Ian Goodfellow, Yoshua Bengio and, Aaron Courville. [DL]
Reinforcement Learning : An Introduction Hardcover. Richard S. Sutton, Andrew G. Barto. A Bradford Book. 2nd edition [RL-Sutton-Barto]
Machine Learning. Kevin Murphy. MIT Press. 2012. [ML-Murphy]
2016 [ML]
Recommender Systems Handbook, Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. 2011. [RSH]
Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman,
Jeff Ullman. Cambridge University Press. 2014. [MMDS]
Decision Theory. Halsted. 1986. [DT]
Data Algorithms : Recipes for Scaling Up with Hadoop and Spark 1st
Edition. Mahmoud Parsian. O'Reilly. 2015 [DA]
Python for Data Analysis : Data Wrangling with Pandas, NumPy, and IPython. Wes McKinney. O'Reilly. 2012 [PDA]
Data Science from Scratch : First Principles with Python. Joel Grus. 2015 [DSS]
Pattern Recognition and Machine Learning. Christopher Bishop. 2006 [PRML]
Advanced Analytics with Spark. O'Reilly. Second Edition. 2017
|