Readings / Notes: CSCC11 Introduction to Machine Learning and Data Mining

The links to chapters of the notes that have not yet been covered in lecture may be broken.

Front matter: Title page, table of contents, notation

Chapter Notes Contents Links and Other Readings
1. Introduction to Machine Learning Overview of Machine Learning topics Machine Learning (Wikipedia)
Linear Algebra Review (by Z. Kolter)
2. Linear Regression 1D regression,
multidimensional regression,
least-squares, pseudo-inverse
Linear Regresion (Wikipedia)
3. Nonlinear Regression Basis function regression, Radial Basis Functions, Neural networks, K-nearest nieghbours RBFs (Wikipedia)
ANNs (Wikipedia)
KNN (Wikipedia)
4. Quadratics (background) Matrix-vector quadratic forms, gradients, optimization
Linear Algebra Review (by Z. Kolter)
5. Basic Probability and Statistics (background) Probability, conditioning, marginalization, density, mathematical expectation Cox axioms (wikipedia)
binomial distribution (wikipedia)
multinomial distribution (wikipedia)
6. Probability Density Functions (background) PDFs Mean and covariance, Uniform distribution, (multi-dim.) Gaussian distribution PDFs (Wikipedia)
Probability Review (by S. Teong)
7. Estimation Bayes' rule, Maximum likelihood, Maximum a Posteriori Probabilistic LS (by A. Ng)
8. Introduction to Classification Class conditional models, Logistic regression, Neural Network Classifiers, Naïve Bayes Logistic Regression (Wikipedia)
Naïve Bayes (Wikipedia)
9. Gradient Descent (background) Gradient Descent, Line Search Gradient descent (Wikipedia)
Line Search (Wikipedia)
Optimization (Wikipedia)
10. Cross Validation Hold-out Validation, N-Fold Cross Valiadation Cross-validation (Wikipedia)
11. Bayesian Methods Bayesian Regression, Model Averaging, Model Selection Bayesian model selection demos (Tom Minka)
12. Monte Carlo Methods (optional) Sampling Gaussians, Importance Sampling, MCMC, Metropolis Hastings MCMC (Wikipedia)
MCMC applet
13. Principal Component Analysis Dimensionality Reduction, PCA, Probabilistic PCA (optional), Whitening (optional) PCA (Wikipedia)
Introductory PCA Tutorial (by L. Sm ith)
14. Lagrange Multipliers (background) Equality constraints, Bounds constraints Lagrange Multipliers (Wikipedia)
15. Clustering K-means, Mixtures of Gaussians, Expectation-Maximization Algorithm K-means (Wikipedia)
Slides on Mixture Models and EM
Notes on BIC
16. Hiddden Markov Models (optional) Markov chains, Viterbi, Forward-Backward, Baum-Welch (EM) HMMs (Wikipedia)
17. Support Vector Machines Maximum margin, Loss functions, Kernels SVMs (Wikipedia)
18. AdaBoost Boosting, Ensemble Methods, AdaBoost (Wikipedia)