| Chapter | Notes | Contents | Links and Other Readings | 
 | 1. | Introduction to Machine Learning | Overview of Machine Learning topics | Machine Learning (Wikipedia) Loss Functions (Wikipedia)
 Linear Algebra Review (by Z. Kolter)
 
 | 
 | 2. | Linear Regression | 1D regression, multidimensional regression,
 least-squares, pseudo-inverse
 | Linear Regresion (Wikipedia) Linear Algebra Review (by Z. Kolter)
 Common matrix identities (S. Roweis)
 
 | 
 | 3. | Nonlinear Regression | Basis function regression, Radial Basis Functions, Neural networks, K-nearest neighbours | RBFs (Wikipedia) ANNs (Wikipedia)
 KNN (Wikipedia)
 
 | 
 | 4. | Quadratics (background) | Matrix-vector quadratic forms, gradients, optimization 
 |  | 
 | 5. | Basic Probability and Statistics (background) | Probability, conditioning, marginalization, density, mathematical expectation | Cox axioms (wikipedia) binomial distribution (wikipedia)
 multinomial distribution (wikipedia)
 | 
 | 6. | Probability Density Functions (background) | PDFs Mean and covariance, Uniform distribution, (multi-dim.)
        Gaussian distribution | PDFs (Wikipedia) Probability Review (by S. Teong)
 
 | 
 | 7. | Estimation | Bayes' rule, Maximum likelihood (ML), Maximum a Posteriori (MAP), Bayes' estimates | Probabilistic LS (by A. Ng) 
 | 
 | 8. | Information Theory | Entropy, Mutual Information, KL Divergence, Cross-Entropy | Information Theory (Wikipedia) Entropy (Wikipedia)
 
 | 
 | 9. | Classification Methods | k-NN classifiers, Decision trees, Class conditional models,
	Naïve Bayes, Logistic regression | Decision Trees (Wikipedia) Logistic Regression (Wikipedia)
 Naïve Bayes (Wikipedia)
 
 | 
 | 10. | Gradient Descent (background) | Gradient Descent, Line Search | Gradient descent (Wikipedia) Line Search (Wikipedia)
 Optimization (Wikipedia)
 GD with Momentum
 
 | 
 | 11. | Cross Validation | N-Fold Cross Valiadation, LOOCV | Cross-validation (Wikipedia) 
 | 
 | 12. | Bayesian Methods | Bayesian Regression, Model Averaging, Model Selection | Bayesian model selection demos (Tom Minka) | 
 | 13. | Monte Carlo Methods | Sampling Gaussians and Categorical Distributions, Importance Sampling, MCMC | MCMC (Wikipedia) MCMC applet
 | 
 | 14. | Principal Component Analysis | Dimensionality Reduction, PCA, Probabilistic PCA | PCA (Wikipedia) PCA Tutorial (by L. Smith)
 | 
 | 15. | Lagrange Multipliers (background) | Equality constraints, Bounds constraints | Lagrange Multipliers (Wikipedia) | 
 | 16. | Clustering | K-means, Gaussian Mixture Models, Expectation-Maximization Algorithm | K-means (Wikipedia) K-means++ (Wikipedia)
 Slides on Mixture Models and EM
 | 
 | 17. | Neural Networks I (optional) | Multi-Layer Perceptron, Activation Functoins, Back-Propagation | MLPs (by R. Grosse) Back-Prop (by R. Grosse)
 
 |