The tutorials of this course will follow the flipped classroom model. Guides to readings and video content will be posted for each tutorial and you are expected to study them offline, and the online tutorials will mainly be Q&A and live demo of coding examples and classic questions from past year exams. The rational behind the flipped classroom methodology is to increase student engagement with content, increase and improve TA contact time with students, and enhance learning (Rotellar et al., 2016).

For the first half of the semester (before midterm), tutorials will aim to provide a solid foundation for machine learning. Note that calculus and the basics of probability and statistics are not covered here and assumed as prerequisite, as they are more or less required for almost any undergrad program, and it's not feasible to include them in this already very packed schedule. Since a foundation in probability and statistics is so essential for machine learning, here is a list of lectures that cover the fundamentals, but we won't cover them in this course and it might be a bit challenging to go through all of them in a short period of time.

For the second half of the semester, tutorials will try to cover the breadth of machine learning: current topics and frontiers in machine learning.

Tutorials will be held at Tuesdays 10pm and Fridays 10am EST so that it'll work for every time zone. You only need to attend one of the two each week. Tutorials will be held on Bb Collaborate on Quercus.

This page will be updated as the course goes on. Throughout the course, meaningful Q/A's will also be reflected here, we appreciate your feedback and gradient to help us continuously improve the course.

**NumPy Review.** This
tutorial gives a conceptual and practical introduction to Numpy. The code can be found here.

**Dataset split.** Kilian Weinberger's Cornell class CS4780 Lecture 3 starts to talk about proper dataset split
at 2:00
until 24:00.

**No free lunch.** The same lecture touches on no free lunch theorem and algorithm choice starting at 27:10
until 33:30.

**K-Nearest-Neigbors.** The same lecture starts to talk about KNN at 36:00
until the end. The lecture
note also has an easier to follow convergence proof for 1-NN in the middle of the page and it also has a nice
demo of curse of dimensionality after that.

**K-Nearest-Neigbors with Numpy.** Prerecorded
video going through the implementation of
K-Nearest-Neigbors using Numpy. Code.

A very nice Probability Cheatsheet share by a student.

This week we'll go through the most essential part of information theory and watch the classic lectures by David MacKay, and then learn more concepts in information theory including conditional entropy, cross entroy and KL divergence.

**Information content and entropy.** Video.
Snapshots We encourage you to watch the
whole video.
If you don't have enough time, you could consider watching it at 1.5x or 2x speed.

If you have trouble following this lecture, you can also watch their
first
lecture.

**Further understanding of entropy and source coding theorem.** Video.
Snapshots
We encourage you to watch the whole video.
If you don't have enough time, you could consider watching it at 1.5x or 2x speed.

**More info theory.** The concept of entropy is at the heart of information theory and also many machine
learning methods, so it is important to have a thorough understanding of it. We will see two more different ways of
explaining entropy. This
tutorial first explain the concept of entropy from another way, and then nicely build on that to explain the
concepts of **cross entropy** and **KL divergence**. This
tutorial first explain the concept of entropy from yet another way, and then talk about **information
gain**, and
in the specific formulation in this tutorial it can also be interpreted as mutual information.

**Eigen decomposition.** Video.
Transcript
(The middle tab). We encourage you to watch the whole video.

**Positive Definite and Semidefinite Matrices.** Video.
Transcript
(The middle tab). He briefly discussed convexity and gradient descent here, you can just take it as a prelude as
we'll
be getting into more details about those topics later.
We encourage you to watch the whole video.

**SVD.** Video.
Transcript
(The middle tab). I particularly like how he talked about the geometry of SVD starting from 28:50,
even though there's a minor mistake
there too. The mistake is the second step, multiplying the diagonal matrix of singular values. It should stretch
along
the standard basis, not along
the
rotated
basis. In other words, the sigma should be applied on the x-y directions, not the rotated directions, you can refer
to a
correct picture on wikipedia **here**.

I particularly like how he showed you can infer the degrees of freedom for rotation in higher dimensional space that cannot be visualized. How cool is that! For example, how many degrees of freedom you can rotate a starship in a 4 dimentional space? You can find the answers from this video. Another motivation to watch the whole video is that he had a nice joke at the end. :) Hopefully after those videos are you will be familiar enough with matrices for the rest of this course. Here is the great Matrix Cookbook which has a huge list of mathematical facts around linear algebra, it's a great reference when you are searching for a particular formula or idendity. A more compact referecen from CSC311 can be found here.

I have put what I think is the minimum amount of calculus that you need to know into this list. If you don't already have a background in multivariable calculus, or if you learned about it so long ago that you almost forgot everything, you are encouraged to go through the videos in that list. This video is particularly useful as it has nice visualizations of contour maps, which are very common in machine learning and in this course. If you still have more time, here is the full unit on derivatives of multivariable functions on the Khan Academy. You could consider watching all the videos at 1.5x speed.

Here is a list of short videos by Andrew Ng on gradient descent.

**Gradient Descent intro, for logistic regression.** Video.

**Gradient Descent on multiple samples** Video.

**Vectorization.** Video.

**Gradient Descent for NN** Video.

**Backprop intuition** Video.

**Forward and backward propagation** Video.

**Gradient checking.** Video.

**Mini batch gradient descent** Video.

**Understanding mini batch gradient descent** Video.

**[Optional] More on optimization** The no free lunch theorem states that any two optimization algorithms
are equivalent when their performance is
averaged across all possible problems. There are a lot of optimizers out there, and each could be the best under
different situations. Here we introduce some of the most commonly used ones for your reference. Before that, you
should have a intuitive understanding of exponentially weighted averages as it'll be used a lot in those methods:
videos
1, 2, 3. After
that, gradient
descent with momentum, RMSProp,
Adam. In
this Adagrad
video, there's also some nice visualizations of the behaviors of different optimizers on different loss landscapes.
You
can find the complete set of such visualizations here.

We'll go through this simple implementation of gradient descent during the tutorial: Code.

**Decision tree.** Please watch this
lecture by Nando de Freitas to breifly review decision tree. Slides.
If you don't have enough time, you could consider watching it at at 1.25x or 1.5x speed.

**Random Forests.** Please watch this
lecture by Nando de Freitas. Slides.
If you don't have enough time, you could consider watching it at at 1.25x or 1.5x speed.

**Maximum Likelihood.** We will go through some derivations on the maximum likelihood estimators during the
tutorials.

**More PCA (Optional).** Here
is a lecture that
goes through the PCA derivation in great detail if you are interested

[Part1 slides][Part1 slides with solution] [Part2 slides] [Part2 slides with solution (Corrected)]

**Pytorch.** This
tutorial gives an overview of pytorch and the basics on how to train neural net. **Code.**

**Transfer Learning with pytorch (optional).** The last part of the the same video briefly talks about transfer
learning, it's optional.

**Autograd (optional).** This
tutorial explains how autograd works in pytorch so that you can have a general idea of how the gradients are
handled by pytorch.

This week we will cover two major types of deep generative models: VAEs and GANs and watch some lecture excerpts from by Pieter Abbeel (et al.) 's latest deep unsupervised learning course.

**VAEs. **Please watch this lecture from here
to 1:46:00, where the foundational ideas of VAEs are covered. VAE might seem like a simple model but there are
a lot of conceptually elegant ideas behind it. Notice how the the likelihood ratio gradient
connects to the REINFORCE algorithm covered in this week's lecture, especially why it has huge variance.
Everything (Variations and related ideas) after
1:46:00 is optional. Slides. Code used in the
lecture.

**GANs.** Please watch this lecture from here
to 0:57:00, where the foundational ideas of GANs are covered. Everything (More GANs) after
it is optional. However if you ever use GAN in your project, you are encouraged to learn about the Gradient
Penalty(GP) covered around 1:55:00. It is
helpful not only for WGAN but also for almost any GAN setting. (Shown in papers 1(image),
2(waveform) and 3(text)),
and it's very simple to implement, so you are encouraged to consider adding
GP when you use GAN for your project. Slides. Code used in the
lecture.

**ML Project Workflow.** By popular demand, we will talk about typical workflow for a ML project. This is
usually something that's never taught, and one was just expected to know how to work on ML projects. As a result
there are many types of workflows out there. The one introduced in this tutorial is Sheldon's typical workflow. It
was based on and inspired by many of (past) collaborators and mentors, and special thanks to for.ai and UTMIST.
The codebase. GitKraken. Note that if you are a student you have free access to github pro, which gives you full
access to GitKraken pro. VS Code. You are not required to follow this workflow for your final project.

**Self-Supervised Learning.**

**Final project.** Slides

Website template. Any feedback is very appreciated, please reach out to: Sheldon, email: huang at cs dot toronto dot edu