[Images synthesized from a GAN]
In the last few years, new inference methods have allowed big advances in probabilistic generative models. These models let us generate novel images and text, find meaningful latent representations of data, take advantage of large unlabeled datasets, and even let us do analogical reasoning automatically. This course will tour recent innovations in inference methods such as recognition networks, black-box stochastic variational inference, and adversarial autoencoders. It will also cover recent advances in generative model design, such as deconvolutional image models, thought vectors, and recurrent variational autoencoders. The class will have a major project component.
This course is designed to bring students to the current frontier of knowledge on these methods, so that ideally, their course projects can make a novel contribution. A previous background in machine learning such as CSC411 or ECE521 is strongly recommended. Linear algebra, basic multivariate calculus, basics of working with probability, and programming skills are required.
Generative modeling loosely refers to building a model of data, for instance p(image), that we can sample from. This is in contrast to discriminative modeling, such as regression or classification, which tries to estimate conditional distributions such as p(class | image).
Even when we're only interested in making predictions, there are practical reasons to build generative models:
We already know how to specify some expressive and flexible generative models, including entire languages of models that can express arbitarily complicated structure. However, until recently such models were hard to apply to real datasets, because inference methods (such as Markov chain Monte Carlo methods) were not usually fast or scalable enough to run on large models or even medium-sized datasets.
The past few years have seen major progress in methods to train and do inference in generative models, loosely following four strands:
The common thread among these approaches that lets them scale to high-dimensional models is that their loss functions are end-to-end differentiable. This is in contrast to previous inference strategies such as MCMC or early variational inference strategies, which required alternating inference and optimization steps and didn't allow gradient-based tuning of the inference procedure.
These new inference schemes are allowing great progress in generative models of images and text.
After the first two lectures, each week a different student, or pair of students, will present on an aspect of these methods, using a couple of papers as reference. I'll provide guidance about the content of these presentations.
In-class discussion will center around:
The hope is that these discussions will lead to actual research papers, or resources that will help others understand these approaches.
Grades will be based on:
Students can work on projects individually,in pairs, or even in triplets. The grade will depend on the ideas, how well you present them in the report, how clearly you position your work relative to existing literature, how illuminating your experiments are, and well-supported your conclusions are.
Each group of students will write a short (around 2 pages) research project proposal, which ideally will be structured similarly to a standard paper. It should include a description of a minimum viable project, some nice-to-haves if time allows, and a short review of related work. You don't have to do what your project proposal says - the point of the proposal is mainly to have a plan and to make it easy for me to give you feedback.
Towards the end of the course everyone will present their project in a short, roughly 5 minute, presentation.
At the end of the class you'll hand in a project report (around 4 to 8 pages), ideally in the format of a machine learning conference paper such as NIPS.
Project report grading rubric
This lecture will outline the motivation for the course and give a rough picture of the state of the field.
Sept 23rd: Variational inference and recognition networks Lecture notes
This lecture will outline the main technical advance that has allowed latent-variable modeling to become practical: Variational autoencoders, in which the approximate inference procedure is specified by a neural network (or other differentiable procedure).
The difference between traditional variational methods and variational autoencoders is that in a variational autoencoder, the local approximate posterior, q(zi|xi) is produced by a closed-form differentiable procedure (such as a neural network), as opposed to a local optimization. This allows the model and inference strategy to be joinly optimized.
Sept 30th: Autoregressive and invertible models Lecture notes
It's possible to directly specify fairly complex models without integrating over any latent variables, if the entire generative procedure is invertible, or if it directly specifies a normalized probability.
Recurrent network-based generative models:
Invertible generative models:
Adversarial training proposes a completely different training procedure for generative models, which relies on a 'discrimintator' to find ways in which data generated by the model is unrealistic.
Frontiers and related methods:
October 14th: Structured encoder/decoders Slides
We have complete freedom in how we compute q(x | z). There is also currently a lot of exploration going on of different types of generative models, p(x, z).
At first, variational autoencoders had only vector-valued latent variables z, in which the different dimensions had no special meaning. People are starting to explore ways to put more meaningful structure on the latent description of data.
October 28th: Conditional generation
Generative models can be used to produce novel content such as images and text. This ability is especially useful when we can generate data conditioned on it having certain desired properties (for instance: generate an image that would be likely to have the caption "a horse on a beach").
November 4th: Model-based reinforcement learning
The main drawback of contemporary deep reinforcement learning methods is that they require a lot of interaction with the system to become effective. Using unsupervised learning, we can break the problem into two parts: 1) Modeling the dynamics of the system, and 2) Finding a good policy or plan given those dynamics.
November 11th: Latent-variable language models
We'll discuss other ways to produce continuous representations of discrete objects, such as text.
November 18th: Project presentations
November 25th: Project presentations
December 2nd: Guest lecture by Roger Grosse
December 10th: Projects due