# Differentiable Inference and Generative Models

[Images synthesized from a GAN]

## Overview

In the last few years, new inference methods have allowed big advances in probabilistic generative models. These models let us generate novel images and text, find meaningful latent representations of data, take advantage of large unlabeled datasets, and even let us do analogical reasoning automatically. This course will tour recent innovations in inference methods such as recognition networks, black-box stochastic variational inference, and adversarial autoencoders. It will also cover recent advances in generative model design, such as deconvolutional image models, thought vectors, and recurrent variational autoencoders. The class will have a major project component.

### Prerequisites

This course is designed to bring students to the current frontier of knowledge on these methods, so that ideally, their course projects can make a novel contribution. A previous background in machine learning such as CSC411 or ECE521 is strongly recommended. Linear algebra, basic multivariate calculus, basics of working with probability, and programming skills are required.

### Where and When

• Fall term 2016, Fridays 2:00-4:00pm
• Room: Galbraith room 220
• Instructor: David Duvenaud
• Email: duvenaud@cs.toronto.edu (put "CSC2541" in the subject)
• Office hours: Wednesdays 1:00-3:00pm, Room 384 Pratt
• Teaching assistants: Tony Wu and Kamal Rai

## What are generative models?

Generative modeling loosely refers to building a model of data, for instance p(image), that we can sample from. This is in contrast to discriminative modeling, such as regression or classification, which tries to estimate conditional distributions such as p(class | image).

### Why generative models?

Even when we're only interested in making predictions, there are practical reasons to build generative models:

• Data efficiency and semi-supervised learning - Generative models can reduce the amount of data required. As a simple example, building an image classifier p(class | image) requires estimating a very high-dimenisonal function, possibly requiring a lot of data, or clever assumptions. In contrast, we could model the data as being generated from some low-dimensional or sparse latent variables z, as in $$p(image) = \int p(image | z) p(z) dz$$. Then, to do classification, we only need to learn p( class | z), which will usually be a much simpler function. This approach also lets us take advantage of unlabeled data - also known as semi-supervised learning.
• Model checking by sampling - Understanding complex regression and classification models is hard - it's often not clear what these models have learned from the data and what they missed. There is a simple way to sanity-check and inspect generative models - simply sample from them, and compare the sampled data to the real data to see if anything is missing.
• Understanding - Generative models usually assume that each datapoint is generated from a (usually low-dimensional) latent variable. These latent variables are often interpretable, and sometimes can tell us about the hidden causes of a phenomenon. These latent variables can also sometimes let us do interesting things such as interpolating between examples

## Differentiable inference

We already know how to specify some expressive and flexible generative models, including entire languages of models that can express arbitarily complicated structure. However, until recently such models were hard to apply to real datasets, because inference methods (such as Markov chain Monte Carlo methods) were not usually fast or scalable enough to run on large models or even medium-sized datasets.

The past few years have seen major progress in methods to train and do inference in generative models, loosely following four strands:

• Variational autoencoders - Latent-variable models that use a neural network to do approximate inference. The recognition network looks at each datapoint x and outputs an approximate posterior on the latents q(z | x) for that datapoint.
• Generative adversarial networks - A way to train generative models by optimizing them to fool a classifier, the discriminator network, that tries to distinguish between real data and data generated by the model.
• Invertible density estimation - A way to specify complex generative models by transforming a simple latent distribution with a series of invertible functions. These approaches are restricted to a more limited set of possible operations, but sidestep the difficult integrals required to train standard latent variable models.
• Autoregressive models - Another way to model p(x) is to break the model into a series of conditional distributions: $$p(x) = p(x_1) p(x_2|x_1) p(x_3 | x_2, x_1) \dots$$ This is the approach used, for example, by recurrent neural networks. These models are also realitvely easy to train, but the downside is that they don't support all of the same queries we can make of latent-variable models.

The common thread among these approaches that lets them scale to high-dimensional models is that their loss functions are end-to-end differentiable. This is in contrast to previous inference strategies such as MCMC or early variational inference strategies, which required alternating inference and optimization steps and didn't allow gradient-based tuning of the inference procedure.

These new inference schemes are allowing great progress in generative models of images and text.

## Course Structure

After the first two lectures, each week a different student, or pair of students, will present on an aspect of these methods, using a couple of papers as reference. I'll provide guidance about the content of these presentations.

In-class discussion will center around:

• Understanding the strengths and weaknesses of these methods.
• Understanding the relationships between these methods, and with previous approaches.
• Extensions or applications of these methods.
• Experiments that might better illuminate their properties.

The hope is that these discussions will lead to actual research papers, or resources that will help others understand these approaches.

• Class presentations - 20%
• Project proposal - 20% - Due Oct 14th
• Project presentation - 20% - Nov 18th and 25th
• Project report and code - 40% - Dec 10th

### Project

Students can work on projects individually,in pairs, or even in triplets. The grade will depend on the ideas, how well you present them in the report, how clearly you position your work relative to existing literature, how illuminating your experiments are, and well-supported your conclusions are.

Each group of students will write a short (around 2 pages) research project proposal, which ideally will be structured similarly to a standard paper. It should include a description of a minimum viable project, some nice-to-haves if time allows, and a short review of related work. You don't have to do what your project proposal says - the point of the proposal is mainly to have a plan and to make it easy for me to give you feedback.

Towards the end of the course everyone will present their project in a short, roughly 5 minute, presentation.

At the end of the class you'll hand in a project report (around 4 to 8 pages), ideally in the format of a machine learning conference paper such as NIPS.