CSC 2547, Winter 2022:

Machine Learning for Machine Vision as Inverse Graphics

Department of Computer Science

University of Toronto



Convolutional neural networks have achieved astounding breakthroughs on a number of machine vision tasks, especially object classification.  However, unlike people, they can require vast amount of data to train, and their (sometimes comical) mistakes show that they do not truly understand what they see. This limits their abilities and leaves them short of the full promise of Artificial Intelligence.

To fully understand a scene, a computer must have a rich, 3-dimensional representation of the world.  It must be able to infer what objects are in a scene, their position, orientation, size, shape, color, texture, category, what parts they are composed of, their relationship to other objects in the scene, as well as the illumination and position and viewing angle of the camera.  In other words, a scene understanding program must be able to represent the world in much the same way as a computer graphics program does. The main difference is that computer graphics generates a 2-dimensional image from a 3-dimensional representation, while scene understanding aims to do the reverse: to infer a 3-dimensional representation of a scene from a 2-dimensional image.  Note that once a 3-dimensional representation has been inferred, it should be possible to answer many common-sense questions about an image. It should also be possible to use a graphics program to regenerate the image from the 3-dimensional representation, and moreover, to generate modified versions of the image, in which objects have been moved or rotated and illumination or camera positions have changed.

This view of scene understanding is known as inverse graphics. Inverting the graphics process to generate a 3-dimensional representation of an image is a difficult, non-deterministic problem. This course approaches the problem with machine learning. That is, we investigate techniques for learning programs that do inverse graphics, as well as related techniques for overcoming the limitations of convolutional neural networks for vision.

This is an advanced graduate course in machine learning. It is primarily a seminar course in which students will read and present papers from the literature. There will also be a major course project. The goal is to bring students to the state of the art in this exciting field. Tentative topics include discriminative and generative approaches, variational inference and autoencoders, capsule networks, group symmetries and equivariance, visual attention and transformers, point nets, inferring 3D structure and part-whole relationships, self-supervised and contrastive learning, adversarial learning.


A solid introduction to Machine Learning (such as csc411 or a graduate course in ML), especially neural nets, a solid knowledge of linear algebra, the basics of multivariate calculus and probability, and programming skills, especially programming with vectors and matrices.  Mathematical maturity will be assumed.  This is primarily a machine-learning course, and a background in computer vision or computer graphics is not required.



Teaching Assistant:

Course Structure

The course is an updated version of csc2547 that I gave in spring 2020, and is organized along the lines of csc2547: Learning to Search given by David Duvenaud, though the course content is quite different.

Paper presentations:


Marking Scheme:

Tentative Schedule


Student Presentations:

Project Presentations: