Visual Recognition, Winter 2012

Visual Recognition

Winter 2012

Overview

Developing autonomous systems that are able to assist us in everydays tasks is one of the grand challenges in modern computer science. While a variety of novel sensors have been developed in the past few years, in this class we will focus on the extraction of this knowledge from visual information alone. One of the most remarkable examples of successful recognition systems is our visual system, which is able to extract high-level information from very noisy and ambiguous data. Unfortunately, despite decades of research efforts, machines are still way below human performance. In this class we will study why this is the case. The goal of this graduate class is to understand the different visual recognition tasks as well as the techniques employed to solve them. A strong component of the course will be statistical learning as it plays a key role in almost every modern visual recognition system. We will cover all stages of the recognition pipeline: low-level (e.g., features), mid-level (e.g., segmentation) as well as high-level reasoning (e.g., scene understanding). Knowledge of machine learning and computer vision is not required, but highly recommended. The theoretical aspects of visual recognition will be covered during the lectures. The class will have a strong practical component, as the students will build the different recognition components during the homework sessions.

Summary

Summary of the class

General information

Lecture: Tuesday and Thursday 10:30 - 11:50
Room: TTIC 530 (6045 S. Kenwood, 5th floor)

Instructor: Raquel Urtasun
E -mail: rurtasun@ttic.edu

Grading: exam (35%) + project (65%)

Syllabus

Classification: features, bag of words (BOW), similarity between images, learning features as well as hashing schemes and retrieval.
Detection: sliding window approaches, branch and bound, structure prediction, hough voting and NN approaches, hierarchical models.
Segmentation: classical approaches as well as modern structure pre- diction approaches including message passing and graph cuts for inference, and CRFs and structured-SVMs for learning.
Pose estimation: pictorial structures (2D) as well as 3D pose estimation including particle filter-based approaches.
Modern 3D geometry and 3D scene understanding: stereo, scene layout (e.g., 3D box for indoor scenes, road layout for outdoor scenes).

Schedule

Date	Topic	Slides	Reading
Jan 3	Introduction	intro	Chapter 1 of R. Szeliski book
Jan 5	Image Formation	formation	Chapter 2 of R. Szeliski book;
Jan 10	Image Filtering	filtering	Chapter 2 and 3 of R. Szeliski book
Jan 12	Midwest Vision Workshop
Jan 17	Transformations + features	transformations
Jan 19	Interest points + descriptors	features
Jan 24	Instance + Category level recognition	instance
Jan 26	Sliding-Window approaches	sliding window
Jan 31	Deformable part-based models	latent svm
Feb 2	Poselets	poselet
Feb 7	More on part-based models	part-based models
Feb 9	NO CLASS
Feb 14	Combining Features	combinations
Feb 16	Learning Representations I	learning representations
Feb 21	Learning Representations II	sparse coding + topic models
Feb 23	Graphical models: inference	learning
Feb 28	Graphical models: inference + learning	inference
March 1	Segmentation	segmentation
March 6	Attributes + descriptions + Context	attributes
March 8	Scene understanding	scene