Visual Recognition, Winter 2012

Visual Recognition

Winter 2012

Overview

Developing autonomous systems that are able to assist us in everydays tasks is one of the grand challenges in modern computer science. While a variety of novel sensors have been developed in the past few years, in this class we will focus on the extraction of this knowledge from visual information alone. One of the most remarkable examples of successful recognition systems is our visual system, which is able to extract high-level information from very noisy and ambiguous data. Unfortunately, despite decades of research efforts, machines are still way below human performance. In this class we will study why this is the case. The goal of this graduate class is to understand the different visual recognition tasks as well as the techniques employed to solve them. A strong component of the course will be statistical learning as it plays a key role in almost every modern visual recognition system. We will cover all stages of the recognition pipeline: low-level (e.g., features), mid-level (e.g., segmentation) as well as high-level reasoning (e.g., scene understanding). Knowledge of machine learning and computer vision is not required, but highly recommended. The theoretical aspects of visual recognition will be covered during the lectures. The class will have a strong practical component, as the students will build the different recognition components during the homework sessions.

Summary

Summary of the class

General information

Lecture: Tuesday and Thursday 10:30 - 11:50
Room: TTIC 530 (6045 S. Kenwood, 5th floor)

Instructor: Raquel Urtasun
E -mail: rurtasun@ttic.edu

Grading: exam (35%) + project (65%)

Syllabus

  1. Classification: features, bag of words (BOW), similarity between images, learning features as well as hashing schemes and retrieval.
  2. Detection: sliding window approaches, branch and bound, structure prediction, hough voting and NN approaches, hierarchical models.
  3. Segmentation: classical approaches as well as modern structure pre- diction approaches including message passing and graph cuts for inference, and CRFs and structured-SVMs for learning.
  4. Pose estimation: pictorial structures (2D) as well as 3D pose estimation including particle filter-based approaches.
  5. Modern 3D geometry and 3D scene understanding: stereo, scene layout (e.g., 3D box for indoor scenes, road layout for outdoor scenes).

Schedule

Date Topic Slides Reading
Jan 3 Introduction intro

Chapter 1 of R. Szeliski book

Jan 5 Image Formation formation

Chapter 2 of R. Szeliski book;

Jan 10 Image Filtering filtering

Chapter 2 and 3 of R. Szeliski book

Jan 12 Midwest Vision Workshop  

 

Jan 17 Transformations + features transformations

 

Jan 19 Interest points + descriptors features

 

Jan 24 Instance + Category level recognition instance

 

Jan 26 Sliding-Window approaches sliding window

 

Jan 31 Deformable part-based models latent svm

 

Feb 2 Poselets poselet

 

Feb 7 More on part-based models part-based models

 

Feb 9 NO CLASS  

 

Feb 14 Combining Features combinations

 

Feb 16 Learning Representations I learning representations

 

Feb 21 Learning Representations II sparse coding + topic models

 

Feb 23 Graphical models: inference learning

 

Feb 28 Graphical models: inference + learning inference

 

March 1 Segmentation segmentation

 

March 6 Attributes + descriptions + Context attributes

 

March 8 Scene understanding scene