Learning, Representation and Context for Human Sensing in Video

 

Workshop in conjunction with IEEE CVPR, New York, June 22nd 2006

 

Chairs: Cristian Sminchisescu and Fernando De La Torre

 

 

Suggestions for Discussion and Panel from Participants

This list is open! You can contribute!

 

 

Bill Freeman

 

- 1984 and 2001:  surveillance and civil liberties

 

David Forsyth

 

- What ambiguities exist under what circumstances for lifting from 2D to 3D?

- How does one build good dynamical models of motion, and are they useful?

- How should one represent activities?

 

Luc van Gool

 

- Holistic versus feature based approaches for structure and representation of actions

- Scale transitions (e.g. from blobs at a distance to close-up analysis)

 

Stan Li

 

- Face recognition for cooperative users - How accurate can it be?

- Face recognition for non-cooperative users - How to make it?

 

Pietro Perona

 

- What is our grand challenge?

- What data sets should we use to benchmark our systems?

- Any hope of understanding what `human behavior' and `actions' and activity' means?

- Connections between human behavior and linguistic structure?

 

Deva Ramanan

 

- 2D versus 3D

- Role of priors (how much are they needed)

- Better data terms (beyond background subtraction, etc.)

- Evaluation (common datasets, common criteria)

 

Sami Rodhami

 

- Is dynamic programming the only method to perform efficient global optimization?

- Are there other alternative to get efficient global optimization than assuming conditional independence?

 

Song Chun Zhu

 

- Creating a large scale annotated data set

- Debate on the learning: supervised, semi-supervised, vs unsupervised

- Representation for what? Do we know the specifications in video analysis?

- What will be sufficient representation?

- Evaluation?