Fall 2006 Talk Descriptions

A Dynamic Prior for Human Pose Tracking

Marcus Brubaker

Abstract: Good motion models for human pose tracking have been ellusive. Previous efforts have focused on learning priors from motion capture data. Such solutions generalize well close to the original data but tend to generalize poorly to new dynamical situations such as shorter or longer stride lengths or walking on inclines. In this talk I will present a physics based prior which uses abstract physical dynamics as its basis. These abstract models express many salient features of human locomotion but remain simple and manageable.

Optimization of fMRI Processing Pipelines

Stephen Strother

Abstract: I will introduce the components of an fMRI processing pipeline ( i.e., experimental design, data acquisition, preprocessing and data analysis modeling), as an example of a scientific workflow, and briefly describe the basic steps of fMRI imaging. Using several fMRI studies I will demonstrate that components of this pipeline strongly interact (particularly with different data-analysis model choices) and that fMRI researchers are far from optimizing, or perhaps even understanding what it means to optimise such pipelines. I will then describe our experience with resampling techniques measuring prediction and reproducibility metrics within a framework (dubbed NPAIRS) that utilises split-half resampling to obtain prediction and global-pattern reproducibility metrics. In addition to providing prediction and reproducibility metrics this framework allows us to convert arbitrary spatial distributions of modeled image parameters to standard Gaussian parametric maps, and identify heterogeneous data observations. Using NPAIRS I will illustrate non-standard uses of prediction (e.g., minimisation) as a function of global pattern reproducibility to measure (1) the dependence of activation patterns on processing pipeline choices in humans (3.125 mm^3) at 1.5 Tesla for a task involving application of manual, parametric static force, and (2) the spatial scale of visual orientation columns in a cat with ultra-high resolution fMRI (.15 x .15 x 1 mm^3) at 9 Tesla. I will end by briefly discussing the possible clinical benefits of optimising, fMRI processing pipelines.

The role of Manifold learning in Human Motion Analysis

Ahmed Elgammal

Abstract: Human body is an articulated object with high degrees of freedom. Despite the high dimensionality of the configuration space, many human motion activities lie intrinsically on low dimensional manifolds. Although the intrinsic body configuration manifolds might be very low in dimensionality, the resulting appearance manifolds are challenging to model given various aspects that affects the appearance such as the shape and appearance of the person performing the motion, or variation in the view point, or illumination. Our objective is to learn representations for the shape and the appearance of moving (dynamic) objects that support tasks such as synthesis, pose recovery, reconstruction, and tracking. We studied various approaches for representing global deformation manifolds that preserve their geometric structure. Given such representations, we can learn generative models for dynamic shape and appearance. We also address the fundamental question of separating style and content on nonlinear manifolds representing dynamic objects.We learn factorized generative models that explicitly decompose the intrinsic body configuration (content) as a function of time from the appearance/shape (style factors) of the person performing the action as time-invariant parameters. We show results on pose recovery, body tracking, gait recognition, as well as facial expression tracking and recognition.

Bio: Dr. Ahmed Elgammal is an assistant professor at the Department of Computer Science, Rutgers, the State University of New Jersey Since Fall 2002. Dr. Elgammal is also a member of the Center for Computational Biomedicine Imaging and Modeling (CBIM) and the Center for Advanced Information Processing (CAIP) at Rutgers. His primary research interest is computer vision and machine learning. His research focus includes human activity recognition, human motion analysis, tracking, human identification, and statistical methods for computer vision. He develops robust real-time algorithms to solve computer vision problems in areas such as visual surveillance, visual human-computer interaction, virtual reality, and multimedia applications. Dr. Elgammal interest includes also research on document image analysis. Dr. Elgammal received the National Science Foundation early CAREER Award in 2006.

Dr. Elgammal received his B.Sc. and M.Sc. degrees in computer science and automatic control from University of Alexandria, Egypt in 1993 and 1996, respectively. He received another M.Sc. and his Ph.D. degree in computer science from the University of Maryland, College Park, in 2000 and 2002 respectively.

Is face recognition 'special'? An examination of psychological and neural mechanisms supporting face recognition

Marlene Behrmann

Abstract: Face recognition is often considered to be a special instance of visual recognition, demanding specialized, perhaps even dedicated psychological and neural mechanisms. To address this issue, behavioral data will be presented from three different populations of individuals all of whom are impaired at face recognition. Thereafter, functional and structural imaging data will be presented to explore the neural correlate of face processing in normal and impaired individuals. Taken together, the findings will support the view that face recognition is not 'special' and, instead, engages general visual processes which represent other classes of objects as well. Additionally, face recognition is supported by an underlying distributed network of cortical regions rather than being mediated by a particular, specialized cortical area.

Investigating Blur in the Framework of Reverse Projection

Scott McCloskey

Abstract: I present a reverse projection model for image formation, which is particularly useful for explaining blur. The model is used to develop methods for seeing "around" occluding objects and also recovering depth from defocus. The appearance of severely defocused occluding objects is modeled, giving rise to an image processing method to recover the radiance of the background in the partially-occluded region. The model also shows that, when out of focus, nearby pixels record light emitted from overlapping regions of the scene. This gives rise to a measurable increase in the correlation between such pixels, with the increase being proportional to scene depth. This principle is used to motivate a method of recovering depth from defocus.

Computer Vision for Panoramic Viewing and Augmented Reality

Mark Fiala

Abstract: The Computational Video Group (CVG) at the National Research Council of Canada is involved in several areas of image processing and computer vision research and applications. Two such areas are the application of panoramic cameras for robotics and "pano-presence", and fiducial marker systems used with non-panoramic cameras for "augmented reality" visualization of 3D content.

Panoramic image sensors, also known as omnidirectional cameras can provide a 360 degree field of view useful both for providing imagery for online tele-operation and offline multimedia systems. Some of the work undertaken in the CVG group will be presented.

Augmented Reality (AR) is the convergence of the real world and virtual computer generated imagery, it is the fusion of real and virtual reality through overlaying virtual objects over real images or video. A virtual object can be made to look like it belongs in a real scene if it is rendered from the right viewpoint, something done routinely in movie making but still a research topic for real time systems where you can look at and walk around virtual objects using a head-mounted display, PDA, cellphone, or tablet PC. To do this, the graphics rendering system must know the pose of the camera, this pose determination can be done accurately and inexpensively using computer vision. One way is to use markers like the ARTag marker system that will be described in the talk. Designing markers to add to the environment for robust detection in camera and video imagery is a computer vision application useful to situations where a camera-object pose is desired such as AR, industrial position tracking, photo-modeling and robot navigation. Examples of augmented reality and the ARTag system developed at the NRC will be shown.

Abstract: While vision systems that actively explore their environment is an already established research field for many years, the interaction of vision systems with humans is still much less understood. In recent years, there are two different trends that push forward computer vision techniques for human-machine interaction. On the one side, hand-held interactive devices are becoming smaller and smaller or completely disappear into an ambient intelligence environment. On the other side, artificial communication partners are becoming embodied in a shared virtual or physical environment. In both cases, the interaction space is extended from the display that is controlled by the computer system to the external environment that has to be perceived through sensors. Intuitive and seamless communication needs to be established in a human-human like fashion using speech, gestures, and interpreting the actions of the user in his/her own natural environment.

In my talk, I will discuss how human-machine interaction affects the development of computer vision techniques and systems. Most of the work presented has been conducted in the European VAMPIRE project. In this project, an augmented reality device has been developed that is able to assist the user in his or her physical activities in the environment. It combines object recognition, object tracking, action and task recognition, as well as localization by using an active memory framework. As a prototypical example an assistant for cocktail mixing has been realized and tested in user studies.

Joint work with Marc Hanheide, Sebastian Wrede, Ingo L"utkebohle, Gerhard Sagerer.

Visual Recognition and Tracking for Perceptive Interfaces

Trevor Darrell

Abstract: Devices should be perceptive, and respond directly to their human user and/or environment. In this talk I'll present new computer vision algorithms for fast recognition, indexing, and tracking that make this possible, enabling multimodal interfaces which respond to users' conversational gesture and body language, robots which recognize common object categories, and mobile devices which can search using visual cues of specific objects of interest. As time permits, I'll describe recent advances in real-time human pose tracking for multimodal interfaces, including new methods which exploit fast computation of approximate likelihood with a pose-sensitive image embedding. I'll also present our linear-time approximate correspondence kernel, the Pyramid Match, and its use for image indexing and object recognition, and discovery of object categories. Throughout the talk, I'll show interface examples including grounded multimodal conversation as well as mobile image-based information retrieval applications based on these techniques.

Photometric Invariants from Color Subspaces

Todd Zickler

Abstract: Complex reflectance phenomena such as specular reflections confound many vision problems because they produce image `features' that do not correspond directly to intrinsic surface properties such as shape and spectral reflectance. One approach to mitigating these effects is to explore functions of an image that are invariant to these complex photometric events. In this talk, I describe a family of such invariants that result from exploiting color information in images of dichromatic surfaces. These invariants are derived from subspaces of RGB color space, and they enable the application of Lambertian-based vision techniques (for stereo, shape from shading, motion estimation, photometric stereo, etc.) to a broad class of specular, non-Lambertian scenes.

Towards Robots that Learn by Communicating

Gerhard Sagerer

Abstract: One common goal of an increasing number of projects is the development of autonomous personal robots that are able to acquire knowledge from the real world by interacting with humans and the environment. Starting from paradigms of situated communication, cooperative construction and optimization of behaviors we are focusing on learning by communication based on the idea of robots as companions. In this approach, a robot is viewed as a communicating and learning agent in the real world.

In my talk I will give an overview of our development of and studies with different robot systems and will stepwise address aspects of multi modal communication in a natural environment. Solutions for detecting the communication partner, creating shared attention, learning names of objects and places and their systematic integration in interacting robot systems will be shown. In order to achieve empirical foundations for learning symbolic as well as sub-symbolic representations of objects, actions, and spatial concepts, we analyse parent-infant interactions in experimental settings. The resulting robot systems are evaluated in a number of human-robot-scenarios.

Send questions or comments about this page to
Page last modified on Thursday, January 11, 2007