Winter 2008 Talk Descriptions

Abstract: A novel model-based approach to 3D hand tracking from monocular video is presented. The 3D hand pose, the hand texture and the illuminant are dynamically estimated through minimization of an objective function. Derived from an inverse problem formulation, the objective function enables explicit use of texture temporal continuity and shading information, while handling important self-occlusions and time-varying illumination. The minimization is done efficiently using a quasi-Newton method, for which we propose a rigorous derivation of the objective function gradient. Particular attention is given to terms related to the change of visibility near self-occlusion boundaries that are neglected in existing formulations. In doing so we introduce new occlusion forces and show that using all gradient terms greatly improves the performance of the method.

Joint work with David Fleet and Nikos Paragios

Abstract: Human motion tracking is an important problem in computer vision. Most prior approaches have concentrated on efficient inference algorithms and prior motion models; however, few can explicitly account for physical plausibility of recovered motion. The primary purpose of this work is to enforce physical plausibility in the tracking of a single articulated human subject. Towards this end, we propose a full body 3D physical simulation-based prior that explicitly incorporates motion control and dynamics into the Bayesian filtering framework. We consider the human's motion to be generated by a "control loop". In this control loop, Newtonian physics approximates the rigid-body motion dynamics of the human and the environment through the application and integration of forces. Collisions generate interaction forces to prevent physically impossible hypotheses. This allows us to properly model human motion dynamics, ground contact and environment interactions. For efficient inference in the resulting high-dimensional state space, we introduce exemplar-based control strategy to reduce the effective search space. As a result we are able to recover the physically-plausible kinematic and dynamic state of the body from monocular and multi-view imagery. We show, both quantitatively and qualitatively, that our approach performs favorably with respect to standard Bayesian filtering methods.

This work was done in collaboration with:
Marek Vondrak, Brown University
Odest Chadwicke Jenkins, Brown University

Abstract: We propose an algorithm for learning the semantics of a (motion) verb from videos depicting the action expressed by the verb, paired with sentences describing the action participants and their roles. Acknowledging that commonalities among example videos may not exist at the level of the input features, our approximation algorithm efficiently searches the space of more abstract features for a common solution. We test our algorithm by using it to learn the semantics of a sample set of verbs; results demonstrate the usefulness of the proposed framework, while identifying directions for further improvement.

(Joint work with Afsaneh Fazly, Sven Dickinson, and Suzanne Stevenson)

Abstract: In order to determine the transformation that best aligns two volumetric images, traditional registration methods automatically search for salient features, such as points, lines and surfaces, evenly throughout the image. However, the performance of many of these algorithms, measured over a particular object of interest, degrades when changes on the appearance of such an object occur across the image pair, such as varying color distributions or shape deformation; or when the relative position of the object changes WRT the background. *Intuitively, this can be explained because the (varying) features from the object of interest are outweighed by the (more consistent) background ones.* This talk will describe a registration method that focuses on an object of interest by building a model of it, which is then used to identify correspondent features. An initial registration guess is also provided, so that an iterative optimization technique can be used to refine the segmentation result.

Abstract: Image registration is the task of aligning two or more images so that their content corresponds on a pixel-to-pixel basis. In this two-part talk, I will discuss two of my recent projects in automatic image registration.

Part 1: Most image registration methods assume that the images being registered are nearly aligned. When that's not the case, things fall apart. I have developed a method that efficiently combats this problem by exhaustively considering all possible shifts of an image. It's made possible by some FFT trickery.

Part 2: When registering medical images of different types (eg. MRI and a CAT scan), the images cannot be registered using pixel intensities directly. For example, bones are bright in a CAT scan, but dark in an MRI. Instead, the problem is often formulated in information-theoretic terms, yielding the current state-of-the-art method of Mutual Information. However, I will demonstrate that there are advantages to posing the problem in terms of clustering.

Bio: Jeff Orchard received his B.Math. degree in applied mathematics from the University of Waterloo, Canada, in 1994, and his M.Sc. degree in applied mathematics from the University of British Columbia, Canada, in 1996. He received his Ph.D. degree in computing science from Simon Fraser University, Canada, in 2003.

Since 2003, Prof. Orchard has been an Assistant Professor in the David R. Cheriton School of Computer Science at the University of Waterloo, Canada. His research interests revolve around applying mathematics and computation to visual data. He has worked on projects in image registration, motion compensation for medical imaging, functional MRI, medical image reconstruction, and image mosaicking. At the University of Waterloo, he is affiliated with the Scientific Computing Research Group, the Waterloo Institute for Health Informatics Research, and the Centre for Computational Mathematics in Industry and Commerce. In 2005, Prof. Orchard organized a workshop called the "Grand Mathematical Challenges in Medical Image Processing".

Abstract: We shall present efficient algorithms for 3D spatial pattern discovery. We shall focus on algorithms for the partial alignment of flexible structures (articulated objects) as well as simultaneous alignment of multiple rigid and flexible structures (3D pattern discovery). We shall demonstrate the algorithm performance for the tasks of multiple alignment of protein structures in the rigid case as well as the detection of pharmacophores of a set of drugs in the flexible case.

Bio: Haim J. Wolfson earned his PhD in Mathematics at Tel Aviv University in 1985. He was a Senior Research Scientist at the NYU Robotics Lab (1985-1989) specializing in Object Recognition in Computer Vision. In 1989 he joined the Computer Science School of Tel Aviv University and has been a full professor there since 2000. Since May 2004 he is the incumbent of the George and Maritza Pionkowski chair in Computer Aided Drug Design. Haim served one term (2002-2004) as Head of the Computer Science School and is currently the Dean of the Raymond and Beverly Sackler Faculty of Exact Sciences (since Sep 1, 2006). He is a co-developer of the “Geometric Hashing” paradigm for Model based Object Recognition in Computer Vision, which is one of the leading geometric pattern discovery paradigms up-to-date. In the early 1990’s he pioneered the introduction of Geometric Hashing and other Computer Vision based methodologies into Structural Bioinformatics. He is a co-founder the TAU Bioinformatics study program. HJW has more than 150 publications in scientific journals, books and refereed conference proceedings. He was the chairman of the scientific program committee of the 5’th European Conference on Computational Biology which was held in January 21-24, 2007 in Eilat, Israel.

Abstract: Classical methods for measuring image motion by computer have concentrated on the cases of optical flow in which the motion field is continuous, or layered motion in which the motion field is made up a small number of depth planes. Here we introduce a third natural category which we call optical snow. Optical snow arises in many natural situations such as camera motion in a highly cluttered 3-D scene, or a passive observer watching a snowfall. Optical snow yields dense motion parallax with depth discontinuities occurring near all image points. As such, constraints on smoothness or even smoothness in layers do not apply.

We present a method for measuring optical snow based on a Fourier analyis of motion. Next we show how local estimates of motion parallax are sufficient to estimate camera motion (egomotion) directly, without first computing optical flow. We demonstrate the effectiveness of the method for both synthetic and real image sequences.

This is joint work with Michael Langer (McGill University).

Abstract: In my presentation, I will discuss how a number of Computer Vision challenges can be cast as problems of energy minimization. In particular, I will present optimization methods which allow to segment moving objects in image sequences, to detect obstacles in traffic videos, to reconstruct 3D shapes from a collection of 2D images and to track familiar shapes (for example walking people or 3D heart models) in videos. I will detail how respective cost functionals can be minimized both by continuous (PDE and level set methods) and by discrete (graph theoretic) methods.

Abstract: This work describes a technique for dynamically reconstructing the shape of an object from motion under orthography. This technique represents the shape as a 3D triangular mesh, and the mesh grows when previously occluded parts of the object become visible. The shape of the existing structure is also continually updated as the mesh is tracked over time. The shape is reconstructed without using factorization techniques, and the reconstructed shape is in turn used to estimate the rigid motion parameters for a given frame. As a result, the shape estimation and pose estimation steps go hand-in-hand to formulate a unifying framework for tracking and structure from motion. Current results have shown potential with this technique, and more powerful representations of the structure, such as appearance, shading and deformation information, can be added onto this framework in future works.

Abstract: The objective of this work is classifying texture from a single image under unknown lighting conditions. The current and successful approach to this task is to treat it as a statistical learning problem and learn a classifier from a set of training images, but this requires a sufficient number and variety of training images. We show that the number of training images required can be drastically reduced (to as few as three or four) by synthesizing additional training data using photometric stereo. We demonstrate the method on two standard texture databases, PhoTex and ALOT. Despite the limitations of photometric stereo, the resulting classification performance surpasses the state of the art results.

This is a joint work with Andrew Zisserman and Jan-Mark Geusebroek.

Send questions or comments about this page to
Page last modified on Tuesday, May 20, 2008