![]() |
||
Winter 2008 Talk DescriptionsAbstract: A novel model-based approach to 3D hand tracking from monocular video is presented. The 3D hand pose, the hand texture and the illuminant
are dynamically estimated through minimization of an objective function. Derived from an inverse problem formulation, the objective function
enables explicit use of texture temporal continuity and shading information, while handling important self-occlusions and time-varying illumination.
The minimization is done efficiently using a quasi-Newton method, for which we propose a rigorous derivation of the objective function gradient.
Particular attention is given to terms related to the change of visibility near self-occlusion boundaries that are neglected in existing formulations. In
doing so we introduce new occlusion forces and show that using all gradient terms greatly improves the performance of the method.
Abstract: Human motion tracking is an important problem in computer vision. Most prior approaches have
concentrated on efficient inference algorithms and prior motion models; however, few can explicitly account for physical plausibility of recovered
motion. The primary purpose of this work is to enforce physical plausibility in the tracking of a single articulated human subject. Towards this end,
we propose a full body 3D physical simulation-based prior that explicitly incorporates motion control and dynamics into the Bayesian filtering
framework. We consider the human's motion to be generated by a "control loop". In this control loop, Newtonian physics approximates the rigid-body
motion dynamics of the human and the environment through the application and integration of forces. Collisions generate interaction forces to prevent
physically impossible hypotheses. This allows us to properly model human motion dynamics, ground contact and environment interactions. For efficient
inference in the resulting high-dimensional state space, we introduce exemplar-based control strategy to reduce the effective search space. As a
result we are able to recover the physically-plausible kinematic and dynamic state of the body from monocular and multi-view imagery. We show, both
quantitatively and qualitatively, that our approach performs favorably with respect to standard Bayesian filtering methods.
Abstract: We propose an algorithm for learning the semantics of a (motion) verb from videos depicting the action expressed by the verb, paired with sentences describing
the action participants and their roles. Acknowledging that commonalities among example videos may not exist at the level of the input features, our approximation algorithm efficiently searches the space of more abstract features
for a common solution. We test our algorithm by using it to learn the semantics of a sample set of verbs; results demonstrate the usefulness of the proposed framework, while identifying directions for further improvement.
Abstract: In order to determine the transformation that best aligns two volumetric images, traditional registration methods automatically search for salient features, such as points, lines and surfaces, evenly throughout the image. However, the performance of many of these algorithms, measured over a particular object of interest, degrades when changes on the appearance of such an object occur across the image pair, such as varying color distributions or shape deformation; or when the relative position of the object changes WRT the background. *Intuitively, this can be explained because the (varying) features from the object of interest are outweighed by the (more consistent) background ones.* This talk will describe a registration method that focuses on an object of interest by building a model of it, which is then used to identify correspondent features. An initial registration guess is also provided, so that an iterative optimization technique can be used to refine the segmentation result. Abstract: Image registration is the task of aligning two or more images so that their content corresponds on a pixel-to-pixel basis. In this two-part
talk, I will discuss two of my recent projects in automatic image registration.
Algorithms for 3D pattern discovery and their applications in Structural Molecular Biology and Computer Aided Drug DesignHaim J. WolfsonAbstract: We shall present efficient algorithms for 3D spatial pattern discovery. We shall focus on algorithms for the partial alignment of flexible
structures (articulated objects) as well as simultaneous alignment of multiple rigid and flexible structures (3D pattern discovery). We shall demonstrate the algorithm performance for the tasks of multiple
alignment of protein structures in the rigid case as well as the detection of pharmacophores of a set of drugs in the flexible case.
Abstract: Classical methods for measuring image motion by computer have concentrated on
the cases of optical flow in which the motion field is continuous, or layered motion in which the motion field is made up a small number of depth planes.
Here we introduce a third natural category which we call optical snow. Optical snow arises in many natural situations such as camera motion in a
highly cluttered 3-D scene, or a passive observer watching a snowfall. Optical snow yields dense motion parallax with depth discontinuities occurring
near all image points. As such, constraints on smoothness or even smoothness in layers do not apply.
Abstract: In my presentation, I will discuss how a number of Computer Vision challenges can be cast as problems of energy minimization. In particular, I will present optimization methods which allow to segment moving objects in image sequences, to detect obstacles in traffic videos, to reconstruct 3D shapes from a collection of 2D images and to track familiar shapes (for example walking people or 3D heart models) in videos. I will detail how respective cost functionals can be minimized both by continuous (PDE and level set methods) and by discrete (graph theoretic) methods. Abstract: This work describes a technique for dynamically reconstructing the shape of an object from motion under orthography. This technique represents the shape as a 3D triangular mesh, and the mesh grows when previously occluded parts of the object become visible. The shape of the existing structure is also continually updated as the mesh is tracked over time. The shape is reconstructed without using factorization techniques, and the reconstructed shape is in turn used to estimate the rigid motion parameters for a given frame. As a result, the shape estimation and pose estimation steps go hand-in-hand to formulate a unifying framework for tracking and structure from motion. Current results have shown potential with this technique, and more powerful representations of the structure, such as appearance, shading and deformation information, can be added onto this framework in future works. Abstract:
The objective of this work is classifying texture from a single image under unknown lighting conditions. The current and successful approach to this
task is to treat it as a statistical learning problem and learn a classifier from a set of training images, but this requires a sufficient
number and variety of training images. We show that the number of training images required can be drastically reduced (to as few as three or four) by
synthesizing additional training data using photometric stereo. We demonstrate the method on two standard texture databases, PhoTex and ALOT.
Despite the limitations of photometric stereo, the resulting classification performance surpasses the state of the art results.
Send questions or comments about this page to |
||