Fall 2004 Talk Descriptions

Nonparametric Colour Indexing: An Alternative to Histograms

Michael Greenspan

Abstract: A method for colour indexing is proposed that is based upon Nonparametric statistical techniques. Nonparametrics compare the ordinal rankings of sample populations, and maintain their significance when the underlying populations are not normally distributed. Lipshutz embedding is first used to generate sets of scalars that combine all colour channel information. The Moses test of dispersion followed by the Wilcoxon test of central tendancy is then applied. The method has been implemented and compared to 8 different histogram similarity metrics under 4 different colour space mappings. The recognition accuracy of the Nonparametric method compares favourably with the histogram methods, and in some tests outperforms the best histogram methods on standard databases.

Statistical Cue Integration in Human and Machine Vision

James Elder

Abstract: I will argue that reliable visual inference in natural scenes requires good statistical models and methods for integrating multiple weak but complementary visual cues. As examples, I will discuss problems in contour processing and human tracking.

Multimodal interaction in an augmented reality scenario

Ingo Bax

Abstract: We describe an augmented reality system designed for online acquisition of visual knowledge and retrieval of memorized objects. The system relies on a head mounted camera and display, which allow the user to view the environment together with overlaid augmentations by the system. In this setup, communication by hand gestures and speech is mandatory as common input devices like mouse and keyboard are not available. Using gesture and speech, basically three types of tasks must be handled: (i) Communication with the system about the environment, in particular, directing attention towards objects and commanding the memorization of sample views; (ii) control of system operation, e.g. switching between display modes; and (iii) re-adaptation of the interface itself in case communication becomes unreliable due to changes in external factors, such as illumination conditions. We present an architecture to manage these tasks and describe and evaluate several of its key elements, including modules for pointing gesture recognition, menu control based on gesture and speech, and control strategies to cope with situations when vision becomes unreliable and has to be re-adapted by speech.

Making Latin Manuscripts Searchable
Using Generalized Hidden Markov Models

Yee Whye Teh

Abstract: We describe a method that can make a scanned, handwritten mediaeval latin manuscript accessible to full text search. A generalized hidden Markov model is fitted, using transcribed latin to obtain a transition model and one example each of 22 letters to obtain an emission model. We show results for unigram, bigram and trigram models. Our method transcribes 25 pages of Terence with fair accuracy (75% of letters correctly transcribed). Search results are very strong: we use examplars of variant spellings to demonstrate that the search respects the ink of the document. Furthermore, our model produces fair searches on a document from which we obtained no training data.

Constrained Deterministic 3D body tracking
and Style-based Motion Synthesis

Raquel Urtasun

Abstract: There has been much effort invested in increasing the robustness of human body tracking, mainly by incorporating motion models. Most approaches are probabilistic in nature and seek to avoid becoming trapped into local minima by considering multiple hypotheses, which typically requires exponentially large amounts of computation as the number of degrees of freedom increases.

By contrast, we'll present two different approaches to constraint the search space. The first one uses temporal motion models based on PCA to formulate the tracking problem as one of minimizing differentiable objective functions. The differential structure of these functions is rich enough to yield good convergence properties using deterministic optimization scheme at a much reduced computational cost. Such approach could be applied to monocular and multiview sequences and to different activities (walking, running, etc). Furthermore the recovered coefficients can be used to identify subjects by gait analysis.

The second approach increase the reliability of existing motion tracking algorithms by imposing limits on the underlying hierarchical joint structures in a way that is true to life. Unlike most existing approaches, we explicitly represent dependencies between joints and derive these limits from actual experimental data. Each set of valid positions is bounded by an implicit surface and we handle hierarchical dependencies by representing the space of valid configurations for a child joint as a function of the position of its parent joint. This representation provides us with a metric in the space of rotations that readily lets us determine whether a posture is valid or not.

On the other hand representing motions as linear sums of principal components has become a widely accepted animation technique. While powerful, the simplest version of this approach is not particularly well suited to modeling the specific style of an individual whose motion had not yet been recorded when building the database: It would take an expert to adjust the PCA weights to obtain a motion style that is indistinguishable from his. Consequently, when realism is required, current practice is to perform a full motion capture session each time a new person must be considered. We extend the PCA approach so that this requirement can be drastically reduced: For whole classes of cyclic and non-cyclic motions such as walking running or jumping, it is enough to observe the newcomer moving only once at a particular speed or jumping a particular distance using either an optical motion capture system or a simple pair of synchronized video cameras. This one observation is used to compute a set of principal component weights that best approximates the motion and to extrapolate in real-time realistic animations of the same person walking or running at different speeds, and jumping a different distance.

Image-based Water Surface Reconstruction
with Refractive Stereo

Nigel Morris

We present a system for reconstructing water surfaces using an indirect refractive stereo reconstruction method. Our work builds on previous work on image-based water reconstruction that uses single view refractive reconstruction techniques. We combine this approach with a stereo matching algorithm. Depth determination relies upon the refractive disparity of points on a plane below the water. We describe how the location of points on the water surface can be determined by hypothesizing a depth from the refractive disparity of one camera view. Then the second camera view is used to verify the depth. We compare two potential metrics for this matching process. We then present results from our algorithm using both simulated and empirical input, analyzing the results to determine the primary factors that contribute toward accurate surface point determination. We also show how this process can be used to reconstruct sequences of dynamic water and present several result sets.

Object Recognition in Cluttered Scenes
using Shock Graphs

Aurelie Bataille

Abstract: Shock graphs have emerged as a powerful generic 2-D shape representation, which decomposes a 2-D silhouette into a set of qualitatively defined parts. Although much progress has been made in both indexing and matching of shock graphs, most approaches typically assume that the silhouette has been correctly segmented. Although invariant to minor occlusion, the topology of a shock graph changes considerably in the presence of significant region under- or over-segmentation. In this talk, we present work in progress on a framework for shock graph-based object recognition in less contrived scenes, i.e., scenes in which object silhouettes cannot be readily extracted. The approach consists of two steps, beginning with a construction of a region scale-space, in which coarsely segmented regions point to component regions segmented at finer scales. Given an object hypothesis for a given region, we traverse the resulting segmentation tree, using the hypothesis to guide a region grouping process that improves the hypothesis. The result represents the best subset of regions, possibly spanning multiple scales, that matches a given object model. However, the possible region boundaries may not align with the actual object boundary. In the second step, the region-model correspondence is used to initialize an active skeleton that includes shock graph-based energy terms, allowing the contour to fine tune the region shape while deforming under a stricter set of model constraints. Very preliminary results will be presented.

Landmark Selection for Vision-Based Navigation

Pablo Sala

Abstract: Recent work in the object recognition community has yielded a class of interest point-based features that are stable under significant changes in scale, viewpoint, and illumination, making them ideally suited to landmark-based navigation. Although many such features may be visible in a given view of the robot's environment, only a few such features are necessary to estimate the robot's position and orientation. In this work, we address the problem of automatically selecting, from the entire set of features visible in the robot's environment, the minimum (optimal) set by which the robot can navigate its environment. Specifically, we decompose the world into a small number of maximally sized regions such that at each position in a given region, the same small set of features is visible. We introduce a novel graph theoretic formulation of the problem and prove that it is NP-complete. Next, we introduce a number of approximation algorithms and evaluate them on both synthetic and real data.

Joint work with Sven Dickinson, Robert Sim and Ali Shokoufandeh.

Send questions or comments about this page to
Page last modified on Wednesday, January 05, 2005