|
Fall 2003 Talk Descriptions
Indexing and Matching for View-Based
3-D Object Recognition Using Shock Graphs
Abstract: A shock graph is a shape abstraction that decomposes a shape into hierarchically organized primitive parts. In this talk, I will propose a novel shock graph computation algorithm that yields stable graphs under noise and shape deformation due to viewpoint changes. Next, I will show an extension to the indexing and matching framework for hierarchical structures introduced by Shokoufandeh et al., along with its application to the problem of matching shock graphs representing views of 3-D objects. The improved indexing performance is based on a vote accumulation algorithm that efficiently solves a multiple one-to-one assignment of votes. In turn, the matching algorithm is extended by ensuring the satisfaction of all the hierarchical constraints encoded in the graphs. I will show that the proposed integrated framework is effective in recognizing object shapes and in estimating the pose of their corresponding 3-D object models. I will show recognition results for a database of 2688 object views.
Learning Multiscale Random Fields for Image Classification
Abstract: We propose an approach to learning features for labelling images, in which each pixel is assigned to one of a predefined set of labels. The probabilistic framework combines the outputs of several components. Components differ in the information they encode, as some focus on the image-label mapping, while others solely on patterns within the label field; and in their scale, as some focus on fine-resolution features while others on coarser, more global structure, which includes contextual information. A supervised version of contrastive divergence algorithm is applied to training the model parameters. We demonstrate performance both on synthetic and real-world images.
Many-to-Many Matching of Scale-Space
Feature Hierarchies using Metric Embedding
Abstract: Scale-space feature hierarchies can be conveniently represented as graphs, in which edges are directed from coarser features to finer features. Consequently, feature matching (or view-based object matching) can be formulated as graph matching. Most approaches to graph matching assume a one-to-one correspondence between nodes (features) which, due to noise, scale discretization, and feature extraction errors, is overly restrictive. We present a framework for the many-to-many matching of multi-scale feature hierarchies, in which features and their relations are captured in a vertex-labeled, edge-weighted graph. The matching algorithm is based on a metric-tree representation of labeled graphs and their low-distortion metric embedding into normed vector spaces. To compute the distance between two sets of embedded, weighted vectors, we use the Earth Movers Distance under transformation. To demonstrate the approach, we target the domain of multi-scale, qualitative shape description, in which an image is decomposed into a set of blobs and ridges with automatic scale selection. We conduct an extensive set of view-based matching trials, and compare the results favorably to matching under a one-to-one assumption.
(this is a joint work with M. Demirci, S. Dickinson, Y. Keselman, and L. Bretzner).
On Visual Maps and their Automatic Construction
Abstract: Exploring and mapping an environment are important tasks for robotic autonomy. Current approaches to these problems focus on learning geometric models of the world using range sensors and exploration heuristics. In this work we present the concept of the visual map, a representation of the visual structure of the environment, as well as a framework for learning this structure. Such maps are useful for a robot equipped with a camera to navigate and localize, and can also serve as a useful tool for visualization and virtual environment construction.
In the first half of this work we develop the map-learning framework, including how visual scene features are initially selected, tracked and evaluated, and how they can subsequently be used for robot pose estimation, navigation and scene reconstruction. We take a probabilistic approach to these tasks and present experimental results demonstrating the utility of the framework.
An important consideration while mapping an environment is that of what priors, if any, are necessary for constructing an accurate map. The second half of our work poses two questions; first: how can visual maps be constructed using only limited prior information about the exploratory trajectory?, and second: how does the exploratory trajectory influence the accuracy of the map? We approach these problems empirically and demonstrate experimental results illustrating how to solve them.
Robust Model-Free Tracking and Reconstruction of
Non-Rigid 3D Shape
Abstract: I will present a robust algorithms for estimating non-rigid motion in video sequences. I will first survey recent methods for tracking and reconstruction from video by enforcing global structure (such as rank constraints) on the tracking. These methods assume color constancy in the neighborhood of each tracked feature, an assumption that is violated by occlusions, deformations, lighting changes, and other effects. Our method identifies outliers while solving for flow. This allows us to obtain high-quality tracking from difficult sequences, even when there is no single ``reference frame'' in which all tracks are visible.
I will then describe a novel non-rigid structure-from-motion algorithm that learns a probability distribution over deformation, allowing it to robustly handle missing data. Combining this with the robust tracking method allows us to robustly reconstruct non-rigid models from video. Viewing the deformation basis as a latent variable yields a formulation closely related to factor analysis; adding dynamics yields a form of linear dynamical system.
Time permitting, I will also describe a way to explicitly model lighting variations as well, thereby unifying multi-view stereo, structure-from-motion, and photometric stereo. When solved independently, these techniques have a number of ambiguities (e.g. tracking textureless regions) that are complementary; combining this methods resolves these ambiguities, and yields high-quality tracking and shape reconstruction from video under variable illumination.
Joint work with Chris Bregler, Brian Curless, Steve Seitz, Lorenzo Torresani, and Li Zhang.
3D scanning of glass, mirrors and liquids by Indirect Projection
Abstract: Research on 3D photography has focused almost exclusively on objects that mostly scatter incident light. As a result, little is known about how create 3D scans of refractive or mirror-like objects such as mirrors, lenses, glass ornaments, and liquids. I will talk about some of the geometry behind this family of reconstruction problems and about the algorithms that one can use to solve them. These algorithms rely on viewing known 2D patterns "indirectly", via the objects we want to scan. Examples of indirect viewing include taking a picture of a known pattern by pointing a camera toward the pattern's reflection in a mirror instead of the pattern itself; or placing a glass object between a pattern and one or more cameras, and taking pictures. Possible graphics applications of this work include image-based rendering, environment matting, development of new scanning technologies, and physical simulation of water or other liquids.
Evolution ER1: Inexpensive and easy-to-use robot for research
Abstract: In this demo, we will start by explaining the motivation behind obtaining this robot platform. Modularity, convenience, upgradeable flexibility, and accompanying software are some of the strength of the ER1 Evolution robot.
We will then demonstrate some of the features of the robot like vision-based recognition and tracking, collision detection and avoidance, voice activation, and behavior-based activity.
Finally, we will discuss how this robot can be useful for both the vision and the cognitive robotics research groups. We will identify some limitations as well as how this might be resolved in the future.
As our experience with the robot is limited, we expect this demonstration to be informal.
Image Similarity by Relative Dynamic Programming
Abstract: One of the most fundamental problems in computer vision is to evaluate the similarity between images of objects. Although the problem has attracted a lot of research efforts by computer scientists and psychologists, it does not have at present a satisfactory solution which is comparable to humans' abilities. We developed a generic measure for images similarity based on the combination of overlapping patches. We present a method to combine several sources of similarity assessments into a single score, and derive an algorithm that is robust to small deformations of parts in various positions and scales. Our relative dynamic programming algorithm is a variant of dynamic programming that is not inherently one-dimensional, and its scores are on a relative scale.
Joint work with Shimon Ullman.

Send questions or comments about this page to 
Page last modified on Saturday, November 20, 2004
|