![]() |
||
Winter 2006 Talk DescriptionsHierarchies relating Topology and GeometryWalter KropatschAbstract: Cognitive Vision has to represent, reason and learn about objects in their environment it has to manipulate and react to. There are deformable objects like humans which cannot be described in simple geometric terms. In many cases they are composed of several pieces forming a 'structured subset of Rn or Zn'. We introduce the potential topological representations for structured objects: plane graphs, combinatorial and generalized maps. They capture abstract spatial relations derived from geometry and enable reconstructions through attributing the relations by e.g. coordinates. In addition they offer the possibility to combine both topology and geometry in a hierarchical framework: irregular (graph) pyramids. The basic operations to construct these hierarchies are edge contraction and edge removal. We show results in using them to hold a whole set of segmentations of an image that enable reasoning and planning actions at various levels of detail down to a single pixel in a homogeneous way. We further speculate that the higher levels map the inherent structure of objects and can be used to integrate (and 'learn') the specific object properties over time by up-projecting individual measurements. The construction of the hierarchies follows the philosophy to reduce the data amount at each higher level of the hierarchy by a factor > 1 while preserving important properties like connectivity and inclusion. We finish by a few related research topics. Vision-based robotics: Building a curious autonomous explorerRobert SimAbstract: Autonomous mobile robot systems have an important role to play in a wide variety of application domains. A key component for autonomy is the capability to explore an unknown environment and construct a representation that a robotic agent can use to localize, navigate, and reason about the world. In this talk I will present results on the automatic construction of visual representations. First, I will present a flexible architecture for real-time vision-based mapping of an unknown environment, in which the main goal is to facilitate robust large-scale mapping. We employ a Rao-Blackwellised particle filter for modeling a posterior probability distribution over robot trajectories and possible maps, and I will present a variety of robust methods for preventing filter divergence and ensuring accuracy. Subsequently, I will discuss recent progress on the problem of autonomous robotic exploration. I will consider issues in choosing optimality criteria for evaluating actions, and utilizing extended planning horizons, such that an exploring robot can demonstrate an emergent sensor of curiosity. The talk will conclude with a presentation of results demonstrating how my work on these two problems has been combined into the world's first fully autonomous, fully vision-based mapping robot. MRF's for MRI's: Bayesian Reconstruction of MR Images via Graph CutsRamin ZabihAbstract: Markov Random Fields (MRF's) are a very effective way to impose spatial smoothness in computer vision. I will describe an application of MRF's to a non-traditional but important problem in medical imaging: the reconstruction of MR images from raw fourier data. This can be formulated as a linear inverse problem, where the goal is to find a spatially smooth solution while permitting discontinuities. Although it is easy to apply MRF's for MR reconstruction, the resulting energy minimization problem poses some interesting challenges. It lies outside of the class of energy functions that can be straightforwardly minimized with graph cuts. I will show how graph cuts can nonetheless be adapted to solve this problem, and demonstrate some preliminary results that are extremely promising. Joint work with Ashish Raj and Gurmeet Singh. From Photohulls to Photoflux OptimizationYuri BoykovAbstract: This work was inspired by recent advances in image segmentation where flux-based functionals significantly improve alignment of object boundaries. We propose a novel "photoflux" functional for multi-view 3D reconstruction that is closely related to properties of photohulls. Our photohull prior can be combined with regularization. Thus, this work can be seen as a unification of two major groups of approaches to multiview stereo: "space carving" and "deformable models". Our framework combines benefits of both groups and allows to recover fine shape details without oversmoothing while robustly handling noise. Photoflux provides an intelligent ballooning force helping to segment thin structures or holes. We propose a number of different versions of photoflux based on global, local, or non-deterministic visibility models. Some forms of photoflux can be easily added into standard regularization techniques. For other forms we propose new optimization methods. We also show that photoflux maximizing shapes can be seen as regularized Laplacian zero-crossings. A Novel Solution to an Old Problem: Invariant Recognition by a Neural Network with Fractal-like ConnectivityMoran FurmanAbstract: The human visual system has a remarkable ability to recognize thousands of objects despite changes in their viewing conditions. During ego motion, for example, static surrounding objects are perceived as unchanging, although their retinal images undergo various transformations including scaling and translation. A traditional approach to size and position invariant recognition is based on pooling from a large number of replicated filters, each one selective to a certain feature at a specific position and size. This approach suffers from a number of drawbacks and is limited in its ability to account for physiological findings. This talk will present a novel approach to invariant recognition. The suggested model is a neural network with connection weight patterns which resemble fractals. This special type of connectivity is shown to enable computationally efficient invariant feature detection. Computer simulations demonstrate the model's ability to account for a variety of physiological findings, including response properties and receptive-field shapes of inferotemporal (IT) neurons. In addition, the suggested model avoids the need of parallel adaptation of a large number of replicated connectivity patterns during learning of novel features. In a broader perspective, the connectivity patterns introduced in this model generate a unique type of distributed representation which might be relevant to other brain functions as well. The Improved Fast Gauss Transform and Applications to Vision and LearningRamani DuraiswamiAbstract: Evaluating sums of multivariate Gaussian kernels is a key computational task in many problems in computer vision, statistics and machine learning. The computational cost of the direct evaluation of such sums scales as the product of the number of kernel functions and the evaluation points. The fast Gauss transform (FGT) reduces the computational complexity of the evaluation of the sum of N Gaussians at M points in d dimensions from O(MN) to O(M+N). The FGT was first proposed by Greengard and Strain and applied successfully to a few lower dimensional applications in mathematics and physics. However the performance degrades exponentially with increasing dimensionality, which makes it impractical for dimensions greater than three. We presented an extension of the fast Gauss transform (the improved fast Gauss transform or IFGT) that was suitable for higher dimensional problems. Use of new data structures and an efficient factorization reduced the constant factor was to asymptotically polynomial order in the dimension. There are many applications of this new algorithm. We have applied it to problems in computer vision (tracking and segmentation via the mean-shift algorithm) and learning (regularized support vector classifiers and Gaussian process regression). In each case use of the IFGT results in a dramatic improvement in the performance of the algorithm, reducing the asymptotic complexity from O(N^3) or O(N^2) to linear order. sigmaSLAM: Stereo Vision SLAM Using the Rao-Blackwellised Particle Filter and a Novel Mixture Proposal DistributionPantelis ElinasAbstract: We consider the problem of Simultaneous Localization and Mapping (SLAM) using the Rao-Blackwellised Particle Filter (RBPF) for the class of indoor mobile robots equipped only with stereo vision. We refer to our approach as sigmaSLAM (for stereo-vision SLAM.) Our goal is to construct dense metric maps of natural 3D point landmarks for large cyclic environments in the absence of accurate landmark position measurements and motion estimates. Our work differs from other approaches because landmark estimates are derived from stereo vision and motion estimates are based on visual odometry; we distinguish between landmarks using the Scale Invariant Feature Transform (SIFT). This is in contrast to current popular approaches that rely on motion models derived from odometric hardware and accurate landmark measurements obtained with laser sensors. Since our approach depends on a particle filter whose main component is the proposal distribution, we develop and evaluate a novel mixture proposal distribution that allows us to robustly close large loops up to 120 meters long. We validate our approach experimentally for long camera trajectories in challenging environments processing thousands of images at reasonable frame rates. We demonstrate the robustness of sigmaSLAM in the presence of large changes in environment illumination and image blurring. In addition, we show that given a well localized camera, we can trivially construct 2D occupancy grids thatare useful for path planning and exploration. This is joint work with Rob Sim and Jim Little that will be presented at ICRA 2006. The paper can be downloaded at www.cs.ubc.ca/~elinas/publications.html. Calibrating Distributed Camera NetworksRich RadkeAbstract: We discuss how to obtain the accurate and globally consistent self-calibration of a distributed camera network, in which cameras and processing nodes may be spread over a wide geographical area, with no centralized processor and limited ability to communicate a large amount of information over long distances. First, we describe how to estimate the vision graph for the network, in which each camera is represented by a node, and an edge appears between two nodes if the two cameras jointly image a sufficiently large part of the environment. We propose an algorithm in which each camera independently composes a fixed-length message that is a lossy representation of a subset of detected features, and broadcasts this "feature digest" to the rest of the network. Each receiver camera decompresses the feature digest to recover approximate feature descriptors, robustly estimates the epipolar geometry to reject outliers and grow additional matches, and decides whether sufficient evidence exists to form a vision graph edge. Second, we present a distributed camera calibration algorithm based on belief propagation, in which each camera node communicates only with its neighbors in the vision graph. The natural geometry of the system and the formulation of the estimation problem give rise to statistical dependencies that can be efficiently leveraged in a probabilistic framework. The camera calibration problem poses several challenges to information fusion, including missing data, overdetermined parameterizations, and non-aligned coordinate systems. We demonstrate the accurate and consistent performance of the vision graph generation and camera calibration algorithms using a simulated 60-node outdoor camera network. Image Deblurring in the Presence of Salt-and-Pepper NoiseLeah BarAbstract: The problem of image deblurring in the presence of salt and pepper noise is considered. Standard image deconvolution algorithms, that are designed for Gaussian noise, do not perform well in this case. Median type filtering is a common method for salt and pepper noise removal. Deblurring an image that has been preprocessed by median-type filtering is however difficult, due to the amplification (in the deconvolution stage)of median-induced distortion. A unified variational approach to salt and pepper noise removal and image deblurring is presented. An objective functional that represents the goals of deblurring, noise-robustness and compliance with the piecewise-smooth image model is formulated. A modified $L1$ data fidelity term integrates deblurring with robustness to outliers. Elements from the Mumford-Shah functional, that favor piecewise smooth images with simple edge-sets, are used for regularization. Promising experimental results are shown for several blur models. Tracking & Privacy in Surveillance ApplicationsShai AvidanAbstract:This talk will cover two of the surveillance projects I've been involved in over the last year. The first project, termed "Ensemble Tracking", is a general tracking algorithm that works by modeling tracking as a binary classification problem, where one aims to separate the object from the background. An ensemble of weak classifiers is trained online, and constantly updated, to maintain a strong classifier that can correctly distinguish the object from the background. The second project, termed "blind vision", combines methods from Secure Multi-Party Computations and Computer Vision. It deals with two parties, Alice & Bob, that want to cooperate to achieve a common goal, without revealing their private information to each other. In our case, Alice owns surveillance images and Bob owns an object detection algorithm. I will show how Bob can detect objects in Alice's images without learning anything about the images, not even the result of his own object detection algorithm. Alice will learn nothing about Bob's algorithm, other then a binary answer to her query.
Send questions or comments about this page to |
||