Fall 2007 Talk Descriptions

Abstract: In my presentation, I will discuss how a number of Computer Vision challenges can be cast as problems of energy minimization. In particular, I will present optimization methods which allow to segment moving objects in image sequences, to detect obstacles in traffic videos, to reconstruct 3D shapes from a collection of 2D images and to track familiar shapes (for example walking people or 3D heart models) in videos. I will detail how respective cost functionals can be minimized both by continuous (PDE and level set methods) and by discrete (graph theoretic) methods.

Abstract: This work describes a technique for dynamically reconstructing the shape of an object from motion under orthography. This technique represents the shape as a 3D triangular mesh, and the mesh grows when previously occluded parts of the object become visible. The shape of the existing structure is also continually updated as the mesh is tracked over time. The shape is reconstructed without using factorization techniques, and the reconstructed shape is in turn used to estimate the rigid motion parameters for a given frame. As a result, the shape estimation and pose estimation steps go hand-in-hand to formulate a unifying framework for tracking and structure from motion. Current results have shown potential with this technique, and more powerful representations of the structure, such as appearance, shading and deformation information, can be added onto this framework in future works.

Abstract: The objective of this work is classifying texture from a single image under unknown lighting conditions. The current and successful approach to this task is to treat it as a statistical learning problem and learn a classifier from a set of training images, but this requires a sufficient number and variety of training images. We show that the number of training images required can be drastically reduced (to as few as three or four) by synthesizing additional training data using photometric stereo. We demonstrate the method on two standard texture databases, PhoTex and ALOT. Despite the limitations of photometric stereo, the resulting classification performance surpasses the state of the art results.

This is a joint work with Andrew Zisserman and Jan-Mark Geusebroek.

Abstract: In this talk I will describe three projects that harness the power of variable-aperture photography -- capturing multiple photos by manipulating basic lens controls such as aperture and focus. I will show that by combining such photos, the information encoded in defocus can be exploited to achieve a variety of goals.

First, I will describe a new method for computing highly detailed 3D shape by controlling both the aperture and focus of a lens. This method is particularly well-suited for scenes with high geometric complexity, for which standard reconstruction methods can break down.

Second, I will show that we can exploit "aperture bracketing" -- a one-button operation on most digital SLR's -- to allow refocusing and other effects in post-capture, all with increased dynamic range. To achieve this, we compute a layered scene model that simultaneously accounts for defocus, high dynamic range exposure, and noise in the input images.

Finally, I will talk about our current work on "light-efficient" photography, whose goal is to capture photos with the desired level of defocus in the shortest time possible.

Abstract: The digital photography revolution has greatly facilitated the way in which we take and share pictures. However, it has mostly relied on a rigid imaging model inherited from traditional photography. Computational photography and video go one step further and exploit digital technology to enable arbitrary computation between the light array and the final image or video. Such computation can overcome limitations of the imaging hardware and enable new applications. It can also enable new imaging setups and postprocessing tools that empower users to enhance and interact with their images and videos.

This talk describes new imaging architectures as well as software techniques that leverage computation to facilitate the extraction of information and enhance images. In particular, I will describe the use of a bilateral decomposition of images into a large-scale and a detail component using an edge-preserving approach. I will describe a variety of techniques that build on such decomposition for tone mapping, relighting, style transfer and flash photography. I will also describe a new simple modification of a lens as well as new inference techniques that enable the capture of both depth and a full-resolution image from a single picture.

Abstract: We present a new method for reconstructing the exterior surface of a complex transparent scene with inhomogeneous interior (e.g., multiple interfaces, reflective or painted interiors, etc). Our approach involves capturing images of the scene from one or more viewpoints while moving a proximal light source to a 2D or 3D set of positions. This gives a 2D (or 3D) dataset per pixel, called the scatter trace. The key idea of our approach is that even though light transport within a transparent scene’s interior can be exceedingly complex, the scatter trace of each pixel has a highly constrained geometry that (1) reveals the contribution of direct surface reflection, and (2) leads to a simple “scatter-trace stereo” algorithm for computing the local geometry of the exterior surface (depth and surface normals). We present 3D reconstruction results for a variety of scenes that exhibit complex light transport phenomena.

3D shape: its unique place in visual perception

Zygmunt Pizlo

Abstract: The talk will begin with a brief review of the main issues related to 3D shape perception: (i) the lack of learning, (ii) the nature of perceptual representation of 3D shapes, and (iii) the role of surface reconstruction vs. priors in shape recovery. A new model, which recovers a 3D shape from a single 2D image by applying simplicity constraints, will be presented. The following constraints are used: symmetry, planarity, maximal compactness and minimum surface area. The role of these constraints in human vision will be illustrated by results of psychophysical experiments. The new model was tested in simulations involving 3D synthetic shapes. The model's recovery is at least as good as that of human subjects.

Towards Learning Human-Robot-Interaction - what vision can do for us

Joachim Schmidt

Abstract:Humans and Robots working and also living together is no longer a fiction but a fact. A Robot with limited interaction abilities can be tolerated for industrial applications, but as soon as we think of household robots, robotic systems to support the elderly or disabled, or robot companions, this means to integrate them in our everyday life. Such systems are exposed to changing scenarios and unknown persons and behaviors what makes perception an even bigger challenge. In order to be accepted as a working member in our society they have to obey the rules of social interaction that are commonly used among humans, but still uncommon to most robots. This talk will act on the idea of socially interacting robots and address the visual perception of humans and point out ideas how to understand and learn human motion and gestures.

Send questions or comments about this page to
Page last modified on Monday, February 11, 2008