Winter 2009 Talk Descriptions

Abstract: We present a method for learning hierarchical class-specific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (C-RBM), a variant of the RBM model in which weights are shared to respect the spatial structure of images. This framework learns a set of features that can generate the images of a specific object class, and serves as building block of our hierarchical model. This hierarchical feature detector is a four layer structure of alternating filtering and maximum subsampling. We learn feature parameters of the first and third layers viewing them as separate C-RBMs. The output of our feature extraction hierarchy is fed as input to a discriminative classifier. It is experimentally demonstrated that the extracted features are effective for object detection, using them to obtain performance comparable to the state-of-the-art on handwritten digit recognition and pedestrian detection.

Abstract: Most face recognition algorithms use "distance-based" methods: feature vectors are extracted from each face and distances in feature space are compared to determine matches. In this talk I will argue for a fundamentally different approach. We consider each image as having been generated from an underlying cause (a latent identity variable, or LIV). In recognition we evaluate the probability that two faces have the same underlying cause. Since image generation is noisy, we can never be exactly certain what this cause was, so we integrate (marginalize) over all possible causes. We present examples of identification and verification and show that our approach outperforms equivalent distance-based algorithms. We will also show how to build models that can cope with wide pose and illumination variation and how to leverage these models for creating realistic images of faces.

Abstract: Light sources and cameras are optical duals: sources emit light rays while the cameras capture them. This talk will argue that light sources can serve as better cameras in many applications, advancing the state of the art in computer vision. By moving a light source instead of a camera, we show how to reconstruct highly intricate shapes like wreaths, corals and tree branches from hundreds of 'views'. Second, we leverage the 'illumination dithering' in the micromirror array of DLP projectors to speedup virtually any active vision algorithm, resulting in high-speed 3D reconstruction, photometric stereo, appearance capture and high frequency preserving motion blur. We finally show what can be known about the camera, by observing the main outdoor illuminants (sky and sun) in a time-lapse image sequence, and how it can be used for appearance transfer across different scenes.

Bio: Srinivas Narasimhan is an Assistant Professor in the Robotics Institute at Carnegie Mellon University. He received his Masters and Doctoral degrees in Computer Science with distinction from Columbia University in 2000 and 2004 respectively. His research interests are in light transport analysis and computational illumination and imaging for applications in vision, graphics and robotics. His work on vision in bad weather received a Best Paper Honorable mention award in IEEE CVPR 2000 and his work on medical endoscopy received the Adobe Best Paper award in 2007 IEEE Workshop on Photometric Analysis in Computer Vision. He received the NSF CAREER award in 2007.

Abstract: We report on a continuing assessment of the imaging performance of an operator independent breast imaging device based on the principles of acoustic tomography. The data were collected with a clinical prototype at the Karmanos Cancer Institute in Detroit MI from patients recruited at our breast center. Tomographic sets of images were constructed from the data and used to form 3-D image stacks corresponding to the volume of the breast. Our techniques generated whole breast reflection images as well as images of the acoustic parameters of sound speed and attenuation. The combination of these images reveals major breast anatomy, including fat, parenchyma, fibrous stroma and masses. The three types of images are intrinsically co-registered because the reconstructions are performed using a common data set acquired by the prototype. Fusion imaging, utilizing thresholding, is shown to visualize mass characterization and facilitates separation of cancer from benign masses. These initial results indicate that operator-independent whole-breast imaging and the detection and characterization of cancerous breast masses are feasible using acoustic tomography techniques. Future improvements in image processing, including denoising and segmentation, will be discussed as a next step in improving the clinical performance of the prototype.

(Joint work with Peter Littrup, Cuiping Li, Olsi Rama, Lisa Bey-Knight, Steven Schmidt and Jessica Lupinacci.)

Send questions or comments about this page to
Page last modified on Tuesday, June 16, 2009