Variable-Aperture Photography (Project)
Samuel W. Hasinoff and Kiriakos N. Kutulakos

   

Publications

Samuel W. Hasinoff and Kiriakos N. Kutulakos, A Layer-Based Restoration Framework for Variable-Aperture Photography. Proc. 11th IEEE International Conference on Computer Vision, ICCV 2007, 8 pp. (DVD proceedings). [pdf] [poster]

Paper abstract

We present variable-aperture photography, a new method for analyzing sets of images captured with different aperture settings, with all other camera parameters fixed. We show that by casting the problem in an image restoration framework, we can simultaneously account for defocus, high dynamic range exposure (HDR), and noise, all of which are confounded according to aperture. Our formulation is based on a layered decomposition of the scene that models occlusion effects in detail. Recovering such a scene representation allows us to adjust the camera parameters in post-capture, to achieve changes in focus setting or depth-of-field—with all results available in HDR. Our method is designed to work with very few input images: we demonstrate results from real sequences obtained using the three-image "aperture bracketing" mode found on consumer digital SLR cameras.

Supplementary material

analytic gradient calculation

Following the discussion in Appendix A, we provide the full analytic formulas used to compute the gradients of the objective function.

general experimental setup

To test our approach on real data, we captured sequences using a Canon EOS 1Ds Mark II, secured on a tripod, with an 85mm f1.2L lens set to manual focus. In all our experiments we use the three-image "aperture bracketing" mode set to +-2 stops, and select shutter speed so that the images are captured at f8, f4, and f2 (yielding relative exposure levels of roughly 1, 4, and 16, respectively). We captured RAW images for increased dynamic range, and demonstrate our results for downsampled 500x333 pixel images.

All the results videos (MPEG-2) below include a side panel with three sliders, to help visualize the camera settings used to synthesize new images. The red zones on the sliders indicate extrapolation:
  1. aperture
    • from narrow to wide
    • ticks indicate the f-stops of the input images (f8, f4, f2)
  2. focus
    • from near to far
    • ticks indicate the estimated relative depths of the scene layers, on a logarithmic scale
  3. exposure
    • from dark to bright
    • ticks indicate exposures corresponding to the input images
    • to indicate tonemapping, the full range is shown spanned

"dumpster" dataset

Outdoor sequence, composed of three layers—a rusty dumpster, a pebbled wall, and a building. The foreground dumpster is darker and nearly in-focus.

"portrait" dataset

Indoor sequence, backlit with available light from the window. The nearly-focused subject is dark compared to the background buildings, and a very dark defocused chair sits in the foreground. Because the chair under-exposed even in the widest-aperture image, we see artifacts at its boundary, due to posterization and over-smoothing

"pillars" dataset

Outdoor sequence, composed of two differently exposed structures—a dark wall is occluded by several bright stone pillars. Note how the method assigns slightly different depths to the two segments containing the gradually sloping background wall. Although not as noticeable in the synthesized results, the initial segmentation misassigns the lower-rightmost portion of the foreground ledge to the background layer.

"macro" dataset - failure case

Macro sequence (using a 10mm extension tube), composed of a miniature glass bottle whose inner surface is painted, and a dried bundle of green tea leaves. This is a challenging dataset for several reasons: the level of defocus is severe outside the very narrow depth-of-field, the scene consists of both smooth and intricate geometry (bottle and tea leaves, respectively), and reflections on the glass surface actually lead to focusing at "virtual" depths. The initial segmentation leads to a very coarse decomposition into layers, which is not improved by the optimization. The worst resynthesis artifacts occur at layer boundaries—the bright "cracks" visible when refocusing are due to the incorrect segmentation combined with our diffusion-based inpainting algorithm.

"lena" dataset - synthetic

This synthetic dataset consists of an HDR version of the 512x512 pixel Lena image, where we simulate HDR by dividing the image into three vertical bands and artificially exposing each band. We decompose the image into layers by assigning different depths to each of three horizontal bands, and generate the input images by applying the forward image formation model. Finally, we add Gaussian noise to the input with a standard deviation of 1% of the intensity range.

additional datasets - coming soon

Additional datasets will illustrate: (1) how the method copes with smoothly varying depths (ie. a tilted plane), and (2) how realistic results can be resynthesized even from very inaccurate depths (eg. a complex object over an untextured background), as in image-based rendering from stereo.