Geri Grolinger


Structural Indexing Using Local Image Features

The problem of object recognition remains one of the most challenging and important problems in computer vision. Today's recognition systems typically extract a sparse set of local features, each of which characterizes, as a vector, a small patch of distinctive image data that is invariant to minor changes in lighting and viewpoint. The most popular such feature, called a SIFT (scale invariant feature transform) feature, speciffies the position, scale, orientation, and low-order description of the image data contained in the patch. When such features are extracted from an image, they vote for objects that contain them (a process called indexing), using an efficient nearest-neighbour search algorithm. Objects that receive a significant number of votes represent promising candidates for explaining the image. However, since each such feature votes independently, two different objects (or candidates) consisting of the same set of features but in very different configurations will receive the same number of votes. Therefore, a costly geometric consistency check must be applied to each model candidate in order to determine the best matching model. As the database grows to contain millions of images, the ambiguity of a single image feature may grow to the point where each feature is a member of a large number of objects, leading to a potentially intractable number of candidates that must be verified. Ambiguity is further compounded when the number of object features is small or represents a small fraction of the total number of image features (i.e., the target object is embedded in a cluttered scene). Instead of invoking strong geometric constraints at verification time, my research is exploring ways to incorporate these constraints at indexing time, leading to far fewer candidates that need to be verified.