David McAllester, Toyota Technological Institute at Chicago
Training Structured Predictors for Novel Loss Functions
As a motivation we consider the PASCAL image segmentation
challenge. Given an image and a target class, such as person, the challenge
is to segment the image into regions occupied by objects in that class
(person foreground) and regions not occupied by that class (non-person
background). At the present state of the art the lowest pixel error rate is
achieved by predicting all background. However, the challenge is evaluated
with an intersection over union score with the property that the
all-background prediction scores zero. This raises the question of how one
incorporates a particular loss function into the training of a structured
predictor. A standard approach is to
incorporate the desired loss into the structured hinge loss and
observe that, for any loss, the structured hinge loss is an upper
bound on the desired loss. However, this upper bound is quite loose
and it is far from clear that the structured hinge
loss is an appropriate or useful way to handle the PASCAL evaluation measure.
This talk reviews various approaches to this problem and presents a new training algorithm we call the good-label-bad-label algorithm. We prove that in the data-rich regime the good-label-bad-label algorithm follows the gradient of the training loss assuming only that we can perform inference in the given graphical model. The algorithm is structurally similar to, but significantly different from, stochastic subgradient descent on the structured hinge loss (which does not follow the loss gradient).