A problem that cannot be solved using  a
kernel that computes the similarity of a test
image to a training case
Suppose we have images that may contain a tank, but
with a cluttered background.
To recognize which ones contain a tank, it is no good
computing a global similarity
A non-tank test image may have a very similar
background to a tank training image, so it will have
very high similarity if the tanks are only a small
fraction of the image.
We need local features that are appropriate for the task.
So they must be learned, not pre-specified.
Its very appealing to convert a learning problem to a
convex optimization problem
 but we may end up by ignoring aspects of the real
learning problem in order to make it convex.