 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
If we use a
neural net to define the features, maybe we
|
|
|
can use convex
optimization for the final layer of weights
|
|
|
and then
backpropagate derivatives to “learn the kernel”.
|
• |
The convex
optimization is quadratic in the number of
|
|
|
training cases.
So this approach works best when most
|
|
|
of the data is
unlabelled.
|
|
|
– |
Unsupervised
pre-training can then use the
|
|
|
unlabelled
data to learn features that are appropriate
|
|
|
for
the domain.
|
|
|
– |
The
final convex optimization can use these features
|
|
|
as
well as possible and also provide derivatives that
|
|
|
allow
them to be fine-tuned.
|
|
|
– |
This
seems better than just trying lots of kernels and
|
|
|
selecting
the best ones (which is the current method).
|
|