 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
• |
Start by learning
one hidden layer.
|
|
|
• |
Then re-present
the data as the activities of the
|
|
|
hidden units.
|
|
|
|
– |
The
same learning algorithm can now be
|
|
|
applied
to the re-presented data.
|
|
|
• |
Can we prove
that each step of this greedy
|
|
|
learning
improves the log probability of the data
|
|
under the overall
model?
|
|
|
|
– |
What
is the overall model?
|
|