Another way to divide and conquer
Re-representing the data: Each time the base
learner is called, it passes a transformed version
of the data to the next learner.
Can we learn a deep, dense DAG one layer at
a time, starting at the bottom, and still
guarantee that learning each layer improves
the overall model of the training data?
This seems very unlikely. Surely we need to know
the weights in higher layers to learn lower layers?