Decision Trees: Non-linear regression or
classification with very little computation
The idea is to divide up the input space into a
disjoint set of regions and to use a very simple
estimator of the output for each region
For regression, the predicted output is just the
mean of the training data in that region.
But we could fit a linear function in each region
For classification the predicted class is just
the most frequent class in the training data in
that region.
We could estimate class probabilities by the
frequencies in the training data in that region.