 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
| |
There are many
variations (CART, ID3, C4.5) but the basic
|
|
|
idea is to
recursively partition the space by greedily picking
|
|
|
the best test
from a fixed set of possible tests at each step.
|
|
|
|
|
We
need to decide on the set of possible tests
|
|
|
|
|
Consider
splits at every coordinate of every point in the training
|
|
|
data
(i.e. all axis-aligned hyper-planes that touch datapoints)
|
|
|
|
|
We
could consider all possible hyper-planes but it is usually too
|
|
|
expensive
in fitting time and model complexity.
|
|
|
|
|
We
need a measure of how good a test is (purity)
|
|
|
|
|
For
regression, compute the resulting sum-squared error over all
|
|
partitions.
|
|
|
|
|
For
classification it is a bit more complicated.
|
|