We need to cluster the training cases into subsets, one
for each local model.
The aim of the clustering is NOT to find clusters of
similar input vectors.
We want each cluster to have a relationship between
input and output that can be well-modeled by one
local model
I/O
I