THE 1NN-1 METHOD
Prediction using one nearest neighbor of the test case
Radford Neal, 25 May 1996
The 1nn-1 method uses the simple procedure of predicting that the
target in a test case will the the same as the target in the closest
training case, with closeness measured by Euclidean distance between
the vectors of inputs. If there is a tie among two or more training
cases for closeness to the test case, it is broken by randomly
selecting one of the training cases.
This method can be used for targets of any type (but currently only a
single target is allowed). The method is probably not very good for
regression tasks, however. It's not really very good for
classification, either, but is thought to be at least a bit more
competitive in this context.
The 1nn-1 procedure is sensitive to the way in which inputs are
encoded, but not to the way that targets are encoded. The standard
method is to use the default DELVE encodings for the inputs. The
target can be encoded in any way that gives a single field (eg, copy,
0-up, etc.), but the target must NOT be encoded as multiple fields.
In particular, the target cannot be encoded as 1-of-n, which is
the DELVE default for non-binary categorical targets.
The method produces only guesses, not predictive distributions. It
therefore cannot be used with the L or Q loss functions.
The method is implemented using the 1nn-1 program, as described below:
PROGRAM TO MAKE PREDICTIONS USING THE 1-NEAREST-NEIGHBOR METHOD.
Usage:
1nn-1 instance
Reads cases from train.n, where n is the instance number given as the
argument. This training data should be encoding numerically, except
perhaps for the target (of which there must be only one, the last item
on the line). Also reads a file of inputs for test cases from test.n,
and produces a file cguess.n, containing the predictions for the test
cases found by the 1-nearest-neighbor method.