THE BASE-1 METHOD Base-line prediction using means, medians, or class frequencies Radford Neal, 6 May 1996 The base-1 method is intended to provide a base-line of performance that can be obtained by completely ignoring the inputs attributes, basing prediction solely on simple statistics regarding the targets in training cases - namely, the mean and median of the training targets, when these targets are numeric, and the frequencies of classes in the training set, when the targets are categorical. The method is appropriate for use with all loss functions except log probability loss ("L"). Currently, the method cannot be used when there is more than one target attribute. The base-1 method is implemented using two programs, "baser" and "basec", with "baser" being used for tasks where the targets are numeric, and "basec" being used when the targets are categorical. Numeric targets for use with "baser" can be encoded with or without normalization (it should make no difference to the result). Categorical targets for use with "basec" should be encoded in "0-up" form - ie, with the classes encoded as integers 0, 1, 2, etc., in the same order as they are listed in the dataset specification. In detail, the two programs operate as follows: BASER - BASE-LINE PREDICTION FOR NUMERIC TARGETS USING MEAN AND MEDIAN Usage: baser instance Reads cases from train.n, where n is the instance number given as the argument, ignoring all but the last number on each line, which should be the target in that training case (a number). Also reads a file of inputs for test cases from test.n, which it completely ignores, except to count how many test cases there are. Writes two files, each having as many lines are there are test cases, with each line being the same. The lines in the file cguess.S.n contain the most mean of the training targets, which is a reasonable guess for squared-error loss. The lines in the file cguess.A.n contain the median of the training targets, which is a reasonable guess for absolute-error loss. When the number of targets is even, the median is the average of the two middle targets. This method does not produce the predictive distributions that would be required for evaluation by log probability loss. BASEC - PREDICTION FOR CLASS TARGETS USING BASE RATES Usage: basec #classes instance Reads cases from train.n, where n is the instance number given as the second argument, ignoring all but the last number on each line, which should be the class of that training case (a number from 0 up to the number of classes minus 1). Also reads a file of inputs for test cases from test.n, which it completely ignores, except to count how many test cases there are. Writes two files, each having as many lines are there are test cases, with each line being the same. The lines in the file cguess.n contain the most frequent class from the training data, with ties resolved by picking the lower-numbered class. This is a reasonable guess for 0-1 loss. The lines in the file prob.n contain the probabilities of the classes, estimated by the frequencies of the classes in the training set. The probability for a class can be zero, if the class does not occur as the target for any case in the training set. This may make this method unsuitable when log probability loss is being used, as the loss can be infinite.