Empirically assessing the predictive performance of learning methods is an essential component of research in machine learning. The DELVE environment was developed to support such assessments. It provides a collection of datasets, a standard approach to conducting experiments with these datasets, and software for the statistical analysis of experimental results. In this paper, DELVE is used to assess the performance of neural network methods when the inputs available to the network have varying degrees of relevance. The results confirm that the Bayesian method of ``Automatic Relevance Determination'' (ARD) is often (but not always) helpful, and show that a variation on ``early stopping'' inspired by ARD is also beneficial. The experiments also reveal some other interesting characteristics of the methods tested. This example illustrates the essential role of empirical testing, and shows the strengths and weaknesses of the DELVE environment.
In C. M. Bishop (editor) Neural Networks and Machine Learning, pp. 97-129, Springer-Verlag (1998): postscript, pdf.