An important note to users with
version 1.0 of the software.

The Delve datasets and families are available
from this page. Every dataset (or family) has a brief overview page and many
also have detailed documentation. You can download
gzipped-tar files of the datasets, but you will require
the
delve software environment to get maximum benefit
from them. Datasets are categorized as primarily
assessment,
development or
historical
according to their recommended
use. Within each category we have distinguished datasets as regression or
classification according to how their prototasks have been created. Details on
how to install the downloaded datasets are given
below .
There is also a summary table of the datasets.

Datasets from this section are recommended to be used when reporting results
for your learning method. You should run your method once on each task and
report the results from that run. That is, you should not use results from the
testing data to modify your method, then re-run it.
Regression Datasets
- abalone. Download
abalone.tar.gz
Predict the age of abaolone from physical measurements. From the
UCI repository of machine learning databases.
-
bank. Download
bank-family
A family of datasets synthetically generated from a simulation of how
bank-customers choose their banks. Tasks are based on predicting the
fraction of bank customers who leave the bank because of full queues.
- census-house. Download census-house.tar.gz
Predicting median house prices from 1990 US census data.
- comp-activ. Download
comp-activ.tar.gz
Predict a computer system activity from system performance measures..
- pumadyn family of datasets.
Download pumadyn-family
This is a family of datasets synthetically generated from a realistic
simulation of the dynamics of a Unimation Puma 560 robot arm.
Classification Datasets
- adult. Download
adult.tar.gz
Predict if an individual's annual income exceeds $50,000 based on census data.
From the
UCI repository of machine learning databases.
- splice. Download splice.tar.gz
Recognize two classes of splice junctions in a DNA sequence.
From the
UCI repository of machine learning databases.
- titanic.
Download titanic.tar.gz
Information on passengers of the Titanic and whether they survived

We recommend that you use datasets from this section while developing a new
learning method, or fine-tuning parameters. That is, you can re-run your
method several times on a dataset until you obtain the desired performance. If
you do use a dataset in this manner, you should not use it when
reporting your method's performance: you should use datasets from the
Assessment section.
Regression Datasets
- boston.
Download boston.tar.gz
Housing in the Boston Massachusetts area. From the
UCI repository of machine learning databases.
- demo. Download
demo.tar.gz
The demo dataset was invented to serve as an example for the Delve
manual and as a test case for Delve software and for software that
applies a learning procedure to Delve datasets.
-
kin family of datasets. Download
kin-family
This is a family of datasets synthetically generated from a realistic
simulation of the forward kinematics of an 8 link all-revolute robot arm.
Classification Datasets
- image-seg.
Download image-seg.tar.gz
Predict the object class of a 3x3 patch from an image of an outdoor
scence. From the
UCI repository of machine learning databases.
- letter.
Download letter.tar.gz
Classify an image as one of 26 upper case letters. The inputs are simple
statistical features derived from the pixels in the image. From the
UCI repository of machine learning databases.
- The mushrooms
dataset.
Download mushrooms.tar.gz
Classify hypothetical samples of gilled mushrooms in the Agaricus and
Lepiota family as edible or poisonous.
From the
UCI repository of machine learning databases.

Datasets from this section have been included because they are
established in the literature. We have attempted to reproduce the
original usage as closely as possible to facilitate comparisons.
Regression Datasets
- add10.
Download add10.tar.gz
A synthetic function suggested by Jerome Friedman in his
"Multivariate Adaptive Regression Splines paper.
- hwang.
Download hwang.tar.gz
Five real-valued functions of two variables used by Jenq-Neng Hwang, et al
and others to test nonparametric regression methods. Both noisy and
noise-free prototasks are defined based on these functions.
Classification Datasets
- ringnorm.
Download ringnorm.tar.gz
Leo Breiman's ringnorm example. Classify cases as coming from one
of two overlapping normal distributions.
- twonorm.
Download twonorm.tar.gz
Leo Breiman's two normal example. Classify a case as coming from
one of 2 normal distribution, one distribution lies within the
other.

Before you can install the datasets, you must build and install the
Delve utilities.
Once you've done that, you can install the datasets. This involves simply
extracting the files from their tape archives into the proper directory: the
installed top-level Delve data directory. By default this directory is "/usr/local/lib/delve/data".
If you used the "--prefix" option with the "configure"
command used to build the delve utilites, replace the "/usr/local"
part of the above path with that prefix.
Each tape archive will create a directory with the same base name as the
archive file. This directory will contain all the data and specification files
Delve needs to generate the tasks.
mv demo.tar.gz /usr/local/lib/delve/data
cd /usr/local/lib/delve/data
gunzip demo.tar.gz
tar -xvf demo.tar
If you want to install a dataset in a private directory, you can do the
following:
- Create a directory called delve in your home directory (or
anywhere else, for that matter).
- In that directory create two more directories: data and methods.
- In the delve/data directory, untar the data file as described
above.
Once you've done that, you can work in your own private delve directory and
you will have access to the datasets you've downloaded, as well the ones
installed in /usr/local/lib/delve/data.
Once you've extracted the data, you can safely remove the tar file.