Datasets Summary Table

All datasets currently available in delve are summarized in the table below. The interpretation of the columns are explained below.

Name Usage Origin Task
Attrs CasesMethods View
census-house AC R13922784 2 click
abalone AN R94177 2 click
adult AN C1648842 0 none
splice AN C613175 8 click
titanic AN C42201 3 click
bank-32fh AS R338192 9 none
bank-32fm AS R338192 9 none
bank-32nh AS R338192 9 none
bank-32nm AS R338192 9 none
bank-8fh AS R98192 9 none
bank-8fm AS R98192 9 none
bank-8nh AS R98192 9 none
bank-8nm AS R98192 9 none
pumadyn-32fh AS R338192 25 click
pumadyn-32fm AS R338192 25 click
pumadyn-32nh AS R338192 25 click
pumadyn-32nm AS R338192 25 click
pumadyn-8fh AS R98192 25 click
pumadyn-8fm AS R98192 25 click
pumadyn-8nh AS R98192 25 click
pumadyn-8nm AS R98192 25 click
demo DA C R52048 11 click
mushrooms DA C238124 1 click
comp-activ DC R278192 2 click
image-seg DC C192310 8 click
boston DN R14506 10 click
kin-32fh DS R338192 22 click
kin-32fm DS R338192 22 click
kin-32nh DS R338192 22 click
kin-32nm DS R338192 22 click
kin-8fh DS R98192 22 click
kin-8fm DS R98192 22 click
kin-8nh DS R98192 22 click
kin-8nm DS R98192 22 click
letter DS C1720000 8 click
add10 HA R119792 2 click
hwang HA R1213600 0 none
ringnorm HA C217400 3 click
twonorm HA C217400 3 click

Summary table

The meaning of the columns are as follows:
  1. Clicking on the dataset name in the left column displays the documentation for the dataset.
  2. The suggested Usage of the dataset is coded as one of Assessment, Development or Historical. See the delve manual, chapter 3 for an elaboration of these terms.
  3. The Origin of the dataset can be one of Natural, Cultivated, Simulated, Artificial. See the delve manual, chapter 3 for an elaboration of these terms.
  4. Task type indicates the types of tasks associated with the dataset. We distinguish Regression, Classification and Density estimation task types depending on the prior information provided about the task's targets. It is possible for a dataset to have more than one kind of task type.
  5. Attrs indicate the total number of attributes in the dataset.
  6. Cases is the total number of cases in the dataset.
  7. Methods shows the number of learning methods in the Delve repostory which have been run on one or more prototasks in the dataset. Clicking on the number lists the methods.
  8. Clicking in the View Results column gives a summary plot of the performance of the different methods on the dataset.
    This type of plot condenses a lot of information into a single figure. Briefly, it shows the expected performance of each method as the height of the solid bar on each of the training set sizes of a prototask. Squared-error loss is used for regression prototasks and 0-1 loss for classification prototasks. The standard error of the mean is indicated by the thin line on top of the bar. Below the plot are boxes in which the P value of the hypothesis that two learning method performances differ in a paired-test. (See the Delve manual, chapter 8 for details.) The methods compared are listed down the left edge. The ordering of the bars from left to right within a training set size is the same as the method list from top to bottom. Therefore, an entry in the (i,j) cell of the box shows the P value of the comparison between the i and j methods. Only P values less than 0.05 are indicated. Scanning along a row, quickly indicates how a method compares to others in the plot; an entry in the row indicates that the method is significantly worse than the method corresponding to the column. Poor performing methods will have many entries in their rows. On the other hand, a column that has entries is indicative of a method with superior performance.

Last Updated 21 May 1998
Comments and questions to: