Datasets Summary Table

All datasets currently available in delve are summarized in the table below. The interpretation of the columns are explained below.

Name Usage Origin Task
types Attrs Cases Methods View
Results

census-house A C R 139 22784 2 click

abalone A N R 9 4177 2 click

adult A N C 16 48842 0 none

splice A N C 61 3175 8 click

titanic A N C 4 2201 3 click

bank-32fh A S R 33 8192 9 none

bank-32fm A S R 33 8192 9 none

bank-32nh A S R 33 8192 9 none

bank-32nm A S R 33 8192 9 none

bank-8fh A S R 9 8192 9 none

bank-8fm A S R 9 8192 9 none

bank-8nh A S R 9 8192 9 none

bank-8nm A S R 9 8192 9 none

pumadyn-32fh A S R 33 8192 25 click

pumadyn-32fm A S R 33 8192 25 click

pumadyn-32nh A S R 33 8192 25 click

pumadyn-32nm A S R 33 8192 25 click

pumadyn-8fh A S R 9 8192 25 click

pumadyn-8fm A S R 9 8192 25 click

pumadyn-8nh A S R 9 8192 25 click

pumadyn-8nm A S R 9 8192 25 click

demo D A C R 5 2048 11 click

mushrooms D A C 23 8124 1 click

comp-activ D C R 27 8192 2 click

image-seg D C C 19 2310 8 click

boston D N R 14 506 10 click

kin-32fh D S R 33 8192 22 click

kin-32fm D S R 33 8192 22 click

kin-32nh D S R 33 8192 22 click

kin-32nm D S R 33 8192 22 click

kin-8fh D S R 9 8192 22 click

kin-8fm D S R 9 8192 22 click

kin-8nh D S R 9 8192 22 click

kin-8nm D S R 9 8192 22 click

letter D S C 17 20000 8 click

add10 H A R 11 9792 2 click

hwang H A R 12 13600 0 none

ringnorm H A C 21 7400 3 click

twonorm H A C 21 7400 3 click

Name	Usage	Origin	Task types	Attrs	Cases	Methods	View Results
census-house	A	C	R	139	22784	2	click
abalone	A	N	R	9	4177	2	click
adult	A	N	C	16	48842	0	none
splice	A	N	C	61	3175	8	click
titanic	A	N	C	4	2201	3	click
bank-32fh	A	S	R	33	8192	9	none
bank-32fm	A	S	R	33	8192	9	none
bank-32nh	A	S	R	33	8192	9	none
bank-32nm	A	S	R	33	8192	9	none
bank-8fh	A	S	R	9	8192	9	none
bank-8fm	A	S	R	9	8192	9	none
bank-8nh	A	S	R	9	8192	9	none
bank-8nm	A	S	R	9	8192	9	none
pumadyn-32fh	A	S	R	33	8192	25	click
pumadyn-32fm	A	S	R	33	8192	25	click
pumadyn-32nh	A	S	R	33	8192	25	click
pumadyn-32nm	A	S	R	33	8192	25	click
pumadyn-8fh	A	S	R	9	8192	25	click
pumadyn-8fm	A	S	R	9	8192	25	click
pumadyn-8nh	A	S	R	9	8192	25	click
pumadyn-8nm	A	S	R	9	8192	25	click
demo	D	A	C R	5	2048	11	click
mushrooms	D	A	C	23	8124	1	click
comp-activ	D	C	R	27	8192	2	click
image-seg	D	C	C	19	2310	8	click
boston	D	N	R	14	506	10	click
kin-32fh	D	S	R	33	8192	22	click
kin-32fm	D	S	R	33	8192	22	click
kin-32nh	D	S	R	33	8192	22	click
kin-32nm	D	S	R	33	8192	22	click
kin-8fh	D	S	R	9	8192	22	click
kin-8fm	D	S	R	9	8192	22	click
kin-8nh	D	S	R	9	8192	22	click
kin-8nm	D	S	R	9	8192	22	click
letter	D	S	C	17	20000	8	click
add10	H	A	R	11	9792	2	click
hwang	H	A	R	12	13600	0	none
ringnorm	H	A	C	21	7400	3	click
twonorm	H	A	C	21	7400	3	click

Summary table

The meaning of the columns are as follows:

Clicking on the dataset name in the left column displays the documentation for the dataset.
The suggested Usage of the dataset is coded as one of Assessment, Development or Historical. See the delve manual, chapter 3 for an elaboration of these terms.
The Origin of the dataset can be one of Natural, Cultivated, Simulated, Artificial. See the delve manual, chapter 3 for an elaboration of these terms.
Task type indicates the types of tasks associated with the dataset. We distinguish Regression, Classification and Density estimation task types depending on the prior information provided about the task's targets. It is possible for a dataset to have more than one kind of task type.
Attrs indicate the total number of attributes in the dataset.
Cases is the total number of cases in the dataset.
Methods shows the number of learning methods in the Delve repostory which have been run on one or more prototasks in the dataset. Clicking on the number lists the methods.
Clicking in the View Results column gives a summary plot of the performance of the different methods on the dataset.
This type of plot condenses a lot of information into a single figure. Briefly, it shows the expected performance of each method as the height of the solid bar on each of the training set sizes of a prototask. Squared-error loss is used for regression prototasks and 0-1 loss for classification prototasks. The standard error of the mean is indicated by the thin line on top of the bar. Below the plot are boxes in which the P value of the hypothesis that two learning method performances differ in a paired-test. (See the Delve manual, chapter 8 for details.) The methods compared are listed down the left edge. The ordering of the bars from left to right within a training set size is the same as the method list from top to bottom. Therefore, an entry in the (i,j) cell of the box shows the P value of the comparison between the i and j methods. Only P values less than 0.05 are indicated. Scanning along a row, quickly indicates how a method compares to others in the plot; an entry in the row indicates that the method is significantly worse than the method corresponding to the column. Poor performing methods will have many entries in their rows. On the other hand, a column that has entries is indicative of a method with superior performance.

Last Updated 21 May 1998
Comments and questions to: delve@cs.toronto.edu