Census dataset

The dataset was designed on the basis of data provided by US Census Bureau (under Lookup access: Summary Tape File 1).
The data were collected as part of the 1990 US census. These are mostly counts cumulated at different survey levels. For the purpose of this data set a level State-Place was used. Data from all states was obtained. Most of the counts were changed into appropriate proportions.

There are 4 prototasks:

  1. house-price-8H
  2. house-price-8L
  3. house-price-16H
  4. house-price-16L

These are all concerned with predicting the median price of the house in the region based on demographic composition and a state of housing market in the region. A number in the name signifies dimensionality of the input. A following letter denotes a very rough approximation to the difficulty of the task. For Low task difficulty, more correlated inputs were choosen as signified by univariate smooth fit of that input on the target. Tasks with High difficulty have had their inputs choosen to make the modeling more difficult due to higher variance or lower correlation of the inputs to the target.

Each prototask has 6 tasks of sizes: 64 128 256 512 1024 2048.
The test sets are hierarchical - within each task, separate test set is used for each instance.

The inputs used by any of the four prototasks are as follows:

  1. P1 ---- total persons count in the region
  2. P2 ---- total families' count in the region
  3. P3 ---- total number of households (HH's)
  4. P5.1 --- percentage of males
  5. P6.2 --- percentage of black people
  6. P6.4 --- percentage of people which are of Asian or Pacific Islander race
  7. P11.3 -- percentage of people between 25-64 years of age
  8. P11.4 -- percentage over 64 years old
  9. P14.6 -- percentage of never-married females
  10. P14.9 -- percentage widowed females
  11. P15.1 -- percentage of people in family HH's
  12. P15.3 -- percentage of people in group quarters (incl jails)
  13. P16.1 -- percentage of HH's with 1 person
  14. P16.2 -- percentage of HH's with 2 or more persons which are family HH-lds
  15. P17A -- average family size
  16. P18.2 -- percentage of HH's with 1+ persons under 18 which are non-family HH-lds
  17. P19.2 -- percentage of HH's with black Householder (HH'lder)
  18. P19.4 -- percentage of HH's with asian HH'lder
  19. P20.1 -- percentage of HH's with Hispanic HH'lder
  20. P25.1 -- percentage of HH's with more then two persons 65 years old or more
  21. P26.1 -- percentage of HH's with more then one non-relatives living in
  22. P27.4 -- percentage of HH-lds which are non-family with 2+ persons
  23. H1 ---- total number of Housing Units (HU's)
  24. H2.1 --- percentage of HU's occupied
  25. H2.2 --- percentage of HU's vacant
  26. H3.1 --- percentage of occupied HU's which are owner-occupied
  27. H5.2 -- percentage of vacant HU's which are for sale only
  28. H5.6 -- percentage of vacant HU's which are not for rent, sale, migrant workers nor for seasonal, recreational or occasional use
  29. H8.1 --- percentage of occ-ed HU's with white HH-lder
  30. H8.2 --- percentage of occ-ed HU's with black HH-lder
  31. H10.1 -- percentage of occ-ed HU's with HH-lder not of Hispanic origin
  32. H10.2 -- percentage of occ-ed HU's with HH-lder of Hispanic origin
  33. H13.1 -- percentage of HU's with 1-4 rooms
  34. H15.1 -- Average number of rooms in an owner-occupied HU's
  35. H18.A -- average number of persons per ownOcc HU's
  36. H40.4 -- percentage of vacant-for-sale HU's vacant more then 6 months

Contributed by: Rafal Kustra.


Last Updated 8 October 1996
Comments and questions to: delve@cs.toronto.edu
Copyright