This page presents datasets and resources for public use and research in natural language processing, computational linguistics and cognition.

Child overextension dataset described here includes over 230 word-referent pairs of overextended noun usages (e.g., ball → "balloon") recorded in young children, with an accompanying code repository. Moral vignettes dataset described here includes over 700 moral vignettes (e.g., "people are starving animals to death") and human judged moral categories, with an accompanying code repository. Symmetry inference sentence dataset (SIS) described here includes 400 sentences of naturalistic usage for 40 English verbs that range from highly symmetric (e.g., A marry B = B marry A) to asymmetric (e.g., A eat B != B eat A) predicates, with an accompanying code repository. Corpus of Chinese Dynastic Histories dataset (CCDH) described here contains text corpora of 24 Chinese Dynastic Histories spanning approximately 2,000 years, from the 3rd century BCE to the 18th century CE.