This page includes datasets and resources for public use and research in natural language processing, computational linguistics, and cognitive science.
Historical noun-verb compositions dataset described here includes 10k entries of historical syntactic usages of verb-noun composition (e.g., abandon ship.dobj) that emerged over the past 150 years (1850-2000) in Google Syntactic-Ngrams English corpus.- Reference: Yu, L. and Xu, Y. (2021) Predicting emergent linguistic compositions through time: Syntactic frame extension via multimodal chaining. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
- Reference: Grewal, K. and Xu, Y. (2021) Chaining algorithms and historical adjective extension. In Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, and Simon Hengchen (eds.), Computational approaches to semantic change (Language Variation). Berlin: Language Science Press.
- Reference: Sun, Z., Zemel, R., and Xu, Y. (2021) A computational framework for slang generation. Transactions of the Association for Computational Linguistics, 9, 462-478.
- Reference: Kapron-King, A. and Xu, Y. (2021) A diachronic evaluation of gender asymmetry in euphemism. In Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change, ACL.
- Reference: Ferreira Pinto Jr., R. and Xu, Y. (2021) A computational theory of child overextension. Cognition, 206, 104472.
- Reference: Xie, J. Y., Hirst, G., and Xu, Y. (2020). Contextualized moral inference. arXiv preprint arXiv:2008.10762.
- Reference: Tanchip, C., Yu, L., Xu, A., and Xu, Y. (2020) Inferring symmetry in natural language. In Findings of the Association for Computational Linguistics: EMNLP 2020.
- Reference: Zinin, S. and Xu, Y. (2020) Corpus of Chinese dynastic histories: Gender analysis over two millennia. In Proceedings of the 12th International Conference on Language Resources and Evaluation.