This page includes datasets and resources for public use.
OpenSub-Slang dataset described here includes a large benchmark dataset (with 7,488 entries of slang usages extracted from movie subtitles) for evaluating informal language processing in (large) language models.- Reference: Sun, Z., Hu, Q., Gupta, R., Zemel, R., and Xu, Y. (2024) Toward informal language processing: Knowledge of slang in large language models. In Proceedings of the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics.
- Reference: Yu, L. and Xu, Y. (2021) Predicting emergent linguistic compositions through time: Syntactic frame extension via multimodal chaining. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
- Reference: Grewal, K. and Xu, Y. (2021) Chaining algorithms and historical adjective extension. In Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, and Simon Hengchen (eds.), Computational approaches to semantic change (Language Variation). Berlin: Language Science Press.
- Reference: Sun, Z., Zemel, R., and Xu, Y. (2021) A computational framework for slang generation. Transactions of the Association for Computational Linguistics, 9, 462-478.
- Reference: Kapron-King, A. and Xu, Y. (2021) A diachronic evaluation of gender asymmetry in euphemism. In Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change, ACL.
- Reference: Ferreira Pinto Jr., R. and Xu, Y. (2021) A computational theory of child overextension. Cognition, 206, 104472.
- Reference: Xie, J. Y., Hirst, G., and Xu, Y. (2020). Contextualized moral inference. arXiv preprint arXiv:2008.10762.
- Reference: Tanchip, C., Yu, L., Xu, A., and Xu, Y. (2020) Inferring symmetry in natural language. In Findings of the Association for Computational Linguistics: EMNLP 2020.
- Reference: Zinin, S. and Xu, Y. (2020) Corpus of Chinese dynastic histories: Gender analysis over two millennia. In Proceedings of the 12th International Conference on Language Resources and Evaluation.