Professor of Computational Linguistics

University of Toronto, Department of Computer Science

Research

Resolving shell nouns

Shell nouns are abstract nouns, such as fact, issue, idea, and problem, which, among other functions, facilitate efficiency by avoiding repetition of long stretches of text. An example is shown in (1) below. Shell nouns encapsulate propositional content, and the process of identifying this content is referred to as shell noun resolution.

(1) Living expenses are much lower in rural India than in New York, but this fact is not fully captured if prices are converted with currency exchange rates.

Our research, led by Varada Kolhatkar and in collaboraton with Heike Zinsmeister, developed computational methods for resolving shell nouns — that is, determining what they refer to. The research is guided by three primary questions:How can an automated process determine the interpretation of shell nouns?To what extent can knowledge derived from the linguistics literature help in this process?To what extent are speakers of English able to interpret shell nouns?

We began with a pilot study to annotate and resolve occurrences of the specific shell-noun phrase this issue in Medline abstracts. The method involved manual annotation, feature extraction, and supervised machine learning.The results illustrated the feasibility of annotating and resolving shell nouns, at least in the closed domain of Medline abstracts.We then developed general algorithms to resolve a variety of shell nouns in the newswire domain. The primary challenge was that each shell noun has its own idiosyncrasies and there was no annotated data available for this task. We developed a number of computational methods for resolving shell nouns that do not rely on manually annotated data.The methods combine lexico-syntactic knowledge and features derived from the linguistic literature and techniques in statistical natural language processing.

For evaluation, we used crowdsourcing to develop annotated corpora for shell nouns and their content. The annotation results showed that the annotators agreed to a large extent on the shell content. The evaluation of resolution methods showed that knowledge derived from the linguistics literature helps in the process of shell noun resolution, at least for shell nouns with strict semantic expectations.

References

Kolhatkar, Varada and Hirst, Graeme. “Resolving `this-issue' anaphora.” 2012 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012), 1255–1265, July 2012, Jeju, Korea. [PDF]

Kolhatkar, Varada; Zinsmeister, Heike; and Hirst, Graeme. “Annotating anaphoric shell nouns with their antecedents.” Proceedings, The 7th Linguistic Annotation Workshop & Interoperability with Discourse, Sofia, August 2013, 112–121. [PDF]

Kolhatkar, Varada; Zinsmeister, Heike; and Hirst, Graeme. “Interpreting anaphoric shell nouns using cataphoric shell nouns as training data.” Proceedings, 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, October 2013, 300310. [PDF]

Kolhatkar, Varada and Hirst, Graeme. “Resolving shell nouns.Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP-2014), Doha, Qatar, October 2014, 499–510. [PDF]

Kolhatkar, Varada. Resolving Shell Nouns. PhD thesis, Department of Computer Science, University of Toronto, 2015 [PDF]