Professor of Computational Linguistics

University of Toronto, Department of Computer Science

Research

Finding and applying threads of meaning in documents

In earlier research, Jane Morris and Graeme Hirst described a Roget's Thesaurus–based technique for finding lexical chains — ‘chains’ of semantically related words within a text — and showed that these chains, or threads of meaning, were indicative of the structure of a text. David St-Onge and Graeme Hirst subsequently redesigned the technique for implementation with WordNet, and showed that the chains could also be used as an easily computable representation of context for the detection and correction of real-word spelling errors. Stephen Green used the technique to automatically create hypertext links within and between documents by looking at their patterns of lexical chains and where and how the chains converge. In interactive information retrieval, for documents without abstracts (such as newspaper and magazine articles), this creates a characterization of the document that is far better for the browsing user than simply a list of keywords or the introductory paragraph. This work is continued in our project on semantic distance.

References

Green, Stephen.“Lexical semantics and automatic hypertext construction.” ACM Computing Surveys, 31(4es), December 1999, article number 22. [PDF]

Green, Stephen. “Building hypertext links by computing semantic similarity.” IEEE Transactions on Knowledge and Data Engineering, 11(5), September–October 1999, 713–730. [PDF]

Green, Stephen J. “Automated link generation: Can we do better than term repetition?" Computer Systems and ISDN Networks, 30(1–7), April 1998, 75–84. [PDF]

Green, Stephen J. “Automatically generating hypertext in newspaper articles by computing semantic relatedness.”Workshop on New Methods in Language Processing and Computational Natural Language Learning (NeMLaP3/CoNNL98), Sydney, January 1998, 101–110. [PDF]

Green, Stephen. “Building hypertext links in newspaper articles using semantic similarity.” Third Workshop on Applications of Natural Language to Information Systems (NLDB '97), Vancouver, June 1997, 178–190. [PDF]

Hirst, Graeme and St-Onge, David. “Lexical chains as representations of context for the detection and correction of malapropisms”. In: Christiane Fellbaum (editor), WordNet: An Electronic Lexical Database, Cambridge, MA: The MIT Press, 1998, 305-332. [PDF]

Green, Stephen. Automatically generating hypertext by computing semantic similarity, PhD thesis, University of Toronto, August 1997. [PDF]

St-Onge, David. Detecting and correcting malapropisms with lexical chains, MSc thesis, University of Toronto, March 1995. [PDF]

Morris, Jane and Hirst, Graeme. “Lexical cohesion computed by thesaural relations as an indicator of the structure of text.” Computational Linguistics, 17(1), March 1991, 21-48. [PDF]