Finding and applying threads of meaning in documents

In earlier research, Jane Morris and Graeme Hirst described a Roget's Thesaurus--based technique for finding lexical chains---`chains' of semantically related words within a text---and showed that these chains, or threads of meaning, were indicative of the structure of a text. David St-Onge and Graeme Hirst subsequently redesigned the technique for implementation with WordNet, and showed that the chains could also be used as an easily computable representation of context for the detection and correction of real-word spelling errors. Stephen Green used the technique to automatically create hypertext links within and between documents by looking at their patterns of lexical chains and where and how the chains converge. In interactive information retrieval, for documents without abstracts (such as newspaper and magazine articles), this creates a characterization of the document that is far better for the browsing user than simply a list of keywords or the introductory paragraph.

This work is continued in our project on semantic distance.

References:

Also: Green thesis, St-Onge thesis.

Return to Research by Graeme Hirst and students