Distributional Measures of Concept-Distance:
A Task-oriented Evaluation

Saif Mohammad and Graeme Hirst

In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2006), July 2006, Sydney, Australia.
ABSTRACT: We propose a framework to derive the distance between concepts from distributional measures of word co-occurrences. We use the categories in a published thesaurus as coarse-grained concepts, allowing all possible distance values to be stored in a concept--concept matrix roughly .01% the size of that created by existing measures. We show that the newly proposed concept-distance measures outperform traditional distributional word-distance measures in the tasks of (1) ranking word pairs in order of semantic distance, and (2) correcting real-word spelling errors. In the latter task, of all the WordNet-based measures, only that proposed by Jiang and Conrath outperforms the best distributional concept-distance measures.

THE PAPER: In PDF and PostScript format.

  Publications             Home