| Research (Doctorate) | |
| Saif Mohammad Advisor: Dr.Graeme Hirst |
|
| Thesis (working title): Distributional Profiles of Concepts and their Application. | |
| You shall know a sense by the company it keeps. | |
| Publications | |
| ABSTRACT |
In the 6th century B.C.E, Aesop wrote a fable with the moral "You shall know a man by the company he keeps". Inspired by the line, Dr. J. R. Firth in 1957 stated, "You shall know a word by the company it keeps". The computational linguistics community embraced it and a lot of work using the idea followed—especially in determining distributional similarity of words. We argue that words when used in different senses keep different "company". For example, bank in the "river bank" sense will have words like coast, silt, and river around it, whereas in the "financial institution" sense, words like money, account, and ATM are more likely. Therefore, keeping a single profile for a word means that the individual profiles for its different senses are merged. While this has benefits (getting distributional similarity of words), we believe (and show) that keeping separate Distributional Profiles for the different senses of a word or Concepts (DPCs) has benefits of its own. Direct determination of DPCs requires sense-annotated data (sentences where one or more of the words are labeled with their intended sense). Manually sense-annotated data is available only in small quantities (that too predominantly for English) and it is expensive to create. This work focuses on creating DPCs without the use of manually sense-annotated data, relying simply on raw text and a published thesaurus. We then analyze the extent to which semantic properties of concepts can be inferred from their DPCs. Specifically, we explore the use of DPCs in a number of natural language tasks including: determining word sense dominance, unsupervised word sense disambiguation, and estimating relatedness of concepts.
Last updated: February 2006