M.Sc. Thesis abstract

We develop a general feature space that can be used for the semantic classification of English verbs. We design a technique to extract these features from a large corpus of English, while trying to maintain portability to other languages---the only language-specific tools we use to extract our core features are a part-of-speech tagger and a partial parser. We show that our general feature space reduces the chance error rate by 40% or more in ten experiments involving from two to thirteen verb classes. We also show that it usually performs as well as features that are selected using specific linguistic expertise, and that it is therefore unnecessary to manually do linguistic analysis for each class distinction of interest. Finally, we consider the use of an automatic feature selection technique, stepwise feature selection, and show that it does not work well with our feature space.

Download:  PS file (1000 Kb) PDF file (568 Kb) Hyperlinked PDF file (768 Kb, not recommended for printing).
Compact format:  PS file (976 Kb) PDF file (544 Kb) Hyperlinked PDF file (744 Kb, not recommended for printing).