Lexical nuances of style and meaning

The nuances of denotation and connotation that are a part of everyday language are a serious problem in many applications of computational linguistics. For example, each word in the output of a machine translation system should be the closest possible match in meaning and connotation to that in the input; but often, the choice must be made from a set of near-synonyms, none of which precisely matches the input. For example, a forest differs from a woods along several fuzzy dimensions of size and `wildness'; and the distinctions are not quite the same as those between the nearest German translations, Wald and Gehölz. Formalisms that are conventionally used in machine translation (and artificial intelligence in general) simply cannot support the kind of fine-grained representation that is necessary for this task. Researchers working on lexical choice in natural language generation and machine translation have assumed extremely simplistic models of synonymy, and instead concentrated on important but orthogonal issues such as filling out verb frames.

Manfred Stede's system, MOOSE, can produce a range of different paraphrases from the same input representation. The system is designed in such a way that the paraphrasing mechanism extends naturally to a multilingual (English and German) generator. The focus of the system is on lexical paraphrases, and one of its contributions of the research is in identifying, analyzing and extending relevant linguistic research so that it can be used to handle the problems of lexical semantics in a language generation system. The lexical entries are more complex than in previous generators, and they separate the various aspects of word meaning, so that different ways of paraphrasing can be systematically related to the different motivations for saying a sentence in a particular way. One result of this is a formalization of a number of verb alternations.

Philip Edmonds extended this work to account for near-synonyms. He developed a new method of representation, supplementary to conventional formalisms, that permits the kind of very fine-grained distinctions that near-synonyms require, both within and across languages. In this method, groups of near-synonyms (possibly in more than one language) are represented by a single concept in the ontology and then differentiated one from another at the sub-conceptual level. The approach permits the representation of lexical connotations and relative emphases and nuances of meaning of the members of each group of near-synonyms.

However, to actually use this kind of representation requires new type of lexical resource: a lexical knowledge base giving information about each near-synonym group in the language, and mappings between near-synonym groups across languages. Diana Inkpen developed a method to automatically acquire a knowledge-base of near-synonym differences with an unsupervised decision-list algorithm that learns extraction patterns from a special dictionary of synonym differences. The patterns are then used to extract knowledge from the text of the dictionary. The initial knowledge-base is later enriched with information from other machine-readable dictionaries. Information about the collocational behavior of the near-synonyms is acquired from free text. The knowledge-base is used by Xenon, a natural language generation system that shows how the new lexical resource can be used to choose the best near-synonym in specific situations.

This work is a continuation of an earlier project on syntactic nuances of style and meaning.

References:

Return to Research by Graeme Hirst and students