Research publications

The following are research papers published by our group, grouped by author. Multi-authored papers are listed under the name of each author. Papers are listed in reverse chronological order, with more recent papers first. Only authors' publications that are related to research in or with the University of Toronto computational linguistics group are included.

For copies of papers not available for download, please email Graeme Hirst.

Collapse All
Yawar Ali (1)

Understanding adjectives,
Yawar Ali,
1985
Master's Thesis. Department of Computer Science, University of Toronto. January. Published as technical report CSRI-167.
Abstract

The first problem is to determine exactly what each adjective modifies. In general, this can only be done by taking account of the semantic properties of the adjective in question, as well as those of other adjectives to its right and of the noun itself. ``Real-world'' knowledge and contextual factors also play a role in this process. This is addressed by developing a classification scheme for adjectives which allows us to substantially reduce the number of candidate interpretations, in some cases to a single one. A system is presented which takes account of the disparate semantic behaviour of different classes of adjectives, word order, punctuation in the noun phrase, and a frame-based store of real-world knowledge, in order to determine the scope of adjectives within a noun phrase.

The second problem is to construct a representation of the description embodied in such a noun phrase. Here, it is desirable that the structure of the representation correspond to the structure of modification within the phrase. Particular adjectives are taken to indicate restrictions on the values that objects may take on for associated properties. These properties may be featural, dimensional, or functional in nature. Frame-like structures are used to represent the generic concepts that are taken to be associated with noun phrases.


(bibtex)
Afra Alishahi (5)

A computational model for early Argument Structure Acquisition,
Afra Alishahi and Suzanne Stevenson,
2007,
Submitted
[Download pdf] (bibtex)

A computational usage-based model for learning general properties of semantic roles,
Afra Alishahi and Suzanne Stevenson,
2007
Proceedings of the 2nd European Cognitive Science Conference
Delphi, Greece
[Download pdf] (bibtex)

A cognitive model for the representation and acquisition of verb selectional preferences,
Afra Alishahi and Suzanne Stevenson,
2007
Proceedings of the ACL-2007 Workshop on Cognitive Aspects of Computational Language Acquisition
Prague, Czech Republic
[Download pdf] (bibtex)

A probabilistic model of early argument structure acquisition,
Afra Alishahi and Suzanne Stevenson,
2005
Proceedings of the 27th Annual Conference of the Cognitive Science Society, July, Stresa, Italy
Abstract
We present a computational model of usage-based learning of verb argument structure in young children. The model integrates Bayesian classification and prediction to learn from utterances paired with appropriate semantic representations. The model balances item-based and class-based knowledge in language use, demonstrating appropriate word order generalizations, and recovery from overgeneralizations with no negative evidence or change in learning parameters.

[Download pdf] (bibtex)

The acquisition and use of argument structure constructions,
Afra Alishahi and Suzanne Stevenson,
2005
Proceedings of the Second Workshop on Psychocomputational Models of Human Language Acquisition, pp. 82--90, June, Ann Arbor
Abstract
We present a Bayesian model for the representation, acquisition, and use of argument structure constructions, which is founded on a novel view of constructions as a mapping of a syntactic form to a probability distribution over semantic features. Our computational experiments demonstrate the feasibility of learning general constructions from individual examples of verb usage, and show that the acquired knowledge generalizes to novel or low-frequency situations in language use.

[Download pdf] (bibtex)
Daniel Ansari (2)

Generating warning instructions by planning accidents and injuries,
Daniel Ansari and Graeme Hirst,
1998
Proceedings, 9th International Workshop on Natural Language Generation, pp. 118--127, August, Niagara-on-the-Lake, Ontario
Abstract
We present a system for the generation of natural language instructions, as are found in instruction manuals for household appliances, that is able to automatically generate safety warnings to the user at appropriate points. Situations in which accidents and injuries to the user can occur are considered at every step in the planning of the normal operation of the device, and these ``injury sub-plans'' are then used to instruct the user to avoid these situations.

[Download pdf] (bibtex)

Deriving procedural and warning instructions from device and environment models,
Daniel Ansari,
1995
Master's Thesis. Department of Computer Science, University of Toronto. June. Published as technical report CSRI-329 .
Abstract

There has been much interest lately in the automatic generation of documentation; however, much of this research has not considered the cost involved in the production of the natural language generation systems to be a major issue: the benefits obtained from automating the construction of the documentation should outweigh the cost of designing and coding the knowledge base.

This study is centred on the generation of instructional text, as is found in instruction manuals for household appliances. We show how knowledge about a device that already exists as part of the engineering effort, together with adequate, domain-independent knowledge about the environment, can be used for reasoning about natural language instructions.

The knowledge selected for communication can be planned for, and all the knowledge necessary for the planning should be contained (possibly in a more abstract form) in the knowledge of the artifact together with the world knowledge. We present the planning knowledge for two example domains, in the form of axioms in the situation calculus. This planning knowledge formally characterizes the behaviour of the artifact, and it is used to produce a basic plan of actions that both the device and user take to accomplish a given goal. We explain how the instructions are generated from the basic plan. This plan is then used to derive further plans for states to be avoided. We will also explain how warning instructions about potentially dangerous situations are generated from these plans. These ideas have been implemented using Prolog and the Penman natural language generation system.

Finally, this thesis makes the claim that the planning knowledge should be derivable from the device and world knowledge; thus the need for cost effectiveness would be met. To this end, we suggest a framework for an integrated approach to device design and instruction generation.


[Download gz] (bibtex)
Melanie Baljko (7)

Computational simulations of mediated face-to-face multimodal communication,
Melanie Baljko,
2004
Ph.D. Thesis. Department of Computer Science, University of Toronto. July.
(bibtex)

Articulatory adaptation in multimodal communicative action,
Melanie Baljko,
2001
Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL) Workshop on Adaptation in Dialogue Systems (Cindi Thompson and Tim Paek and and Eric Horvitz ed.), pp. 73--74, June, Pittsburgh PA
[Download pdf] (bibtex)

The evaluation of microplanning and surface realization in the generation of multimodal acts of communication,
Melanie Baljko,
2001
Proceedings of the Workshop on Multimodal Communication and Context in Embodied Agents Fifth International Conference on Autonomous Agents (AA'01) (Catherine Pelachaud and Isabella Poggi ed.), pp. 89--94, May, Montreal Quebec
Abstract
We describe an application domain which requires the computational simulation of human--human communication in which one of the interlocutors has an expressive communication disorder. The importance and evaluation of a process, called here microplanning and surface realization, for such communicative agents is discussed and a related exploratory study is described.

[Download pdf] (bibtex)

Incorporating multimodality in the design of interventions for communication disorders,
Melanie Baljko,
2000
Proceedings of the 4th Swedish Symposium on Multimodal Communication (SSoMC'00) (Patric Dahlqvist ed.), pp. 13--14, October, Stockholhm University/KTH
[Download pdf] (bibtex)

The computational simulation of multimodal, face-to-face communication constrained by physical disabilities,
Melanie Baljko,
2000
Proceedings, Workshop on Integrating Information from Different Channels in Multi-Media-Contexts European Summer School in Logic, Language, and Information, pp. 1--10, August, Birmingham UK
Abstract
In face-to-face interaction, interlocutors often use several modes of articulation simultaneously. An interlocutor's communication will often be multimodal even when he or she knows the other interlocutors cannot perceive all of the modes of communication (e.g., people often gesture while speaking on the telephone). Our present inquiry --- which incorporates computational modeling in conjunction with the analysis of, and comparison to empirical data --- is motivated by the desire to understand a particular design space and is relevant to other research that seeks to understand these ``complex signals'' in human-human and human-computer interaction.

[Download pdf] (bibtex)

The importance of subjectivity in computational stylistic assessment,
Melanie Baljko and Graeme Hirst,
1999
Text Technology, 9(1), pp. 5--17, Spring
Abstract
Often, a text that has been written collaboratively does not ``speak with a single voice.'' Such a text is stylistically incongruous --- as opposed to merely stylistically inconsistent, which might or might not be deleterious to the quality of the text. This widespread problem reduces the overall quality of a text and reflects poorly on its authors. We would like to design a facility for revising style that augments the software environments in which collaborative writing takes place, but before doing so, a question must be answered: what is the role of subjectivity in stylistic assessment for a style-revision facility? We describe an experiment designed to measure the agreement between the stylistic assessments performed by a group of subjects based on a free-sort of writing samples. The results show that there is a statistically significant level of agreement between the subjects' assessments and, furthermore, there was a small number of groupings (three) of even more similar stylistic assessments. The results also show the invalidity of using authorship as an indicator of the reader's perceptions of stylistic similarity between the writing samples.

[Download pdf] (bibtex)

Ensuring stylistic congruity in collaboratively written text: Requirements analysis and design issues,
Melanie Baljko,
1997
Master's Thesis. Department of Computer Science, University of Toronto. May. Published as technical report CSRI-365 .
Abstract
Often, texts that have been written collaboratively do not ``speak with a single voice.'' Eliminating stylistic incongruity, a difficult undertaking for both collaborative and singular writers, is the desired function of a software tool. This thesis describes the first cycle of an iterative software development process towards meeting this goal. The user requirements are analyzed with respect to a model that synthesizes established research, and then the requirements are taxonomized. Then, a framework for performing computational stylistic assessments is developed for later tool design. An experiment designed to measure the subjectivity in stylistic assessment --- a relevant issue for making deterministic, computational stylistic assessments --- was performed; the results indicate that future stylistic assessment tools must account for different patterns of assessment. Several design directions motivated by these results are suggested.

[Download pdf] (bibtex)
Faye Baron (2)

Identifying non-compositional idioms in text using WordNet synsets,
Faye Baron,
2007
Master's Thesis. Department of Computer Science, University of Toronto.
Abstract

Any natural language processing system that does not have a knowledge of non-compositional idioms and their interpretation will make mistakes. Previous authors have attempted to automatically identify these expressions through the property of non-substitutability: similar words cannot be successfully substituted for words in non-compositional idiom expressions without changing their meaning.

In this study, we use the non-substitutability property of idioms to contrast and expand the ideas of previous works, drawing on WordNet for the attempted substitutions. We attempt to determine the best way to automatically identify idioms through the comparison of algorithms including frequency counts, pointwise mutual information and PMI ranges; the evaluation of the importance of relative word position; and the assessment of the usefulness of syntactic relations. We discover that many of the techniques which we try are not useful for identifying idioms and confirm that non-compositionality doesn't appear to be a necessary or sufficient condition for idiomaticity.


[Download pdf] (bibtex)

Collocations as cues to semantic orientation,
Faye Baron and Graeme Hirst,
2003
[Download pdf] (bibtex)
Benjamin Bartlett (2)

Failing to find paraphrases using PNrule,
Benjamin Bartlett,
2007
January
[Download pdf] (bibtex)

Finding paraphrases using PNrule,
Benjamin Bartlett,
2006
Master's Thesis. Department of Computer Science, University of Toronto. September.
Abstract
In this thesis, we attempt to use a machine-learning algorithm PNrule, along with simple lexical and syntactic measures to detect paraphrases in cases where their existence is rare. We choose PNrule because it was specifically developed for classification in instances where the target class is rare compared to other classes within the data. We test our system both on a dataset we develop based on movie reviews, and on the PASCAL RTE dataset; we obtain poor results on the former, and moderately good results on the latter. We examine why this is the case, and suggest improvements for future research.

[Download pdf] (bibtex)
Julian Brooke (1)

Patterns in the stream: Exploring the interaction of polarity, topic, and discourse in a large opinion corpus,
Julian Brooke and Matthew Hurst,
2009
Proceedings of the Conference on Information and Knowledge Management (CIKM), 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion Measurement, pp. 1--8, November, Hong Kong, China
Abstract
A qualitative examination of review texts suggests that there are consistent patterns to how topic and polarity are expressed in discourse. These patterns are visible in the text and paragraph structure, topic depth, and polarity flow. In this paper, we employ sentence-level sentiment classifiers and a hand-built tree ontology to investigate whether these patterns can be quantitatively identified in a large corpus of video game reviews. Our results indicate that the beginning and the end of major textual units (e.g. paragraphs) stand out in the flow of texts, showing a concentration of reliable opinion and key topic aspects, and that there are other important regularities in the expression of opinion and topic relevant to their ordering and the discourse markers with which they appear.

[Download pdf] (bibtex)
Barbara Brunson (1)

A processing model for Warlpiri syntax and implications for linguistic theory,
Barbara Brunson,
1986
Master's Thesis. Department of Linguistics, University of Toronto. September. Published as technical report CSRI-206.
Abstract

Much of the development of the current Government-Binding (GB) theory of syntax has progressed independently of concerns raised in theories of language processing. Similarly, models of syntactic processing are often proposed that lack any underpinning in syntactic theory. The work described in this report focuses on the language Warlpiri, an Australian aboriginal language with properties that are difficult to reconcile with most theories of Universal Grammar -- properties such as free word-order and discontinuity. This language is studied from the two-fold perspective of establishing a linguistically and computationally sound processing model. This forces the linguistic model to be sufficiently precise to satisfy the demands of implementation as well as forcing the implementation to proceed in a linguistically principled way.

This report presents a portion of Warlpiri grammar in a revised GB-based account, addressing the issues of parsability, as well as more theoretical syntactic issues, that together force a reassessment and parametrization of certain linguistic principles. In particular, a revised version of theta theory and the notion of thematic identification are readily interpreted into processing strategies that extend naturally to deal with adjuncts and non-subcategorized arguments in a wide range of languages. The complementary nature of the syntax and morpho-syntax in the satisfaction of syntactic principles as well as in the construction of syntactic representations is addressed, as is the crucial relevance of prosodic information for preserving determinism in the parsing algorithm.


(bibtex)
Alexander Budanitsky (6)

Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model,
L. Amber Wilcox-O'Hearn and Graeme Hirst and Alexander Budanitsky,
2008
Proceedings, 9th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2008) (Lecture Notes in Computer Science 4919, Springer-Verlag) (Alexander Gelbukh ed.), pp. 605--616, February, Haifa
Conference poster with updated results available here
Abstract
The trigram-based noisy-channel model of real-word spelling-error correction that was presented by Mays, Damerau, and Mercer in 1991 has never been adequately evaluated or compared with other methods. We analyze the advantages and limitations of the method, and present a new evaluation that enables a meaningful comparison with the WordNet-based method of Hirst and Budanitsky. The trigram method is found to be superior, even on content words. We then improve the method further and experiment with a new variation that optimizes over fixed-length windows instead of over sentences.

[Download pdf] (bibtex)

Evaluating WordNet-based measures of semantic distance,
Alexander Budanitsky and Graeme Hirst,
2006
Computational Linguistics, 32(1), pp. 13--47, March
Abstract
The quantification of lexical semantic relatedness has many applications in NLP, and many different measures have been proposed. We evaluate five of these measures, all of which use WordNet as their central resource, by comparing their performance in detecting and correcting real-word spelling errors. An information-content--based measure proposed by Jiang and Conrath is found superior to those proposed by Hirst and St-Onge, Leacock and Chodorow, Lin, and Resnik. In addition, we explain why distributional similarity is not an adequate proxy for lexical semantic relatedness.

[Download pdf] (bibtex)

Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model,
L. Amber Wilcox-O'Hearn and Graeme Hirst and Alexander Budanitsky,
2006
February Superseded by 2008 CICLing version.
Abstract
The trigram-based noisy-channel model of real-word spelling-error correction that was presented by Mays, Damerau, and Mercer in 1991 has never been adequately evaluated or compared with other methods. We analyze the advantages and limitations of the method, and present a new evaluation that enables a meaningful comparison with the WordNet-based method of Hirst and Budanitsky. The trigram method is found to be superior, even on content words. We then improve the method further and experiment with a new variation that optimizes over fixed-length windows instead of over sentences.

[Download pdf] (bibtex)

Correcting real-word spelling errors by restoring lexical cohesion,
Graeme Hirst and Alexander Budanitsky,
2005
Natural Language Engineering, 11(1), pp. 87--111, March
Get paper from publisher's Web site
Abstract
Spelling errors that happen to result in a real word in the lexicon cannot be detected by a conventional spelling checker. We present a method for detecting and correcting many such errors by identifying tokens that are semantically unrelated to their context and are spelling variations of words that would be related to the context. Relatedness to context is determined by a measure of semantic distance initially proposed by Jiang and Conrath (1997). We tested the method on an artificial corpus of errors; it achieved recall of up to 50% and precision of 18 to 25% -- levels that approach practical usability.

[Download pdf] (bibtex)

Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures,
Alexander Budanitsky and Graeme Hirst,
2001
Workshop on WordNet and Other Lexical Resources Second meeting of the North American Chapter of the Association for Computational Linguistics, pp. 29--34, June, Pittsburgh PA
Abstract
Five different proposed measures of similarity or semantic distance in WordNet were experimentally compared by examining performance their in a real-word spelling correction system. It was found that Jiang and Conrath's measure gave the best results overall. That of Hirst and St-Onge seriously over-related, that of Resnik seriously under-related, and those of Lin and of Leacock and Chodorow fell in between.

[Download pdf] (bibtex)

Lexical Semantic Relatedness and its Application in Natural Language Processing,
Alexander Budanitsky,
1999
Department of Computer Science, University of Toronto, Technical Report Number CSRG-390, August
Abstract
A great variety of natural language processing tasks, from word sense disambiguation to text summarization to speech recognition, rely heavily on the ability to measure semantic relatedness or distance between words of a natural language. This report is a comprehensive study of recent computational methods of measuring lexical semantic relatedness. A survey of methods, as well as their applications, is presented, and the question of evaluation is addressed both theoretically and experimentally. Application to the specific task of intelligent spelling checking is discussed in detail: the design of a prototype system for the detection and correction of malapropisms (words that are similar in spelling or sound to, but quite different in meaning from, intended words) is described, and results of experiments on using various measures as plug-ins are considered. Suggestions for research directions in the areas of measuring semantic relatedness and intelligent spelling checking are offered.

[Download pdf] (bibtex)
Mark Catt (2)

An intelligent CALI system for grammatical error diagnosis,
Mark Catt and Graeme Hirst,
1990
Computer Assisted Language Learning, 3, pp. 3--26, November
Abstract

This paper describes an approach to computer-assisted language instruction based on the application of artificial intelligence technology to grammatical error diagnosis. We have developed a prototype system, Scripsi, capable of recognising a wide range of errors in the writing of language learners. Scripsi not only detects ungrammaticality, but hypothesizes its cause and provides corrective information to the student. These diagnostic capabilities rely on the application of a model of the learner's linguistic knowledge.

Scripsi operates interactively, accepting the text of the student's composition and responding with diagnostic information about its grammatical structure. In contrast to the narrowly defined limits of interaction available with automated grammatical drills, the framework of interactive composition provides students with the opportunity to express themselves in the language being learned.

Although Scripsi's diagnostic functions are limited to purely structural aspects of written language, the way is left open for the incorporation of semantic processing. The design of Scripsi is intended to lay the groundwork for the creation of intelligent tutoring systems for second language instruction. The development of such expertise will remedy many of the deficiencies of existing technology by providing a basis for genuinely communicative instructional tools --- computerised tutors capable of interacting linguistically with the student.

The research is based on the assumption that the language produced by the language learner, ``learner language'', differs in systematic ways from that of the native speaker. In particular, the learner's errors can be attributed primarily to two causes: the operation of universal principles of language acquisition and the influence of the learner's native language. A central concern in the design of Scripsi has been the incorporation of a psychologically sound model of the linguistic competence of the second language learner.


[Download pdf] (bibtex)

Intelligent diagnosis of ungrammaticality in computer-assisted language instruction,
Mark Catt,
1988
Master's Thesis. Department of Computer Science, University of Toronto. October. Published as technical report CSRI-218.
Abstract

We describe an approach to grammatical error diagnosis in computer-assisted language instruction (CALI). Our prototype system, Scripsi, employs a model of the linguistic competence of the second language learner in diagnosing ungrammaticality in learners' writing. Scripsi not only detects errors, but hypothesises their cause and provides corrective information to the student.

Scripsi's grammatical model reflects the results of research in second language acquisition, which has identified language transfer and rule overgeneralisation as the chief sources of error in learner language. Thus, in characterizing the learner's ``transitional competence'', we model not only the grammar of the learner's native language, but also the strategies that give rise to overgeneralisation. Although the approach is language-independent, our implementation targets French-speaking and Chinese-speaking learners of English.

The computational realization of the model assumes that linguistic behaviour is rule-governed. We have adopted a rule-oriented grammatical formalism in which the processes of transfer and overgeneralisation are readily interpreted. Linguistic rules are expressed in a feature-based grammatical framework closely related to the Standard Theory of transformational grammar. We have extended the shift-reduce parsing algorithm in order to accommodate context-sensitive and transformational aspects of the formalism.

We argue that the development of expertise in intelligent grammatical diagnosis is a prerequisite for the next generation of CALI tools -- genuinely communicative systems capable of interacting linguistically with the student.


[Download pdf] (bibtex)
Christopher Collins (4)

DocuBurst: Radial Space-Filling Visualization of Document Content,
Christopher Collins,
2007
Knowledge Media Design Institute, University of Toronto, Technical Report Number KMDI-TR-2007-1
Toronto, Canada
Abstract
We present the first visualization of document content which takes advantage of the human-created structure in lexical databases. We use an accepted design paradigm to generate visualizations which improve the usability and utility of Word- Net as the backbone for document content visualization. A radial, space-filling layout of hyponymy (IS-A relation) is presented with interactive techniques of zoom, filter, and details-on-demand for the task of document visualization. The techniques can be generalized to multiple documents.

[Download pdf] (bibtex)

Visualizing Uncertainty in Lattices to Support Decision-Making,
Christopher Collins and Sheelagh Carpendale and Gerald Penn,
2007
Proceedings of the Eurographics/IEEE VGTC Symposium on Visualization, May, Norrköping, Sweden
http://diglib.eg.org
Abstract
Lattice graphs are used as underlying data structures in many statistical processing systems, including natural language processing. Lattices compactly represent multiple possible outputs and are usually hidden from users. We present a novel visualization intended to reveal the uncertainty and variability inherent in statistically-derived lattice structures. Applications such as machine translation and automated speech recognition typically present users with a best-guess about the appropriate output, with apparent complete confidence. Through case studies we show how our visualization uses a hybrid layout along with varying transparency, colour, and size to reveal the lattice structure, expose the inherent uncertainty in statistical processing, and help users make better-informed decisions about statistically-derived outputs.

(bibtex)

Head-driven parsing for word lattices,
Christopher Collins and Bob Carpenter and Gerald Penn,
2004
Proceedings of the 42nd Annual Meeting of the Association for Computation Linguistics, July, Barcelona, Spain
Abstract
We present the first application of the head-driven statistical parsing model of Michael Collins as a simultaneous language model and parser for large-vocabulary speech recognition. The model is adapted to an online left-to-right chart-parser for word lattices, integrating acoustic, n-gram, and parser probabilities. The parser uses structural and lexical dependencies not considered by n-gram models, conditioning recognition on more linguistically-grounded relationships. Experiments on the Wall Street Journal treebank and lattice corpora show word error rates competitive with the standard n-gram language model while extracting additional structural information useful for speech understanding.

[Download pdf] (bibtex)

Head-driven probabilistic parsing for word lattices,
Christopher Collins,
2004
Master's Thesis. Department of Computer Science, University of Toronto. January.
Abstract

This thesis presents the first application of the state-of-the-art head-driven statistical parsing model of Michael Collins as a simultaneous language model and parser for large-vocabulary speech recognition. The model is adapted to an online left-to-right chart-parser for word lattices, integrating acoustic, n-gram and parser probabilities.

The parser uses structural and lexical dependencies not considered by n-gram models, conditioning recognition on more linguistically-grounded relationships. By preferring paths through the word lattice for which a probable parse exists, word error rate can be reduced and important syntactic and semantic relationships can be determined in a single step process.

New forms of heuristic search and pruning are employed to improve efficiency. Experiments on the Wall Street Journal treebank and lattice corpora show word error rates competitive with the standard n-gram language model while extracting additional structural information useful for speech understanding.


[Download pdf] (bibtex)
Paul Cook (7)

An Unsupervised Model for Text Message Normalization,
Paul Cook and Suzanne Stevenson,
2009
Proceedings of the NAACL HLT 2009 Workshop on Computational Approaches to Linguistic Creativity, pp. 71--79, June, Boulder, Colorado
Abstract
Cell phone text messaging users express themselves briefly and colloquially using a variety of creative forms. We analyze a sample of creative, non-standard text message word forms to determine frequent word formation processes in texting language. Drawing on these observations, we construct an unsupervised noisy-channel model for text message normalization. On a test set of 303 text message forms that differ from their standard form, our model achieves 59% accuracy, which is on par with the best supervised results reported on this dataset.

[Download pdf] (bibtex)

Unsupervised type and token identification of idiomatic expressions,
Afsaneh Fazly and Paul Cook and Suzanne Stevenson,
2009
Computational Linguistics, 35(1), pp. 61--103
Abstract
Idiomatic expressions are plentiful in everyday language, yet they remain mysterious, as it is not clear exactly how people learn and understand them. They are of special interest to linguists psycholinguists, and lexicographers, mainly because of their syntactic and semantic idiosyncrasies as well as their unclear lexical status. Despite a great deal of research on the properties of idioms in the linguistics literature, there is not much agreement on which properties are characteristic of these expressions. Because of their peculiarities, idiomatic expressions have mostly been overlooked by researchers in computational linguistics. In this article, we look into the usefulness of some of the identified linguistic properties of idioms for their automatic recognition. Specifically, we develop statistical measures that each model a specific property of idiomatic expressions by looking at their actual usage patterns in text. We use these statistical measures in a type-based classification task where we automatically separate idiomatic expressions (expressions with a possible idiomatic interpretation) from similar-on-the-surface literal phrases (for which no idiomatic interpretation is possible). In addition, we use some of the measures in a token identification task where we distinguish idiomatic and literal usages of potentially idiomatic expressions in context.

[Download 08-010-R1-07-048] (bibtex)

The VNC-Tokens Dataset,
Paul Cook and Afsaneh Fazly and Suzanne Stevenson,
2008
Proceedings of the LREC Workshop on Towards a Shared Task for Multiword Expressions (MWE 2008), June, Marrakech, Morocco
Abstract
Idiomatic expressions formed from a verb and a noun in its direct object position are a productive cross-lingual class of multiword expressions, which can be used both idiomatically and as a literal combination. This paper presents the VNC-Tokens dataset, a resource of almost 3000 English verb--noun combination usages annotated as to whether they are literal or idiomatic. Previous research using this dataset is described, and other studies which could be evaluated more extensively using this resource are identified.

[Download pdf] (bibtex)

Pulling their weight: Exploiting syntactic forms for the automatic identification of idiomatic expressions in context,
Paul Cook and Afsaneh Fazly and Suzanne Stevenson,
2007
Proceedings of the ACL Workshop on A Broader Perspective on Multiword Expressions, Prague, Czech Republic
Abstract
Much work on idioms has focused on type identification, i.e. determining whether a sequence of words can form an idiomatic expression. Since an idiom type often has a literal interpretation as well, token classification of potential idioms in context is critical for NLP. We explore the use of informative prior knowledge about the overall syntactic behaviour of a potentially-idiomatic expression (type-based knowledge) to determine whether an instance of the expression is used idiomatically or literally (token-based knowledge). We develop unsupervised methods for the task, and show that their performance is comparable to that of state-of-the-art supervised techniques.

[Download pdf] (bibtex)

Automagically Inferring the Source Words of Lexical Blends,
Paul Cook and Suzanne Stevenson,
2007
Proceedings of the Conference of the Pacific Association for Computational Linguistics (PACLING-2007), Melbourne, Australia
Abstract
Lexical blending is a highly productive and frequent process by which new words enter a language. A blend is formed when two or more source words are combined, with at least one them shortened, as in brunch ("breakfast"+"lunch"). We use linguistic and cognitive aspects of this process to motivate a computational treatment of neologisms formed by blending. We propose statistical features that can indicate the source words of a blend, and whether an unknown word was formed by blending. We present computational experiments that show the usefulness in these tasks of features tapping into the recognizability of the source words in the blend, in combination with their semantic properties.

[Download pdf] (bibtex)

Automatically Classifying English Verb-Particle Constructions by Particle Semantics,
Paul Cook,
2006
Master's Thesis. Department of Computer Science, University of Toronto. August.
Abstract
We address the issue of automatically determining the semantic contribution of the particle in a verb-particle construction (VPC), a task which has been previously ignored in computational work on VPCs. Adopting a cognitive linguistic standpoint, we assume that every VPC is compositional, and that the semantic contribution of a particle corresponds to one of a small number of senses. We develop a feature space based on syntactic and semantic properties of verbs and VPCs for type classification of English VPCs according to the sense contributed by their particle. We focus on VPCs using the particle up since it is very frequent and exhibits a wide range of meanings. In our experiments on unseen test VPCs, features which are motivated by properties specific to verbs and VPCs outperform linguistically uninformed word co-occurrence features, and give a reduction in error rate of around 20-30% over a chance baseline.

[Download pdf] (bibtex)

Classifying particle semantics in English verb-particle constructions,
Paul Cook and Suzanne Stevenson,
2006
Proceedings of the ACL/COLING Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties (MWE 2006), July, Sydney, Australia
Abstract
Previous computational work on learning the semantic properties of verb-particle constructions (VPCs) has focused on their compositionality, and has left unaddressed the issue of which meaning of the component words is being used in a given VPC. We develop a feature space for use in classification of the sense contributed by the particle in a VPC, and test this on VPCs using the particle up. The features that capture linguistic properties of VPCs that are relevant to the semantics of the particle outperform linguistically uninformed word co-occurrence features in our experiments on unseen test VPCs.

[Download pdf] (bibtex)
Adrian Corduneanu (1)

A Pylonic Decision-Tree Language Model with Optimal Question Selection,
Adrian Corduneanu,
1999
Proceedings of the 37th Annual Meeting, Association for Computational Linguistics, pp. 606--609, June, College Park, Maryland
Abstract
This paper discusses a decision-tree approach to the problem of assigning probabilities to words following a given text. In contrast with previous decision-tree language model attempts, an algorithm for selecting nearly optimal questions is considered. The model is to be tested on a standard task, The Wall Street Journal, allowing a fair comparison with the well-known trigram model.

[Download pdf] (bibtex)
Jean-Pierre Corriveau (5)

Time-constrained memory: A reader-based approach to text comprehension, Jean-Pierre Corriveau, 1995
, Mahwah NJ:, Lawrence Erlbaum Associates
Publisher's Web site
Buy at Amazon.com
(bibtex)

Interpretation of definite reference with a time-constrained memory,
Jean-Pierre Corriveau,
1991
Proceedings, 13th Annual Conference of the Cognitive Science Society, pp. 678--681 , August, Chicago IL
(bibtex)

Constraint satisfaction in time-constrained memory,
Jean-Pierre Corriveau,
1991
Workshop on parallel processing for artificial intelligence, at the International Joint Conference on Artificial Intelligence, August, Sydney
(bibtex)

Time-constrained memory for reader-based text comprehension,
Jean-Pierre Corriveau,
1991
Ph.D. Thesis. Department of Computer Science, University of Toronto. January.
Order published version from publisher
Buy published version at Amazon.com .
Abstract

Marvin Minsky writes at the beginning of The Society of Mind (1986, page 18) that ``to explain the mind, we have to show how minds are built from mindless stuff, from parts that are much smaller and simpler than anything we'd consider smart.'' In this dissertation, I develop a model of a strictly quantitative (i.e., non-semantic) memory that can be used to specify a conceptual analyzer for teuchistic (i.e., `constructionist') text comprehension. I view this model as a prototype of Minsky's ``agents of the mind''.

Most importantly, I acknowledge the real-time processing constraints derived from the biological constraint (Feldman, 1984) and therefore, assume that linguistic comprehension is a race defined in terms of time-constrained memory processes.

Because I do not model an adaptable memory, I partition memory into a static component, which consists of a massively parallel network of simple computing elements whose processes allow for the construction of clusters, and a dynamic component, where these clusters reside. Through specification browsers, the user of the system can input and modify both the topology of the network and the individual behavior of each computing element of static memory, which forms a `knowledge' base. Clusters are built from the processing of an input text with respect to this `knowledge' base and constitute the output of the system. Given that there is widespread disagreement on the nature, modus operandi, and use of inferences in text comprehension, the focus in this work is not on the knowledge required for comprehension, but rather on its specification in terms of constraints to satisfy through the exchange of simple signals and sequences of primitive memory operations to execute upon constraint satisfaction. I demonstrate at length how typical rules for the problems of syntax, referential resolution, lexical and structural disambiguation, and bridging inferences can be encoded in the proposed representational scheme, and thus illustrate how a theory of text understanding may be `grounded' into a more fundamental quantitative time-constrained memory.


(bibtex)

On the role of time in reader-based text comprehension,
Jean-Pierre Corriveau,
1987
Proceedings of Ninth Annual Conference of the Cognitive Science Society , pp. 794--801, July, Seattle
(bibtex)
Michael Demko (1)

Statistical Parsing with Context-Free Filtering Grammar,
Michael Demko,
2007
Master's Thesis. Department of Computer Science, University of Toronto.
Abstract

Statistical parsers that simultaneously generate both phrase-structure and lexical dependency trees have been limited in two important ways: the detection of non-projective dependencies has not been integrated with other parsing decisions, or the constraints between phrase-structure and dependency structure have been overly strict. I develop context-free filtering grammar as a generalization of the more restrictive lexicalized factored parsing model, and I develop for the new grammar formalism a scoring model to resolve parsing ambiguities. I demonstrate the flexibility of the new model by implementing a statistical parser for German, a freer-word-order language exhibiting a mixture of context-free and non-projective behaviours.


[Download pdf] (bibtex)
Chrysanne DiMarco (12)

Generation by selection and repair as a method for adapting text for the individual reader,
Chrysanne DiMarco and Graeme Hirst and Eduard Hovy,
1997
Proceedings of the Flexible Hypertext Workshop (held in conjunction with the 8th ACM International Hypertext Conference, Southampton, April 1997), pp. 36--43, August, Microsoft Research Institute, Macquarie University
Abstract

A recent and growing development in Web applications has been the advent of various tools that claim to ``customize'' access to information on the Web by allowing users to specify the kinds of information they want to receive without having to search for it or sift through masses of irrelevant material. But this kind of customization is really just a crude filtering of raw Web material in which the user simply selects the ``channels'' of information she wishes to receive; this selection of information sources is hardly more ``customization'' than someone deciding to tune their television to a certain station. True customization, or tailoring, of information would be done for the user by a system that had access to an actual model of the user, a profile of the user's interests and characteristics. And such tailoring would involve much more than just selecting streams of basic content: the content of the text, whether for on-line Web page or a paper document, would be carefully selected structured, and presented in the manner best calculated to appeal to a particular individual. Adaptive-hypertext presentation comes closest to achieving this kind of document tailoring, but the current techniques used for adapting the content of a document to a particular user generally only involve some form of selectively showing (or hiding) portions of text or choosing whole variants of larger parts of the document.

If the Web document designer wishes to write and present material in a way that will communicate well with the user, then just displaying the most relevant chunks of information will not be sufficient. For effective communication, both the form and content of the language used in a document should be tailored in rhetorically significant ways to best suit a user's particular personal characteristics and preferences. Ideally, we would have Web-based natural language generation systems that could produce fully customized and customizable documents on demand by individual users, according to a formal user model. As a first step in this direction, we have been investigating applications of our earlier work on pragmatics in natural language processing to building systems for the automated generation of Web documents tailored to the individual reader.


[Download pdf] (bibtex)

Authoring and generating health-education documents that are tailored to the needs of the individual patient,
Graeme Hirst and Chrysanne DiMarco and Eduard Hovy and Kimberley Parsons,
1997
User Modeling: Proceedings of the Sixth International Conference, UM97 (Anthony Jameson and Cécile Paris and Carlo Tasso ed.), pp. 107--118, June, Chia Laguna, Sardinia, Italy, Vienna and New York
Springer Wien New York
Abstract
Health-education documents can be much more effective in achieving patient compliance if they are customized for individual readers. For this purpose, a medical record can be thought of as an extremely detailed user model of a reader of such a document. The HealthDoc project is developing methods for producing health-information and patient-education documents that are tailored to the individual personal and medical characteristics of the patients who receive them. Information from an on-line medical record or from a clinician will be used as the primary basis for deciding how best to fit the document to the patient. In this paper, we describe our research on three aspects of the project: the kinds of tailoring that are appropriate for health-education documents; the nature of a tailorable master document and how it can be created; and the linguistic problems that arise when a tailored instance of the document is to be generated.

[Download pdf] (bibtex)

Automatic customization of health-education brochures for individual patient,
Graeme Hirst and Chrysanne DiMarco,
1996
Proceedings, Information Technology and Community Health Conference (ITCH-96), pp. 222--228, November, Victoria, B.C.
Abstract

Many studies have shown that health-education messages and patient instructions are more effective when closely tailored to the particular condition and characteristics of the individual recipient. But in situations where many factors interact -- for example, in explaining the pros and cons of hormone replacement therapy -- the number of different combinations is far too large for a set of appropriately tailored messages to be produced in advance.

The HealthDoc project is presently developing linguistic techniques for producing, on demand, health-education and patient-information brochures that are customized to the medical and personal characteristics of an individual patient.

For each topic, HealthDoc requires a `master document' written by an expert on the subject with the help of a program called an `authoring tool'. The writer decides upon the basic elements of the text -- clauses and sentences -- and the patient conditions under which each element should be included in the output. The program assists the writer in building correctly structured master-document fragments and annotating them with the relationships and conditions for inclusion.

When a clinician wishes to give a patient a particular brochure from HealthDoc, she will select it from a menu and specify the name of the patient. HealthDoc will use information from the patient's on-line medical record to then create and print a version of the document appropriate to that patient, by selecting the appropriate pieces of material and then performing the necessary linguistic operations to combine them into a single, coherent text.


[Download n] (bibtex)

HealthDoc: Customizing patient information and health education by medical condition and personal characteristics,
Chrysanne DiMarco and Graeme Hirst and Leo Wanner and John Wilkinson,
1995
Workshop on Artificial Intelligence in Patient Education, August, Glasgow Scotland
Abstract
The HealthDoc project aims to provide a comprehensive approach to the customization of patient-information and health-education materials through the development of sophisticated natural language generation systems. We adopt a model of patient education that takes into account patient information ranging from simple medical data to complex cultural beliefs, so that our work provides both an impetus and testbed for research in multicultural health communication. We propose a model of language generation, `generation by selection and repair', that relies on a `master-document' representation that pre-determines the basic form and content of a text, yet is amenable to editing and revision for customization. The implementation of this model has so far led to the design of a sentence planner that integrates multiple complex planning tasks and a rich set of ontological and linguistic knowledge sources.

[Download pdf] (bibtex)

Usage notes as the basis for a representation of near-synonymy for lexical choice,
Chrysanne DiMarco and Graeme Hirst,
1993
>Proceedings, 9th annual conference of the University of Waterloo Centre for the New Oxford English Dictionary and Text Research, pp. 33--43, September, Oxford
Abstract
The task of choosing between lexical near-equivalents in text generation requires the kind of knowledge of fine differences between words that is typified by the usage notes of dictionaries and books of synonym discrimination. These usage notes follow a fairly standard pattern, and a study of their form and content shows the kinds of differentiae adduced in the discrimination of near-synonyms. For appropriate lexical choice in text generation and machine translation systems, it is necessary to develop the concept of formal `computational usage notes', which would be part of the lexical entries in a conceptual knowledge base. The construction of a set of `computational usage notes' adequate for text generation is a major lexicographic task of the future.

[Download pdf] (bibtex)

A computational theory of goal-directed style in syntax,
Chrysanne DiMarco and Graeme Hirst,
1993
Computational Linguistics, 19(3), pp. 451--499, September
Abstract

The problem of style is highly relevant to computational linguistics, but current systems deal only superficially, if at all, with subtle but significant nuances of language. Expressive effects, together with their associated meaning, contained in the style of a text are lost to analysis and absent from generation.

We have developed an approach to the computational treatment of style that is intended to eventually incorporate three selected components---lexical syntactic, and semantic. In this paper, we concentrate on certain aspects of syntactic style. We have designed and implemented a computational theory of goal-directed stylistics that can be used in various applications, including machine translation, second-language instruction and natural language generation.

We have constructed a vocabulary of style that contains both primitive and abstract elements of style. The primitive elements describe the stylistic effects of individual sentence components. These elements are combined into patterns that are described by a stylistic meta-language, the abstract elements, that define the concordant and discordant stylistic effects common to a group of sentences. Higher-level patterns are built from the abstract elements and associated with specific stylistic goals, such as clarity or concreteness. Thus, we have defined rules for a syntactic stylistic grammar at three interrelated levels of description: primitive elements, abstract elements, and stylistic goals. Grammars for both English and French have been constructed, using the same vocabulary and the same development methodology. Parsers that implement these grammars have also been built.

The stylistic grammars codify aspects of language that were previously defined only descriptively. The theory is being applied to various problems in which the form of an utterance conveys an essential part of meaning and so must be precisely represented and understood.


[Download pdf] (bibtex)

A goal-based grammar of rhetoric,
Chrysanne DiMarco and Graeme Hirst and Marzena Makuta-Giluk,
1993
Association for Computational Linguistics, Workshop on Intentionality and Structure in Discourse Relations, pp. 15--18, June, Ohio State University
[Download pdf] (bibtex)

The semantic and stylistic differentiation of synonyms and near-synonyms,
Chrysanne DiMarco and Graeme Hirst and Manfred Stede,
1993
AAAI Spring Symposium on Building Lexicons for Machine Translation, pp. 114--121, March, Stanford CA
Abstract

If we want to describe the action of someone who is looking out a window for an extended time, how do we choose between the words gazing, staring, and peering? What exactly is the difference between an argument, a dispute, and a row? In this paper, we describe our research in progress on the problem of lexical choice and the representations of world knowledge and of lexical structure and meaning that the task requires. In particular, we wish to deal with nuances and subtleties of denotation and connotation---shades of meaning and of style---such as those illustrated by the examples above.

We are studying the task in two related contexts: machine translation and the generation of multilingual text from a single representation of content. In the present paper, we concentrate on issues in lexical representation. We describe a methodology, based on dictionary usage notes, that we are using to discover the dimensions along which similar words can be differentiated, and we discuss a two-part representation for lexical differentiation.


[Download pdf] (bibtex)

Focus shifts as indicators of style in paragraphs,
Mark Ryan and Chrysanne DiMarco and Graeme Hirst,
1992
Department of Computer Science, University of Waterloo, June
In DiMarco, Chrysanne et al, Four papers on computational stylistics.
[Download pdf] (bibtex)

Accounting for style in machine translation,
Chrysanne DiMarco and Graeme Hirst,
1990
Third International Conference on Theoretical Issues in Machine Translation, June, Austin TX
Abstract

A significant part of the meaning of any text lies in the author's style. Different choices of words and syntactic structure convey different nuances in meaning, which must be carried through in any translation if it is to be considered faithful. Up to now, machine translation systems have been unable to do this. Subtleties of style are simply lost to current machine-translation systems.

The goal of the present research is to develop a method to provide machine-translation systems with the ability to understand and preserve the intent of an author's stylistic characteristics. Unilingual natural language understanding systems could also benefit from an appreciation of these aspects of meaning. However, in translation, style plays an additional role, for here one must also deal with the generation of appropriate target-language style.

Consideration of style in translation involves two complementary, but sometimes conflicting, aims:

  • The translation must preserve, as much as possible, the author's stylistic intent --- the information conveyed through the manner of presentation.
  • But it must have a style that is appropriate and natural to the target language.

The study of comparative stylistics is, in fact, guided by the recognition that languages differ in their stylistic approaches: each has its own characteristic stylistic preferences. The stylistic differences between French and English are exemplified by the predominance of the pronominal verb in French. This contrast allows us to recognize the greater preference of English for the passive voice:

  • (a) Le jambon se mange froid.
    (b) Ham is eaten cold.

Such preferences exist at the lexical, syntactic, and semantic levels, but reflect differences in the two languages that can be grouped in terms of more-general stylistic qualities. French words are generally situated at a higher level of abstraction than that of the corresponding English words which tend to be more concrete (Vinay and Darbelnet 1958, 59). French aims for precision while English is more tolerant of vagueness. (Duron 1963 109).

So, a French source text may be abstract and very precise in style, but the translated English text should be looser and less abstract, while still retaining the author's stylistic intent. Translators use this kind of knowledge about comparative stylistics as they clean up raw machine-translation output, dealing with various kinds of stylistic complexities.


(bibtex)

Computational stylistics for natural language translation,
Chrysanne DiMarco,
1990
Ph.D. Thesis. Department of Computer Science, University of Toronto. February. Published as technical report CSRI-239.
Abstract

The problem of style is highly relevant to machine translation, but current systems deal only superficially, if at all, with the preservation of stylistic effects. At best, MT output is syntactically correct but aims no higher than a strict uniformity in style. The expressive effects contained in the source text, together with their associated meaning, are lost.

I have developed an approach to the computational treatment of style that incorporates three selected components --- lexical, syntactic and semantic --- and focuses on certain aspects of syntactic style. I have designed and implemented the foundations of a computational model of goal-directed stylistics that could serve as the basis of a system to preserve style in French-to-English translation. First, I developed a vocabulary of style that contains both primitive and abstract elements of style. The primitive elements describe the stylistic effects of individual sentence components. These elements are combined into patterns that are described by a stylistic meta-language, the abstract elements, that define the stylistic effects common to a group of sentences. These elements have as their basis the notions of concord and discord, for it is my contention that style is created by patterns of concord and discord giving an overall integrated arrangement. These patterns are built from the abstract elements and associated with specific stylistic goals such as clarity or concreteness. Thus, I have developed a syntactic stylistic grammar at three interrelated levels of description: primitive shapes, abstract elements, stylistic goals. Grammars for both French and English have been constructed, using the same vocabulary and the same development methodology. As well, Mark Ryan has used this vocabulary and methodology to construct a semantic stylistic grammar. Parsers that implement these grammars have also been implemented.

Together, the English and French parsers could form the basis of a system that would preserve many aspects of style in translation. The incorporation of stylistic analysis into MT systems should significantly reduce the current reliance on human post-editing and improve the quality of MT output.


(bibtex)

Stylistic grammars in language translation,
Chrysanne DiMarco and Graeme Hirst,
1988
Proceedings, 12th International conference on computational linguistics (COLING-88), pp. 148--153, August, Budapest
Abstract

We are developing stylistic grammars to provide the basis for a French and English stylistic parser. Our stylistic grammar is a branching stratificational model, built upon a foundation dealing with lexical, syntactic, and semantic stylistic realizations. Its central level uses a vocabulary of constituent stylistic elements common to both English and French, while the top level correlates stylistic goals, such as clarity and concreteness, with patterns of these elements.

Overall, we are implementing a computational schema of stylistics in French-to-English translation. We believe that the incorporation of stylistic analysis into machine translation systems will significantly reduce the current reliance on human post-editing and improve the quality of the systems' output

.

[Download pdf] (bibtex)
Judith Dick (6)

A case-based representation of legal text for conceptual retrieval,
Judith Dick and Graeme Hirst,
1991
Workshop on Language and Information Processing, American Society for Information Science, October, Washington DC
[Download pdf] (bibtex)

On the usefulness of conceptual graphs in representing knowledge for intelligent retrieval,
Judith Dick,
1991
Proceedings, Sixth Annual Workshop on Conceptual Graphs, pp. 153--167, July, Binghamton NY
[Download pdf] (bibtex)

Intelligent text retrieval,
Judith Dick and Graeme Hirst,
1991
Text retrieval: Workshop notes from the Ninth National Conference on Artificial Intelligence (AAAI-91), July, Anaheim CA
[Download pdf] (bibtex)

Representation of legal text for conceptual retrieval,
Judith Dick,
1991
Proceedings, Third International Conference on Artificial Intelligence and Law, pp. 244--252, June, Oxford
[Download pdf] (bibtex)

A conceptual, case-relation representation of text for intelligent retrieval,
Judith Dick,
1991
Ph.D. Thesis. Department of Computer Science, University of Toronto. April. Published as technical report CSRI-265 .
Abstract

Ideally, a case-law retrieval system would provide the lawyer with conceptual access to cases and help him or her to develop an argument. This research constitutes an attempt to move from contemporary information retrieval towards the ideal by using natural language understanding techniques.

A knowledge base of contract cases has been constructed to demonstrate the advantages of using a conceptual representation rather than keywords. The KB consists of knowledge representations of the cases, a lexicon of legal concepts and some semantic constraints. The ratio decidendi or principal argument of each of the cases has been analyzed according to Toulmin's ``good reasons'' argument model. The argument schema is used to structure the representation of the discourse. Sowa's conceptual graphs have been used as a near-first-order notation. Conceptual graphs have an established group of users and a growing software base. The notation is augmented by Somers's 28 definitive deep cases, which are designed to answer the strongest criticisms of case.

The lexicon of legal concepts is integrated with the argument representations. Each legal concept has its own definition and pointers to instances. In a full-scale implementation, as the KB grew the legal concepts would be augmented, continuously being redefined, by knowledge from incoming cases. The open-textured concepts are used in the design to improve retrieval.

It might be argued that constructing such a KR is slow, requires human ability and is impractical for large-scale applications. Nevertheless in future, KR construction can reasonably be expected to be automatic. Here, we are not looking to write language in logic just yet, but to model conceptual content in order to facilitate the retrieval of information. We demonstrate that retrieval based on semantics and inference, is perceptive and powerful.

The dissertation concludes with a retrieval demonstration using questions derived from cases following those represented in the KB. LOG, a frame-matching algorithm based on spreading activation, is used. The demonstration focuses on pattern-matching among conceptual definitions. Semantic constraints facilitate inference within the type hierarchy.


[Download pdf] (bibtex)

Conceptual retrieval and case law,
Judith Dick,
1987
Proceedings, First International Conference on Artificial Intelligence and Law, pp. 106--115, May, Boston
[Download pdf] (bibtex)
Philip Edmonds (9)

Near-synonymy and lexical choice,
Philip Edmonds and Graeme Hirst,
2002
Computational Linguistics, 28(2), pp. 105--144, June
Abstract

We develop a new computational model for representing the fine-grained meanings of near-synonyms and the differences between them. We also develop a sophisticated lexical-choice process that can decide which of several near-synonyms is most appropriate in a particular situation. This research has direct applications in machine translation and text generation.

We first identify the problems of representing near-synonyms in a computational lexicon and show that no previous model adequately accounts for near-synonymy. We then propose a preliminary theory to account for near-synonymy, relying crucially on the notion of granularity of representation, in which the meaning of a word arises out of a context-dependent combination of a context-independent core meaning and a set of explicit differences to its near-synonyms. That is, near-synonyms cluster together.

We then develop a clustered model of lexical knowledge, derived from the conventional ontological model. The model cuts off the ontology at a coarse grain, thus avoiding an awkward proliferation of language-dependent concepts in the ontology, and groups near-synonyms into subconceptual clusters that are linked to the ontology. A cluster differentiates near-synonyms in terms of fine-grained aspects of denotation, implication, expressed attitude, and style. The model is general enough to account for other types of variation, for instance in collocational behaviour.

An efficient, robust, and flexible fine-grained lexical-choice process is a consequence of a clustered model of lexical knowledge. To make it work, we formalize criteria for lexical choice as preferences to express certain concepts with varying indirectness, to express attitudes, and to establish certain styles. The lexical-choice process itself works on two tiers: between clusters and between near-synonyns of clusters. We describe our prototype implementation of the system, called I-Saurus.


[Download pdf] (bibtex)

Reconciling fine-grained lexical knowledge and coarse-grained ontologies in the representation of near-synonyms,
Philip Edmonds and Graeme Hirst,
2000
Proceedings of the Workshop on Semantic Approximation, Granularity, and Vagueness, April, Breckenridge CO
Abstract
A machine translation system must be able to adequately cope with near-synonymy, for there are often many slightly different translations available for any source language word that can significantly and differently affect the meaning or style of a translated text. Conventional models of lexical knowledge used in natural-language processing systems are inadequate for representing near-synonyms, because they are unable to represent fine-grained lexical knowledge. We will discuss a new model for representing fine-grained lexical knowledge whose basis is the idea of granularity of representation.

[Download pdf] (bibtex)

Semantic representations of near-synonyms for automatic lexical choice,
Philip Edmonds,
1999
Ph.D. Thesis. Department of Computer Science, University of Toronto. September. Published as technical report CSRI-399.
Abstract

We develop a new computational model for representing the fine-grained meanings of near-synonyms and the differences between them. We also develop a sophisticated lexical-choice process that can decide which of several near-synonyms is most appropriate in any particular context. This research has direct applications in machine translation and text generation, and also in intelligent electronic dictionaries and automated style-checking and document editing.

We first identify the problems of representing near-synonyms in a computational lexicon and show that no previous model adequately accounts for near-synonymy. We then propose a preliminary theory to account for near-synonymy in which the meaning of a word arises out of a context-dependent combination of a context-independent core meaning and a set of explicit differences to its near-synonyms. That is near-synonyms cluster together.

After considering a statistical model and its weaknesses, we develop a clustered model of lexical knowledge, based on the conventional ontological model. The model cuts off the ontology at a coarse grain thus avoiding an awkward proliferation of language-dependent concepts in the ontology, and groups near-synonyms into subconceptual clusters that are linked to the ontology. A cluster acts as a formal usage note that differentiates near-synonyms in terms of fine-grained aspects of denotation, implication, expressed attitude and style. The model is general enough to account for other types of variation, for instance, in collocational behaviour.

We formalize various criteria for lexical choice as preferences to express certain concepts with varying indirectness, to express attitudes, and to establish certain styles. The lexical-choice process chooses the near-synonym that best satisfies the most preferences. The process uses an approximate-matching algorithm that determines how well the set of lexical distinctions of each near-synonym in a cluster matches a set of input preferences.

We implemented the lexical-choice process in a prototype sentence-planning system. We evaluate the system to show that it can make the appropriate word choices when given a set of preferences.


[Download pdf] (bibtex)

Choosing the word most typical in context using a lexical co-occurrence network,
Philip Edmonds,
1997
Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the Association for Computational Linguistics, pp. 507--509, July, Madrid Spain
Abstract
This paper presents a partial solution to a component of the problem of lexical choice: choosing the synonym most typical, or expected, in context. We apply a new statistical approach to representing the context of a word through lexical co-occurrence networks. The implementation was trained and evaluated on a large corpus, and results show that the inclusion of second-order co-occurrence relations improves the performance of our implemented lexical choice program.

[Download pdf] (bibtex)

Evoking meaning by choosing the right words,
Philip Edmonds,
1996
Proceedings of the First Student Conference in Computational Linguistics in Montreal, pp. 80--87, June, Montreal Quebec
Abstract
Choosing the right word is difficult. One reason is that the context affects the meaning expressed by a word in complex ways. In particular, when a word is used in a context that is not normal for the word, it may evoke a special meaning. This paper presents a lexical choice process that chooses the word from a set of near-synonyms that best produces the desired effects in the given context. It relies on a clustered representation of lexical knowledge that unites both a statistical model of word co-occurrence (for determining when a word use will be marked) and knowledge-based model (for determining what specific effects will occur).

[Download pdf] (bibtex)

Collaboration on reference to objects that are not mutually known,
Philip Edmonds,
1994
Proceedings of the 15th International Conference on Computational Linguistics (COLING-94), pp. 1118--1122, August, Kyoto Japan
Abstract
In conversation, a person sometimes has to refer to an object that is not previously known to the other particpant. We present a plan-based model of how agents collaborate on reference of this sort. In making a reference, an agent uses the most salient attributes of the referent. In understanding a reference, an agent determines his confidence in its adequacy asa means of identifying the referent. To collaborate, the agents use judgment, suggestion, and elaboration moves to refashion an inadequate referring expression.

[Download pdf] (bibtex)

Repairing conversational misunderstandings and non-understandings,
Graeme Hirst and Susan McRoy and Peter A. Heeman and Philip Edmonds and Diane Horton,
1994
Speech communication, 15(3--4), pp. 213--229, December
Abstract
Participants in a discourse sometimes fail to understand one another but, when aware of the problem, collaborate upon or negotiate the meaning of a problematic utterance. To address nonunderstanding, we have developed two plan-based models of collaboration in identifying the correct referent of a description: one covers situations where both conversants know of the referent, and the other covers situations such as direction-giving, where the recipient does not. In the models conversants use the mechanisms of refashioning, suggestion, and elaboration, to collaboratively refine a referring expression until it is successful. To address misunderstanding, we have developed a model that combines intentional and social accounts of discourse to support the negotiation of meaning. The approach extends intentional accounts by using expectations deriving from social conventions in order to guide interpretation. Reflecting the inherent symmetry of the negotiation of meaning, all our models can act as both speaker and hearer, and can play both the role of the conversant who is not understood or misunderstood and the role of the conversant who fails to understand.

[Download pdf] (bibtex)

A computational model of collaboration on reference in direction-giving dialogues,
Philip Edmonds,
1993
Master's Thesis. Department of Computer Science, University of Toronto. October. Published as technical report CSRI-289..
Abstract

In a conversation, a speaker sometimes has to refer to an object that is not previously known to the hearer. This type of reference occurs frequently in dialogues where the speaker is giving directions to a particular place. To make a reference, the speaker attempts to build a description of the object that will allow the hearer to identify it when she later reaches it.

This thesis presents a computational model of how an agent collaborates on reference in direction-giving dialogues. Viewing language as goal-oriented behaviour, we encode route descriptions referring expressions, and discourse actions in the planning paradigm. This allows an agent to construct plans that achieve communicative goals by means of surface speech actions, and to infer plans and goals from these actions. The basis is that a referring expression plan is acceptable to an agent if she is confident that the plan is adequate as an executable identification plan. By considering the salience of the features used in a referring expression plan, an agent can evaluate her confidence in its adequacy. Driven by the implicit intention of making plans mutually acceptable, the conversants collaborate until the hearer is confident in the adequacy of the current referring expression plan. In doing so, the conversants use suggestion and elaboration discourse actions that operate on the current plan. While collaborating, an agent is in a mental state that includes the intention to achieve the goal of having the direction recipient understand the directions, the plan the agents are currently considering, and a focus of attention into the plan. This collaborative state governs the discourse by sanctioning both the adoption of goals, and the mutual acceptance of plans. Reflecting the inherent symmetry in collaborative dialogue, the model can act as both speaker and hearer, and can play the roles of both the direction-giver and the recipient.


[Download pdf] (bibtex)

Translating near-synonyms: Possibilities and preferences in the interlingua,
Philip Edmonds,

Proceedings of the AMTA/SIG-IL Second Workshop on Interlinguas, pp. 23--30, Langhorne PA
Published in technical report MCCS-98-316, Computing Research Laboratory, New Mexico State University
Abstract
This paper argues that an interlingual representation must explicitly represent some parts of the meaning of a situation as possibilities (or preferences), not as necessary or definite components of meaning (or constraints). Possibilities enable the analysis and generation of nuance, something required for faithful translation. Furthermore, the representation of the meaning of words is crucial, because it specifies which nuances words can convey in which contexts.

[Download pdf] (bibtex)
Brenda Fawcett (2)

The detection and representation of ambiguities of intension and description,
Brenda Fawcett and Graeme Hirst,
1986
Proceedings of the 24th Annual Meeting, Association for Computational Linguistics, pp. 192--199, June, New York
Abstract

Ambiguities related to intension and their consequent inference failures are a diverse group, both syntactically and semantically. One particular kind of ambiguity that has received little attention so far is whether it is the speaker or the third party to whom a description in an opaque third-party attitude report should be attributed. The different readings lead to different inferences in a system modeling the beliefs of external agents.

We propose that a unified approach to the representation of the alternative readings of intension-related ambiguities can be based on the notion of a descriptor that is evaluated with respect to intensionality, the beliefs of agents, and a time of application. We describe such a representation, built on a standard modal logic, and show how it may be used in conjunction with a knowledge base of background assumptions to license restricted substitution of equals in opaque contexts.


[Download pdf] (bibtex)

The representation of ambiguity in opaque contexts,
Brenda Fawcett,
1985
Master's Thesis. Department of Computer Science, University of Toronto. October. Published as technical report CSRI-178.
Abstract

A knowledge of intensions, which are used to designate concepts of objects, is important for natural language processing systems. Certain linguistic phrases can refer either to the concept of an entity or to the entity itself. To properly understand a phrase and to prevent invalid inferences from being drawn, the system must determine the type of reference being asserted. We identify a set of ``opaque'' constructs and suggest that a common mechanism be developed to handle them.

To account for the ambiguities of opaque contexts, noun phrases are translated into descriptors. It must be made explicit to whom the descriptor is ascribed and whether its referent is non-specific or specific. Similarly, sentential constituents should be treated as propositions and evaluated relative to conjectured states of affairs. As a testbed for these ideas we define a Montague-style meaning representation and implement the syntactic and semantic components of a moderate-size NLP system in a logic programming environment.

One must also consider how to disambiguate and interpret such a representation with respect to a knowledge base. Much contextual and world knowledge is required. We characterize what facilities are necessary for an accurate semantic interpretation, considering what is and is not available in current knowledge representation systems.


[Download pdf] (bibtex)
Afsaneh Fazly (14)

Unsupervised type and token identification of idiomatic expressions,
Afsaneh Fazly and Paul Cook and Suzanne Stevenson,
2009
Computational Linguistics, 35(1), pp. 61--103
Abstract
Idiomatic expressions are plentiful in everyday language, yet they remain mysterious, as it is not clear exactly how people learn and understand them. They are of special interest to linguists psycholinguists, and lexicographers, mainly because of their syntactic and semantic idiosyncrasies as well as their unclear lexical status. Despite a great deal of research on the properties of idioms in the linguistics literature, there is not much agreement on which properties are characteristic of these expressions. Because of their peculiarities, idiomatic expressions have mostly been overlooked by researchers in computational linguistics. In this article, we look into the usefulness of some of the identified linguistic properties of idioms for their automatic recognition. Specifically, we develop statistical measures that each model a specific property of idiomatic expressions by looking at their actual usage patterns in text. We use these statistical measures in a type-based classification task where we automatically separate idiomatic expressions (expressions with a possible idiomatic interpretation) from similar-on-the-surface literal phrases (for which no idiomatic interpretation is possible). In addition, we use some of the measures in a token identification task where we distinguish idiomatic and literal usages of potentially idiomatic expressions in context.

[Download 08-010-R1-07-048] (bibtex)

The VNC-Tokens Dataset,
Paul Cook and Afsaneh Fazly and Suzanne Stevenson,
2008
Proceedings of the LREC Workshop on Towards a Shared Task for Multiword Expressions (MWE 2008), June, Marrakech, Morocco
Abstract
Idiomatic expressions formed from a verb and a noun in its direct object position are a productive cross-lingual class of multiword expressions, which can be used both idiomatically and as a literal combination. This paper presents the VNC-Tokens dataset, a resource of almost 3000 English verb--noun combination usages annotated as to whether they are literal or idiomatic. Previous research using this dataset is described, and other studies which could be evaluated more extensively using this resource are identified.

[Download pdf] (bibtex)

Pulling their weight: Exploiting syntactic forms for the automatic identification of idiomatic expressions in context,
Paul Cook and Afsaneh Fazly and Suzanne Stevenson,
2007
Proceedings of the ACL Workshop on A Broader Perspective on Multiword Expressions, Prague, Czech Republic
Abstract
Much work on idioms has focused on type identification, i.e. determining whether a sequence of words can form an idiomatic expression. Since an idiom type often has a literal interpretation as well, token classification of potential idioms in context is critical for NLP. We explore the use of informative prior knowledge about the overall syntactic behaviour of a potentially-idiomatic expression (type-based knowledge) to determine whether an instance of the expression is used idiomatically or literally (token-based knowledge). We develop unsupervised methods for the task, and show that their performance is comparable to that of state-of-the-art supervised techniques.

[Download pdf] (bibtex)

Learning structured appearance models from captioned images of cluttered scenes,
Mike Jamieson and Afsaneh Fazly and Sven Dickinson and Suzanne Stevenson and Sven Wachsmuth,
2007
Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV), October, Rio de Janeiro, Brazil
[Download pdf] (bibtex)

Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures,
Afsaneh Fazly and Suzanne Stevenson,
2007
Proceedings of the ACL'07 Workshop on A Broader Perspective on Multiword Expressions, June, Prague, Czech Republic
[Download pdf] (bibtex)

Automatically learning semantic knowledge about multiword predicates,
Afsaneh Fazly and Suzanne Stevenson and Ryan North,
2007
Journal of Language Resources and Evaluation, 41(1)
(29 pages) original publication is available from Springer [here]
(bibtex)

Automatic Acquisition of Lexical Knowledge about Multiword Predicates,
Afsaneh Fazly,
2007
Ph.D. Thesis. Department of Computer Science, University of Toronto.
[Download pdf] (bibtex)

Automatically constructing a lexicon of verb phrase idiomatic combinations,
Afsaneh Fazly and Suzanne Stevenson,
2006
Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 337--344, April, Trento Italy
Abstract
We investigate the lexical and syntactic flexibility of a class of idiomatic expressions. We develop measures that draw on such linguistic properties, and demonstrate that these statistical corpus-based measures can be successfully used for distinguishing idiomatic combinations from non-idiomatic ones. We also propose a means for automatically determining which syntactic forms a particular idiom can appear in, and hence should be included in its lexical representation.

[Download pdf] (bibtex)

Automatically determining allowable combinations of a class of flexible multiword expressions,
Afsaneh Fazly and Ryan North and Suzanne Stevenson,
2006
Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2006), pp. 81--92, February, Mexico City, Mexico
Lecture Notes in Computer Science, Volume 3878), Springer-Verlag Springer file
Abstract
We develop statistical measures for assessing the acceptability of a frequent class of multiword expressions. We also use the measures to estimate the degree of productivity of the expressions over semantically related nouns. We show that a linguistically-inspired measure outperforms a standard measure of collocation in its match with human judgments. The measure uses simple extraction techniques over non-marked-up web data.

[Download pdf] (bibtex)

Automatic acquisition of knowledge about multiword predicates,
Afsaneh Fazly and Suzanne Stevenson,
2005
Proceedings of the 19th Pacific Asia Conference on Language, Information, and Computation (PACLIC), December, Taipei, Taiwan
overview paper for an invited talk by Suzanne Stevenson
Abstract
Human interpretation of natural language relies heavily on cognitive processes involving metaphorical and idiomatic meanings. One area of computational linguistics in which such processes play an important but largely unaddressed, role is the determination of the properties of multiword predicates (MWPs). MWPs such as give a groan and cut taxes involve metaphorical meaning extensions of highly frequent, and highly polysemous, verbs. Tools for automatically identifying such MWPs, and extracting their lexical and syntactic properties, are crucial to the adequate treatment of text in a computational system, due to the productive nature of MWPs across many languages. This paper gives an overview of our work addressing these issues. We begin by relating linguistic properties of metaphorical uses of verbs to their distributional properties. We devise automatic methods for assessing whether a verb phrase is literal, metaphorical, or idiomatic. Since metaphorical MWPs are generally semi-productive, we also develop computational measures of their individual acceptability and of their productivity over semantically related combinations. Our results demonstrate that combining statistical approaches with linguistic information is beneficial, both for the acquisition of knowledge about metaphorical and idiomatic MWPs, and for the organization of such knowledge in a computational lexicon.

[Download pdf] (bibtex)

Automatically distinguishing literal and figurative usages of highly polysemous verbs,
Afsaneh Fazly and Ryan North and Suzanne Stevenson,
2005
Proceedings of the ACL 2005 Workshop on Deep Lexical Acquisition, June, Ann Arbor, MI
Abstract
We investigate the meaning extensions of very frequent and highly polysemous verbs, both in terms of their compositional contribution to a light verb construction (LVC), and the patterns of acceptability of the resulting LVC. We develop compositionality and acceptability measures that draw on linguistic properties specific to LVCs, and demonstrate that these statistical, corpus-based measures correlate well with human judgments of each property.

[Download pdf] (bibtex)

Statistical measures of the semi-productivity of light verb constructions,
Suzanne Stevenson and Afsaneh Fazly and Ryan North,
2004
Proceedings of the ACL 2004 Workshop on Multiword Expressions: Integrating Processing, pp. 1--8, August, Barcelona, Spain
Abstract

We propose a statistical measure for the degree of acceptability of light verb constructions, such as take a walk, based on their linguistic properties. Our measure shows good correlations with human ratings on unseen test data. Moreover, we find that our measure correlates more strongly when the potential complements of the construction (such as walk, stroll, or run) are separated into semantically similar classes. Our analysis demonstrates the systematic nature of the roductivity of these constructions.


[Download pdf] (bibtex)

Testing the efficacy of part-of-speech information in word completion,
Afsaneh Fazly and Graeme Hirst,
2003
Workshop on Language Modeling for Text Entry Methods 11th Conference of the European Chapter of the Association for Computational Linguistics, April, Budapest, Hungary
Abstract
We investigate the effect of incorporating syntactic information into a word-completion algorithm. We introduce two new algorithms that combine part-of-speech tag trigrams with word bigrams, and evaluate them with a testbench constructed for the purpose. The results show a small but statistically significant improvement in keystroke savings for one of our algorithms over baselines that use only word n-grams.

[Download pdf] (bibtex)

The use of syntax in word completion utilities,
Afsaneh Fazly,
2002
Master's Thesis. Department of Computer Science, University of Toronto. January.
Abstract
Current word-prediction utilities rely on little more than word unigram and bigram frequencies. Can part-of-speech information help? To answer this question, we first built a testbench for word prediction; then introduced several new prediction algorithms which exploit part-of-speech tag information. We trained the prediction algorithms using a very large corpus of English, and in several experiments evaluated them according to several performance measures. All the algorithms were compared with WordQ, a commercial word-prediction program. Our results confirm that strong word unigram and bigram models, collected from a very large corpus, give accurate predictions. All predictors, including that based on word unigram statistics, outperform the WordQ prediction algorithm. The predictor based on word bigrams works surprisingly well compared to the syntactic predictors. Although two of the syntactic predictors work slightly better than the bigram predictor, the ANOVA test shows that the difference is not statistically significant.

[Download pdf] (bibtex)
Ol'ga Feiguina (4)

Authorship attribution for small texts: Literary and forensic experiments,
Ol'ga Feiguina and Graeme Hirst,
2007
Proceedings, International Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection, 30th Annual International ACM SIGIR Conference (SIGIR '07), July, Amsterdam
[Download pdf] (bibtex)

Bigrams of syntactic labels for authorship discrimination of short texts,
Graeme Hirst and Ol'ga Feiguina,
2007
Literary and Linguistic Computing, 22(4), pp. 405--417
Get paper from publisher's Web site doi: 10.1093/llc/fqm023
Abstract
We present a method for authorship discrimination that is based on the frequency of bigrams of syntactic labels that arise from partial parsing of the text. We show that this method, alone or combined with other classification features, achieves a high accuracy on discrimination of the work of Anne and Charlotte Brontë, which is very difficult to do by traditional methods. Moreover, high accuracies are achieved even on fragments of text little more than 200 words long.

(bibtex)

Generating more-positive and more-negative text,
Diana Inkpen and Ol'ga Feiguina and Graeme Hirst, 2005
In: James G. Shanahan and Yan Qu and Janyce Wiebe (editors), Computing attitude and affect in text, Dordrecht, The Netherlands, Springer.
Supersedes March 2004 AAAI Symposium version
Abstract
We present experiments on modifying the semantic orientation of the near-synonyms in a text. We analyze a text into an interlingual representation and a set of attitudinal nuances, with particular focus on its near-synonyms. Then we use our text generator to produce a text with the same meaning but changed semantic orientation (more positive or more negative) by replacing, wherever possible, words with near-synonyms that differ in their expressed attitude.

[Download pdf] (bibtex)

Generating more-positive and more-negative text,
Diana Inkpen and Ol'ga Feiguina and Graeme Hirst,
2004
AAAI Spring Symposium on Exploring Attitude and Affect in Text, March, Stanford University
published as AAAI technical report SS-04-07. Superseded by 2005 book version
(bibtex)
Mary Ellen Foster (1)

Automatically generating text to accompany information graphics,
Mary Ellen Foster,
1999
Master's Thesis. Department of Computer Science, University of Toronto. August. Published as technical report CSRI-397.
Abstract

Generally, when quantitative information is to be presented, some form of graphical presentation is used, often with a textual caption to ensure that the audience notices particular aspects of the data.

This thesis presents the principles that should be followed by a system aiming to produce such captions automatically. The process of caption generation is examined in the context of the standard tasks in text generation. Most previous systems in this area produce textual summaries intended to stand alone; the issues involved in producing a caption differ as the text must be coordinated with the graphic it is to accompany. The thesis also presents CAPUT, a prototype caption-generation system which follows these principles to generate single-sentence captions for information graphics of the type that might appear in a newspaper article. Finally, extensions to CAPUT that would bring it from a prototype to a full-fledged caption generation system are proposed.


[Download pdf] (bibtex)
Timothy Alexander Dalton Fowler (2)

LC Graphs for the Lambek calculus with product,
Timothy Alexander Dalton Fowler,
2007
Proceedings of Mathematics of Language 10, Los Angeles, California
Abstract
Since the introduction of the Lambek calculus in Lambek (1958), there has been a great deal of interest in its usefulness as a grammar for parsing in natural language. In 2003, Pentus proved that the version of the calculus with the product is NP-complete, while the version which omits the product has a computational complexity that is still unknown. This paper presents graph formalism similar to that of Penn (2001) for the Lambek calculus with product and then examines the differences between the two calculi by way of this new graph formalism.

[Download pdf] (bibtex)

A graph formalism for proofs in the Lambek calculus with product,
Timothy Alexander Dalton Fowler,
2006
Master's Thesis. Department of Computer Science, University of Toronto.
Abstract

Since the introduction of the Lambek calculus in Lambek (1958), there has been a great deal of interest in its usefulness as a grammar for parsing in natural language. Sev- eral variants have been introduced and for each, questions of tractability and usefulness have been posited and answered. Pentus (2003) answered the question of tractability of the original calculus by providing an NP-completeness proof.

The simplest of these variants is the version of the original calculus without the product, the computational complexity of which is as yet unknown. This thesis seeks to identify the precise implications of products on the complexity of parsing in the calculus. Towards this goal, a graph formalism for proofs in the original calculus is extended from the work in Penn (2001).

We then present a simplified, graphical NP-completeness proof for derivability in the Lambek calculus with product and consider the potential intractability of the Lambek calculus without product.


[Download pdf] (bibtex)
Ulrich Germann (2)

Yawat: Yet Another Word Alignment Tool,
Ulrich Germann,
2008
Proceedings of the ACL-08: HLT Demo Session, pp. 20--23, June, Columbus, Ohio
Association for Computational Linguistics The associated poster
[Download pdf] (bibtex)

An iterative approach to pitch-marking of speech signals without electroglottographic data,
Ulrich Germann,
2006
April
Abstract

We propose an iterative approach to high-quality pitch-marking of speech recordings without the use of laryngographic data. Our method first identifies islands of pitch marks that can be determined with high confidence. These islands are then extended into neighboring regions. A second round of island identification and extension with lower quality requirements fills the remaining gaps. We evaluate this pitch-marking method against pitch-marks produced with the Praat sound analysis software.


[Download pdf] (bibtex)
Angela Glover (2)

Automatically detecting stylistic inconsistencies in computer-supported collaborative writing,
Angela Glover,
1996
Master's Thesis. Ontario Institute for Studies in Education, University of Toronto. January. Published as technical report CSRI-340.
Abstract

Collaborative writing is increasingly common in both professional and academic fields. One difficulty that collaborative writers face is trying to produce a consistent style, as each writer may bring a distinctive style to the collaborative writing task. I investigated the viability of using stylostatistical techniques to discover describable, computationally tractable stylistic tests to help collaborative writers eliminate such differences.

Writing samples were collected by having graduate students watch two halves of a television episode, then write a summary of each half. Automatically generated syntactic information was used in statistical analyses to ascertain which halves differed significantly. Examination of the statistically significant results revealed a wide variety of inconsistencies on various levels. Many of these inconsistencies were not immediately obvious before the stylostatistical test results were known. I therefore conclude that stylostatistical techniques provide a promising approach for creating a computer tool to accelerate and improve people's detection of stylistic inconsistencies.


[Download pdf] (bibtex)

Detecting stylistic inconsistencies in collaborative writing,
Angela Glover and Graeme Hirst, 1996
In: Mike Sharples and Thea van der Geest (editors), The new writing environment: Writers at work in a world of technology, London, UK, Springer-Verlag.
Abstract

When two or more writers collaborate on a document by each contributing pieces of text, the problem can arise that while each might be an exemplary piece of writing, they do not cohere into a document that speaks with a single voice. That is, they are stylistically inconsistent. But given a stylistically inconsistent document, people often find it hard to articulate exactly where the problems lie. Rather, they feel that something is wrong but can't quite say why.

An example of stylistic inconsistency can be seen in the following sentence, which is from a brochure given to hospital patients who are to undergo a cardiac catheterization. (The parenthesized numbers are ours, to refer to the individual clauses.)

(1) Once the determination for a cardiac catheterization has been made, (2) various tests will need to be performed (3) to properly assess your condition prior to the procedure.

Clause 1 and (to a slightly lesser extent) clause 3 are in medical talk, as if in a formal communication from physician to physician; clause 2 is much more informal, and is expressed in ordinary lay language. The effect of the two styles mixed together in the one sentence is a feeling of incongruity---which was presumably not intended by the author or authors. This example, however, is unusual in its brevity. More often, the problem of inconsistency emerges only over longer stretches of text, especially where the granularity of the multiple authorship is at the paragraph, section, or chapter level. Moreover, while stylistic inconsistencies arise primarily in jointly written documents, we do not exclude the possibility of their occurrence in singly authored texts, especially those where different parts were written at different times or, initially, for different purposes.

Our ultimate goal in this research is to build software that will help with this problem---that will point out stylistic inconsistencies in a document, and perhaps suggest how they can be fixed. In this paper, we report some of our initial explorations and data collection.


(bibtex)
Neil Graham (3)

Segmenting documents by stylistic character,
Neil Graham and Graeme Hirst and Bhaskara Marthi,
2005
Natural Language Engineering, 11(4), pp. 397--415, December
Supersedes August 2003 workshop version Get paper from publisher's Web site
Abstract
As part of a larger project to develop an aid for writers that would help to eliminate stylistic inconsistencies within a document, we experimented with neural networks to find the points in a text at which its stylistic character changes. Our best results, well above baseline, were achieved with time-delay networks that used features related to the author's syntactic preferences, whereas low-level and vocabulary-based features were not found to be useful. An alternative approach with character bigrams was not successful.

[Download pdf] (bibtex)

Segmenting a document by stylistic character,
Neil Graham and Graeme Hirst,
2003
Workshop on Computational Approaches to Style Analysis and Synthesis 18th International Joint Conference on Artificial Intelligence, August, Acapulco, Mexico
Superceded by extended journal version
(bibtex)

Automatic Detection of Authorship Changes within Single Documents,
Neil Graham,
2000
Master's Thesis. Department of Computer Science, University of Toronto. January. Published as technical report CSRG-406.
Abstract

One of the most difficult tasks facing anyone who must compile or maintain any large, collaboratively-written document is to foster a consistent style throughout. In this thesis, we explore whether it is possible to identify stylistic inconsistencies within documents even in principle given our understanding of how style can be captured statistically.

We carry out this investigation by computing stylistic statistics on very small samples of text comprising a set of synthetic collaboratively-written documents, and using these statistics to train and test a series of neural networks. We are able to show that this method does allow us to recover the boundaries of authors' contributions. We find that time-delay neural networks, hitherto ignored in this field, are especially effective in this regard. Along the way, we observe that statistics characterizing the syntactic style of a passage appear to hold much more information for small text samples than those concerned with lexical choice or complexity.


[Download pdf] (bibtex)
Stephen J. Green (7)

Lexical semantics and automatic hypertext construction,
Stephen J. Green,
1999
ACM Computing Surveys, 31(4es), December
Go to ACM Computing Surveys page for this issue
(bibtex)

Building hypertext links by computing semantic similarity,
Stephen J. Green,
1999
IEEE Transactions on Knowledge and Data Engineering, 11(5), pp. 713--731, September--October
Abstract
Most current automatic hypertext generation systems rely on term repetition to calculate the relatedness of two documents. There are well-recognized problems with such approaches; most notably, a vulnerability to the effects of synonymy (many words for the same concept) and polysemy (many concepts for the same word). We propose a novel method for automatic hypertext generation that is based on a technique called lexical chaining, a method for discovering sequences of related words in a text. This method uses a more general notion of document relatedness, and attempts to take into account the effects of synonymy and polysemy. We also present the results of an empirical study designed to test this method in the context of a question answering task from a database of newspaper articles.

[Download pdf] (bibtex)

Automated link generation: Can we do better than term repetition?,
Stephen J. Green,
1998
Proceedings of the Seventh International World Wide Web Conference, pp. 75--84, April, Brisbane, Australia
Shortlisted for conference's best-paper award.
Abstract
Most current automatic hypertext generation systems rely on term repetition to calculate the relatedness of two documents. There are well-recognized problems with such approaches, most notably they are vulnerable to the linguistic effects of synonymy (many words for the same concept) and polysemy (many concepts for the same word). I propose a novel method for automatic hypertext generation that is based on a technique called lexical chaining, a method for discovering sets of related words in a text. I will also present the results of an empirical study designed to test this method in the context of a question answering task from a database of newspaper articles.

[Download pdf] (bibtex)

Automatically generating hypertext in newspaper articles by computing semantic relatedness,
Stephen J. Green,
1998
NeMLaP3/CoNLL98: New Methods in Language Processing and Computational Natural Language (D.M.W. Powers ed.), January, Macquarie University
[Download pdf] (bibtex)

Automatically generating hypertext by computing semantic similarity,
Stephen J. Green,
1997
Ph.D. Thesis. Department of Computer Science, University of Toronto. August. Published as technical report CSRI-366.
Abstract

We describe a novel method for automatically generating hypertext links within and between newspaper articles. The method is based on lexical chaining a technique for extracting the sets of related words that occur in texts. Links between the paragraphs of a single article are built by considering the distribution of the lexical chains in that article. Links between articles are built by considering how the chains in the two articles are related. By using lexical chaining we mitigate the problems of synonymy and polysemy that plague traditional information retrieval approaches to automatic hypertext generation.

In order to motivate our research, we discuss the results of a study that shows that humans are inconsistent when assigning hypertext links within newspaper articles. Even if humans were consistent, the time needed to build a large hypertext and the costs associated with the production of such a hypertext make relying on human linkers an untenable decision. Thus we are left to automatic hypertext generation.

Because we wish to determine how our hypertext generation methodology performs when compared to other proposed methodologies, we present a study comparing the hypertext linking methodology that we propose with a methodology based on a traditional information retreival approach. In this study, subjects were asked to perform a question-answering task using a combination of links generated by our methodology and the competing methodology. The result is that links between articles generated using our methodology have a significant advantage over links generated by the competing methodology. We show combined results for all subjects tested, along with results based on subjects' experience in using the World Wide Web.

We detail the construction of a system for performing automatic hypertext generation in the context of an online newspaper. The proposed system is fully capable of handling large databases of news articles in an efficient manner.


[Download pdf] (bibtex)

Building hypertext links in newspaper articles using semantic similarity,
Stephen J. Green,
1997
Third Workshop on Applications of Natural Language to Information Systems (NLDB '97), pp. 26--27 178--190, June, Vancouver, B.C.
Abstract
We discuss an automatic method for the construction of hypertext links within and between newspaper articles. The method comprises three steps: determining the lexical chains in a text, building links between the paragraphs of articles, and building links between articles. Lexical chains capture the semantic relations between words that occur throughout a text. Each chain is a set of related words that captures a portion of the cohesive structure of a text. By considering the distribution of chains within an article, we can build links between the paragraphs. By computing the similarity of the chains contained in two different articles, we can decide whether or not to place a link between them.

[Download pdf] (bibtex)

Using lexical chains to build hypertext links in newspaper articles,
Stephen J. Green,
1996
Working notes of the AAAI workshop on Internet-based Information Systems, August, Portland, OR
Abstract
We discuss an automatic method for the construction of hypertext links within and between newspaper articles. The method comprises three steps: determining the lexical chains in a text, building links between the paragraphs of articles, and building links between articles. Lexical chains capture the semantic relations between words that occur throughout a text. Each chain is a set of related words that captures a portion of the cohesive structure of a text. By considering the distribution of chains within an article, we can build links between the paragraphs. By comparing the chains contained in two different articles, we can decide whether or not to place a link between them. We also present the results of an experiment designed to measure inter-linker consistency in the manual construction of hypertext links between the paragraphs of newspaper articles. The results show that inter-linker consistency is low, but better than that obtained in a previous experiment.

[Download pdf] (bibtex)
Peter A. Heeman (4)

Collaborating on referring expressions,
Peter A. Heeman and Graeme Hirst,
1995
Computational Linguistics, 21(3), pp. 351--382, September
Abstract
This paper presents a computational model of how conversational participants collaborate in order to make a referring action successful. The model is based on the view of language as goal-directed behavior. We propose that the content of a referring expression can be accounted for by the planning paradigm. Not only does this approach allow the processes of building referring expressions and identifying their referents to be captured by plan construction and plan inference, it also allows us to account for how participants clarify a referring expression by using meta-actions that reason about and manipulate the plan derivation that corresponds to the referring expression. To account for how clarification goals arise and how inferred clarification plans affect the agent, we propose that the agents are in a certain state of mind, and that this state includes an intention to achieve the goal of referring and a plan that the agents are currently considering. It is this mental state that sanctions the adoption of goals and the acceptance of inferred plans, and so acts as a link between understanding and generation.

[Download pdf] (bibtex)

Repairing conversational misunderstandings and non-understandings,
Graeme Hirst and Susan McRoy and Peter A. Heeman and Philip Edmonds and Diane Horton,
1994
Speech communication, 15(3--4), pp. 213--229, December
Abstract
Participants in a discourse sometimes fail to understand one another but, when aware of the problem, collaborate upon or negotiate the meaning of a problematic utterance. To address nonunderstanding, we have developed two plan-based models of collaboration in identifying the correct referent of a description: one covers situations where both conversants know of the referent, and the other covers situations such as direction-giving, where the recipient does not. In the models conversants use the mechanisms of refashioning, suggestion, and elaboration, to collaboratively refine a referring expression until it is successful. To address misunderstanding, we have developed a model that combines intentional and social accounts of discourse to support the negotiation of meaning. The approach extends intentional accounts by using expectations deriving from social conventions in order to guide interpretation. Reflecting the inherent symmetry of the negotiation of meaning, all our models can act as both speaker and hearer, and can play both the role of the conversant who is not understood or misunderstood and the role of the conversant who fails to understand.

[Download pdf] (bibtex)

A computational model of collaboration on referring expressions,
Peter A. Heeman,
1991
Master's Thesis. Department of Computer Science, University of Toronto. October. Published as technical report CSRI-251.
Abstract

In order to refer to an object, a speaker attempts to build a description of the object that will allow the hearer to identify it. Since the description might not enable the hearer to identify the referent, the speaker and hearer might engage in a clarification subdialogue in which they collaborate in order to make the referring action successful.

This thesis presents a computational model of how conversational participants collaborate in order to make a referring action successful. The model is based on the view of language as goal-directed behavior. We propose that the content of a referring expression can be accounted for by the planning paradigm. Not only does this approach allow the process of building referring expressions and identifying their referent to be captured by plan construction and plan inference, it also allows us to account for clarifications of referring expressions by using meta-plans. These meta-plans reason about and manipulate referring expression plans in order to capture how conversational participants collaborate in order to make the referring action successful. To complete this picture we show how participants can infer goals underlying referring expression plans or clarification plans, and how these inferred goals can be used by the participant to adopt its own goals to clarify the referring expression. An important aspect of this process is that subsequent clarifications can either clarify the previous clarification or can clarify the referring expression resulting from the previous clarification. Hence, for the latter case, the same meta-plans that are used to clarify the original referring expression can be used to model the subsequent clarifications.


(bibtex)

Collaborating on referring expressions,
Peter A. Heeman,
1991
Proceedings, 29th annual meeting of the Association for Computational Linguistics, pp. 345--346, June, Berkeley, CA
[Download pdf] (bibtex)
Graeme Hirst (130)

Limitations of the Philosophy of Language Understanding Implicit in Computational Linguistics,
Graeme Hirst,
2009
Proceedings, 7th European Conference on Computing and Philosophy, pp. 108--109, July, Barcelona
Abstract
Contemporary computational linguistics (CL) strives to be strongly empirical and not rely on researchers' intuitions. Yet it relies on intuitions nonetheless: those of the annotators who mark up the data used for training and testing the models that CL develops. Implicit in this is a philosophy of language understanding in which there is a single linguistic reality, a single understanding or interpretation of a text or of its elements, which is open to native-speaker introspection or intuition. Hence all competent native speakers of a language (or dialect) will have the same intuition and, barring error or ill-defined annotation requirements, will annotate any given text or any given linguistic element within a text the same way.

This implicit philosophy is challenged in two ways: 1. Reader-based views of meaning and language understanding: Fish and other postmodernists claim that readers bring their own knowledge and experience to the interpretation of text, which is not necessarily the same as that of the writer or any other reader. Hence the annotation methodology is useful only if working within what Fish calls an ``interpretive community''. 2. Individual differences in cognitive language comprehension processes: It is well established that there are individual differences in the cognitive processes and strategies of language comprehension and that these can sometimes lead to different interpretations of a text.

It's thus not a surprise that CL often finds itself with relatively low inter-annotator agreement on its marked-up data, and systems whose performance is mediocre when trained with or tested on this data. CL needs to recognize the limitations of its implicit philosophy of language understanding and aim instead for systems that are more-explicitly adapted to the individual user -- sytems in which aspects of language that are subject to notable individual differences are indeed modelled on an individual basis.


[Download pdf] (bibtex)

Ontology and the lexicon,
Graeme Hirst, 2009
In: Steffen Staab and Rudi Studer (editors), Handbook on Ontologies (2nd edition), Berlin, Germany, Springer, pp. 269--292.
[Download pdf] (bibtex)

Who decides what a text means?,
Graeme Hirst,
2009
Invited talk given at the Gesellschaft fur Sprachtechnologie und Computerlinguistik, Tagung 2009, October, Potsdam
Abstract
Writer-based and reader-based views of text-meaning are reflected by the respective questions ``What is the author trying to tell me'' and ``What does this text mean to me personally?'' Contemporary computational linguistics, however, generally takes neither view. But this is not adequate for the development of sophisticated applications such as intelligence gathering and question answering. I discuss different views of text-meaning from the perspective of the needs of computational text analysis and the collaborative repair of misunderstanding.

[Download pdf] (bibtex)

Vocabulary changes in Agatha Christie's mysteries as an indication of dementia: A case study,
Ian Lancashire and Graeme Hirst,
2009
19th Annual Rotman Research Institute Conference Cognitive Aging: Research and Practice March Toronto Conference poster available here
Abstract
Although the novelist Agatha Christie was never diagnosed with dementia, it is believed to have been the cause of her decline in her later years. We analyzed the vocabulary size, the repeated use of fixed phrases, and the indefinite noun usage in 16 Agatha Christie novels written between ages 28 and 82. We found statistically significant drops in vocabulary, and increases in repeated phrases and indefinite nouns in 15 detective novels from The Mysterious Affair at Styles to Postern of Fate. These language effects are recognized as symptoms of memory difficulties associated with Alzheimer's disease. Our study supports the conclusion that Agatha Christie's last few novels show early signs of encroaching dementia.

[Download pdf] (bibtex)

Analyzing the text of clinical literature for question answering,
Yun Niu and Graeme Hirst, 2009
In: Violaine Prince and Mathieu Roche (editors), Information Retrieval in Biomedicine, Hershey, PA, IGI Global, pp. 190--220.
See publisher's website
Abstract
The task of question answering (QA) is to find an accurate and precise answer to a natural language question in some predefined text. Most existing QA systems handle fact-based questions that usually take named entities as the answers. In this chapter the authors take clinical QA as an example to deal with more complex information needs. They propose an approach using semantic class analysis as the organizing principle to answer clinical questions. They investigate three semantic classes that correspond to roles in the commonly accepted PICO format of describing clinical scenarios. The three semantic classes are: the description of the patient (or the problem), the intervention used to treat the problem, and the clinical outcome. The authors focus on automatic analysis of two important properties of the semantic classes.

(bibtex)

Extracting synonyms from dictionary definitions,
Tong Wang and Graeme Hirst,
2009
Proceedings, Recent Advances in Natural Language Processing (RANLP) 2009, September, Borovets, Bulgaria
Abstract
We investigate the problem of extracting synonyms from dictionary definitions. Our premise for using definition texts in dictionaries is that, in contrast to free-texts, their composition usually exhibits more regularities in terms of syntax and style and thus, will provide a better controlled environment for synonym extraction. We propose three extraction methods: two rule-based ones and one using the maximum entropy model; each method is evaluated on three experiments --- by solving TOEFL synonym questions, by comparing extraction results with existing thesauri, and by labeling synonyms in definition texts. Results show that simple rule-based extraction methods perform surprisingly well on solving TOEFL synonym questions; they actually out-perform the best reported lexicon-based method by a large margin although they do not correlate as well with existing thesauri.

[Download pdf] (bibtex)

An Evaluation of the Contextual Spelling Checker of Microsoft Office Word 2007,
Graeme Hirst,
2008
January
Abstract
Microsoft Office Word 2007 includes a ``contextual spelling checker'' that is intended to find misspellings that nonetheless form correctly spelled words. In an evaluation on 1400 examples, it is found to have high precision but low recall -- that is, it fails to find most errors, but when it does flag a possible error, it is almost always correct. However, its performance in terms of F is inferior to that of the trigrams-based method of Mays, Damerau, and Mercer (1991).

[Download pdf] (bibtex)

The future of text-meaning in computational linguistics,
Graeme Hirst,
2008
Proceedings, 11th International Conference on Text, Speech and Dialogue (TSD 2008) (Lecture Notes in Artificial Intelligence 5246, Springer-Verlag) (Sojka, Petr; Horák, Aleš; Kopeček, Ivan; and Pala, Karel ed.), pp. 1--9, September, Brno, Czech Republic
Abstract
Writer-based and reader-based views of text-meaning are reflected by the respective questions ``What is the author trying to tell me'' and ``What does this text mean to me personally?'' Contemporary computational linguistics, however, generally takes neither view. But this is not adequate for the development of sophisticated applications such as intelligence gathering and question answering. I discuss different views of text-meaning from the perspective of the needs of computational text analysis and the collaborative repair of misunderstanding.

[Download pdf] (bibtex)

Computing Word-Pair Antonymy,
Saif Mohammad and Bonnie Dorr and Graeme Hirst,
2008
2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), October, Waikiki, Hawaii
Abstract

Knowing the degree of antonymy between words has widespread applications in natural language processing. Manually-created lexicons have limited coverage and do not include most semantically contrasting word pairs. We present a new automatic and empirical measure of antonymy that combines corpus statistics with the structure of a published thesaurus. The approach is evaluated on a set of closest-opposite questions, obtaining a precision of over 80%. Along the way, we discuss what humans consider antonymous and how antonymy manifests itself in utterances.


[Download pdf] (bibtex)

Towards a Comparative Database of Dysarthric Articulation,
Frank Rudzicz and Pascal van Lieshout and Graeme Hirst and Gerald Penn and Fraser Shein and Talya Wolff,
2008
Proceedings of the eighth International Seminar on Speech Production (ISSP'08), December, Strasbourg France
[Download pdf] (bibtex)

Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model,
L. Amber Wilcox-O'Hearn and Graeme Hirst and Alexander Budanitsky,
2008
Proceedings, 9th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2008) (Lecture Notes in Computer Science 4919, Springer-Verlag) (Alexander Gelbukh ed.), pp. 605--616, February, Haifa
Conference poster with updated results available here
Abstract
The trigram-based noisy-channel model of real-word spelling-error correction that was presented by Mays, Damerau, and Mercer in 1991 has never been adequately evaluated or compared with other methods. We analyze the advantages and limitations of the method, and present a new evaluation that enables a meaningful comparison with the WordNet-based method of Hirst and Budanitsky. The trigram method is found to be superior, even on content words. We then improve the method further and experiment with a new variation that optimizes over fixed-length windows instead of over sentences.

[Download pdf] (bibtex)

Authorship attribution for small texts: Literary and forensic experiments,
Ol'ga Feiguina and Graeme Hirst,
2007
Proceedings, International Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection, 30th Annual International ACM SIGIR Conference (SIGIR '07), July, Amsterdam
[Download pdf] (bibtex)

Views of text-meaning in computational linguistics: Past, present, and future,
Graeme Hirst, 2007
In: Gordana Dodig-Crnkovic and Susan Stuart (editors), Computation, Information, Cognition -- The Nexus and the Liminal, Newcastle-upon-Tyne, Cambridge Scholars Publishing, pp. 270--279.
Abstract
Three views of text-meaning compete in the philosophy of language: objective, subjective, and authorial --- ``in'' the text, or ``in'' the reader, or ``in'' the writer. Computational linguistics has ignored the competition and implicitly embraced all three, and rightly so; but different views have predominated at different times and in different applications. Contemporary applications mostly take the crudest view: meaning is objectively ``in'' a text. The more-sophisticated applications now on the horizon, however, demand the other two views: as the computer takes on the user's purpose, it must also take on the user's subjective views; but sometimes, the user's purpose is to determine the author's intent. Accomplishing this requires, among other things, an ability to determine what could have been said but wasn't, and hence a sensitivity to linguistic nuance. It is therefore necessary to develop computational mechanisms for this sensitivity.

[Download pdf] (bibtex)

Bigrams of syntactic labels for authorship discrimination of short texts,
Graeme Hirst and Ol'ga Feiguina,
2007
Literary and Linguistic Computing, 22(4), pp. 405--417
Get paper from publisher's Web site doi: 10.1093/llc/fqm023
Abstract
We present a method for authorship discrimination that is based on the frequency of bigrams of syntactic labels that arise from partial parsing of the text. We show that this method, alone or combined with other classification features, achieves a high accuracy on discrimination of the work of Anne and Charlotte Brontë, which is very difficult to do by traditional methods. Moreover, high accuracies are achieved even on fragments of text little more than 200 words long.

(bibtex)

TOR, TORMD: Distributional profiles of concepts for unsupervised word sense disambiguation,
Saif Mohammad and Graeme Hirst and Philip Resnik,
2007
SemEval-2007: 4th International Workshop on Semantic Evaluations, June, Prague
The conference poster
Abstract

Words in the context of a target word have long been used as features by supervised word-sense classifiers. Mohammad and Hirst (2006a) proposed a way to determine the strength of association between a sense or concept and co-occurring words --- the distributional profile of a concept (DPC) --- without the use of manually annotated data. We implemented an unsupervised naive Bayes word sense classifier using these DPCs that was best or within one percentage point of the best unsupervised systems in the Multilingual Chinese--English Lexical Sample Task (task #5) and the English Lexical Sample Task (task #17). We also created a simple PMI-based classifier to attempt the English Lexical Substitution Task (task #10); however, its performance was poor.


[Download pdf] (bibtex)

Cross-lingual distributional profiles of concepts for measuring semantic distance,
Saif Mohammad and Iryna Gurevych and Graeme Hirst and Torsten Zesch,
2007
2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), June, Prague
Abstract

We present the idea of estimating semantic distance in one, possibly resource-poor, language using a knowledge source in another, possibly resource-rich, language. We do so by creating cross-lingual distributional profiles of concepts, using a bilingual lexicon and a bootstrapping algorithm, but without the use of any sense-annotated data or word-aligned corpora. The cross-lingual measures of semantic distance are evaluated on two tasks: (1) estimating semantic distance between words and ranking the word pairs according to semantic distance, and (2) solving Reader's Digest `Word Power' problems. In task (1), cross-lingual measures are superior to conventional monolingual measures based on a wordnet. In task (2) cross-lingual measures are able to solve more problems correctly, and despite scores being affected by many tied answers, their overall performance is again better than the best monolingual measures.


[Download pdf] (bibtex)

Identifying cores of semantic classes in unstructured text with a semi-supervised learning approach,
Yun Niu and Graeme Hirst,
2007
Proceedings, International Conference on Recent Advances in Natural Language Processing, September, Borovets, Bulgaria
[Download pdf] (bibtex)

Evaluating WordNet-based measures of semantic distance,
Alexander Budanitsky and Graeme Hirst,
2006
Computational Linguistics, 32(1), pp. 13--47, March
Abstract
The quantification of lexical semantic relatedness has many applications in NLP, and many different measures have been proposed. We evaluate five of these measures, all of which use WordNet as their central resource, by comparing their performance in detecting and correcting real-word spelling errors. An information-content--based measure proposed by Jiang and Conrath is found superior to those proposed by Hirst and St-Onge, Leacock and Chodorow, Lin, and Resnik. In addition, we explain why distributional similarity is not an adequate proxy for lexical semantic relatedness.

[Download pdf] (bibtex)

Foreword,
Graeme Hirst, 2006
In: Eneko Agirre and Philip Edmonds (editors), Word Sense Disambiguation: Applications and algorithms, Springer, pp. xvii--xix.
Word Sense Disambiguation: Applications and algorithms
(bibtex)

Building and using a lexical knowledge-base of near-synonym differences,
Diana Inkpen and Graeme Hirst,
2006
Computational Linguistics, 32(2), pp. 223--262, June
Abstract
Choosing the wrong word in a machine translation or natural language generation system can convey unwanted connotations, implications, or attitudes. The choice between near-synonyms such as error mistake, slip, and blunder --- words that share the same core meaning, but differ in their nuances --- can be made only if knowledge about their differences is available.

We present a method to automatically acquire a new type of lexical resource: a knowledge-base of near-synonym differences. We develop an unsupervised decision-list algorithm that learns extraction patterns from a special dictionary of synonym differences. The patterns are then used to extract knowledge from the text of the dictionary.

The initial knowledge-base is later enriched with information from other machine-readable dictionaries. Information about the collocational behavior of the near-synonyms is acquired from free text. The knowledge-base is used by Xenon, a natural language generation system that shows how the new lexical resource can be used to choose the best near-synonym in specific situations.


[Download pdf] (bibtex)

Distributional measures of concept-distance: A task-oriented evaluation,
Saif Mohammad and Graeme Hirst,
2006
Proceedings, 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), July, Sydney, Australia
Abstract
We propose a framework to derive the distance between concepts from distributional measures of word co-occurrences. We use the categories in a published thesaurus as coarse-grained concepts allowing all possible distance values to be stored in a concept--concept matrix roughly .01% the size of that created by existing measures. We show that the newly proposed concept-distance measures outperform traditional distributional word-distance measures in the tasks of (1) ranking word pairs in order of semantic distance and (2) correcting realword spelling errors. In the latter task, of all the WordNet-based measures, only that proposed by Jiang and Conrath outperforms the best distributional conceptdistance measures.

[Download pdf] (bibtex)

Determining word sense dominance using a thesaurus,
Saif Mohammad and Graeme Hirst,
2006
Proceedings of the 11th conference of the European chapter of the Association for Computational Linguistics (EACL-2006), pp. 121--128, April, Trento, Italy
Abstract
The degree of dominance of a sense of a word is the proportion of occurrences of that sense in text. We propose four new methods to accurately determine word sense dominance using raw text and a published thesaurus. Unlike the McCarthy et al. (2004) system, these methods can be used on relatively small target texts, without the need for a similarly-sense-distributed auxiliary text. We perform an extensive evaluation using artificially generated thesaurus-sense-tagged data. In the process, we create a word--category co-occurrence matrix, which can be used for unsupervised word sense disambiguation and estimating distributional similarity of word senses, as well.

[Download pdf] (bibtex)

Using outcome polarity in sentence extraction for medical question-answering,
Yun Niu and Xiaodan Zhu and Graeme Hirst,
2006
Proceedings of the American Medical Informatics Association 2006 Annual Symposium, November, Washington, D.C.
Abstract
Multiple pieces of text describing various pieces of evidence in clinical trials are often needed in answering a clinical question. We explore a multi-document summarization approach to automatically find this information for questions about effects of using a medication to treat a disease. Sentences in relevant documents are ranked according to various features by a machine-learning approach. Those with higher scores are more important and will be included in the summary. The presence of clinical outcomes and their polarity are incorporated into the approach as features for determining importance of sentences, and the effectiveness of this is investigated, along with that of other textual features. The results show that information on clinical outcomes improves the performance of summarization.

[Download pdf] (bibtex)

Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model,
L. Amber Wilcox-O'Hearn and Graeme Hirst and Alexander Budanitsky,
2006
February Superseded by 2008 CICLing version.
Abstract
The trigram-based noisy-channel model of real-word spelling-error correction that was presented by Mays, Damerau, and Mercer in 1991 has never been adequately evaluated or compared with other methods. We analyze the advantages and limitations of the method, and present a new evaluation that enables a meaningful comparison with the WordNet-based method of Hirst and Budanitsky. The trigram method is found to be superior, even on content words. We then improve the method further and experiment with a new variation that optimizes over fixed-length windows instead of over sentences.

[Download pdf] (bibtex)

Segmenting documents by stylistic character,
Neil Graham and Graeme Hirst and Bhaskara Marthi,
2005
Natural Language Engineering, 11(4), pp. 397--415, December
Supersedes August 2003 workshop version Get paper from publisher's Web site
Abstract
As part of a larger project to develop an aid for writers that would help to eliminate stylistic inconsistencies within a document, we experimented with neural networks to find the points in a text at which its stylistic character changes. Our best results, well above baseline, were achieved with time-delay networks that used features related to the author's syntactic preferences, whereas low-level and vocabulary-based features were not found to be useful. An alternative approach with character bigrams was not successful.

[Download pdf] (bibtex)

Correcting real-word spelling errors by restoring lexical cohesion,
Graeme Hirst and Alexander Budanitsky,
2005
Natural Language Engineering, 11(1), pp. 87--111, March
Get paper from publisher's Web site
Abstract
Spelling errors that happen to result in a real word in the lexicon cannot be detected by a conventional spelling checker. We present a method for detecting and correcting many such errors by identifying tokens that are semantically unrelated to their context and are spelling variations of words that would be related to the context. Relatedness to context is determined by a measure of semantic distance initially proposed by Jiang and Conrath (1997). We tested the method on an artificial corpus of errors; it achieved recall of up to 50% and precision of 18 to 25% -- levels that approach practical usability.

[Download pdf] (bibtex)

Generating more-positive and more-negative text,
Diana Inkpen and Ol'ga Feiguina and Graeme Hirst, 2005
In: James G. Shanahan and Yan Qu and Janyce Wiebe (editors), Computing attitude and affect in text, Dordrecht, The Netherlands, Springer.
Supersedes March 2004 AAAI Symposium version
Abstract
We present experiments on modifying the semantic orientation of the near-synonyms in a text. We analyze a text into an interlingual representation and a set of attitudinal nuances, with particular focus on its near-synonyms. Then we use our text generator to produce a text with the same meaning but changed semantic orientation (more positive or more negative) by replacing, wherever possible, words with near-synonyms that differ in their expressed attitude.

[Download pdf] (bibtex)

Semantic knowledge in a word completion task,
Jianhua Li and Graeme Hirst,
2005
Proceedings, 7th International ACM SIGACCESS Conference on Computers and Accessibility, October, Baltimore, MD
Abstract
We propose a combinatory approach to interactive word-completion for users with linguistic disabilities in which semantic knowledge combines with n-gram probabilities to predict semantically more-appropriate words than n-gram methods alone. The semantic knowledge is used to measure the semantic association of completion candidates with the context. Experimental results show a performance improvement when using the combinatory model for the completion of nouns.

[Download pdf] (bibtex)

Distributional measures as proxies for semantic relatedness,
Saif Mohammad and Graeme Hirst,
2005
January
Abstract
The automatic ranking of word pairs as per their semantic relatedness and ability to mimic human notions of semantic relatedness has widespread applications. Measures that rely on raw data (distributional measures) and those that use knowledge-rich ontologies both exist. Although extensive studies have been performed to compare ontological measures with human judgment, the distributional measures have primarily been evaluated by indirect means. This paper is a detailed study of some of the major distributional measures; it lists their respective merits and limitations. New measures that overcome these drawbacks, that are more in line with the human notions of semantic relatedness, are suggested. The paper concludes with an exhaustive comparison of the distributional and ontology-based measures. Along the way, significant research problems are identified. Work on these problems may lead to a better understanding of how semantic relatedness is to be measured.

[Download pdf] (bibtex)

The subjectivity of lexical cohesion in text,
Jane Morris and Graeme Hirst, 2005
In: James G. Shanahan and Yan Qu and Janyce Wiebe (editors), Computing attitude and affect in text, Dordrecht, The Netherlands, Springer.
Supersedes March 2004 AAAI Symposium version The conference poster
Abstract
A reader's perception of even an ``objective'' text is to some degree subjective. We present the results of a pilot study in which we looked at the degree of subjectivity in readers' perceptions of lexical semantic relations, which are the building blocks of the lexical chains used in many applications in natural language processing.

[Download pdf] (bibtex)

Analysis of polarity information in medical text,
Yun Niu and Xiaodan Zhu and Jianhua Li and Graeme Hirst,
2005
Proceedings of the American Medical Informatics Association 2005 Annual Symposium, pp. 570--574, October, Washington, D.C.
Abstract
Knowing the polarity of clinical outcomes is important in answering questions posed by clinicians in patient treatment. We treat analysis of this information as a classification problem. Natural language processing and machine learning techniques are applied to detect four possibilities in medical text: no outcome, positive outcome, negative outcome, and neutral outcome. A supervised learning method is used to perform the classification at the sentence level. Five feature sets are constructed: UNIGRAMS, BIGRAMS, CHANGE PHRASES, NEGATIONS, and CATEGORIES. The performance of different combinations of feature sets is compared. The results show that generalization using the category information in the domain knowledge base Unified Medical Language System is effective in the task. The effect of context information is significant. Combining linguistic features and domain knowledge leads to the highest accuracy.

[Download pdf] (bibtex)

Ontology and the lexicon,
Graeme Hirst, 2004
In: Steffen Staab and Rudi Studer (editors), Handbook on Ontologies, Berlin, Germany, Springer, pp. 209--229.
Superseded by 2009 version.
(bibtex)

Generating more-positive and more-negative text,
Diana Inkpen and Ol'ga Feiguina and Graeme Hirst,
2004
AAAI Spring Symposium on Exploring Attitude and Affect in Text, March, Stanford University
published as AAAI technical report SS-04-07. Superseded by 2005 book version
(bibtex)

Non-classical lexical semantic relations,
Jane Morris and Graeme Hirst,
2004
Workshop on Computational Lexical Semantics Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, May, Boston, MA
Reprinted in: Hanks, Patrick (editor), Lexicology: Critical Concepts in Linguistics, Routledge, 2007.
Abstract
NLP methods and applications need to take account not only of ``classical'' lexical relations, as found in WordNet, but the less-structural, more context-dependent ``non-classical'' relations that readers intuit in text. In a reader-based study of lexical relations in text, most were found to be of the latter type. The relationships themselves are analyzed, and consequences for NLP are discussed.

[Download pdf] (bibtex)

The subjectivity of lexical cohesion in text,
Jane Morris and Graeme Hirst,
2004
AAAI Spring Symposium on Exploring Attitude and Affect in Text, March, Stanford University
Published as AAAI technical report SS-04-07, Superseded by 2005 book version.
(bibtex)

Analysis of semantic classes in medical text for question answering,
Yun Niu and Graeme Hirst,
2004
Workshop on Question Answering in Restricted Domains, 42nd Annual Meeting of the Association for Computational Linguistics, July, Barcelona, Spain
Abstract
To answer questions from clinical evidence texts, we identify occurrences of the semantic classes --- disease, medication, patient outcome --- that are candidate elements of the answer, and the relations among them. Additionally, we determine whether an outcome is positive or negative.

[Download pdf] (bibtex)

Collocations as cues to semantic orientation,
Faye Baron and Graeme Hirst,
2003
[Download pdf] (bibtex)

Testing the efficacy of part-of-speech information in word completion,
Afsaneh Fazly and Graeme Hirst,
2003
Workshop on Language Modeling for Text Entry Methods 11th Conference of the European Chapter of the Association for Computational Linguistics, April, Budapest, Hungary
Abstract
We investigate the effect of incorporating syntactic information into a word-completion algorithm. We introduce two new algorithms that combine part-of-speech tag trigrams with word bigrams, and evaluate them with a testbench constructed for the purpose. The results show a small but statistically significant improvement in keystroke savings for one of our algorithms over baselines that use only word n-grams.

[Download pdf] (bibtex)

Segmenting a document by stylistic character,
Neil Graham and Graeme Hirst,
2003
Workshop on Computational Approaches to Style Analysis and Synthesis 18th International Joint Conference on Artificial Intelligence, August, Acapulco, Mexico
Superceded by extended journal version
(bibtex)

Paraphrasing paraphrased,
Graeme Hirst,
2003
July Sapporo, Japan Invited talk at Second International Workshop on Paraphrasing, 41st Annual Meeting of the Association for Computational Linguistics
[Download pdf] (bibtex)

Natural language processing, Disambiguation in,
Graeme Hirst, 2003
In: Encyclopedia of Cognitive Science, Volume 3, Nature Publishing Group (Macmillan), pp. 181--188 .
Encyclopedia of Cognitive Science
(bibtex)

Near-synonym choice in natural language generation,
Diana Inkpen and Graeme Hirst,
2003
International Conference RANLP-2003 (Recent Advances in Natural Language Processing), pp. 204--211, September, Borovets, Bulgaria
Reprinted, slighly abridged, in Recent Advances in Natural Language Processing III, John Benjamins Publishing Company, 2004 (Selected papers from RANLP 2003 edited by Nicolas Nicolov, Kalina Bontcheva, Galia Angelova, and Ruslan Mitkov
Abstract
We present Xenon, a natural language generation system capable of distinguishing between near-synonyms. It integrates a near-synonym choice module with an existing sentence realization module. We evaluate Xenon using English and French near-synonyms.

[Download pdf] (bibtex)

Automatic sense disambiguation of the near-synonyms in a dictionary entry,
Diana Inkpen and Graeme Hirst,
2003
Proceedings, 4th Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2003), pp. 258--267, February, Mexico City, Mexico
Abstract
We present an automatic method to disambiguate the senses of the near-synonyms in the entries of a dictionary of synonyms. We combine different indicators that take advantage of the structure on the entries and of lexical knowledge in WordNet. We also present the results of human judges doing the disambiguation for 50 randomly selected entries. This small amount of annotated data is used to tune and evaluate our system.

[Download pdf] (bibtex)

Term relationships and their contribution to text semantics and information literacy through lexical cohesion,
Jane Morris and Clare Beghtol and Graeme Hirst,
2003
Proceedings of the 31st Annual Conference of the Canadian Association for Information Science, May--June, Halifax, NS
Abstract
An analysis of linguistic approaches to determining the lexical cohesion in text reveals differ-ences in the types of lexical semantic relations (term relationships) that contribute to the continuity of lexical meaning in the text. Differences were also found in how these lexical relations join words to-gether, sometimes with grammatical relations, to form larger groups of related words that sometimes exhibit a more tightly-knit internal structure than a simple chain of words. Further analysis of the lexical semantic relations indicates a specific need to focus on a neglected group of relations, referred to as non-classical relations, and a general need to focus on relations in the context of text. Experiments with human readers of text are suggested to investigate these issues, as well as address the lack of research that uses human subjects to identify reader-oriented relations. Because lexical cohesion contributes to the semantic understanding of text, these reader-oriented relations have potential relevance to improving access to text-based information. As well, the structured groups of words formed using a combination of lexical and grammatical relations has potential computational benefits to lexical cohesion analysis of text.

[Download pdf] (bibtex)

Answering Clinical Questions with Role Identification,
Yun Niu and Graeme Hirst and Gregory McArthur and Patricia Rodriguez-Gianolli,
2003
Proceedings, Workshop on Natural Language Processing in Biomedicine, 41st annual meeting of the Association for Computational Linguistics, July, Sapporo, Japan
Abstract
We describe our work in progress on natural language analysis in medical question-answering in the context of a broader medical text-retrieval project. We analyze the limitations in the medical domain of the technologies that have been developed for general question-answering systems, and describe an alternative approach whose organizing principle is the identification of semantic roles in both question and answer texts that correspond to the fields of PICO format.

[Download pdf] (bibtex)

Near-synonymy and lexical choice,
Philip Edmonds and Graeme Hirst,
2002
Computational Linguistics, 28(2), pp. 105--144, June
Abstract

We develop a new computational model for representing the fine-grained meanings of near-synonyms and the differences between them. We also develop a sophisticated lexical-choice process that can decide which of several near-synonyms is most appropriate in a particular situation. This research has direct applications in machine translation and text generation.

We first identify the problems of representing near-synonyms in a computational lexicon and show that no previous model adequately accounts for near-synonymy. We then propose a preliminary theory to account for near-synonymy, relying crucially on the notion of granularity of representation, in which the meaning of a word arises out of a context-dependent combination of a context-independent core meaning and a set of explicit differences to its near-synonyms. That is, near-synonyms cluster together.

We then develop a clustered model of lexical knowledge, derived from the conventional ontological model. The model cuts off the ontology at a coarse grain, thus avoiding an awkward proliferation of language-dependent concepts in the ontology, and groups near-synonyms into subconceptual clusters that are linked to the ontology. A cluster differentiates near-synonyms in terms of fine-grained aspects of denotation, implication, expressed attitude, and style. The model is general enough to account for other types of variation, for instance in collocational behaviour.

An efficient, robust, and flexible fine-grained lexical-choice process is a consequence of a clustered model of lexical knowledge. To make it work, we formalize criteria for lexical choice as preferences to express certain concepts with varying indirectness, to express attitudes, and to establish certain styles. The lexical-choice process itself works on two tiers: between clusters and between near-synonyns of clusters. We describe our prototype implementation of the system, called I-Saurus.


[Download pdf] (bibtex)

Negotiation, compromise, and collaboration in interpersonal and human--computer conversations,
Graeme Hirst,
2002
Proceedings, Workshop on Meaning Negotiation, 18th National Conference on Artificial Intelligence, pp. 1--4, July, Edmonton, AB
Abstract
People are very adept at recognizing when something they said has been misunderstood by a conversational partner and at recognizing when they themselves have misunderstood something that was said earlier in the conversation. In either case, they will usually say something to repair the situation and regain mutual understanding. If computers are ever to converse with humans in natural language they must be as adept as people are in their ability to detect and repair both their own occasional misunderstandings and also those of their conversational partner. The processes through which conversational repairs take place include negotiation, collaboration and construction of meaning. By modeling the mechanisms for collaboration and negotiation that natural language uses that we will be able to develop mechanisms for semantic interoperability in complex non-linguistic forms of communication.

[Download pdf] (bibtex)

Acquiring collocations for lexical choice between near-synonyms,
Diana Inkpen and Graeme Hirst,
2002
SIGLEX Workshop on Unsupervised Lexical Acquisition 40th meeting of the Association for Computational Linguistics, June, Philadelphia, PA
Abstract
We extend a lexical knowledge-base of near-synonym differences with knowledge about their collocational behaviour. This type of knowledge is useful in the process of lexical choice between near-synonyms. We acquire collocations for the near-synonyms of interest from a corpus (only collocations with the appropriate sense and part-of-speech). For each word that collocates with a near-synonym we use a differential test to learn whet her the word forms a less-preferred collocation or an anti-collocation with other near-synonyms in the same cluster. For this task we use a much larger corpus (the Web). We also look at associations (longer-distance co-occurrences) as a possible source of learning more about nuances that the near-synonyms may carry.

[Download pdf] (bibtex)

Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures,
Alexander Budanitsky and Graeme Hirst,
2001
Workshop on WordNet and Other Lexical Resources Second meeting of the North American Chapter of the Association for Computational Linguistics, pp. 29--34, June, Pittsburgh PA
Abstract
Five different proposed measures of similarity or semantic distance in WordNet were experimentally compared by examining performance their in a real-word spelling correction system. It was found that Jiang and Conrath's measure gave the best results overall. That of Hirst and St-Onge seriously over-related, that of Resnik seriously under-related, and those of Lin and of Leacock and Chodorow fell in between.

[Download pdf] (bibtex)

Review of The Longman Grammar of Spoken and Written English,
Graeme Hirst,
2001
Computational linguistics, 27(1), pp. 132--139, March
[Review of: Longman Grammar of Spoken and Written English by Douglas Biber, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan; Harlow, Essex: Pearson Education Ltd, 1999]
[Download pdf] (bibtex)

Building a lexical knowledge-base of near-synonym differences,
Diana Inkpen and Graeme Hirst,
2001
Workshop on WordNet and Other Lexical Resources Second meeting of the North American Chapter of the Association for Computational Linguistics, June, Pittsburgh, PA
Abstract
In machine translation and natural language generation, making the wrong word choice from a set of near-synonyms can be imprecise or awkward, or convey unwanted implications. Using Edmonds's model of lexical knowledge to represent clusters of near-synonyms, our goal is to automatically derive a lexical knowledge-base from the Choose the Right Word dictionary of near-synonym discrimination. We do this by automatically classifying sentences in this dictionary according to the classes of distinctions they express. We use a decision-list learning algorithm to learn words and expressions that characterize the classes DENOTATIONAL DISTINCTIONS and ATTITUDE-STYLE DISTINCTIONS. These results are then used by an extraction module to actually extract knowledge from each sentence. We also integrate a module to resolve anaphors and word-to-word comparisons. We evaluate the results of our algorithm for several randomly selected clusters against a manually built standard solution, and compare them with the results of a baseline algorithm. Improvements on previous results are due in part to the addition of a coreference module.

[Download pdf] (bibtex)

Experiments on extracting knowledge from a machine-readable dictionary of synonym differences,
Diana Inkpen and Graeme Hirst, 2001
In: Gelbukh, Alexander (editor), Computational Linguistics and Intelligent Text Processing (Proceedings, Second Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, February 2001), Berlin, Springer-Verlag, pp. 264--278.
Published as Lecture Notes in Computer Science, vol 2004
Abstract
In machine translation and natural language generation, making the wrong word choice from a set of near-synonyms can be imprecise or awkward, or convey unwanted implications. Using Edmonds's model of lexical knowledge to represent clusters of near-synonyms, our goal is to automatically derive a lexical knowledge-base from the Choose the Right Word dictionary of near-synonym discrimination. We do this by automatically classifying sentences in this dictionary according to the classes of distinctions they express. We use a decision-list learning algorithm to learn words and expressions that characterize the classes DENOTATIONAL DISTINCTIONS and ATTITUDE-STYLE DISTINCTIONS. These results are then used by an extraction module to actually extract knowledge from each sentence. We also integrate a module to resolve anaphors and word-to-word comparisons. We evaluate the results of our algorithm for several randomly selected clusters against a manually built standard solution, and compare them with the results of a baseline algorithm.

[Download pdf] (bibtex)

Reconciling fine-grained lexical knowledge and coarse-grained ontologies in the representation of near-synonyms,
Philip Edmonds and Graeme Hirst,
2000
Proceedings of the Workshop on Semantic Approximation, Granularity, and Vagueness, April, Breckenridge CO
Abstract
A machine translation system must be able to adequately cope with near-synonymy, for there are often many slightly different translations available for any source language word that can significantly and differently affect the meaning or style of a translated text. Conventional models of lexical knowledge used in natural-language processing systems are inadequate for representing near-synonyms, because they are unable to represent fine-grained lexical knowledge. We will discuss a new model for representing fine-grained lexical knowledge whose basis is the idea of granularity of representation.

[Download pdf] (bibtex)

Context as a spurious concept,
Graeme Hirst,
2000
Proceedings, Conference on Intelligent Text Processing and Computational Linguistics, pp. 273--287, February, Mexico City, Mexico
Supersedes 1997 version
Abstract
I take issue with AI formalizations of context, primarily the formalization by McCarthy and Buvac, that regard context as an undefined primitive whose formalization can be the same in many different kinds of AI tasks. In particular, any theory of context in natural language must take the special nature of natural language into account and cannot regard context simply as an undefined primitive. I show that there is no such thing as a coherent theory of context simpliciter---context pure and simple---and that context in natural language is not the same kind of thing as context in KR. In natural language, context is constructed by the speaker and the interpreter, and both have considerable discretion in so doing. Therefore, a formalization based on pre-defined contexts and pre-defined `lifting axioms' cannot account for how context is used in real-world language.

[Download pdf] (bibtex)

The importance of subjectivity in computational stylistic assessment,
Melanie Baljko and Graeme Hirst,
1999
Text Technology, 9(1), pp. 5--17, Spring
Abstract
Often, a text that has been written collaboratively does not ``speak with a single voice.'' Such a text is stylistically incongruous --- as opposed to merely stylistically inconsistent, which might or might not be deleterious to the quality of the text. This widespread problem reduces the overall quality of a text and reflects poorly on its authors. We would like to design a facility for revising style that augments the software environments in which collaborative writing takes place, but before doing so, a question must be answered: what is the role of subjectivity in stylistic assessment for a style-revision facility? We describe an experiment designed to measure the agreement between the stylistic assessments performed by a group of subjects based on a free-sort of writing samples. The results show that there is a statistically significant level of agreement between the subjects' assessments and, furthermore, there was a small number of groupings (three) of even more similar stylistic assessments. The results also show the invalidity of using authorship as an indicator of the reader's perceptions of stylistic similarity between the writing samples.

[Download pdf] (bibtex)

What exactly are lexical concepts?,
Graeme Hirst,
1999
Behavioral and Brain Sciences, 22(1), pp. 45--46 , February
(bibtex)

Generating warning instructions by planning accidents and injuries,
Daniel Ansari and Graeme Hirst,
1998
Proceedings, 9th International Workshop on Natural Language Generation, pp. 118--127, August, Niagara-on-the-Lake, Ontario
Abstract
We present a system for the generation of natural language instructions, as are found in instruction manuals for household appliances, that is able to automatically generate safety warnings to the user at appropriate points. Situations in which accidents and injuries to the user can occur are considered at every step in the planning of the normal operation of the device, and these ``injury sub-plans'' are then used to instruct the user to avoid these situations.

[Download pdf] (bibtex)

Lexical chains as representations of context for the detection and correction of malapropisms,
Graeme Hirst and David St-Onge, 1998
In: Christiane Fellbaum (editor), WordNet: An electronic lexical database, Cambridge, MA, The MIT Press, pp. 305--332.
Abstract

Because chains of semantically related words express semantic continuity, such lexical chains can play an important role in the detection of malapropisms. A malapropism is a correctly spelled word that does not fit in the context where it is used because it is the result of a spelling error on a different word that was intended.

We first assume that such a word has much less probability of being inserted in any chain with other words. If this assumption is correct, words that failed to be inserted with other words can be considered as potential malapropisms. A mechanism that generates spelling replacements can then be used to generate replacement candidates. The second assumption is that whenever a spelling replacement can be inserted in a chain with other words, this replacement is likely to be the intended word for which a malapropism has been substituted.

The algorithm proposed here to detect lexical chains uses the on-line thesaurus WordNet to automatically quantify semantic relations between words. Chains identified by the algorithm may have two major problems: over- or under-chaining. Under-chaining---the inability to link a pair of related words---might be caused by an inadequacy of WordNet's set of relations, a lack of connections in WordNet's set of relations, a lack of connections in WordNet, a lack of consistency in the semantic proximity expressed by WordNet's links, and a poor algorithm for chaining. Over-chaining---the linking of two poorly related words---might happen whenever two semantically distant words are close to each other in WordNet's graph. Over-chaining often results in the merging of two chains.

The results of the experiment show the validity of the basic assumptions. However, improvements to the lexical-chaining algorithm are required before the malapropism detection algorithm can be integrated into a commercial spelling checker.


(bibtex)

Generation by selection and repair as a method for adapting text for the individual reader,
Chrysanne DiMarco and Graeme Hirst and Eduard Hovy,
1997
Proceedings of the Flexible Hypertext Workshop (held in conjunction with the 8th ACM International Hypertext Conference, Southampton, April 1997), pp. 36--43, August, Microsoft Research Institute, Macquarie University
Abstract

A recent and growing development in Web applications has been the advent of various tools that claim to ``customize'' access to information on the Web by allowing users to specify the kinds of information they want to receive without having to search for it or sift through masses of irrelevant material. But this kind of customization is really just a crude filtering of raw Web material in which the user simply selects the ``channels'' of information she wishes to receive; this selection of information sources is hardly more ``customization'' than someone deciding to tune their television to a certain station. True customization, or tailoring, of information would be done for the user by a system that had access to an actual model of the user, a profile of the user's interests and characteristics. And such tailoring would involve much more than just selecting streams of basic content: the content of the text, whether for on-line Web page or a paper document, would be carefully selected structured, and presented in the manner best calculated to appeal to a particular individual. Adaptive-hypertext presentation comes closest to achieving this kind of document tailoring, but the current techniques used for adapting the content of a document to a particular user generally only involve some form of selectively showing (or hiding) portions of text or choosing whole variants of larger parts of the document.

If the Web document designer wishes to write and present material in a way that will communicate well with the user, then just displaying the most relevant chunks of information will not be sufficient. For effective communication, both the form and content of the language used in a document should be tailored in rhetorically significant ways to best suit a user's particular personal characteristics and preferences. Ideally, we would have Web-based natural language generation systems that could produce fully customized and customizable documents on demand by individual users, according to a formal user model. As a first step in this direction, we have been investigating applications of our earlier work on pragmatics in natural language processing to building systems for the automated generation of Web documents tailored to the individual reader.


[Download pdf] (bibtex)

Context as a spurious concept,
Graeme Hirst,
1997
AAAI Fall Symposium on Context in Knowledge Representation and Natural Language, November, Cambridge, MA
Superseded by February 2000 version
Abstract
I take issue in this talk with AI formalizations of context, primarily the formalization by McCarthy and Buvac, that regard context as an undefined primitive whose formalization can be the same in many different kinds of AI tasks. In particular, any theory of context in natural language must take the special nature of natural language into account and cannot regard context simply as an undefined primitive. I show that there is no such thing as a coherent theory of context simpliciter---context pure and simple---and that context in natural language is not the same kind of thing as context in KR. In natural language, context is constructed by the speaker and the interpreter, and both have considerable discretion in so doing. Therefore, a formalization based on pre-defined contexts and pre-defined `lifting axioms' cannot account for how context is used in real-world language.

[Download pdf] (bibtex)

Authoring and generating health-education documents that are tailored to the needs of the individual patient,
Graeme Hirst and Chrysanne DiMarco and Eduard Hovy and Kimberley Parsons,
1997
User Modeling: Proceedings of the Sixth International Conference, UM97 (Anthony Jameson and Cécile Paris and Carlo Tasso ed.), pp. 107--118, June, Chia Laguna, Sardinia, Italy, Vienna and New York
Springer Wien New York
Abstract
Health-education documents can be much more effective in achieving patient compliance if they are customized for individual readers. For this purpose, a medical record can be thought of as an extremely detailed user model of a reader of such a document. The HealthDoc project is developing methods for producing health-information and patient-education documents that are tailored to the individual personal and medical characteristics of the patients who receive them. Information from an on-line medical record or from a clinician will be used as the primary basis for deciding how best to fit the document to the patient. In this paper, we describe our research on three aspects of the project: the kinds of tailoring that are appropriate for health-education documents; the nature of a tailorable master document and how it can be created; and the linguistic problems that arise when a tailored instance of the document is to be generated.

[Download pdf] (bibtex)

Detecting stylistic inconsistencies in collaborative writing,
Angela Glover and Graeme Hirst, 1996
In: Mike Sharples and Thea van der Geest (editors), The new writing environment: Writers at work in a world of technology, London, UK, Springer-Verlag.
Abstract

When two or more writers collaborate on a document by each contributing pieces of text, the problem can arise that while each might be an exemplary piece of writing, they do not cohere into a document that speaks with a single voice. That is, they are stylistically inconsistent. But given a stylistically inconsistent document, people often find it hard to articulate exactly where the problems lie. Rather, they feel that something is wrong but can't quite say why.

An example of stylistic inconsistency can be seen in the following sentence, which is from a brochure given to hospital patients who are to undergo a cardiac catheterization. (The parenthesized numbers are ours, to refer to the individual clauses.)

(1) Once the determination for a cardiac catheterization has been made, (2) various tests will need to be performed (3) to properly assess your condition prior to the procedure.

Clause 1 and (to a slightly lesser extent) clause 3 are in medical talk, as if in a formal communication from physician to physician; clause 2 is much more informal, and is expressed in ordinary lay language. The effect of the two styles mixed together in the one sentence is a feeling of incongruity---which was presumably not intended by the author or authors. This example, however, is unusual in its brevity. More often, the problem of inconsistency emerges only over longer stretches of text, especially where the granularity of the multiple authorship is at the paragraph, section, or chapter level. Moreover, while stylistic inconsistencies arise primarily in jointly written documents, we do not exclude the possibility of their occurrence in singly authored texts, especially those where different parts were written at different times or, initially, for different purposes.

Our ultimate goal in this research is to build software that will help with this problem---that will point out stylistic inconsistencies in a document, and perhaps suggest how they can be fixed. In this paper, we report some of our initial explorations and data collection.


(bibtex)

Automatic customization of health-education brochures for individual patient,
Graeme Hirst and Chrysanne DiMarco,
1996
Proceedings, Information Technology and Community Health Conference (ITCH-96), pp. 222--228, November, Victoria, B.C.
Abstract

Many studies have shown that health-education messages and patient instructions are more effective when closely tailored to the particular condition and characteristics of the individual recipient. But in situations where many factors interact -- for example, in explaining the pros and cons of hormone replacement therapy -- the number of different combinations is far too large for a set of appropriately tailored messages to be produced in advance.

The HealthDoc project is presently developing linguistic techniques for producing, on demand, health-education and patient-information brochures that are customized to the medical and personal characteristics of an individual patient.

For each topic, HealthDoc requires a `master document' written by an expert on the subject with the help of a program called an `authoring tool'. The writer decides upon the basic elements of the text -- clauses and sentences -- and the patient conditions under which each element should be included in the output. The program assists the writer in building correctly structured master-document fragments and annotating them with the relationships and conditions for inclusion.

When a clinician wishes to give a patient a particular brochure from HealthDoc, she will select it from a menu and specify the name of the patient. HealthDoc will use information from the patient's on-line medical record to then create and print a version of the document appropriate to that patient, by selecting the appropriate pieces of material and then performing the necessary linguistic operations to combine them into a single, coherent text.


[Download n] (bibtex)

A formal and computational characterization of pragmatic infelicities,
Daniel Marcu and Graeme Hirst,
1996
Proceedings of the Twelfth European Conference on Artificial Intelligence, pp. 587--591, August, Budapest, Hungary
An earlier version of this paper appeared as: Marcu, Daniel and Hirst, Graeme. ``Detecting pragmatic infelicities.'' Working notes, AAAI Symposium on Computational implicature: Computational approaches to interpreting and generating coversational implicature, Stanford University, March 1996 64--70.
Abstract
We study the logical properties that characterize pragmatic inferences and we show that classical understanding of notions such as entailment and defeasibiity is not enough if one wants to explain infelicities that occur when a pragmatic inference is cancelled. We show that infelicities can be detected if a special kind of inference is considered, namely infelicitously defeasible inference. We also show how one can use stratified logic, a linguistically motivated formalism that accommodates indefeasible, infelicitously defeasible, and felicitously defeasible inferences, to reason about pragmatic inferences and detect infelicities associated with utterances. The formalism yeilds an algorithm for detecting infelicities, which has been implemented in Lisp.

[Download pdf] (bibtex)

Language use in context,
Janyce M. Wiebe and Graeme Hirst and Diane Horton,
1996
Communications of the ACM, 39(1), pp. 102--111, January
Abstract

Any text or dialogue establishes a linguistic context within which subsequent utterances must be understood. And beyond the linguistic context is the participatory context. A speaker or writer directs an utterance or text toward a hearer or reader with a particular purpose --- to inform, to amuse, to collaborate in a task, perhaps. The form and content of the utterance are chosen accordingly, and the listener or reader must infer the underlying intent as part of their understanding.

This article explores recent research on language use in context going beyond sentence boundariees and processing discourse --- treating texts or dialogues as whole units composed of interrelated parts, not merely as sequences of isolated sentences. The article discusses the comprehension and production of language, looking at both texts and dialogues. A text to be processed might be, for example, a newspaper or magazine article being translated into another language or whose content is to be ``understood'' or abstracted in an information storage and retrieval system. A dialogue to be processed might be a conversation, spoken or typed, between a human and a computer, in service of some collaborative task. Many of the problems described here occur in both kinds of discourse.

The underlying goal of the research described in this special section of Communications of the ACM is to move beyond `toy' systems and come to grips with `real language'. While the research described in the other articles in this section focuses on robustly processing massive amounts of text, the work described here focuses on understanding, in computational terms, the complexities and subtleties of language as people really use it.

In an article of this length, we cannot hope to describe all of the recent important work addressing language use in context. For example, we will not cover pronoun resolution, ellipsis, metaphor, or many aspects of belief ascription.


[Download pdf] (bibtex)

HealthDoc: Customizing patient information and health education by medical condition and personal characteristics,
Chrysanne DiMarco and Graeme Hirst and Leo Wanner and John Wilkinson,
1995
Workshop on Artificial Intelligence in Patient Education, August, Glasgow Scotland
Abstract
The HealthDoc project aims to provide a comprehensive approach to the customization of patient-information and health-education materials through the development of sophisticated natural language generation systems. We adopt a model of patient education that takes into account patient information ranging from simple medical data to complex cultural beliefs, so that our work provides both an impetus and testbed for research in multicultural health communication. We propose a model of language generation, `generation by selection and repair', that relies on a `master-document' representation that pre-determines the basic form and content of a text, yet is amenable to editing and revision for customization. The implementation of this model has so far led to the design of a sentence planner that integrates multiple complex planning tasks and a rich set of ontological and linguistic knowledge sources.

[Download pdf] (bibtex)

Collaborating on referring expressions,
Peter A. Heeman and Graeme Hirst,
1995
Computational Linguistics, 21(3), pp. 351--382, September
Abstract
This paper presents a computational model of how conversational participants collaborate in order to make a referring action successful. The model is based on the view of language as goal-directed behavior. We propose that the content of a referring expression can be accounted for by the planning paradigm. Not only does this approach allow the processes of building referring expressions and identifying their referents to be captured by plan construction and plan inference, it also allows us to account for how participants clarify a referring expression by using meta-actions that reason about and manipulate the plan derivation that corresponds to the referring expression. To account for how clarification goals arise and how inferred clarification plans affect the agent, we propose that the agents are in a certain state of mind, and that this state includes an intention to achieve the goal of referring and a plan that the agents are currently considering. It is this mental state that sanctions the adoption of goals and the acceptance of inferred plans, and so acts as a link between understanding and generation.

[Download pdf] (bibtex)

Near-synonymy and the structure of lexical knowledge,
Graeme Hirst,
1995
Working notes, AAAI Symposium on Representation and Acquisition of Lexical Knowledge: Polysemy, Ambiguity, and Generativity, pp. 51--56, March, Stanford University
Abstract
The need to deal adequately with near-synonymy in tasks such as lexical choice is the basis for two alternatives to conventional models of lexical knowledge: a Saussurean approach and a prototype-theory approach. I discuss these approaches, showing that the latter is troublesome but the former is likely to succeed, and also discuss the consequences for computational models of lexical knowledge.

[Download pdf] (bibtex)

A uniform treatment of pragmatic inferences in simple and complex utterances and sequences of utterances,
Daniel Marcu and Graeme Hirst,
1995
Proceedings, 33rd Annual Meeting, Association for Computational Linguistics, pp. 144--150, June, Cambridge, MA
Abstract
Drawing appropriate defeasible inferences has been proven to be one of the most pervasive puzzles of natural language processing and a recurrent problem in pragmatics. This paper provides a theoretical framework, called stratified logic, that can accommodate defeasible pragmatic inferences. The framework yields an algorithm that computes the conversational conventional, scalar, clausal, and normal state implicatures; and the presuppositions that are associated with utterances. The algorithm applies equally to simple and complex utterances and sequences of utterances.

[Download ps] (bibtex)

The repair of speech act misunderstandings by abductive inference,
Susan McRoy and Graeme Hirst,
1995
Computational linguistics, 21(4), pp. 435--478, December
Abstract

During a conversation, agents can easily come to have different beliefs about the meaning or discourse role of some utterance. Participants normally rely on their expectations to determine whether the conversation is proceeding smoothly: if nothing unusual is detected, then understanding is presumed to occur. Conversely, when an agent says something that is inconsistent with another's expectations, then the other agent may change her interpretation of an earlier turn and direct her response to the reinterpretation accomplishing what is known as a fourth-turn repair.

Here we describe an abductive account of the interpretation of speech acts and the repair of speech act misunderstandings. Our discussion considers the kinds of information that participants use to interpret an utterance, even if it is inconsistent with their beliefs. It also considers the information used to design repairs. We describe a mapping between the utterance-level forms (semantics) and discourse-level acts (pragmatics), and a relation between the discourse acts and the beliefs and intentions that they express. We specify for each discourse act, the acts that might be expected, if the hearer has understood the speaker correctly. We also describe our account of belief and intention, distinguishing the beliefs agents actually have from the ones they act as if they have when they perform a discourse act. To support repair, we model how misunderstandings can lead to unexpected actions and utterances and describe the processes of interpretation and repair. To illustrate the approach we show how it accounts for an example repair.


[Download pdf] (bibtex)

Repairing conversational misunderstandings and non-understandings,
Graeme Hirst and Susan McRoy and Peter A. Heeman and Philip Edmonds and Diane Horton,
1994
Speech communication, 15(3--4), pp. 213--229, December
Abstract
Participants in a discourse sometimes fail to understand one another but, when aware of the problem, collaborate upon or negotiate the meaning of a problematic utterance. To address nonunderstanding, we have developed two plan-based models of collaboration in identifying the correct referent of a description: one covers situations where both conversants know of the referent, and the other covers situations such as direction-giving, where the recipient does not. In the models conversants use the mechanisms of refashioning, suggestion, and elaboration, to collaboratively refine a referring expression until it is successful. To address misunderstanding, we have developed a model that combines intentional and social accounts of discourse to support the negotiation of meaning. The approach extends intentional accounts by using expectations deriving from social conventions in order to guide interpretation. Reflecting the inherent symmetry of the negotiation of meaning, all our models can act as both speaker and hearer, and can play both the role of the conversant who is not understood or misunderstood and the role of the conversant who fails to understand.

[Download pdf] (bibtex)

Natural language analysis by computer,
Graeme Hirst, 1994
In: R.E. Asher (editor), The encyclopedia of language and linguistics, Volume 5, Oxford, Pergamon Press, pp. 2730--2736 .
(bibtex)

Natural language processing,
C.S. Mellish and Graeme Hirst, 1994
In: R.E. Asher (editor), The encyclopedia of language and linguistics, Volume 5, Oxford, Pergamon Press, pp. 2748 .
(bibtex)

Reference and anaphor resolution in natural language processing,
Graeme Hirst, 1994
In: R.E. Asher (editor), The encyclopedia of language and linguistics, Volume 7, Oxford, Pergamon Press, pp. 3487--3489 .
(bibtex)

Semantic interpretation in natural language processing,
Graeme Hirst, 1994
In: R.E. Asher (editor), The encyclopedia of language and linguistics, Volume 7, Oxford, Pergamon Press, pp. 3801--3804 .
(bibtex)

An implemented formalism for computing linguistic presuppositions and existential commitments,
Daniel Marcu and Graeme Hirst,
1994
Proceedings, International Workshop on Computational Semantics, pp. 141--150, December, Tilburg, The Netherlands
Abstract
We rely on the strength of linguistic and philosophical perspectives in constructing a framework that offers a unified explanation for presuppositions and existential commitment. We use a rich ontology and a set of methodological principles that embed the essence of Meinong's philosophy and Grice's conversational principles into a stratified logic under an unrestricted interpretation of the quantifiers. The result is a logical formalism that yields a tractable computational method that uniformly calculates all the presuppositions of a given utterance including the existential ones.

[Download ps] (bibtex)

Parsing as an energy minimization problem,
Bart Selman and Graeme Hirst, 1994
In: Geert Adriaens and Udo Hahn (editors), Parallel natural language processing, Norwood, NJ, Ablex Publishing, pp. 238--254.
Abstract

We show how parsing can be formulated as an energy minimization problem. We describe our model in terms of a connectionist network. In this network we use a parallel computation scheme similar to the one used in the Boltzmann machine and apply simulated annealing, a stochastic optimization method. We show that at low temperatures the time average of the visited states at thermal equilibrium represents the correct parse of the input sentence.

Our treatment of parsing as an energy minimization problem enables us to formulate general rules for the setting of weights and thresholds in the network. We also consider an alternative to the commonly used energy function. Using this new function, one can choose a provably correct set of weights and thresholds for the network and still have an acceptable rate of convergence in the annealing process.

The parsing scheme is built from a small set of connectionist primitives that represent the grammar rules. These primitives are linked together using pairs of computing units that behave like discrete switches. These units are used as binders between concepts represented in the network. They can be linked in such a way that individual rules can be selected from a collection of rules, and are very useful in the construction of connectionist schemes for any form of rule-based processing.


(bibtex)

Usage notes as the basis for a representation of near-synonymy for lexical choice,
Chrysanne DiMarco and Graeme Hirst,
1993
>Proceedings, 9th annual conference of the University of Waterloo Centre for the New Oxford English Dictionary and Text Research, pp. 33--43, September, Oxford
Abstract
The task of choosing between lexical near-equivalents in text generation requires the kind of knowledge of fine differences between words that is typified by the usage notes of dictionaries and books of synonym discrimination. These usage notes follow a fairly standard pattern, and a study of their form and content shows the kinds of differentiae adduced in the discrimination of near-synonyms. For appropriate lexical choice in text generation and machine translation systems, it is necessary to develop the concept of formal `computational usage notes', which would be part of the lexical entries in a conceptual knowledge base. The construction of a set of `computational usage notes' adequate for text generation is a major lexicographic task of the future.

[Download pdf] (bibtex)

A computational theory of goal-directed style in syntax,
Chrysanne DiMarco and Graeme Hirst,
1993
Computational Linguistics, 19(3), pp. 451--499, September
Abstract

The problem of style is highly relevant to computational linguistics, but current systems deal only superficially, if at all, with subtle but significant nuances of language. Expressive effects, together with their associated meaning, contained in the style of a text are lost to analysis and absent from generation.

We have developed an approach to the computational treatment of style that is intended to eventually incorporate three selected components---lexical syntactic, and semantic. In this paper, we concentrate on certain aspects of syntactic style. We have designed and implemented a computational theory of goal-directed stylistics that can be used in various applications, including machine translation, second-language instruction and natural language generation.

We have constructed a vocabulary of style that contains both primitive and abstract elements of style. The primitive elements describe the stylistic effects of individual sentence components. These elements are combined into patterns that are described by a stylistic meta-language, the abstract elements, that define the concordant and discordant stylistic effects common to a group of sentences. Higher-level patterns are built from the abstract elements and associated with specific stylistic goals, such as clarity or concreteness. Thus, we have defined rules for a syntactic stylistic grammar at three interrelated levels of description: primitive elements, abstract elements, and stylistic goals. Grammars for both English and French have been constructed, using the same vocabulary and the same development methodology. Parsers that implement these grammars have also been built.

The stylistic grammars codify aspects of language that were previously defined only descriptively. The theory is being applied to various problems in which the form of an utterance conveys an essential part of meaning and so must be precisely represented and understood.


[Download pdf] (bibtex)

A goal-based grammar of rhetoric,
Chrysanne DiMarco and Graeme Hirst and Marzena Makuta-Giluk,
1993
Association for Computational Linguistics, Workshop on Intentionality and Structure in Discourse Relations, pp. 15--18, June, Ohio State University
[Download pdf] (bibtex)

The semantic and stylistic differentiation of synonyms and near-synonyms,
Chrysanne DiMarco and Graeme Hirst and Manfred Stede,
1993
AAAI Spring Symposium on Building Lexicons for Machine Translation, pp. 114--121, March, Stanford CA
Abstract

If we want to describe the action of someone who is looking out a window for an extended time, how do we choose between the words gazing, staring, and peering? What exactly is the difference between an argument, a dispute, and a row? In this paper, we describe our research in progress on the problem of lexical choice and the representations of world knowledge and of lexical structure and meaning that the task requires. In particular, we wish to deal with nuances and subtleties of denotation and connotation---shades of meaning and of style---such as those illustrated by the examples above.

We are studying the task in two related contexts: machine translation and the generation of multilingual text from a single representation of content. In the present paper, we concentrate on issues in lexical representation. We describe a methodology, based on dictionary usage notes, that we are using to discover the dimensions along which similar words can be differentiated, and we discuss a two-part representation for lexical differentiation.


[Download pdf] (bibtex)

Not all reflexive reasoning is deductive,
Graeme Hirst and Dekai Wu,
1993
Behavioral and brain sciences, 16(3), pp. 462--463 , September
(bibtex)

Knowledge about planning: On the meaning and representation of plan decomposition,
Diane Horton and Graeme Hirst,
1993
AAAI Spring Symposium on Reasoning about Mental States: Formal Theories and Applications, pp. 74--78, March, Stanford, CA
Abstract
Plan-related inference has been one of the most-studied problems in Artificial Intelligence. Pollack (1990) has argued that a plan should be seen as a set of mental attitudes towards a structured object. Although the objects of these attitudes have received far more attention to date than the attitudes themselves, little has been said about the exact meaning of one of their key components -- the decomposition relation. In developing a plan representation for our work on plan misinference in dialogue we have explored two of the possible meanings, their implications, and the relationship between them. These issues underly the literature, and in this paper, we step back and discuss them explicitly.

[Download pdf] (bibtex)

Misunderstanding and the negotiation of meaning,
Susan McRoy and Graeme Hirst,
1993
AAAI Fall Symposium on Human--Computer Collaboration, pp. 57--62, October, Raleigh, NC
Abstract
Most computational accounts of dialogue have assumed that once listeners have interpreted an utterance, they never change this interpretation. However, human interactions routinely violate this assumption. This is because people are necessarily limited in how much information they can make explicit. As a result, misunderstandings might occur---discourse participants might differ in their beliefs about the meaning of what has been said or about its relevance to the discourse. To address this possibility, participants rely in part on their expectations to determine whether they have understood each other. If a speaker fails to notice anything unusual she may assume that the conversation is proceeding smoothly. But if she hears something that seems inconsistent with her expectations, she may hypothesize that there has been a misunderstanding and attempt to reinterpret part of the discourse, initiating a repair. In other words, speakers' inferences about discourse are nonmonotonic, because speakers may learn things that conflict with their earlier reasoning and cause them to re-evaluate what happened before. Because their utterances can only make a limited amount of information explicit, discourse participants' can only surmise---abduce---each other's intentions. They must reason from observed utterances to causes or goals that might account for them.

[Download pdf] (bibtex)

Abductive explanations of dialogue misunderstandings,
Susan McRoy and Graeme Hirst,
1993
Proceedings, 6th Conference of the European Chapter of the Association for Computational Linguistics, pp. 277--286, April, Utrecht, The Netherlands
Abstract
Most computational accounts of dialogue have assumed that once listeners have interpreted an utterance, they never change this interpretation. However, human interactions routinely violate this assumption. This is because people are necessarily limited in how much information they can make explicit. As a result, misunderstandings might occur---discourse participants might differ in their beliefs about the meaning of what has been said or about its relevance to the discourse. To address this possibility, participants rely in part on their expectations to determine whether they have understood each other. If a speaker fails to notice anything unusual she may assume that the conversation is proceeding smoothly. But if she hears something that seems inconsistent with her expectations, she may hypothesize that there has been a misunderstanding and attempt to reinterpret part of the discourse, initiating a repair. In other words, speakers' inferences about discourse are nonmonotonic, because speakers may learn things that conflict with their earlier reasoning and cause them to re-evaluate what happened before. Because their utterances can only make a limited amount of information explicit, discourse participants' can only surmise---abduce---each other's intentions. They must reason from observed utterances to causes or goals that might account for them.

[Download pdf] (bibtex)

Mixed-depth representations for natural language text,
Graeme Hirst and Mark Ryan, 1992
In: Paul S. Jacobs (editor), Text-based intelligent systems, Hillsdale, NJ, Lawrence Erlbaum Associates, pp. 59--82.
Abstract

Intelligent text-based systems will vary as to the degree of difficulty of the texts they deal with. Some may have a relatively easy time with texts for which fairly superficial processes will get useful results, such as, say, The New York Times or Julia Child's Favorite Recipes. But many systems will have to work on more difficult texts. Often, it is the complexity of the text that makes the system desirable in the first place. It is for such systems that we need to think about making the deeper methods that are already studied in AI and computational linguistics more robust and suitable for processing long texts without interactive human help. The dilemma is that on one hand, we have the limitations of raw text databases and superficial processing methods; on the other we have the difficulty of deeper methods and conceptual representations. Our proposal here is to have the best of both, and accordingly we develop the notion of a heterogeneous, or mixed, type of representation.

In our model, a text base permits two parallel representations of meaning: the text itself, for presentation to human users, and a conceptual encoding of the text, for use by intelligent components of the system. The two representations are stored in parallel; that is, there are links between each unit of text (a sentence or paragraph in most cases) and the corresponding conceptual encoding. This encoding could be created en masse when the text was entered into the system. But if it is expected that only a small fraction of the text base will ever be looked at by processes that need the conceptual representations, then the encoding could be performed on each part of the text as necessary for inference and understanding to answer some particular request. The results could then be stored so that they don't have to be redone if the same area of the text is searched again. Thus, a text would gradually grow its encoding as it continues to be used. (And the work will never be done for texts or parts of texts that are never used.)

So far, this is straightforward. But we can go one step further. The encoding itself may be deep or shallow at different places, depending on what happened to be necessary at the time it was generated---or on what was possible. Or, to put it a different way, we can view natural-language text and AI-style knowledge representations as two ends of a spectrum.


[Download pdf] (bibtex)

Word sense and case slot disambiguation,
Graeme Hirst and Eugene Charniak,
1992
Proceedings of the Second National Conference on Artificial Intelligence (AAAI-82), pp. 95--98, August, Pittsburgh
[Download pdf] (bibtex)

An intelligent computer assistant for stylistic instruction,
Julie Payette and Graeme Hirst,
1992
Computers and the Humanities, 26(2), pp. 87--102
Abstract

This article describes an intelligent computer-assisted language instruction system that is designed to teach principles of syntactic style to students of English. Unlike conventional style checkers, the system performs a complete syntactic analysis of its input, and takes the student's stylistic intent into account when providing a diagnosis. Named STASEL, for Stylistic Treatment At the Sentence Level, the system is specifically developed for the teaching of style, and makes use of artificial intelligence techniques in natural language processing to analyze free-form input sentences interactively.

An important contribution of STASEL is its ability to provide stylistic guidance according to the specific writing goals of clarity and conciseness. In an attempt to remedy some of the deficiencies of existing instructional software, STASEL's design demonstrates how stylistic instruction can be effectively computerized, while laying the groundwork for the creation of intelligent tutoring systems for teaching writing.


(bibtex)

Knowledge and knowledge acquisition in the computational context,
Stephen Regoczei and Graeme Hirst, 1992
In: Robert R. Hoffmann (editor), The psychology of expertise: Cognitive research and empirical AI, New York, Springer-Verlag, pp. 12--25.
Abstract

The enterprise of artificial intelligence has given rise to a new class of software systems. These software systems, commonly called expert systems, or knowledge-based systems, are distinguished in that they contain, and can apply, knowledge or some particular skill or expertise in the execution of a task. These systems embody, in some form, humanlike expertise. The construction of such software therefore requires that we somehow get hold of the knowledge and transfer it into the computer, representing it in a form usable by the machine. This total process has come to be called knowledge acquisition (KA). The necessity for knowledge representation---the describing or writing down of the knowledge in machine-usable form---underlies and shapes the whole KA process and the development of expert system software.

Concern with knowledge is nothing new, but some genuinely new issues have been introduced by the construction of expert systems. The processes of KA and KR are envisaged as the means through which software is endowed with expertise-producing knowledge. This vision however, is problematic. The connection between knowledge and expertise itself is not clearly understood, though the phrases ``knowledge-based system'' and ``expert system'' tend to be used interchangeably, as if all expertise were knowledge-like. This haziness about basics also leads to the unrealistic expectation that the acquisition of knowledge in machine-usable form will convey powers of expert performance upon computer software. These assumptions are questionable. For a deeper understanding, we must clarify the concepts of knowledge acquisition and knowledge representation, and the concept of knowledge itself, as they are used in the computer context. That is the first goal of this chapter. The second goal is to explicate the issues involved in KA and show how they are amenable to research by experimental or cognitive psychologists.

The chapter will be organized as follows. In the second section we will set the stage for cross-disciplinary discussion by sketching the history of artificial intelligence and KA. In the third section, we try to answer the question, What is knowledge? by examining the various approaches that people have taken in trying to grasp the nature of knowledge. In the fourth section, we discuss the KA problem. In particular, we present a model of the KA process to reconcile and pull together the various approaches to KA that are found in the literature. This basic model of KA will be used in the commentaries chapter (chapter 17 of this book) to compare the contributions to this volume. In the present introductory chapter, we outline some crucial current issues, especially those that could be fruitfully addressed by experimental psychologists, and as a conclusion we try to point to some future directions for research.


(bibtex)

Focus shifts as indicators of style in paragraphs,
Mark Ryan and Chrysanne DiMarco and Graeme Hirst,
1992
Department of Computer Science, University of Waterloo, June
In DiMarco, Chrysanne et al, Four papers on computational stylistics.
[Download pdf] (bibtex)

A case-based representation of legal text for conceptual retrieval,
Judith Dick and Graeme Hirst,
1991
Workshop on Language and Information Processing, American Society for Information Science, October, Washington DC
[Download pdf] (bibtex)

Intelligent text retrieval,
Judith Dick and Graeme Hirst,
1991
Text retrieval: Workshop notes from the Ninth National Conference on Artificial Intelligence (AAAI-91), July, Anaheim CA
[Download pdf] (bibtex)

Does Conversation Analysis have a role in computational linguistics?,
Graeme Hirst,
1991
Computational linguistics, 17(2), pp. 211--227, June
[Review of: Luff, Paul; Gilbert, Nigel; and Frohlich, David (editors). Computers and conversation London:Academic Press, 1990]
Abstract

Computational linguists are a vicious warrior tribe. Unconstrained by traditional disciplinary boundaries, they invade and plunder neighboring disciplines for attractive theories to incorporate into their own. The latest victim of these attacks is sociology---in particular, a branch of ethnomethodological sociolinguistics called Conversation Analysis.

The book reviewed here, Computers and conversation, is the first significant result of this. It is based on a symposium held at the University of Surrey in September 1989. Its purpose is to show what CA has to offer to the study of human--computer interaction especially interaction in natural language. In this review article, I use the book to explore at some length the question of whether Conversation Analysis has indeed anything to say to computational linguistics.


[Download pdf] (bibtex)

Existence assumptions in knowledge representation,
Graeme Hirst,
1991
Artificial intelligence, 49, pp. 199--242, May
(This issue of the journal was reprinted as: Brachman, Ronald J.; Levesque Hector J.; and Reiter, Raymond (editors). Knowledge representation. Cambridge, MA: The MIT Press, 1992.)
Abstract

If knowledge representation formalisms are to be suitable for semantic interpretation of natural language, they must be more adept with representations of existence and non-existence than they presently are. Quantifiers must sometimes range over non-existent entities. I review the philosophical background, including Anselm and Kant, and exhibit some ontological problems that natural language sentences pose for knowledge representation. The paraphrase methods of Russell and Quine are unable to deal with many of the problems.

Unfortunately, the shortcomings of the Russell--Quine ontology are reflected in most current knowledge representation formalisms in AI. Several alternatives are considered, including some intensional formalisms and the work of Hobbs, but all have problems. Free logics and possible worlds don't help either. But useful insights are found in the Meinongian theory of Parsons, in which a distinction between nuclear and extranuclear kinds of predicates is made and used to define a universe over which quantification ranges. If this is combined with a naive ontology, with about eight distinct kinds of existence, a better approach to the representation of non-existence can be developed within Hobbs's basic formalism.


[Download pdf] (bibtex)

Planning the future of natural language research (even in Canada),
Graeme Hirst,
1991
Canadian artificial intelligence, Number 26, pp. 10--13, February
Abstract
This talk is a cynical and offensive assessment of the state of research in Canada in 1990 on natural language understanding and computational linguistics.

[Download ps] (bibtex)

Discrepancies in discourse models and miscommunication in conversation,
Diane Horton and Graeme Hirst,
1991
AAAI Fall Symposium on Discourse Structure in Natural Language Understanding and Generation, pp. 31--32 , November, Pacific Grove, CA
(bibtex)

An abductive account of repair in conversation,
Susan McRoy and Graeme Hirst,
1991
AAAI Fall Symposium on Discourse Structure in Natural Language Understanding and Generation, pp. 52--57 , November, Pacific Grove, CA
(bibtex)

Repairs in communication are abductive inferences,
Susan McRoy and Graeme Hirst,
1991
AAAI Fall Symposium on Knowledge and Action at the Social and Organizational Levels, pp. 89--91 , November, Pacific Grove, CA
(bibtex)

Lexical cohesion, the thesaurus, and the structure of text,
Jane Morris and Graeme Hirst,
1991
Computational linguistics, 17(1), pp. 21--48, March
Abstract
In text, lexical cohesion is the result of chains of related words that contribute to the continuity of lexical meaning. These lexical chains are a direct result of units of text being ``about the same thing'', and finding text structure involves finding units of text that are about the same thing. Hence, computing the chains is useful, since they will have a correspondence to the structure of the text. Determining the structure of text is an essential step in determining the deep meaning of the text. In this paper, a thesaurus is used as the major knowledge base for computing lexical chains. Correspondences between lexical chains and structural elements are shown to exist. Since the lexical chains are computable, and exist in non-domain-specific text, they provide a valuable indicator of text structure. The lexical chains also provide a semantic context for interpreting words, concepts, and sentences.

[Download pdf] (bibtex)

Computer-assisted instruction in syntactic style,
Julie Payette and Graeme Hirst,
1991
Proceedings, ACH/ALLC '91, ``Making connections'' [International Conference on Computers and the Humanities], pp. 333-340 , March, Tempe, AZ
(bibtex)

The corporation as mind: Lessons for AI,
Stephen Regoczei and Graeme Hirst,
1991
AAAI Fall Symposium on Knowledge and Action at the Social and Organizational Levels, pp. 95--97, November, Pacific Grove, CA
Abstract

The most successful example of distributed artificial intelligence ever constructed is the modern corporation. It is our view that a corporation is an instance of AI---a generalized, distributed AI system. We believe that this more general view carries lessons for the construction of computer-based intelligence.

We use the term corporation to include not only large companies, but also similar organizations such as government departments. Such organizations are typically composed of small management groups that, in turn, are organized into larger sections or divisions. The result is usually a tree structure, or something close to that.

The hardware from which this structure is built is of two kinds: equipment and people. The equipment includes all the corporation's buildings, machinery, furniture, and so on. The people are its employees. But, we submit, the intelligence in the corporation lies not so much in the people as in the other components of its hierarchy. Each management group, each corporate division, and, indeed, the corporation itself may be regarded as an abstract intelligent agent: an entity with its own `mind', a mind that the other components influence through symbolic communication.


[Download pdf] (bibtex)

An intelligent CALI system for grammatical error diagnosis,
Mark Catt and Graeme Hirst,
1990
Computer Assisted Language Learning, 3, pp. 3--26, November
Abstract

This paper describes an approach to computer-assisted language instruction based on the application of artificial intelligence technology to grammatical error diagnosis. We have developed a prototype system, Scripsi, capable of recognising a wide range of errors in the writing of language learners. Scripsi not only detects ungrammaticality, but hypothesizes its cause and provides corrective information to the student. These diagnostic capabilities rely on the application of a model of the learner's linguistic knowledge.

Scripsi operates interactively, accepting the text of the student's composition and responding with diagnostic information about its grammatical structure. In contrast to the narrowly defined limits of interaction available with automated grammatical drills, the framework of interactive composition provides students with the opportunity to express themselves in the language being learned.

Although Scripsi's diagnostic functions are limited to purely structural aspects of written language, the way is left open for the incorporation of semantic processing. The design of Scripsi is intended to lay the groundwork for the creation of intelligent tutoring systems for second language instruction. The development of such expertise will remedy many of the deficiencies of existing technology by providing a basis for genuinely communicative instructional tools --- computerised tutors capable of interacting linguistically with the student.

The research is based on the assumption that the language produced by the language learner, ``learner language'', differs in systematic ways from that of the native speaker. In particular, the learner's errors can be attributed primarily to two causes: the operation of universal principles of language acquisition and the influence of the learner's native language. A central concern in the design of Scripsi has been the incorporation of a psychologically sound model of the linguistic competence of the second language learner.


[Download pdf] (bibtex)

Accounting for style in machine translation,
Chrysanne DiMarco and Graeme Hirst,
1990
Third International Conference on Theoretical Issues in Machine Translation, June, Austin TX
Abstract

A significant part of the meaning of any text lies in the author's style. Different choices of words and syntactic structure convey different nuances in meaning, which must be carried through in any translation if it is to be considered faithful. Up to now, machine translation systems have been unable to do this. Subtleties of style are simply lost to current machine-translation systems.

The goal of the present research is to develop a method to provide machine-translation systems with the ability to understand and preserve the intent of an author's stylistic characteristics. Unilingual natural language understanding systems could also benefit from an appreciation of these aspects of meaning. However, in translation, style plays an additional role, for here one must also deal with the generation of appropriate target-language style.

Consideration of style in translation involves two complementary, but sometimes conflicting, aims:

  • The translation must preserve, as much as possible, the author's stylistic intent --- the information conveyed through the manner of presentation.
  • But it must have a style that is appropriate and natural to the target language.

The study of comparative stylistics is, in fact, guided by the recognition that languages differ in their stylistic approaches: each has its own characteristic stylistic preferences. The stylistic differences between French and English are exemplified by the predominance of the pronominal verb in French. This contrast allows us to recognize the greater preference of English for the passive voice:

  • (a) Le jambon se mange froid.
    (b) Ham is eaten cold.

Such preferences exist at the lexical, syntactic, and semantic levels, but reflect differences in the two languages that can be grouped in terms of more-general stylistic qualities. French words are generally situated at a higher level of abstraction than that of the corresponding English words which tend to be more concrete (Vinay and Darbelnet 1958, 59). French aims for precision while English is more tolerant of vagueness. (Duron 1963 109).

So, a French source text may be abstract and very precise in style, but the translated English text should be looser and less abstract, while still retaining the author's stylistic intent. Translators use this kind of knowledge about comparative stylistics as they clean up raw machine-translation output, dealing with various kinds of stylistic complexities.


(bibtex)

Mixed-depth representations for natural language text,
Graeme Hirst,
1990
AAAI Spring Symposium on Text-Based Intelligent Systems, pp. 25--29 , March, Stanford
(bibtex)

Planning the future of natural language research (even in Canada),
Graeme Hirst,
1990
May Invited talk at the Conference of the Canadian Society for Computational Studies of Intelligence
Abstract
This is the script of the original talk.

[Download ps] (bibtex)

A frame-based semantics for focusing subjuncts,
Dan Lyons and Graeme Hirst,
1990
Proceedings of the 28th Annual Meeting, Association for Computational Linguistics, pp. 54--61, June, Pittsburgh, PA
Abstract
A compositional semantics for focusing subjuncts---words such as only, even, and also---is developed from Rooth's theory of association with focus. By adapting the theory so that it can be expressed in terms of a frame-based semantic formalism, a semantics that is more computationally practical is arrived at. This semantics captures progmatic subtleties by incorporating a two-part representation, and recognizes the contribution of intonation to meaning.

[Download pdf] (bibtex)

Race-based parsing and syntactic disambiguation,
Susan McRoy and Graeme Hirst,
1990
Cognitive science, 14(3), pp. 313--353, July--September
Abstract
We present a processing model that integrates some important psychological claims about the human sentence-parsing mechanism; namely, that processing is influenced by limitations on working memory and by various syntactic preferences. The model uses time-constraint information to resolve conflicting preferences in a psychologically plausible way. The starting point for this proposal is the Sausage Machine model. (Fodor and Frazier, 1980; Frazier and Fodor, 1978). From there, we attempt to overcome the original model's dependence on ad hoc aspects of its grammar, and its omission of verb-frame preferences. We also add mechanisms for lexical disambiguation and semantic processing in parallel with syntactic processing.

[Download pdf] (bibtex)

The meaning triangle as a tool for the acquisition of abstract, conceptual knowledge,
Stephen Regoczei and Graeme Hirst,
1990
International journal of man--machine studies, 33(5), pp. 505--520, November
Previously published in Proceedings, 3rd Workshop on Knowledge Acquisition for Knowledge-Based Systems, Banff, November 1988. (Also published as technical report CSRI-211, Computer Systems Research Institute, University of Toronto, May 1988)
Abstract
The meaning triangle is presented as a useful diagramming tool for organizing knowledge in the informant-analyst interaction-based, natural language-mediated knowledge acquisition process. In concepts-oriented knowledge acquisition, the knowledge explication phase dominates. During the conceptual analysis process, it is helpful to separate verbal conceptual, and referent entities. Diagramming these entities on an agent-centred meaning triangle clarifies for both informant and analyst the ontological structure that underlies the discourse and the creation of domains of discourse.

[Download pdf] (bibtex)

Computational models of ambiguity resolution,
Graeme Hirst, 1989
In: David S. Gorfein (editor), Resolving semantic ambiguity (Cognitive science series), Springer-Verlag, pp. 255--275.
Abstract

Research on computer programs for understanding natural language can be both a consumer of psycholinguistic research on ambiguity and a contributor to such work. The fields have different ways of thinking, different concerns, and different criteria as to what counts as a result. I will show that, nevertheless, both fields may profit from cooperation, by giving examples from my own work and from that of some of my students showing how issues raised or results obtained in one field can be of use in the other. The examples will include computer systems for lexical, structural, and thematic disambiguation that were inspired by recent psycholinguistic work. I will also show computational work on ambiguities of description attribution and other ``large ambiguities'' that raise, I will argue interesting problems for psycholinguists.


(bibtex)

Ontological assumptions in knowledge representation,
Graeme Hirst,
1989
Proceedings, First International Conference on Principles of Knowledge Representation and Reasoning, pp. 157--169, May, Toronto
San Mateo, CA: Morgan Kaufmann Publishers
Abstract

If knowledge representation formalisms are to be suitable for semantic interpretation of natural language, they must be more adept with representations of existence and non-existence than they presently are. I review the philosophical background, and exhibit some ontological problems for KR. I then look at the shortcomings of current approaches, including several intensional formalisms and the work of Hobbs. The Meinongian theory of Parsons is considered. Lastly, I present a naïve ontology for knowledge representation, identifying about nine distinct kinds of existence.


(bibtex)

Sortal analysis with SORTAL, a software assistant for knowledge acquisition,
Stephen Regoczei and Graeme Hirst,
1989
Proceedings, Fourth Workshop on Knowledge Acquisition for Knowledge-Based Systems, October, Banff
Also published as technical report CSRI-232, Computer Systems Research Institute, University of Toronto, August 1989
Abstract

SORTAL is a software assistant for performing meaning-triangle-based sortal analysis. This paper describes its architecture and possible implementations. Conceptual analysis and conceptual modelling are central components of the informant-and-analyst-based, natural language-mediated knowledge acquisition process, but focusing on concepts is not enough. The ``aboutness'' of the language used in the interview forces the analyst to recognize distinctions between words, concepts, referents, and cogniting agents. Creating frame-like representations for agent-centred meaning triangles, as well as updating ontologies, keeping track of multiple domains of discourse, and the creation of knowledge bases for use in other systems are tasks that can be assisted by a software tool such as SORTAL. We sketch the requirements for such an assistant give examples of its operation, and address implementation issues.


[Download pdf] (bibtex)

Stylistic grammars in language translation,
Chrysanne DiMarco and Graeme Hirst,
1988
Proceedings, 12th International conference on computational linguistics (COLING-88), pp. 148--153, August, Budapest
Abstract

We are developing stylistic grammars to provide the basis for a French and English stylistic parser. Our stylistic grammar is a branching stratificational model, built upon a foundation dealing with lexical, syntactic, and semantic stylistic realizations. Its central level uses a vocabulary of constituent stylistic elements common to both English and French, while the top level correlates stylistic goals, such as clarity and concreteness, with patterns of these elements.

Overall, we are implementing a computational schema of stylistics in French-to-English translation. We believe that the incorporation of stylistic analysis into machine translation systems will significantly reduce the current reliance on human post-editing and improve the quality of the systems' output

.

[Download pdf] (bibtex)

Resolving lexical ambiguity computationally with spreading activation and Polaroid Words,
Graeme Hirst, 1988
In: Steven Small and Garrison Cottrell and Michael Tanenhaus (editors), Lexical ambiguity resolution, Los Altos, CA: Morgan Kaufmann, pp. 73--107.
Reprinted, with a new epilogue, in: Pustejovsky, James and Wilks, Yorick (editors), Readings in the Lexicon, The MIT Press, to appear. 2003 epilogue to this paper:  PDF
Abstract

Any computer system for understanding natural language input (even in relatively weak senses of the word understanding) needs to be able to resolve lexical ambiguities. In this paper, I describe the lexical ambiguity resolution component of one such system.

The basic strategy used for disambiguation is ``do it the way people do.'' While cognitive modeling is not the primary goal of this work, it is often a good strategy in artificial intelligence to consider cognitive modeling anyway; finding out how people do something and trying to copy them is a good way to get a program to do the same thing. In developing the system below, I was strongly influenced by psycholinguistic research on lexical access and negative priming---in particular by the results of Swinney; Seidenberg, Tanenhaus, Leiman, and Bienkowski; and Reder. I will discuss the degree to which the system is a model of ambiguity resolution in people.


[Download pdf] (bibtex)

Semantic interpretation and ambiguity,
Graeme Hirst,
1988
Artificial intelligence, 34(2), pp. 131--177, March
Abstract

A new approach to semantic interpretation in natural language understanding is described, together with mechanisms for both lexical and structural disambiguation that work in concert with the semantic interpreter.

ABSITY, the system described, is a Montague-inspired semantic interpreter. Like Montague formalisms, its semantics is compositional by design and is strongly typed, with semantic rules in one-to-one correspondence with the meaning-affecting rules of a Marcus parser. The Montague semantic objects---functors and truth conditions---are replaced with elements of the frame language FRAIL. ABSITY's partial results are always well-formed FRAIL objects.

A semantic interpreter must be able to provide feedback to the parser to help it handle structural ambiguities. In ABSITY, this is done by the ``Semantic Enquiry Desk,'' a process that answers the parser's questions on semantic preferences. Disambiguation of word senses and of case slots is done by a set of procedures, one per word or slot, each of which determines the word or slot's correct sense, in cooperation with the other processes.

It is from the fact that partial results are always well-formed semantic objects that the system gains much of its power. This, in turn, comes from the strict correspondence between syntax and semantics in ABSITY. The result is a foundation for semantic interpretation superior to previous approaches.


[Download pdf] (bibtex)

Presuppositions as beliefs,
Diane Horton and Graeme Hirst,
1988
Proceedings, 12th International conference on computational linguistics (COLING-88), pp. 255--260, August, Budapest
Abstract

Most theories of presupposition implicitly assume that presuppositions are facts, and that all agents involved in a discourse share belief in the presuppositions that it generates. These unrealistic assumptions can be eliminated if each presupposition is treated as the belief of an agent. However, it is not enough to consider only the beliefs of the speaker; we show that the beliefs of other agents are often involved. We describe a new model, including an improved definition of presupposition, that treats presuppositions as beliefs and considers the beliefs of all agents involved in the discourse. We show that treating presuppositions as beliefs makes it possible to explain phenomena that cannot be explained otherwise.


[Download pdf] (bibtex)

Semantic interpretation and the resolution of ambiguity, Graeme Hirst, 1987
, Cambridge, England:, Cambridge University Press
Studies in natural language processing Reprinted, 1992 Buy at Amazon.com
Abstract
While parsing techniques have been greatly improved in recent years, the approach to semantics has generally been ad hoc and with little theoretical basis. In this monograph, I present a new theoretically motivated foundation for semantic interpretation (conceptual analysis) by computer, and show how this framework facilitates the resolution of both lexical and syntactic ambiguities. The approach is interdisciplinary, drawing on research in computational linguistics, artificial intelligence, Montague semantics, and cognitive psychology.

(bibtex)

Semantics,
Graeme Hirst, 1987
In: David Eckroth (editor), Encyclopedia of artificial intelligence, New York: Wiley-Interscience / John Wiley, pp. 1024--1029 .
(bibtex)

Parsing as an energy minimization problem,
Bart Selman and Graeme Hirst, 1987
In: Lawrence Davis (editor), Genetic algorithms and simulated annealing (Research notes in artificial intelligence), Pitman, pp. 141--154.
Revised version appears in: Geert Adriaens and Udo Hahn (editors). Parallel natural language processing. Norwood, NJ: Ablex Publishing, 1994, 238--254.
(bibtex)

The detection and representation of ambiguities of intension and description,
Brenda Fawcett and Graeme Hirst,
1986
Proceedings of the 24th Annual Meeting, Association for Computational Linguistics, pp. 192--199, June, New York
Abstract

Ambiguities related to intension and their consequent inference failures are a diverse group, both syntactically and semantically. One particular kind of ambiguity that has received little attention so far is whether it is the speaker or the third party to whom a description in an opaque third-party attitude report should be attributed. The different readings lead to different inferences in a system modeling the beliefs of external agents.

We propose that a unified approach to the representation of the alternative readings of intension-related ambiguities can be based on the notion of a descriptor that is evaluated with respect to intensionality, the beliefs of agents, and a time of application. We describe such a representation, built on a standard modal logic, and show how it may be used in conjunction with a knowledge base of background assumptions to license restricted substitution of equals in opaque contexts.


[Download pdf] (bibtex)

Why dictionaries should list case structures,
Graeme Hirst,
1986
Advances in lexicology: Proceedings of the Second Annual Conference of the University of Waterloo Centre for the New Oxford English Dictionary, pp. 147--162 , November, University of Waterloo
(bibtex)

A rule-based connectionist parsing system,
Bart Selman and Graeme Hirst,
1985
Proceedings of the Seventh Annual Conference of the Cognitive Science Society, pp. 212--221 , August, Irvine
(bibtex)

A semantic process for syntactic disambiguation,
Graeme Hirst,
1984
Proceedings, Fourth National Conference on Artificial Intelligence (AAAI-84), pp. 148--152, August, Austin
Abstract
Structural ambiguity in a sentence cannot be resolved without semantic help. We present a process for structural disambiguation that uses verb expectations, presupposition satisfaction, and plausibility, and an algorithm for making the final choice when these cues give conflicting information. The process, called the Semantic Enquiry Desk, is part of a semantic interpreter that makes sure all its partial results are well-formed semantic objects; it is from this that it gains much of its power.

[Download pdf] (bibtex)

Jumping to conclusions: Psychological reality and unreality in a word disambiguation program,
Graeme Hirst,
1984
Proceedings, Sixth Meeting of the Cognitive Science Society, pp. 179--182, June, Boulder
Abstract

Human language understanding sometimes jumps to conclusions without having all the information it needs or even using all that it has. So, therefore should any psychologically real language-understanding program. How this can be done in a discrete computational model is not obvious. In this paper, I look at three aspects of the problem:

  • When is information ignored?
  • When is a decision made out of impatience?
  • When is no decision made at all?
I give illustrations of these problems in the domain of word sense disambiguation with the Polaroid Words system.


(bibtex)

A foundation for semantic interpretation,
Graeme Hirst,
1983
Proceedings of the 21st Annual Meeting, Association for Computational Linguistics, pp. 64--73, June, Cambridge, MA USA
Abstract

Traditionally, translation from the parse tree representing a sentence to a semantic representation (such as frames or procedural semantics) has always been the most ad hoc part of natural language understanding (NLU) systems. However, recent advances in linguistics most notably the system of formal semantics known as Montague semantics, suggest ways of putting NLU semantics onto a cleaner and firmer foundation. We are using a Montague-inspired approach to semantics in an integrated NLU and problem-solving system that we are building. Like Montague's, our semantics are compositional by design and strongly typed, with semantic rules in one-to-one correspondence with the meaning-affecting rules of a Marcus-style parser. We have replaced Montague's semantic objects, functors and truth conditions with the elements of the frame language Frail, and added a word sense and case slot disambiguation system. The result is a foundation for semantic interpretation that we believe to be superior to previous approaches.


[Download pdf] (bibtex)

An evaluation of evidence for innate sex differences in linguistic ability,
Graeme Hirst,
1982
Journal of psycholinguistic research, 11(2), pp. 95--113, March
[Download pdf] (bibtex)

Anaphora in Natural Language Understanding, Graeme Hirst, 1981
, Berlin: Springer-Verlag
Lecture notes in computer science 119 Search for it at Abebooks Download it from SpringerLink (subscription required)
(bibtex)

Discourse-oriented anaphora resolution in natural language understanding: A review,
Graeme Hirst,
1981
American Journal of Computational Linguistics, 7(2), pp. 85--98, April-June
Abstract

Recent research in anaphora resolution has emphasized the effects of discourse structure and cohesion in determining what concepts are available as possible referents, and how discourse cohesion can aid reference resolution. Five approaches, all within this paradigm and yet all distinctly different, are presented, and their strengths and weaknesses evaluated.


[Download pdf] (bibtex)

What should Computer Scientists read?,
Graeme Hirst and Nadia Talent,
1978
Proceedings of the Eighth Australian Computer Conference, pp. 1707--1716 , August, Camberra, Australia
(bibtex)

Discipline impact factors: A method for determining core journal lists,
Graeme Hirst,
1978
Journal of the American Society for Information Science, 29(4), pp. 171--172, July
[Download pdf] (bibtex)

Computer Science journals --- An iterated citation analysis,
Graeme Hirst and Nadia Talent,
1977
IEEE Transactions on Professional Communication, PC-20(4), pp. 233--238, December
[Download pdf] (bibtex)
Kenneth Hoetmer (2)

Higher-order types for grammar engineering,
Kenneth Hoetmer,
2005
Master's Thesis. Department of Computer Science, University of Toronto. March.
Abstract
Linguistically precise general-purpose grammars of natural language enable a detailed semantic analysis that is currently unavailable to corpus-based approaches. Unfortunately, the engineering of such grammars is often tedious time-consuming, error-prone, and inaccessible to new developers. This work seeks to alleviate the engineering problem by discovering, documenting, and exploiting structural patterns of current grammar signatures. More specifically, it mines the English Resource Grammar (ERG) for evidence of intended patterns of type usage and documents those patterns within the framework of Alexandrian design patterns. The structural patterns are then exploited by way of parametric types, special higher-order type constructors and methods for automatic type selection. The applicability of the patterns is illustrated by ICEBERG, a higher-order refactoring of the ERG.

(bibtex)

In search of epistemic primitives in the English Resource Grammar (or why HPSG can't live without higher-order datatypes),
Gerald Penn and Kenneth Hoetmer,
2003
Proceedings of the 10th International Conference on Head-Driven Phrase Structure Grammar, pp. 318--337, July, East Lansing, MI
Abstract

This paper seeks to improve HPSG engineering through the design of more terse, readable, and intuitive type signatures. It argues against the exclusive use of IS-A networks and, with reference to the English Resource Grammar, demonstrates that a collection of higher-order datatypes are already acutely in demand in contemporary HPSG design. Some default specification conventions to assist in maximizing the utility of higher-order type constructors are also discussed.


[Download pdf] (bibtex)
Diane Horton (7)

Language use in context,
Janyce M. Wiebe and Graeme Hirst and Diane Horton,
1996
Communications of the ACM, 39(1), pp. 102--111, January
Abstract

Any text or dialogue establishes a linguistic context within which subsequent utterances must be understood. And beyond the linguistic context is the participatory context. A speaker or writer directs an utterance or text toward a hearer or reader with a particular purpose --- to inform, to amuse, to collaborate in a task, perhaps. The form and content of the utterance are chosen accordingly, and the listener or reader must infer the underlying intent as part of their understanding.

This article explores recent research on language use in context going beyond sentence boundariees and processing discourse --- treating texts or dialogues as whole units composed of interrelated parts, not merely as sequences of isolated sentences. The article discusses the comprehension and production of language, looking at both texts and dialogues. A text to be processed might be, for example, a newspaper or magazine article being translated into another language or whose content is to be ``understood'' or abstracted in an information storage and retrieval system. A dialogue to be processed might be a conversation, spoken or typed, between a human and a computer, in service of some collaborative task. Many of the problems described here occur in both kinds of discourse.

The underlying goal of the research described in this special section of Communications of the ACM is to move beyond `toy' systems and come to grips with `real language'. While the research described in the other articles in this section focuses on robustly processing massive amounts of text, the work described here focuses on understanding, in computational terms, the complexities and subtleties of language as people really use it.

In an article of this length, we cannot hope to describe all of the recent important work addressing language use in context. For example, we will not cover pronoun resolution, ellipsis, metaphor, or many aspects of belief ascription.


[Download pdf] (bibtex)

Repairing conversational misunderstandings and non-understandings,
Graeme Hirst and Susan McRoy and Peter A. Heeman and Philip Edmonds and Diane Horton,
1994
Speech communication, 15(3--4), pp. 213--229, December
Abstract
Participants in a discourse sometimes fail to understand one another but, when aware of the problem, collaborate upon or negotiate the meaning of a problematic utterance. To address nonunderstanding, we have developed two plan-based models of collaboration in identifying the correct referent of a description: one covers situations where both conversants know of the referent, and the other covers situations such as direction-giving, where the recipient does not. In the models conversants use the mechanisms of refashioning, suggestion, and elaboration, to collaboratively refine a referring expression until it is successful. To address misunderstanding, we have developed a model that combines intentional and social accounts of discourse to support the negotiation of meaning. The approach extends intentional accounts by using expectations deriving from social conventions in order to guide interpretation. Reflecting the inherent symmetry of the negotiation of meaning, all our models can act as both speaker and hearer, and can play both the role of the conversant who is not understood or misunderstood and the role of the conversant who fails to understand.

[Download pdf] (bibtex)

Knowledge about planning: On the meaning and representation of plan decomposition,
Diane Horton and Graeme Hirst,
1993
AAAI Spring Symposium on Reasoning about Mental States: Formal Theories and Applications, pp. 74--78, March, Stanford, CA
Abstract
Plan-related inference has been one of the most-studied problems in Artificial Intelligence. Pollack (1990) has argued that a plan should be seen as a set of mental attitudes towards a structured object. Although the objects of these attitudes have received far more attention to date than the attitudes themselves, little has been said about the exact meaning of one of their key components -- the decomposition relation. In developing a plan representation for our work on plan misinference in dialogue we have explored two of the possible meanings, their implications, and the relationship between them. These issues underly the literature, and in this paper, we step back and discuss them explicitly.

[Download pdf] (bibtex)

Review of Context and Presupposition by Rob A. van der Sandt,
Diane Horton,
1991
Linguistics, 29(4), pp. 730--738, November
[Download ps] (bibtex)

Discrepancies in discourse models and miscommunication in conversation,
Diane Horton and Graeme Hirst,
1991
AAAI Fall Symposium on Discourse Structure in Natural Language Understanding and Generation, pp. 31--32 , November, Pacific Grove, CA
(bibtex)

Presuppositions as beliefs,
Diane Horton and Graeme Hirst,
1988
Proceedings, 12th International conference on computational linguistics (COLING-88), pp. 255--260, August, Budapest
Abstract

Most theories of presupposition implicitly assume that presuppositions are facts, and that all agents involved in a discourse share belief in the presuppositions that it generates. These unrealistic assumptions can be eliminated if each presupposition is treated as the belief of an agent. However, it is not enough to consider only the beliefs of the speaker; we show that the beliefs of other agents are often involved. We describe a new model, including an improved definition of presupposition, that treats presuppositions as beliefs and considers the beliefs of all agents involved in the discourse. We show that treating presuppositions as beliefs makes it possible to explain phenomena that cannot be explained otherwise.


[Download pdf] (bibtex)

Incorporating agents' beliefs in a model of presupposition,
Diane Horton,
1986.
Master's Thesis. Department of Computer Science, University of Toronto. October. Published as technical report CSRI-201.
Abstract

The full communicative content of an utterance consists of its direct meaning as well as a variety of indirect information which can be inferred from the utterance. Presupposition is one category of such information.

Many theories of presupposition have been postulated. Most implicitly assume that presuppositions are facts, and that all agents involved in discourse share knowledge of them. We argue that these are unrealistic assumptions and propose a new view which considers presuppositions to be beliefs associated with particular agents. We then develop a definition of presupposition which embodies this view. We conclude that a model of presupposition which incorporates agents' beliefs, in addition to being more correct, is able to account for presuppositional phenomena which could not be accounted for otherwise.


[Download pdf] (bibtex)
Diana Inkpen (9)

Building and using a lexical knowledge-base of near-synonym differences,
Diana Inkpen and Graeme Hirst,
2006
Computational Linguistics, 32(2), pp. 223--262, June
Abstract
Choosing the wrong word in a machine translation or natural language generation system can convey unwanted connotations, implications, or attitudes. The choice between near-synonyms such as error mistake, slip, and blunder --- words that share the same core meaning, but differ in their nuances --- can be made only if knowledge about their differences is available.

We present a method to automatically acquire a new type of lexical resource: a knowledge-base of near-synonym differences. We develop an unsupervised decision-list algorithm that learns extraction patterns from a special dictionary of synonym differences. The patterns are then used to extract knowledge from the text of the dictionary.

The initial knowledge-base is later enriched with information from other machine-readable dictionaries. Information about the collocational behavior of the near-synonyms is acquired from free text. The knowledge-base is used by Xenon, a natural language generation system that shows how the new lexical resource can be used to choose the best near-synonym in specific situations.


[Download pdf] (bibtex)

Generating more-positive and more-negative text,
Diana Inkpen and Ol'ga Feiguina and Graeme Hirst, 2005
In: James G. Shanahan and Yan Qu and Janyce Wiebe (editors), Computing attitude and affect in text, Dordrecht, The Netherlands, Springer.
Supersedes March 2004 AAAI Symposium version
Abstract
We present experiments on modifying the semantic orientation of the near-synonyms in a text. We analyze a text into an interlingual representation and a set of attitudinal nuances, with particular focus on its near-synonyms. Then we use our text generator to produce a text with the same meaning but changed semantic orientation (more positive or more negative) by replacing, wherever possible, words with near-synonyms that differ in their expressed attitude.

[Download pdf] (bibtex)

Generating more-positive and more-negative text,
Diana Inkpen and Ol'ga Feiguina and Graeme Hirst,
2004
AAAI Spring Symposium on Exploring Attitude and Affect in Text, March, Stanford University
published as AAAI technical report SS-04-07. Superseded by 2005 book version
(bibtex)

Building a Lexical Knowledge-Base of Near-Synonym Differences,
Diana Inkpen,
2003
Ph.D. Thesis. Department of Computer Science, University of Toronto. October.
Abstract

Current natural language generation or machine translation systems cannot distinguish among near-synonyms --- words that share the same core meaning but vary in their lexical nuances. This is due to a lack of knowledge about differences between near-synonyms in existing computational lexical resources.

The goal of this thesis is to automatically acquire a lexical knowledge-base of near-synonym differences (LKB of NS) from multiple sources, and to show how it can be used in a practical natural language processing system.

I designed a method to automatically acquire knowledge from dictionaries of near-synonym discrimination written for human readers. An unsupervised decision-list algorithm learns patterns and words for classes of distinctions. The patterns are learned automatically, followed by a manual validation step. The extraction of distinctions between near-synonyms is entirely automatic. The main types of distinctions are: stylistic (for example, inebriated is more formal than drunk), attitudinal (for example skinny is more pejorative than slim), and denotational (for example, blunder implies accident and ignorance, while error does not).

I enriched the initial LKB of NS with information extracted from other sources. First, information about the senses of the near-synonym was added (WordNet senses). The other near-synonyms in the same dictionary entry and the text of the entry provide a strong context for disambiguation. Second, knowledge about the collocational behaviour of the near-synonyms was acquired from free text. Collocations between a word and the near-synonyms in a dictionary entry were classified into: preferred collocations, less-preferred collocations and anti-collocations. Third, knowledge about distinctions between near-synonyms was acquired from machine-readable dictionaries (the General Inquirer and the Macquarie Dictionary). These distinctions were merged with the initial LKB of NS, and inconsistencies were resolved.

The generic LKB of NS needs to be customized in order to be used in a natural language processing system. The parts that need customization are the core denotations and the strings that describe peripheral concepts in the denotational distinctions. To show how the LKB of NS can be used in practice, I present Xenon, a natural language generation system system that chooses the near-synonym that best matches a set of input preferences. I implemented Xenon by adding a near-synonym choice module and a near-synonym collocation module to an existing general-purpose surface realizer.


[Download pdf] (bibtex)

Near-synonym choice in natural language generation,
Diana Inkpen and Graeme Hirst,
2003
International Conference RANLP-2003 (Recent Advances in Natural Language Processing), pp. 204--211, September, Borovets, Bulgaria
Reprinted, slighly abridged, in Recent Advances in Natural Language Processing III, John Benjamins Publishing Company, 2004 (Selected papers from RANLP 2003 edited by Nicolas Nicolov, Kalina Bontcheva, Galia Angelova, and Ruslan Mitkov
Abstract
We present Xenon, a natural language generation system capable of distinguishing between near-synonyms. It integrates a near-synonym choice module with an existing sentence realization module. We evaluate Xenon using English and French near-synonyms.

[Download pdf] (bibtex)

Automatic sense disambiguation of the near-synonyms in a dictionary entry,
Diana Inkpen and Graeme Hirst,
2003
Proceedings, 4th Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2003), pp. 258--267, February, Mexico City, Mexico
Abstract
We present an automatic method to disambiguate the senses of the near-synonyms in the entries of a dictionary of synonyms. We combine different indicators that take advantage of the structure on the entries and of lexical knowledge in WordNet. We also present the results of human judges doing the disambiguation for 50 randomly selected entries. This small amount of annotated data is used to tune and evaluate our system.

[Download pdf] (bibtex)

Acquiring collocations for lexical choice between near-synonyms,
Diana Inkpen and Graeme Hirst,
2002
SIGLEX Workshop on Unsupervised Lexical Acquisition 40th meeting of the Association for Computational Linguistics, June, Philadelphia, PA
Abstract
We extend a lexical knowledge-base of near-synonym differences with knowledge about their collocational behaviour. This type of knowledge is useful in the process of lexical choice between near-synonyms. We acquire collocations for the near-synonyms of interest from a corpus (only collocations with the appropriate sense and part-of-speech). For each word that collocates with a near-synonym we use a differential test to learn whet her the word forms a less-preferred collocation or an anti-collocation with other near-synonyms in the same cluster. For this task we use a much larger corpus (the Web). We also look at associations (longer-distance co-occurrences) as a possible source of learning more about nuances that the near-synonyms may carry.

[Download pdf] (bibtex)

Building a lexical knowledge-base of near-synonym differences,
Diana Inkpen and Graeme Hirst,
2001
Workshop on WordNet and Other Lexical Resources Second meeting of the North American Chapter of the Association for Computational Linguistics, June, Pittsburgh, PA
Abstract
In machine translation and natural language generation, making the wrong word choice from a set of near-synonyms can be imprecise or awkward, or convey unwanted implications. Using Edmonds's model of lexical knowledge to represent clusters of near-synonyms, our goal is to automatically derive a lexical knowledge-base from the Choose the Right Word dictionary of near-synonym discrimination. We do this by automatically classifying sentences in this dictionary according to the classes of distinctions they express. We use a decision-list learning algorithm to learn words and expressions that characterize the classes DENOTATIONAL DISTINCTIONS and ATTITUDE-STYLE DISTINCTIONS. These results are then used by an extraction module to actually extract knowledge from each sentence. We also integrate a module to resolve anaphors and word-to-word comparisons. We evaluate the results of our algorithm for several randomly selected clusters against a manually built standard solution, and compare them with the results of a baseline algorithm. Improvements on previous results are due in part to the addition of a coreference module.

[Download pdf] (bibtex)

Experiments on extracting knowledge from a machine-readable dictionary of synonym differences,
Diana Inkpen and Graeme Hirst, 2001
In: Gelbukh, Alexander (editor), Computational Linguistics and Intelligent Text Processing (Proceedings, Second Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, February 2001), Berlin, Springer-Verlag, pp. 264--278.
Published as Lecture Notes in Computer Science, vol 2004
Abstract
In machine translation and natural language generation, making the wrong word choice from a set of near-synonyms can be imprecise or awkward, or convey unwanted implications. Using Edmonds's model of lexical knowledge to represent clusters of near-synonyms, our goal is to automatically derive a lexical knowledge-base from the Choose the Right Word dictionary of near-synonym discrimination. We do this by automatically classifying sentences in this dictionary according to the classes of distinctions they express. We use a decision-list learning algorithm to learn words and expressions that characterize the classes DENOTATIONAL DISTINCTIONS and ATTITUDE-STYLE DISTINCTIONS. These results are then used by an extraction module to actually extract knowledge from each sentence. We also integrate a module to resolve anaphors and word-to-word comparisons. We evaluate the results of our algorithm for several randomly selected clusters against a manually built standard solution, and compare them with the results of a baseline algorithm.

[Download pdf] (bibtex)
Nathalie Japkowicz (3)

A system for translating locative prepositions from English into French,
Nathalie Japkowicz and Janyce M. Wiebe,
1991
Proceedings, 29th annual meeting of the Association for Computational Linguistics, pp. 153--160, June, Berkeley
Abstract
Machine translation of locative prepositions is not straightforward, even between closely related languages. This paper discusses a system of translation of locative prepositions between English and French. The system is based on the premises that English and French do not always conceptualize objects in the same way, and that this accounts for the major differences in the ways that locative prepositions are used in these languages. This paper introduces knowledge representations of conceptualizations of objects, and a method for translating prepositions based on these conceptual representations.

[Download pdf] (bibtex)

Using conceptual information to translate locative prepositions from English into French,
Nathalie Japkowicz and Janyce Wiebe,
1990
Current trends in SNePS---Proceedings of the 1990 workshop (Syed Ali and Hans Chalupsky and Deepak Kumar ed.)
(bibtex)

The translation of basic topological prepositions from English into French,
Nathalie Japkowicz,
1990
Master's Thesis. Department of Computer Science, University of Toronto. October.
Abstract

Machine translation of locative prepositions is difficult, even between closely related languages such as English and French. We investigate translating the three prepositions in, on, and at into the French prepositions dans, sur, and à. Often, in corresponds to dans on to sur, and at to à. This correspondence, however, is not perfect: in a number of cases, the uses of these prepositions were observed to differ from one language to the other. These cases are not simply exceptional. Following recent work in cognitive science, we use the notion of conceptualization to account for this problem. A conceptualization (or metaphor) is a mental representation of an object or an idea. We believe that the differences in the uses of locative prepositions are caused by differences in the way objects are conceptualized in English and French.

We have implemented a system that translates sentences involving one of the locative prepositions we studied from English to French, based on this idea. In addition, our system is able to detect ambiguities, as well as errors and abnormalities in input sentences.


(bibtex)
Eric Joanis (5)

A general feature space for automatic verb classification,
Eric Joanis and Suzanne Stevenson and David James,
2008
Natural Language Engineering, 14(3), pp. 337--367
Also published by Cambridge Journals Online on December 19, 2006
Abstract
Lexical semantic classes of verbs play an important role in structuring complex predicate information in a lexicon, thereby avoiding redundancy and enabling generalizations across semantically similar verbs with respect to their usage. Such classes, however, require many person-years of expert effort to create manually, and methods are needed for automatically assigning verbs to appropriate classes. In this work, we develop and evaluate a feature space to support the automatic assignment of verbs into a well-known lexical semantic classification that is frequently used in natural language processing. The feature space is general – applicable to any class distinctions within the target classification; broad – tapping into a variety of semantic features of the classes; and inexpensive – requiring no more than a POS tagger and chunker. We perform experiments using support vector machines (SVMs) with the proposed feature space demonstrating a reduction in error rate ranging from 48% to 88% over a chance baseline accuracy, across classification tasks of varying difficulty. In particular, we attain performance comparable to or better than that of feature sets manually selected for the particular tasks. Our results show that the approach is generally applicable, and reduces the need for resource-intensive linguistic analysis for each new classification task. We also perform a wide range of experiments to determine the most informative features in the feature space, finding that simple, easily extractable features suffice for good verb classification performance.

[Download pdf] (bibtex)

A general feature space for automatic verb classification,
Eric Joanis and Suzanne Stevenson,
2003
Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-03), April, Budapest, Hungary
Abstract
We develop a general feature space for automatic classification of verbs into lexical semantic classes. Previous work was limited in scope by the need for manual selection of discriminating features, through a linguistic analysis of the target verb classes (Merlo and Stevenson, 2001). We instead analyze the classification structure at a higher level, using the possible defining characteristics of classes as the basis for our feature space. The general feature space achieves reductions in error rates of 42--69%, on a wider range of classes than investigated previously, with comparable performance to feature sets manually selected for the particular classification tasks. Our results show that the approach is generally applicable, and avoids the need for resource-intensive linguistic analysis for each new task.

[Download pdf] (bibtex)

Semi-supervised Verb Class Discovery Using Noisy Features,
Suzanne Stevenson and Eric Joanis,
2003
In Proceedings of the Seventh Conference on Natural Language Learning (CoNLL-2003), June, Edmonton, Canada
Abstract

We cluster verbs into lexical semantic classes, using a general set of noisy features that capture syntactic and semantic properties of the verbs. The feature set was previously shown to work well in a supervised learning setting, using known English verb classes. In moving to a scenario of verb class discovery, using clustering, we face the problem of having a large number of irrelevant features for a particular clustering task. We investigate various approaches to feature selection, using both unsupervised and semi-supervised methods, comparing the results to subsets of features manually chosen according to linguistic properties. We find that the unsupervised method we tried cannot be consistently applied to our data. However the semi-supervised approach (using a seed set of sample verbs) overall outperforms not only the full set of features, but the hand-selected features as well.


[Download pdf] (bibtex)

Automatic Verb Classification Using a General Feature Space,
Eric Joanis,
2002
Master's Thesis. Department of Computer Science, University of Toronto. October.
Abstract
We develop a general feature space that can be used for the semantic classification of English verbs. We design a technique to extract these features from a large corpus of English, while trying to maintain portability to other languages---the only language-specific tools we use to extract our core features are a part-of-speech tagger and a partial parser. We show that our general feature space reduces the chance error rate by 40% or more in ten experiments involving from two to thirteen verb classes. We also show that it usually performs as well as features that are selected using specific linguistic expertise, and that it is therefore unnecessary to manually do linguistic analysis for each class distinction of interest. Finally, we consider the use of an automatic feature selection technique, stepwise feature selection, and show that it does not work well with our feature space.

[Download pdf] (bibtex)

Review of the Literature on Aggregation in Natural Language Generation,
Eric Joanis,
1999
Department of Computer Science, University of Toronto, Technical Report Number CSRG-398, September
Abstract
In this paper, we review the significant body of research on aggregation especially in the past decade. The linguistic phenomena labelled as aggregation are distinguished and classified and their use in Natural Language Generation is analyzed. Several systems using aggregation are described and used as examples of what can be done, and their architectures are compared. Finally, a chronology of significant contributions to aggregation is provided.

[Download pdf] (bibtex)
Siavash Kazemian (2)

A Critical Assessment of Spoken Utterance Retrieval through Approximate Lattice Representations,
Siavash Kazemian,
2009
Master's Thesis. Department of Computer Science, University of Toronto. January.
Abstract
This paper compares the performance of Position-specific Posterior Lattices (PSPL) and Confusion Networks (CN) applied to Spoken Utterance Retrieval, and tests these recent proposals against several baselines namely 1-best transcription, using the whole lattice, and the set-of-words baseline. The set-of-words baseline is used for the first time in context of Spoken Utterance Retrieval. PSPL and CN provide compact representations that generalize the original segment lattices and provide greater recall robustness, but have yet to be evaluated against each other in multiple WER conditions for Spoken Utterance Retrieval. Our comparisons suggest that while PSPL and Confusion Networks have comparable recall the former is slightly more precise, although its merit appears to be coupled to the assumptions of low-frequency search queries and low- WER environments. While in the low-WER environments all methods tested have comparable performance, both PSPL and CN significantly outperform the 1-best transcription in high-WER environments but perform similarly to the whole lattice and set-of-words baselines.

[Download pdf] (bibtex)

A Critical Assessment of Spoken Utterance Retrieval through Approximate Lattice Representations,
Siavash Kazemian and Frank Rudzicz and Gerald Penn and Cosmin Munteanu,
2008
Proceedings of the ACM International Conference on Multimedia Information Retrieval (MIR2008), October, Vancouver Canada
(bibtex)
Grzegorz Kondrak (6)

Phonetic alignment and similarity,
Grzegorz Kondrak,
2003
Computers and the Humanities, 37(3), pp. 273--291, August
Abstract
The computation of the optimal phonetic alignment and the phonetic similarity between words is an important step in many applications in computational phonology, including dialectometry. After discussing several related algorithms, I present a novel approach to the problem that employs a scoring scheme for computing phonetic similarity between phonetic segments on the basis of multivalued articulatory phonetic features. The scheme incorporates the key concept of feature salience which is necessary to properly balance the importance of various features. The new algorithm combines several techniques developed for sequence comparison: an extended set of edit operations, local and semiglobal modes of alignment, and the capability of retrieving a set of near-optimal alignments. On a set of 82 cognate pairs, it performs better than comparable algorithms reported in the literature.

[Download pdf] (bibtex)

Determining recurrent sound correspondences by inducing translation models,
Grzegorz Kondrak,
2002
Proceedings, 19th International Conference on Computational Linguistics (COLING-2002), August, Taipei, Taiwan
Abstract
I present a novel approach to the determination of recurrent sound correspondences in bilingual wordlists. The idea is to relate correspondences between sounds in wordlists to translational equivalences between words in bitexts (bilingual corpora). My method induces models of sound correspondence that are similar to models developed for statistical machine translation. The experiments show that the method is able to determine recurrent sound correspondences in bilingual wordlists in which less than 30% of the pairs are cognates. By employing the discovered correspondences, the method can identify cognates with higher accuracy that the previously reported algorithms.

[Download pdf] (bibtex)

Algorithms for Language Reconstruction,
Grzegorz Kondrak,
2002
Ph.D. Thesis. Department of Computer Science, University of Toronto. July.
Abstract

Genetically related languages originate from a common proto-language. In the absence of historical records, proto-languages have to be reconstructed from surviving cognates, that is words that existed in the proto-language and are still present in some form in its descendants. The language reconstruction methods have so far been largely based on informal and intuitive criteria. In this thesis, I present techniques and algorithms for performing various stages of the reconstruction process automatically.

The thesis is divided into three main parts that correspond to the principal steps of language reconstruction. The first part presents a new algorithm for the alignment of cognates, which is sufficiently general to align any two phonetic strings that exhibit some affinity. The second part introduces a method of identifying cognates directly from the vocabularies of related languages on the basis of phonetic and semantic similarity. The third part describes an approach to the determination of recurrent sound correspondences in bilingual wordlists by inducing models similar to those developed for statistical machine translation.

The proposed solutions are firmly grounded in computer science and incorporate recent advances in computational linguistics, articulatory phonetics, and bioinformatics. The applications of the new techniques are not limited to diachronic phonology, but extend to other areas of computational linguistics, such as machine translation.


[Download pdf] (bibtex)

Identifying cognates by phonetic and semantic similarity,
Grzegorz Kondrak,
2001
Proceedings, Second meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2001), pp. 103--110, June, Pittsburgh, PA
Abstract
I present a method of identifying cognates in the vocabularies of related languages. I show that a measure of phonetic similarity based on multivalued features performs better than ``orthographic'' measures such as the Longest Common Subsequence Ratio (LCSR) or Dice's coefficient. I introduce a procedure for estimating semantic similarity of glosses that employs keyword selection and WordNet. Tests performed on vocabularies of four Algonquian languages indicate that the method is capable of discovering on average nearly 75% percent of cognates at 50% precision.

[Download pdf] (bibtex)

A new algorithm for the alignment of phonetic sequences,
Grzegorz Kondrak,
2000
Proceedings, First meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-2000), pp. 288--295, May, Seattle, WA
Abstract
Alignment of phonetic sequences segments is a necessary step in many applications in computational phonology. After discussing various approaches to phonetic alignment, I present a new algorithm that combines a number of techniques developed for sequence comparison with a scoring scheme for computing phonetic similarity on the basis of multivalued features. The algorithm performs better on cognate alignment, in terms of accuracy and efficiency, than other algorithms reported in the literature.

[Download pdf] (bibtex)

Alignment of Phonetic Sequences,
Grzegorz Kondrak,
1999
Department of Computer Science, University of Toronto, Technical Report Number CSRG-402, December
Abstract
Alignment of phonetic sequences is a necessary step in many applications in computational phonology. In this paper, I critically evaluate several approaches to phonetic alignment that have been reported within the last few years. I then present a new algorithm that is geared towards the alignment of cognates. I show how the basic dynamic programming algorithm for sequence comparison can be modified to deal with a range of phonological phenomena. For computing phonetic similarity, I propose a scoring scheme that is based on multivalued features. Finally, I provide a complete test set that demonstrates the new algorithm performs better on cognate alignment, in terms of accuracy and efficiency, than other algorithms reported in the literature.

[Download ps] (bibtex)
Yves Lespérance (2)

A formal theory of indexical knowledge and action,
Yves Lespérance,
1991
Ph.D. Thesis. Department of Computer Science, University of Toronto. January. Published as technical report CSRI-248.
Abstract

Agents act upon and perceive the world from a particular perspective. It is important to recognize this relativity to perspective if one is not to be overly demanding in specifying what they need to know in order to be able to achieve goals through action. An agent may not know where he is, what time it is, which objects are around him, what the absolute positions of these objects are, and even who he is, and still be able to achieve his goals. This is because the knowledge required is indexical knowledge, knowledge about how one is related to things in one's environment or to events in one's history.

This thesis develops a formal theory of knowledge and action that handles the distinction between indexical and objective knowledge and allows a proper specification of the knowledge prerequisites and effects of action. The theory is embodied in a logic. The semantics of knowledge proposed is a natural extension of the standard possible-world semantic scheme. The notion of ``ability to achieve a goal by doing an action'' is formalized within the logic; the formalization does not require an agent to know in an absolute sense who he is or what time it is.

We then formalize various domains within the logical system proposed. These examples show how actions can be specified so as to avoid making excessive requirement upon the knowledge of agents, and how such specifications can be used to prove that an agent is able to achieve a goal by doing an action if he knows certain facts. We direct a significant part of our formalization efforts at a robotics domain since this kind of application provides the most intuitive examples of situations where indexical knowledge is sufficient for ability. But we also formalize other domains involving temporal knowledge, knowledge of the phone system, and knowledge of one's position in a data structure. The examples involved show that the notion of indexical knowledge is more abstract than one might first imagine. On the basis of evidence provided by the temporal examples, we provide an argument to the effect that the distinction between indexical and objective knowledge accommodated by our framework is of practical interest and that it cannot be handled within previous theories.


(bibtex)

Toward a computational interpretation of situation semantics,
Yves Lespérance,
1986
Computational intelligence, 2(1), pp. 9--27, February
Abstract

Situation semantics proposes novel and attractive treatments for several problem areas of natural language semantics, such as efficiency (context sensitivity) and propositional attitude reports. Its focus on the information carried by utterances makes the approach very promising fro accounting for pragmatic phenomena. However, situation semantics seems to oppose several basic assumptions underlying current approaches to natural language processing and the design of intelligent systems in general. It claims that efficiency undermines the standard notions of logical form entailment, and proof theory, and objects to the view that mental processes necessarily involve internal representations. The paper attempts to clarify these issues and discusses the impact of situation semantics' criticisms for natural language processing, knowledge representation, and reasoning. I claim that the representational approach is the only currently practical one for the design of large intelligent systems, but argue that the representations used should be efficient in order to account for the system's embedding in its environment. The paper concludes by stating some constraints that a computational interpretation of situation semantics should obey and discussing remaining problems.


(bibtex)
Jianhua Li (3)

Modelling Semantic Knowledge for a Word Completion Task,
Jianhua Li,
2006
Master's Thesis. Department of Computer Science, University of Toronto. October.
Abstract
To assist people with physical disabilities in text entry, we have studied the contribution of semantic knowledge in the word completion task. We have first constructed a semantic knowledge base (SKB) that stores the semantic association between word pairs. To create the SKB a novel Lesk-like relatedness filter is employed. On the basis of the SKB, we have proposed an integrated semantics-based word completion model. The model combines the semantic knowledge in the SKB with n-gram probabilities. To deal with potential problems in the model we propose the strategy of using salient terms and the ad hoc algorithm for the OOV recognition. We tested our model and compared with the model using n-gram probabilities of word and part-of-speech alone and found that our model has achieved significant performance improvement. In addition, test experiments on the algorithm for OOV recognition present a notable enhancement of the system performance.

[Download pdf] (bibtex)

Semantic knowledge in a word completion task,
Jianhua Li and Graeme Hirst,
2005
Proceedings, 7th International ACM SIGACCESS Conference on Computers and Accessibility, October, Baltimore, MD
Abstract
We propose a combinatory approach to interactive word-completion for users with linguistic disabilities in which semantic knowledge combines with n-gram probabilities to predict semantically more-appropriate words than n-gram methods alone. The semantic knowledge is used to measure the semantic association of completion candidates with the context. Experimental results show a performance improvement when using the combinatory model for the completion of nouns.

[Download pdf] (bibtex)

Analysis of polarity information in medical text,
Yun Niu and Xiaodan Zhu and Jianhua Li and Graeme Hirst,
2005
Proceedings of the American Medical Informatics Association 2005 Annual Symposium, pp. 570--574, October, Washington, D.C.
Abstract
Knowing the polarity of clinical outcomes is important in answering questions posed by clinicians in patient treatment. We treat analysis of this information as a classification problem. Natural language processing and machine learning techniques are applied to detect four possibilities in medical text: no outcome, positive outcome, negative outcome, and neutral outcome. A supervised learning method is used to perform the classification at the sentence level. Five feature sets are constructed: UNIGRAMS, BIGRAMS, CHANGE PHRASES, NEGATIONS, and CATEGORIES. The performance of different combinations of feature sets is compared. The results show that generalization using the category information in the domain knowledge base Unified Medical Language System is effective in the task. The effect of context information is significant. Combining linguistic features and domain knowledge leads to the highest accuracy.

[Download pdf] (bibtex)
Robert Lizée (1)

A study of natural language quantification and anaphora through families of sets and binary relations,
Robert Lizée,
1995
Master's Thesis. Department of Computer Science, University of Toronto. February. Published as technical report CSRI-314.
Abstract
In this thesis, we study the use of families of sets and binary relations to represent natural language quantification and anaphora. We focus on quantification in a dependency-free context (no variables) using the idea of Barwise and Cooper (1981) of expressing quantifiers as families of sets. A language where sentences are expressed as subset relations between generalized quantifiers is shown equivalent to the variable-free Montagovian syntax of McAllester and Givan (1992), relating their notion of obvious inference to the transitive closure. To account for anaphora, we propose to use an extended algebra of binary relations (Suppes, 1976; Bottner, 1992), in practice, restricting the number of variables to one. We contribute to the formalism with a family of composition operators, allowing to account for sentences with determiners other than `every', `some', or `no'. Moreover, we show how to handle some cases of long-distance anaphora, some cases involving the word `other', and some donkey-sentence anaphora.

[Download ps] (bibtex)
Dan Lyons (1)

A frame-based semantics for focusing subjuncts,
Dan Lyons and Graeme Hirst,
1990
Proceedings of the 28th Annual Meeting, Association for Computational Linguistics, pp. 54--61, June, Pittsburgh, PA
Abstract
A compositional semantics for focusing subjuncts---words such as only, even, and also---is developed from Rooth's theory of association with focus. By adapting the theory so that it can be expressed in terms of a frame-based semantic formalism, a semantics that is more computationally practical is arrived at. This semantics captures progmatic subtleties by incorporating a two-part representation, and recognizes the contribution of intonation to meaning.

[Download pdf] (bibtex)
Meghana Marathe (1)

Lexical Chains using Distributional Measures of Concept Distance,
Meghana Marathe,
2009
Master's Thesis. Department of Computer Science, University of Toronto.
Abstract

In practice, lexical chains are typically built using term reiteration or resource-based measures of semantic distance. The former approach misses out on a significant portion of the inherent semantic information in a text, while the latter suffers from the limitations of the linguistic resource it depends upon. In this paper, chains are constructed using the framework of distributional measures of concept distance, which combines the advantages of resource-based and distributional measures of semantic distance. These chains were evaluated on the task of text segmentation and in a study that asked linguistically-trained judges to rate them qualitatively. While performing as well as or better than state-of-the-art methods in the former task, they were rated significantly lower for coherence than chains built using Lin's WordNet-based measure.


[Download pdf] (bibtex)
Daniel Marcu (18)

The Theory and Practice of Discourse Parsing and Summarization, Daniel Marcu, 2000
, The MIT Press
0-262-13372-5 November Order from publisher ($39.95 plus shipping) Order from Amazon ($35.00 plus shipping)
Abstract

Until now, most discourse researchers have assumed that full semantic understanding is necessary to derive the discourse structure of texts. This book documents the first serious attempt to construct automatically and use nonsemantic computational structures for text summarization. Daniel Marcu develops a semantics-free theoretical framework that is both general enough to be applicable to naturally occurring texts and concise enough to facilitate an algorithmic approach to discourse analysis. He presents and evaluates two discourse parsing methods: one uses manually written rules that reflect common patterns of usage of cue phrases such as ``however'' and ``in addition to''; the other uses rules that are learned automatically from a corpus of discourse structures. By means of a psycholinguistic experiment, Marcu demonstrates how a discourse-based summarizer identifies the most important parts of texts at levels of performance that are close to those of humans.

Marcu also discusses how the automatic derivation of discourse structures may be used to improve the performance of current natural language generation, machine translation, summarization, question answering, and information retrieval systems.


(bibtex)

Perlocutions: The Achilles' heel of speech act theory,
Daniel Marcu,
2000
Journal of Pragmatics, 32(12), pp. 1719-1741, November
An earlier version of this paper was published in Working Notes AAAI Fall Symposium on Communicative Action in Humans and Machines, MIT, Cambridge, MA, November 1997, 51--58
Abstract
This paper criticizes previous approaches to perlocutions and previous formalizations of perlocationary effects of communicative actions by showing that some of their fundamental assumptions are inconsistent with data from communication studies, psychology, and social studies of persuasion. Consequently, it argues for a data-driven approach to pragmatics, one that permits pragmatic theories to be falsified and improved. The paper also offers an introductory account of a formal theory that can explain the difference in persuasiveness between messages that are characterized by the same set of locutionary and illocutionary acts; and the difference in persuasiveness of the same message with respect to different hearers. The formal theory is developed using the language of situation calculus.

[Download ps] (bibtex)

The rhetorical parsing of unrestricted texts: A surface-based approach,
Daniel Marcu,
2000
Computational Linguistics, 26(3), pp. 395--448, September
Abstract

Coherent texts are not just simple sequences of clauses and sentences, but rather complex artifacts that have highly elaborate rhetorical structure. This paper explores the extent to which well-formed rhetorical structures can be automatically derived by means of surface-form-based algorithms. These algorithms identify discourse usages of cue phrases and break sentences into clauses, hypothesize rhetorical relations that hold among textual units, and produce valid rhetorical structure trees for unrestricted natural language texts. The algorithms are empirically grounded in a corpus analysis of cue phrases and rely on a first-order formalization of rhetorical structure trees.

The algorithms are evaluated both intrinsically and extrinsically. The intrinsic evaluation assesses the resemblance between automatically and manually constructed rhetorical structure trees. The extrinsic evaluation shows that automatically derived rhetorical structures can be successfully exploited in the context of text summarization.


[Download pdf] (bibtex)

Extending a formal and computational model of Rhetorical Structure Theory with intentional structures à la Grosz and Sidner,
Daniel Marcu,
2000
Proceedings, 18th International Conference on Computational Linguistics (COLING-2000), pp. 523--529, August, Saarbrücken, Germany
Abstract
In the last decade, members of the computational linguistics community have adopted a perspective on discourse based primarily on either Rhetorical Structure Theory or Grosz and Sidner's Theory. However, only recently have researchers started to investigate the relationship between the two approaches. In this paper, we use Moser and Moore's (1996) work as a departure point for extending Marcu's formalization of RST (1996). The result is a first-order axiomatization of the mathematical properties of text structures and of the relationship between the structure of text and intentions. The axiomatization enables one to use intentions for reducing the ambiguity of discourse and the structure of discourse for deriving intentional inferences.

[Download pdf] (bibtex)

Advances in automatic text summarization, Daniel Marcu, 1999
(Inderjeet Mani and Mark T. Maybury ed.), pp. 123--136, The MIT Press
Discourse trees are good indicators of importance in text
Abstract

Researchers in computational linguistics have long speculated that the nuclei of the rhetorical structure tree of a text form an adequate ``summary'' of the text for which that tree was built. However, to my knowledge, there has been no experiment to confirm how valid this speculation really is.

In this paper, I describe a psycholinguistic experiment that shows that the concepts of discourse structure and nuclearity can be used effectively in text summarization. More precisely, I show that there is a strong correlation between the nuclei of the discourse structure of a text and what readers perceive to be the most important units in that text. In addition, I propose and evaluate the quality of an automatic discourse-based summarization system that implements the methods that were validated by the psycholinguistic experiment. The evaluation indicates that although the system does not match yet the results that would be obtained if discourse trees had been built manually, it still significantly outperforms both a baseline algorithm and Microsoft's Office97 summarizer.


[Download ps] (bibtex)

A surface-based approach to identifying discourse markers and elementary textual units in unrestricted texts,
Daniel Marcu,
1998
Proceedings, COLING/ACL'98 Workshop on Discourse Relations and Discourse Markers, pp. 1--7, August, Montreal, QC
Abstract
I present a surface-based algorithm that employs knowledge of cue phrase usages in order to determine automatically clause boundaries and discourse markers in unrestricted natural language texts. The knowledge was derived from a comprehensive corpus analysis.

[Download ps] (bibtex)

To build text summaries of high quality, nuclearity is not sufficient,
Daniel Marcu,
1998
Working notes, AAAI Spring Symposium on Intelligent Text Summarization, pp. 1--8, March, Stanford
Abstract
Researchers in discourse have long hypothesized that the nuclei of a rhetorical structure tree provide a good summary of the text for which that tree was built. In this paper, I discuss a psycholinguistic experiment that validates this hypothesis, but that also shows that the distinction between nuclei and satellites is not sufficient if we want to build summaries of very high quality. I empirically compare various techniques for mapping discourse trees into partial orders that reflect the importance of the elementary textual units in texts and I discuss both their strengths and weaknesses.

[Download ps] (bibtex)

The rhetorical parsing, summarization, and generation of natural language texts,
Daniel Marcu,
1998
Ph.D. Thesis. Department of Computer Science, University of Toronto. January. Published as technical report CSRI-371.
Abstract

This thesis is an inquiry into the nature of the high-level rhetorical structure of unrestricted natural language texts computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically.

The thesis proposes a first-order formalization of the high-level rhetorical structure of text. The formalization assumes that text can be sequenced into elementary units; that discourse relations hold between textual units of various sizes; that some textual units are more important to the writer's purpose than others; and that trees are a good approximation of the abstract structure of text. The formalization also introduces a linguistically motivated compositionality criterion, which is shown to hold for the text structures that are valid.

The thesis proposes, analyzes theoretically, and compares empirically four algorithms for determining the valid text structures of a sequence of units among which some rhetorical relations hold. Two algorithms apply model-theoretic techniques; the other two apply proof-theoretic techniques.

The formalization and the algorithms mentioned so far correspond to the theoretical facet of the thesis. An exploratory corpus analysis of cue phrases provides the means for applying the formalization to unrestricted natural language texts. A set of empirically motivated algorithms were designed in order to determine the elementary textual units of a text, to hypothesize rhetorical relations that hold among these units, and eventually, to derive the discourse structure of that text. The process that finds the discourse structure of unrestricted natural language texts is called rhetorical parsing.

The thesis explores two possible applications of the text theory that it proposes. The first application concerns a discourse-based summarization system, which is shown to significantly outperform both a baseline algorithm and a commercial system. An empirical psycholinguistic experiment not only provides an objective evaluation of the summarization system, but also confirms the adequacy of using the text theory proposed here in order to determine the most important units in a text. The second application concerns a set of text planning algorithms that can be used by natural language generation systems in order to construct text plans in the cases in which the high-level communicative goal is to map an entire knowledge pool into text.


[Download pdf] (bibtex)

The rhetorical parsing of natural language texts,
Daniel Marcu,
1997
Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the Association for Computational Linguistics, pp. 96--103, July, Madrid, Spain
Abstract
We derive the rhetorical structures of texts by means of two new surface-form-based algorithms: one that identifies discourse usages of cue phrases and breaks sentences into clauses, and one that produces valid rhetorical structure trees for unrestricted natural language texts. The algorithms use information that was derived from a corpus analysis of cue phrases.

[Download ps] (bibtex)

From discourse structures to text summaries,
Daniel Marcu,
1997
Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization, pp. 82--88, July, Madrid, Spain
Abstract
We describe experiments that show that the concepts of rhetorical analysis and nuclearity can be used effectively for determining the most important units in a text. We show how these concepts can be implemented and we discuss results that we obtained with a discourse-based summarization program.

[Download ps] (bibtex)

From local to global coherence: A bottom-up approach to text planning,
Daniel Marcu,
1997
Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97), pp. 629--635, July, Providence, RI
Abstract
We present a new, data-driven approach to text planning, which can be used not only to map full knowledge pools into natural language texts, but also to generate texts that satisfy multiple, high-level communicative goals. The approach explains how global coherence can be achieved by exploiting the local coherence constraints of rhetorical relations. The local constraints were derived from a corpus analysis.

[Download ps] (bibtex)

A formal and computational characterization of pragmatic infelicities,
Daniel Marcu and Graeme Hirst,
1996
Proceedings of the Twelfth European Conference on Artificial Intelligence, pp. 587--591, August, Budapest, Hungary
An earlier version of this paper appeared as: Marcu, Daniel and Hirst, Graeme. ``Detecting pragmatic infelicities.'' Working notes, AAAI Symposium on Computational implicature: Computational approaches to interpreting and generating coversational implicature, Stanford University, March 1996 64--70.
Abstract
We study the logical properties that characterize pragmatic inferences and we show that classical understanding of notions such as entailment and defeasibiity is not enough if one wants to explain infelicities that occur when a pragmatic inference is cancelled. We show that infelicities can be detected if a special kind of inference is considered, namely infelicitously defeasible inference. We also show how one can use stratified logic, a linguistically motivated formalism that accommodates indefeasible, infelicitously defeasible, and felicitously defeasible inferences, to reason about pragmatic inferences and detect infelicities associated with utterances. The formalism yeilds an algorithm for detecting infelicities, which has been implemented in Lisp.

[Download pdf] (bibtex)

The conceptual and linguistic facets of persuasive arguments,
Daniel Marcu,
1996
Proceedings, Gaps and bridges: New directions in planning and natural language generation, pp. 43--46, August, Budapest, Hungary
(Workshop at the 1996 European Conference on Artificial Intelligence (ECAI 96)
Abstract
This paper provides a body of knowledge that characterizes persuasive arguments, which is thoroughly grounded in empirical data derived from communication studies, psychology, and social studies of persuasion. The paper also discusses the limitations of current theories of argumentation and systems in accommodating both the conceptual and linguistic facets of persuasive arguments.

[Download ps] (bibtex)

Building up rhetorical structure trees,
Daniel Marcu,
1996
Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), pp. 1069--1074, August, Portland, OR
Abstract
I use the distinction between the nuclei and the satellites that pertain to discourse relations to introduce a compositionality criterion for discourse trees. I provide a first-order formalization of rhetorical structure trees and, on its basis, I derive an algorithm that constructs all the valid rhetorical trees that can be associated with a given discourse.

[Download ps] (bibtex)

Distinguishing between coherent and incoherent texts,
Daniel Marcu,
1996
Proceedings, Student Conference on Computational Linguistics in Montreal, pp. 136--143, June, Montreal, QC
Abstract
In this paper, I show that current discourse theories are not able to explain why different orderings of the same textual segments exhibit different properties with respect to coherence. I then propose a criterion of coherence that exploits both the strong tendency of textual units that are associated with certain rhetorical relations to obey a canonical ordering and the inclination of semantically and rhetorically related information to cluster into larger textual spans. I formalize this criterion as a constraint satisfaction problem and I show how it can yeild a decision procedure that is capable of distinguishing between coherent and incoherent texts. The procedure has been implemented in Lisp.

[Download ps]