Theoretical issues in CL

Professor of Computational Linguistics

University of Toronto, Department of Computer Science

Research

Theoretical issues of representation and meaning in computational linguistics

Presuppositions of existence in logical formalisms (Hirst 1991): A problem that arose in some of our early research on intelligent retrieval of legal texts is the representation of existence in logical formalisms. Many of the texts talked about whether or not something exists, usually an abstract entity like liability. It has been generally accepted in philosophy since Kant that existence is not a simple predicate — that is, one cannot say things like exists(liability) as one would say red(ball). Taking existence as a predicatable property leads to logical fallacies. The problem in AI, then, is how existence can be represented without the danger of obtaining fallacious inferences. Unfortunately, the now-conventional Russellian approach of making existence a quantifier also causes many problems in representing natural language. But we have identified one philosophical theory (by Terence Parsons) that may be adaptable. This approach uses two different kinds of predication, one of which has special behaviour and may be used for existence and certain other properties that are not normally well behaved. Graeme Hirst has made this approach meet up with conventional AI approaches by refining the notion of existence into about eight different types, each a separate predicate, and dividing the universe into kosher and tref parts. Quantifiers may scope only over the kosher area, and entities in the tref area may be mentioned but not used.

Context in language is not the same as context in knowledge representation (Hirst 2000): AI formalizations of context, particularly the formalization by McCarthy and Buvac, regard context as an undefined primitive whose formalization can be the same in many different kinds of AI tasks. This is not appropriate. Any theory of context in natural language must take the special nature of natural language into account and cannot regard context simply as an undefined primitive. Graeme Hirst has shown that there is no such thing as a coherent theory of context simpliciter — context pure and simple — and that context in natural language is not the same kind of thing as context in KR. In natural language, context is constructed by the speaker and the interpreter, and both have considerable discretion in so doing. Therefore, a formalization based on pre-defined contexts and pre-defined 'lifting axioms' cannot account for how context is used in real-world language.

Text-meaning in computational linguistics (Hirst 2007, 2008, 2009a): There is an unseen interaction between, on one hand, the methodologies of CL and NLP and, on the other hand, the way in which we implicitly view the roles of the writer and the reader in determining the meaning of text, and this has consequences for our research agenda.With a review of the history of CL, I show that as statistical and machine-learning-based methods came to dominate the field, its goals became less user-centred as there was no longer a role for the user in these approaches. But, as I explain, this is not adequate for the continuing development of sophisticated applications such as intelligence gathering and question answering, which require user-centred approaches to be fully effective. These papers will help new researchers understand the context in which they work: (a) CL has a history 50 years long, and its recent successes have come at the price of scaling back the goals of the field in ways that we don't really want. (b) Our research agenda is shaped in ways that we aren't even consciously aware of when, developing a new hammer, we go out looking for nails and forget that our original goal was to fasten some screws; and we need to be sensitive to this effect.

The relationship between ontologies and lexicons (Hirst 2009b): A lexicon is a linguistic object and hence is not the same thing as an ontology, which is non-linguistic. Nonetheless, word senses are in many ways similar to ontological concepts and the relationships found between word senses resemble the relationships found between concepts. I explain these ideas, with examples, and show that although the arbitrary and semi-arbitrary distinctions made by natural languages limit the degree to which these similarities can be exploited, a lexicon can nonetheless serve in the development of an ontology, especially in a technical domain.

Overcoming linguistic barriers to the Multilingual Semantic Web (Hirst 2015): I analyze Berners-Lee, Hendler, and Lassila’s conception of the Semantic Web, discussing what it implies for a Multilingual Semantic Web and the barriers that the nature of language itself puts in the way of that vision. Issues raised include the mismatch between natural language lexicons and hierarchical ontologies, the limitations of a purely writer-centered view of meaning, and the benefits of a reader-centered view. I then discuss how we can start to overcome these barriers by taking a different view of the problem and considering distributional models of semantics in place of purely symbolic models.

References

Hirst, Graeme. “Existence assumptions in knowledge representation.” Artificial Intelligence, 49, May 1991, 199–242. (This issue of the journal was reprinted as: Brachman, Ronald J.; Levesque, Hector J.; and Reiter, Raymond (editors). Knowledge Representation. Cambridge, MA: The MIT Press, 1992.) [PDF]

Hirst, Graeme. “Context as a spurious concept.” Proceedings, Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, February 2000, 273–287. [PDF]

Hirst, Graeme. “Views of text-meaning in computational linguistics: Past, present, and future.” In: Computation, Information, Cognition — The Nexus and the Liminal, Dodig Crnkovic, Gordana and Stuart, Susan (editors), Cambridge Scholars Publishing, Newcastle-upon-Tyne, 2007, 270–279. [PDF]

Hirst, Graeme. “The future of text-meaning in computational linguistics.” In: Sojka, Petr; Horák, Aleš; Kopeček, Ivan; and Pala, Karel (editors), Proceedings, 11th International Conference on Text, Speech and Dialogue (TSD 2008) [Brno, Czech Republic, September 2008], (Lecture Notes in Artificial Intelligence 5246), Berlin: Springer-Verlag, 2008, 1-9. [PDF]

Hirst, Graeme. “Limitations of the philosophy of language understanding implicit in computational linguistics.” Proceedings, 7th European Conference on Computing and Philosophy, Barcelona, July 2009a, 108–109. [PDF]

Hirst, Graeme. “Ontology and the lexicon.” In: Staab, Steffen and Studer, Rudi (editors), Handbook on Ontologies (second edition), Berlin: Springer Verlag (International Handbooks on Information Systems), 2009b, 269–292. (Revision of the 2004 version from the first edition of the book.) [PDF]

Hirst, Graeme. “Overcoming linguistic barriers to the Multilingual Semantic Web.” In: Paul Buitelaar and Philipp Cimiano (editors), Towards the Multilingual Semantic Web, Springer, 2015, 1–14. [PDF]