Discourse structure, rhetorical parsing, and text summarization
Daniel Marcu has developed a first-order formalization of the high-level, rhetorical structure of text. He proposed, analyzed theoretically, and compared empirically four algorithms for determining the valid text structures of a sequence of units among which some rhetorical relations hold. Two algorithms apply model-theoretic techniques; the other two apply proof-theoretic techniques. An exploratory corpus analysis of cue phrases then provided the means for applying the formalization to unrestricted natural language texts. A set of empirically motivated algorithms were designed in order to determine the elementary textual units of a text, to hypothesize rhetorical relations that hold among these units, and eventually, to derive the discourse structure of that text. The process that finds the discourse structure of unrestricted natural language texts is called rhetorical parsing.
Marcu has also explored two possible applications of the text theory that he proposes. The first is a discourse-based summarization system, which was shown to significantly outperform both a baseline algorithm and a commercial system. The second is a set of text planning algorithms that can be used by natural language generation systems in order to construct text plans in the cases in which the high-level communicative goal is to map an entire knowledge pool into text.
References:
- Marcu, Daniel. The Theory and Practice of Discourse Parsing and Summarization. The MIT Press, November 2000. ISBN 0-262-13372-5, $39.95. Order from publisher ($39.95 plus shipping). Order from Amazon ($35.00 plus shipping).
- Marcu, Daniel. The rhetorical parsing, summarization, and generation of natural language texts, PhD thesis, Department of Computer Science, University of Toronto, January 1998.
- Marcu, Daniel. ``The rhetorical parsing of natural language texts.'' Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, July 1997, 96--103.
- Marcu, Daniel. ``Building up rhetorical structure trees.'' Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), Portland, Oregon, August 1996, 1069--1074.
Also: Marcu 1998.
Return to Research by Graeme Hirst and students