Discourse structure, rhetorical parsing, and text summarization

Daniel Marcu developed a first-order formalization of the high-level, rhetorical structure of text. He proposed, analyzed theoretically, and compared empirically four algorithms for determining the valid text structures of a sequence of units among which some rhetorical relations hold. Two algorithms apply model-theoretic techniques; the other two apply proof-theoretic techniques. An exploratory corpus analysis of cue phrases then provided the means for applying the formalization to unrestricted natural language texts. A set of empirically motivated algorithms were designed in order to determine the elementary textual units of a text, to hypothesize rhetorical relations that hold among these units, and eventually, to derive the discourse structure of that text. The process that finds the discourse structure of unrestricted natural language texts is called rhetorical parsing.

Marcu also explored two possible applications of the text theory that he proposed. The first is a discourse-based summarization system, which was shown to significantly outperform both a baseline algorithm and a commercial system. The second is a set of text planning algorithms that can be used by natural language generation systems in order to construct text plans in the cases in which the high-level communicative goal is to map an entire knowledge pool into text.

This project is continued in our subsequent research on discourse coherence and rhetorical parsing.


Marcu, Daniel. The Theory and Practice of Discourse Parsing and Summarization. The MIT Press, November 2000. [Publisher's website]

Marcu, Daniel. The rhetorical parsing, summarization, and generation of natural language texts, PhD thesis, Department of Computer Science, University of Toronto, January 1998.  [PDF]

Marcu, Daniel. The rhetorical parsing of natural language texts.” Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, July 1997, 96–103.  [PDF]

Marcu, Daniel. “Building up rhetorical structure trees.” Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96), Portland, Oregon, August 1996, 1069–1074.  [PDF]

Graeme Hirst

Professor of Computational Linguistics

University of Toronto, Department of Computer Science