Discourse coherence and rhetorical parsing
Vanessa Feng developed an RST-style text-level discourse parser, based on the HILDA discourse parser of Hernault et al., that significantly improves its tree-building step by incorporating rich linguistic features. She also analyzed the difficulty of extending traditional sentence-level discourse parsing to text-level parsing by comparing discourse-parsing performance under different discourse conditions.
In her earlier work on structure and coherence in discourse, Feng extended the original entity-based coherence model of Barzilay and Lapata by learning from more fine-grained coherence preferences in training data. Multiple ranks were associated with the set of permutations originating from the same source document, as opposed to the original pairwise rankings. She also studied the effect of the permutations used in training, and the effect of the coreference component used in entity extraction. With no additional manual annotations required, the extended model is able to outperform the original model on two tasks: sentence ordering and summary coherence rating.
Erick Galani Maziero, a guest student from the University of São Paulo, São Carlos, and Muyu Zhang, a guest student from the Harbin Institute of Technology, adapted this model to Portuguese and Chinese, respectively.
This project is a continuation of our earlier work on
Feng, Vanessa Wei and Hirst, Graeme. “Text-level discourse parsing with rich linguistic features.” Proceedings, 50th Annual Meeting of the Association for Computational Linguistics, Jeju, Korea, July 2012, 60–68. 
Feng, Vanessa Wei and Hirst, Graeme. “Extending the entity-based coherence model with multiple ranks.” Proceedings, 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, April 2012, 315–324. 
Maziero, Erick Galani; Hirst, Graeme; and Pardo, Thiago Alexandre Salgueiro. “Adaptation of discourse parsing models for Portuguese language.” Proceedings, 4th Brazilian Conference on Intelligent Systems (BRACIS), Natal, Brazil, November 2015, 140–145. 
Maziero, Erick Galani; Hirst, Graeme; and Pardo, Thiago Alexandre Salgueiro. “Semi-supervised never-ending learning in discourse parsing.” Proceedings, Recent Advances in Natural Language Processing (RANLP), Hissar, Bulgaria, September 2015, 436–442. 
Zhang, Muyu; Feng, Vanessa Wei; Qin, Bing; Hirst, Graeme; Liu, Ting; Huang, Jingwen. “Encoding world knowledge in the evaluation of local coherence.” Proceedings, 2015 Conference of the North American Chapter of the Association for Computational Linguistics -- Human Language Technologies, Denver, June 2015, 1087–1096. 
Professor of Computational Linguistics
University of Toronto, Department of Computer Science