- LINEAR-TIME BOTTOM-UP TEXT-LEVEL RST-STYLE DISCOURSE PARSER 2.01 (Feng and Hirst, 2014)
A more efficient and accurate version of our RST-style discourse parser. The older version can be found here.
We have also worked hard to improve the robustness of our parser.
Unlike in v1.01, in this version, we implement our own EDU segmentation module. For details, see this arXiv paper.OUR PARSER IN USE
- Vanessa Wei Feng, Ziheng Lin, and Graeme Hirst, 2014. The Impact of Deep Hierarchical Discourse Structures in the Evaluation of Text Coherence. In Proceedings of the 25th International Conference on Computational Linguistics (COLING-2014), Dublin, Ireland.
- TEXT-LEVEL RST-STYLE DISCOURSE PARSER 1.01 (Feng and Hirst, 2012)
An RST-style discourse parser that produces discourse tree structure on full-text level.
A quick introduction of RST (Rhetoriacl Structure Theory) can be found here.
The program runs on Linux systems. It was originally coded in Python, and packed using PyInstaller.
The overall software work flow is based on the third-party HILDA discourse parser (duVerle and Prendinger, 2009). The EDU segmentation component in our software was kept unchanged from their discourse parser. The developers of HILDA discourse parser receive full credit for the overall work flow and the EDU segmentation component.
The major contribution of this software includes:
- Revising the tree-building component by incorporating rich-linguistic features in the binary classifier Struct and the multi-class classifier Label. For full information, please refer to our ACL 2012 paper.
- Optimize the workflow from HILDA's prototype, so that batch-mode parsing is made possible without the need to load all components each time.
- Provide an option to allow user specify EDU segmentation (e.g., gold-standard ones from the RST discourse treebank), so that experimental evaluation based on user-specified EDUs is possble.
- April 18, 2013: In this version, we have not implemented our contextual-features as described in our ACL 2012 paper. We plan to explore incorporating them in some post-processing procedures in future versions.
- April 20, 2013: It seems that newer version of NLTK has a different interface for its tree objects. Our implementation was based on NLTK 2.0b9. So if you're experiencing with errors reported such as "'function' object has no attribute 'parent'", please either replace your NLTK version with 2.0b9, or go through the source codes under src/ folder, adding brackets to all *.parent, *.left_sibling, *.right_sibling, *.root, *.treepos, *.treepositions. (e.g., *.parent becomes *.parent()).
- May 08, 2013: Fixed a minor bug regarding the use of "-E" option.
- Nov 11, 2013: Added traceback so now the detailed information of errors encountered in the parsing process can be seen.
Also, make sure that you download nltk's data (see NLTK's website for instructions) after installing NLTK on your system.
- Nov 15, 2013: Replaced the original classification model files with SVMperf and SVMmulticlass models, as used in our original experiments in Feng and Hirst (2012). As a result, the overall performance and parsing efficiency is increased.
- Nov 20, 2013: Modified the README file to reflect the change to the classifiers used in the software.
- Jan 13, 2014: Included a BSD license into the package.
OUR PARSER IN USE
- Jun-Ping Ng, Min-Yen Kan, Ziheng Lin, Wei Feng, Bin Chen, Jian Su and Chew Lim Tan, 2013. Exploiting Discourse Analysis for Article-Wide Temporal Classification. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), Seattle, USA.[pdf]
- Peter Jansen, Mihai Surdeanu, and Peter Clark, 2014. Discourse Complements Lexical Semantics for Non-factoid Answer Reranking. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), Baltimore, USA.[pdf]
If you want to be added to mailing list, please send request to email@example.com.