Professor of Computational Linguistics

University of Toronto, Department of Computer Science


Digitizing and analyzing Parliamentary proceedings

The Digging into Linked Parliamentary Data project (“Dilipad”) — an interdisciplinary and international collaboration between researchers at the University of Toronto, the University of Amsterdam, and the Institute of Historical Research (University of London) — set out to standardize, enrich and distribute the parliamentary proceedings of the Netherlands (1815–present), the United Kingdom (1803–present) and Canada (1901–present). The project was funded by the Digging into Data Challenge (DiD), an international consortium of granting agencies seeking to promote the dissemination of data in the humanities and social sciences. The goals of the Dilipad project were threefold: the first was to create a uniform, extensible format for the digitized records of parliamentary proceedings in Canada, the UK, and the Netherlands; the second was to facilitate analyses of these proceedings by researchers as well as by non-academic stakeholders such as activists, journalists and enthusiasts; the final goal was to leverage these new data to address substantive research questions about such topics as gender, ideology, immigration and the detection of emotion.

Beelen et al (2017) digitized and annotated the Canadian House of Commons proceedings in English, dating back to 1901. (For a number of reasons, including difficulties associated with the optical recognition of Roman characters with accentuation, they were unable to reliably process the accompanying French translation of the debates, as well as any of the debates from the nineteenth century). Beelen et al introduce Lipad, an online platform designed as a hub for archiving Canadian political data, with the parliamentary proceedings at the centre of its architecture.Their paper describes the structure of the database and provides guidelines to prospective users.

Rheault et al (2016) use methods of natural language processing and a digitized corpus of text data spanning a century of parliamentary debates in the United Kingdom to analyze the emotional states politicians in parliament.They use this approach to examine changes in aggregate levels of emotional polarity in the British parliament, and to test a hypothesis about the emotional response of politicians to economic recessions. Their findings suggest that, contrary to popular belief, the mood of politicians has become more positive during the past decades, and that variations in emotional polarity can be predicted by the state of the national economy.

Naderi and Hirst (2018) present an automated approach to distinguishing true, false, stretch, and dodge statements in questions and answers in the Canadian Parliament. They leverage the truthfulness annotations of a U.S. fact-checking corpus by training a neural net model and incorporating the prediction probabilities into our models. They find that in concert with other linguistic features, these probabilities can improve the multi-class classification results. They further show that dodge statements can be detected with an F1 measure as high as 82.57% in binary classification settings.


Beelen, Kaspar; Alberdingk Thijm, Timothy; Cochrane, Christopher; Halvemaan, Kees; Hirst, Graeme; Kimmins, Michael; Lijbrink, Sander; Marx, Maarten; Naderi, Nona; Rheault, Ludovic; Polyanovsky, Roman; Whyte, Tanya. “Digitization of the Canadian parliamentary debates”. Canadian Journal of Political Science , 50(3), September 2017, 849–864. [PDF]

Naderi, Nona and Hirst, Graeme. “Automated fact-checking of claims in argumentative parliamentary debates.” Proceedings of the First Workshop on Fact Extraction and Verification (FEVER), Brussels, November 2018, 60–65. [PDF]

Rheault, Ludovic; Beelen, Kaspar; Cochrane, Christopher; and Hirst, Graeme. “Measuring emotion in parliamentary debates with automated textual analysis.” PLoS ONE, 2016, 11(12): e0168843. [PDF]