Comparative European legislative research in the age of large-scale computational text analysis: A review article
Published date | 01 January 2025 |
DOI | http://doi.org/10.1177/01925121231199904 |
Author | Miklós Sebők,Sven-Oliver Proksch,Christian Rauh,Péter Visnovitz,Gergő Balázs,Jan Schwalbach |
Date | 01 January 2025 |
https://doi.org/10.1177/01925121231199904
International Political Science Review
2025, Vol. 46(1) 18 –39
© The Author(s) 2023
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/01925121231199904
journals.sagepub.com/home/ips
Comparative European legislative
research in the age of large-scale
computational text analysis:
A review article
Miklós Sebők
Centre for Social Sciences, Hungary
Sven-Oliver Proksch
University of Cologne, Germany
Christian Rauh
WZB Berlin Social Science Center, Germany
Péter Visnovitz
Centre for Social Sciences, Hungary
Gergő Balázs
Centre for Social Sciences, Hungary
Jan Schwalbach
GESIS – Leibniz Institute for the Social Sciences, Germany
Abstract
Advances in data accessibility and analytical methods opened new frontiers for comparative studies of
European legislative activities. However, these advances still need to be fully harnessed by legislative scholars
for multiple reasons. We provide an overview of extant research agendas to identify these reasons and
explore the opportunities for tapping the potential of big data and quantitative text analysis. We present
significant data collection efforts, such as ParlSpeech, the Comparative Agendas Project and CLARIN, and
highlight their respective value for, primarily, large-N comparative research focusing on European Union
member states and the European Union itself. Our review highlights the most consequential gaps in the
literature and shortcomings of available data and analysis. These include the lack of extensive historical and
geographical coverage, missing harmonisation and cross-linking between separate efforts, no unified speech
and document (bill, law) databases, and the unavailability of good-quality full-text variables.
Keywords
Legislative studies, comparative politics, European politics, quantitative text analysis
Corresponding author:
Miklós Sebők, Centre for Social Sciences, 4 Tóth Kálmán utca, Budapest, 1097, Hungary.
Email: sebok.miklos@tk.hu
1199904IPS0010.1177/01925121231199904International Political Science ReviewSebők et al.
research-article2023
Original Research Article
Sebők et al. 19
In this article we provide an overview of extant research agendas and explore opportunities for
tapping the potential of big data and quantitative text analysis in comparative legislative research.
We present significant data collection efforts, such as ParlSpeech, the Comparative Agendas
Project (CAP) and the Common Language Resources and Technology Infrastructure (CLARIN),
and highlight their respective value for, primarily, large-N comparative research focusing on
European Union (EU) member states and the EU itself. Our review highlights the most consequen-
tial gaps in the literature and shortcomings of available data and analysis. These include the lack of
extensive historical and geographical coverage, missing harmonisation and cross-linking between
separate efforts, no unified speech and document (bill, law) databases, and the unavailability of
good-quality full-text variables.
Parliaments are vital venues for political representation and policymaking in democratic states.
In parliaments, elected representatives publicly communicate with each other and their voters
(Back et al., 2021; Proksch and Slapin, 2015). This allows them to position themselves towards
broader societal cleavage lines or specific initiatives (Borghetto and Chaqués-Bonafont, 2019),
with legislative debates constituting ‘the formal end-games of a long political process’ (Laver,
2021). Besides informing the broader political discourse by highlighting political issues and
stances in speeches, Members of Parliament (MPs) also propose or amend bills, thereby fixing
political preferences in binding laws and regulations. Thus, the content of parliamentary debate
and its legislative documents offer invaluable information on the political agendas and conflict
lines structuring collective decision-making in democracies.
Analysing and comparing the content of legislative speeches and documents have thus been a
long-standing focus of empirical political science. However, our ability to extract systematic infor-
mation from large legislative text corpora at scale has changed over recent decades by using algo-
rithmic approaches that treat (qualitative) text as (quantitative) data (Goplerud, 2021; Grimmer and
Stewart, 2013). This review article outlines the tremendous potential and practical limits of exploit-
ing these tools to enhance our understanding of the functioning of representative democracy. Our
assessment aims to review the extant literature and spot gaps to explore how the field could and
should proceed in leveraging available big data sources and newly developed innovative methods.
These gaps (and the respective takeaways) include a lack of historical and geographical coverage,
missing harmonisation and cross-linking between separate efforts and the unavailability of good-
quality full-text variables.
We proceed in two steps. First, we showcase successful applications of text-as-data methods to
critical questions of legislative politics. Scholars have effectively used text data to measure essen-
tial concepts such as issue attention, ideological positioning and polarisation, rhetorical strategies
or legislative influence. This demonstrates the promise that text-as-data methods hold for under-
standing the inner workings of democracies. However, the lack of readily available, machine-
readable text corpora of speeches, bills and laws across a broad set of countries presents a bottleneck
for innovative comparative research.
Thus, secondly, we report the findings from an encompassing stock-tacking exercise on the
availability of parliamentary text corpora across democracies in the EU and beyond. We pro-
vide readers with an aggregate overview and a searchable database. We also discuss a few of
the broadest data collection efforts thus far (ParlSpeech, the CAP and the CLARIN infrastruc-
ture). These projects highlight the fact that the availability of parliamentary text data has mas-
sively improved.
But we also show that geographical and temporal biases persist, that access to legislative docu-
ments is less developed when compared to speeches, that our ability to link meaningfully different
types of text produced in the legislative process is still suboptimal, and that datasets with
To continue reading
Request your trial