Measuring the landscape of civil war

AuthorKristen A Harkness,Rex W Douglass
Date01 March 2018
DOI10.1177/0022343318754959
Published date01 March 2018
Subject MatterRegular Articles
Measuring the landscape of civil war:
Evaluating geographic coding decisions
with historic data from the Mau Mau rebellion
Rex W Douglass
Department of Political Science, University of California, San Diego
Kristen A Harkness
School of International Relations, University of St Andrews
Abstract
Subnational conflict research increasingly utilizes georeferenced event datasets to understand contentious politics and
violence. Yet, how exactly locations are mapped to particular geographies, especially from unstructured text sources
such as newspaper reports and archival records, remains opaque and few best practices exist for guiding researchers
through the subtle but consequential decisions made during geolocation. We begin to address this gap by developing
a systematic approach to georeferencing that articulates the strategies available, empirically diagnoses problems of bias
created by both the data generating process and researcher-controlled tasks, and provides new generalizable tools for
simultaneously optimizing both the recovery and accuracy of coordinates. We then empirically evaluate our process
and tools against new micro-level data on the Mau Mau rebellion (colonial Kenya 1952–60), drawn from 20,000
pages of recently declassified British military intelligence reports. By leveraging a subset of these data that includes
map codes alongside natural language location descriptions, we demonstrate how inappropriately georeferencing data
can have important downstream consequences in terms of systematically biasing coefficients or altering statistical
significance and how our tools can help alleviate these problems.
Keywords
archival data, armed conflict, event data, georeferencing, Kenya, Mau Mau
How do we determine where historical acts of violence
took place? At the heart of many questions in conflict
studies is the where of violence: where were civilians or
communities targeted by conflict actors? Where did
counterinsurgency tactics succeed or fail? Underlying
these analyses are decisions and assumptions about how
to map events to the geography that they occupied.
Determining whether an event took place at a specific
latitude and longitude, or in a village or particular
administrative unit, is a matter of weighing evidence,
comparing sources, and making judgment calls between
different possible georeferencing strategies. These impor-
tant decisions, however, are usually made in an ad hoc
manner, on the basis of opaque assumptions, and with-
out the benefit of empirically tested guidance.
We begin to address this gap by developing a systema-
tic approach to georeferencing that articulates the strate-
gies available, empirically diagnoses problems of bias
created by both the data generating process and
researcher-controlled tasks, and provides new generaliz-
able tools for simultaneously optimizing both the recov-
ery and accuracy of coordinates. We are able to do this by
exploiting a new source of conflict data from the Mau
Mau uprising in late colonial Kenya. Drawn from over
20,000 pages of archival records, comprising raw intel-
ligence reports written by British security personnel, our
Corresponding author:
kh81@st-andrews.ac.uk
Journal of Peace Research
2018, Vol. 55(2) 190–205
ªThe Author(s) 2018
Reprints and permission:
sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/0022343318754959
journals.sagepub.com/home/jpr
events contain either a natural language location descrip-
tion, or a precise military map code, or both. This allows
us to explicitly test differences between georeferencing
strategies and their relative performanc e by leveraging
comparisons between imputed and military coordinates.
The article proceeds as follows: the next two sections
discuss existing debates over error and bias in spatial data
and provide a brief survey of how georeferencing deci-
sions are made across contemporary conflict research.
We next introduce a new dataset of conflict events dur-
ing the Mau Mau rebellion, describing the archival
records and how we sampled and coded them. We then
discuss the tasks, decisions, and problems of georeferen-
cing event data, including the trade-offs that various
strategies entail for recovery versus accuracy. We also
develop two new ensemble methods of georeferencing
that leverage empirical diagnostic information to over-
come these trade-offs, flexibly combining strategies and
data sources to compensate for the weaknesses of any
one. Finally, we empirically demonstrate how inappro-
priately georeferenced data can have important down-
stream consequences in terms of systematically biasing
coefficients and altering statistical significance.
Error and bias in spatial conflict event data
Scholars have turned increa singly to analyzing subna-
tional variation in protests, rebel violence, and repres-
sion. These works have transformed our understanding
of contentious politics, allowing us to evaluate theories
on the local dynamics of violence, the influence of terrain
and infrastructure on war, and the myriad strategic inter-
actions between states, their challengers, and civilians.
This rigorous micro-level work depends on georeferenced
conflict event datasets, which have proliferated in recent
years. Regional or global event datasets include the Armed
Conflict Location and Event Dataset (ACLED) (Raleigh
et al., 2010), the UCDP Georeferenced Event Dataset
(UCDP GED) (Sundberg & Melander, 2013), and the
Social Conflict in Africa Database (SCAD) (Salehyan
et al., 2012), among others.
1
Intensive endeavors to com-
pile georeferenced event data on particular conflicts have
complemented such cross-national efforts (see Table I).
This increasing reliance on event data raises important
questions over the data generating process and how
systematic sources of bias could undermine valid causal
inferences. Conflict event datasets never represent full
‘ground truth’. Rather, social actors with their own per-
spectives and agendas select events for observation,
recording, and archiving (Woolley, 2000: 157). Journal-
istic sources, from which many event data have been
coded, have come under heavy criticism for the small
fraction of total events they capture, for coverage fatigue,
and for bias towards large-scale, violent, and urban
events (Baum & Zhukov, 2015; Davenport, 2010: 7;
Davenport & Ball, 2002; Earl et al., 2004; Eck, 2012;
O’Loughlin et al., 2010; Weidmann, 2016; Woolley,
2000). Archival records, on the other hand, are usually
generated by government actors who have their own
motives for collecting information during conflict.
Indeed, they often selectively destroy or censor records
and may systematically undercount civilian deaths result-
ing from their own operations (Balcells & Sullivan,
2018; Bennett, 2013: 3; Byman, 2013: 36).
Error and bias induced by the researcher-controlled
processes of data recovery, extraction, and coding have
received far less attention. Machine coding has been cri-
tiqued as a method of event compilation due to its appar-
ently amplified urban bias and typical reliance on English
language news sources (Hammond & Weidmann, 2014;
Table I. Sample of single country georeferenced event datasets
Contemporary
Afghanistan 2004–11 Berman et al. (2011); O’Loughlin
et al. (2010)
Chechnya 2000–05 Lyall (2009)
Colombia 1988–2000 Albertus & Kaplan (2012)
India 1984–96 Hoelscher, Miklian &
Vadlamannati (2012)
Iraq 2004–10 Berman, Shapiro & Felter (2011);
Condra & Shapiro (2012)
Israel/Palestine 2000–05 Benmelech, Berrebi & Klor
(2015)
North Caucasus 2000–08 Toft & Zhukov (2012); Zhukov
(2012)
Northern Ireland
1968–98
Loyle, Sullivan & Davenport
(2014)
Pakistan 1988–2011 Bueno de Mesquita et al. (2015)
Philippines 1997–2010 Berman et al. (2011); Crost, Felter
& Johnston (2014)
Historic
Greece 1943–44 Kalyvas (2006); Kalyvas & Kocher
(2007)
Guatemala 1975–85 Sullivan (2016)
Spain 1936–39 Balcells (2010)
Ukraine 1943–55 Zhukov (2015)
Vietnam 1965–75 Douglass (2016); Kocher,
Pepinsky & Kalyvas (2011)
1
See also the University of Maryland’s Study of Terrorism and
Responses to Terrorism (START), the RAND Database of
Worldwide Terrorism Incidents (RDWTI), the International Crisis
Early Warning System (ICEWS), and the Global Data on Events
Language and Tone (GDELT).
Douglass & Harkness 191

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT