An Introduction to Multilevel Regression and Post-Stratification for Estimating Constituency Opinion

DOI10.1177/1478929919864773
Date01 November 2020
Published date01 November 2020
Subject MatterProfessional Section: Methods
https://doi.org/10.1177/1478929919864773
Political Studies Review
2020, Vol. 18(4) 630 –645
© The Author(s) 2019
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/1478929919864773
journals.sagepub.com/home/psrev
An Introduction to Multilevel
Regression and Post-
Stratification for Estimating
Constituency Opinion
Chris Hanretty
Abstract
This article provides an overview of multilevel regression and post-stratification. It reviews the
stages in estimating opinion for small areas, identifies circumstances in which multilevel regression
and post-stratification can go wrong, or go right, and provides a worked example for the UK using
publicly available data sources and a previously published post-stratification frame.
Keywords
small area estimation, multilevel regression and post-stratification, public opinion, Bayesian
methods, R
Accepted: 30 June 2019
Introduction
Multilevel regression and post-stratification (MRP) is a technique for estimating public
opinion in small areas using large national samples. ‘Small areas’ are usually anything
smaller than nations, and past work using MRP has produced estimates for areas as large
as US states (average population: 6.5 million) to areas as small as Westminster constitu-
encies (average population: 100,000) (Hanretty et al., 2018; Park et al., 2004). ‘Large
samples’ also vary in size, and depend on the context: some MRP work (typically work
producing estimates for a small number of small areas) has used national samples of
around 1500 (Leemann and Wasserfallen, 2017), while some very large work in election
forecasting has used samples of more than 80,000 (Lauderdale et al., submitted).
Researchers and practitioners use MRP because they are interested in subnational
opinion. Different people can be interested in subnational opinion for different reasons.
Political scientists tend to be interested in subnational opinion because they are interested
in whether subnational opinion is reflected in legislatures. Election forecasters are
Royal Holloway, University of London, Egham, UK
Corresponding author:
Chris Hanretty, Royal Holloway, University of London, Egham, Surrey TW20 0EX, UK.
Email: chris.hanretty@rhul.ac.uk
864773PSW0010.1177/1478929919864773Political Studies ReviewHanretty
research-article2019
Professional Section: Methods
Hanretty 631
interested in subnational opinion because in many electoral systems national vote shares
are a poor good to relevant electoral outcomes. Others still may be interested in subna-
tional opinion for commercial reasons.
MRP is used because the alternatives are either very poor or very expensive. A poor
alternative is simply splitting a large sample into (much) smaller geographic subsamples.
This is a poor alternative because there is no guarantee that a sample which is representa-
tive at the national level will be representative when it is broken down into smaller groups.
This approach is also only possible when the number of respondents per small area is
relatively large. Lax and Phillips (2009a) combine four different national surveys on
same sex marriage into a ‘mega-poll’ of 6458. The expected number of respondents per
state is therefore around 130. Splitting up a large sample is a plausible strategy with this
many respondents per state. It would not be plausible for estimating opinion in the 435
congressional districts (expected respondents per area: 15).
An expensive alternative is conducting polls in each small area for which we want to
estimate public opinion. This strategy is possible where the number of small areas is rela-
tively small. Most of the 50 US states are large enough to support state polling companies.
It would be possible, though expensive, to conduct surveys in each of these states. It
would not, however, be feasible to conduct a survey in each of the 650 Westminster con-
stituencies: any research design which involves surveying more than half a million people
is probably beyond the reach of any private concern.
Because these alternatives are often not possible or not desirable, and because MRP is
now an established technique, many researchers are curious about the analyses MRP
makes possible. There is therefore a need for a guide which sets out, in practical terms,
the issues involved in producing MRP estimates of opinion for small areas, and which
provides a template for such analyses.
This document tries to provide such a template. I begin by describing the history of
MRP, before describing the principal stages in any MRP analysis. I then set out the scope
of MRP, the design considerations, and other practical issues relating to implementation.
I finally provide code and a worked example for researchers working in the UK.
The History of MRP
The basic idea behind MRP is that it is possible to group people into different types based
on their sociodemographic characteristics, and to make predictions for each of those types
on the basis of an appropriate statistical model. These predictions can then be used to
make estimates for small areas if we can use or generate information on how many voters
of each type are present in each area.
This basic idea emerged in the 1960s (see the discussion in Park et al. (2004)), but it was
not until the early 2000s when statistical modelling techniques were sufficiently well-
developed to make MRP feasible for (advanced) applied researchers. A methodological
paper by Park et al., 2004 was quickly followed by substantive applications in the United
States (Lax and Phillips, 2009b; Warshaw and Rodden, 2012). MRP methods were subse-
quently applied in the UK (Hanretty et al., 2018), Switzerland (Leemann and Wasserfallen,
2017) and (in a slightly different form) Germany (Selb and Munzert, 2011).
Since that time, development of MRP as a method has focusing on how to use the richest
possible post-stratification frames (Lauderdale et al., submitted; Leemann and Wasserfallen,
2017) and how to ensure that the multilevel regression models employ as rich and as extensive
a range of predictor variables as possible (Ghitza and Gelman, 2013; Goplerud et al., 2018).

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT