Assessing Sampling Error in Pseudo‐Panel Models

Published date01 June 2021
AuthorRumman Khan
Date01 June 2021
DOIhttp://doi.org/10.1111/obes.12416
Assessing Sampling Error in Pseudo-Panel Models
RUMMAN KHAN
School of Economics, University of Bristol, Bristol, UK (e-mail: rumman.khan.econ@gmail.com)
Abstract
While pseudo-panels are useful when only repeated cross-section data are available,
estimates are likely to be attenuated and suffer from sampling error if cell sizes
(number of individuals grouped together in a cohort) are too few. However, there is no
consensus on how large cell size needs to be, with recommendations ranging from 100
to several thousands. This is due to sampling error being affected by both cell size and
three important types of variation in the cohort data (across and within cohorts and
over time). We combine these into a single metric, called CAWAR, and demonstrate
its relationship to sampling error using Monte Carlo simulations and an empirical
application. We produce recommended values for CAWAR beyond which sampling
error bias is minimal and from these one can easily calculate the required cell size.
I. Introduction
The advantages of using panel data, which include both time-series and cross-section
dimensions, for empirical analysis are well known. However, in many settings, such
data may not be available due to the cost and diff‌iculty of following the same set of
individual over time. Instead, what often is available are repeated cross-sections
(henceforth RCS) where a different set of individuals are observed in each wave.
Pseudo-panels allow panel-type estimation with RCS data by grouping individuals into
cohorts based on common characteristics that are f‌ixed over time, a popular example
being the birth year of individuals, and treating the cohort means as if they are
observations in an actual panel.
Pseudo-panels offer several advantages. Firstly, they are likely to be more
representative of the underlying population as available panels tend to have smaller
samples than nationally representative datasets (that are often RCS), such as household
surveys or demographic and health surveys, or be for a specif‌ic subset of the
population. Secondly, panels tend to cover shorter time periods making them
unsuitable for long-run analysis; even when panel data are available over a long span
of time, non-random attrition and non-response can cause selection bias. In contrast,
RCS data suffers much less from such concerns as each wave surveys a different set of
JEL Classif‌ication numbers:C18, C21, C23, C81.
742
©2021 The Department of Economics, University of Oxford and John Wiley & Sons Ltd.
OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 83, 3 (2021) 0305-9049
doi: 10.1111/obes.12416
individuals. Hence, many microeconomic datasets available across a number of
decades are RCS. Well-known examples include the UKs Family Expenditure Survey
and the USAs Current Population Survey. Finally, the averaging process involved in
constructing pseudo-panels eliminates individual-level measurement error that is
prevalent in survey data; measurement error is also less systematic given each
individual is observed only once in the sample.
Application of pseudo-panels began with models of consumption and labour supply
where only RCS data were available but cross-sectional estimates suffered from
omitted variable bias due to the presence of f‌ixed effects at the household or individual
level. With panel data, this is easily resolved by, for example, using a within
transformation. Deaton (1985) demonstrated how such models could also be
consistently estimated in a similar fashion using pseudo-panels with cohort f‌ixed
effects. Browning, Deaton and Irish (1985) were the f‌irst to estimate such models
empirically and applications have since been used in a wide range of empirical
settings, some recent examples include Fulford (2014), Meng et al. (2014) and
Arestoff and Djemai (2016).
The pseudo-panel framework has since been expanded to enable estimation of other
panel models with RCS data. Of these, the most developed has been dynamic models,
with theoretical contributions from Moff‌itt (1993), Collado (1997), McKenzie (2004)
and Verbeek and Vella (2005) as well as a number of empirical applications, including
Propper, Rees and Green (2001), Antman and McKenzie (2007) and Cuesta, Nopo and
Pizzolitto (2011). Other extensions of the pseudo-panel framework include allowing
the f‌ixed effects to be multiplicative rather than additive as Deaton assumed (Juodis,
2018), inclusion of spatial effects (Baltagi, Bresson and Etienne, 2015), and quantile
regressions (Imai et al., 2014). Thus pseudo-panels continue to be a fecund area for
both theoretical and applied research.
The main drawback of pseudo-panel models can be expressed as a sampling error
issue, which arises when the cohort sample means are not representative of the
underlying cohort population means, for which the literature has been unable to f‌ind an
adequate solution. As cell size (the number of individuals within a cohort) tends to
inf‌inity, cohort samples become more representative of the population and sampling
error is reduced. However, how large cell sizes need to be in practice is more diff‌icult
to ascertain as sampling error also depends on the level of variation created in the
cohort data. The latter has a large effect on the cell size required to minimize sampling
error bias; in some cases cell size of 100 may be suff‌icient (Verbeek and Nijman,
1992), while in others, substantial bias can persist even with cell sizes in the thousands
(Devereux, 2007a). As the majority of applied pseudo-panel studies have cell sizes
between 100 and 500, one cannot judge whether sampling error has been adequately
addressed without explicit reference to the variation created in the cohort data.
Therefore, as no practical measures for assessing the cohort level variation exist, the
consistency of pseudo-panel models will always be in question unless cell sizes are in
the thousands. However, the latter is too restrictive a requirement to be relevant in
practice and negates much of the data advantages of using RCS data.
To get more practical and precise values for the required cell sizes, we identify
three sources of variation at the cohort level that the literature has shown to be
©2021 The Department of Economics, University of Oxford and John Wiley & Sons Ltd
Assessing sampling error in pseudo-panel models743

Get this document and AI-powered insights with a free trial of vLex and Vincent AI

Get Started for Free

Start Your 3-day Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your 3-day Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your 3-day Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your 3-day Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your 3-day Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

Start Your 3-day Free Trial of vLex and Vincent AI, Your Precision-Engineered Legal Assistant

  • Access comprehensive legal content with no limitations across vLex's unparalleled global legal database

  • Build stronger arguments with verified citations and CERT citator that tracks case history and precedential strength

  • Transform your legal research from hours to minutes with Vincent AI's intelligent search and analysis capabilities

  • Elevate your practice by focusing your expertise where it matters most while Vincent handles the heavy lifting

vLex

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT