To Pool or Not to Pool: Revisited

Published date01 April 2018
Date01 April 2018
DOIhttp://doi.org/10.1111/obes.12220
185
©2017 The Department of Economics, University of Oxford and JohnWiley & Sons Ltd.
OXFORD BULLETIN OF ECONOMICSAND STATISTICS, 80, 2 (2018) 0305–9049
doi: 10.1111/obes.12220
To Pool or Not to Pool: Revisited*
M. Hashem Pesaran† and Qiankun Zhou
Department of Economics, USC Dornsife INET, University of Southern California, and
Trinity College, Cambridge, UK (e-mail: pesaran@usc.edu)
Department of Economics, Lousiana State University, Baton Rouge, LA, 70803,
USA (e-mail: qzhou@lsu.edu)
Abstract
This paper provides a new comparative analysis of pooled least squares and fixed effects
(FE) estimators of the slope coefficients in the case of panel data models when the time
dimension (T) is fixed while the cross section dimension (N) is allowed to increase without
bounds. The individual effects are allowed to be correlated with the regressors, and the
comparison is carried out in terms of an exponent coefficient, , which measures the degree
of pervasiveness of the FE in the panel. The use of allows us to distinguish between
poolability of small Ndimensional panels with large Tfrom large Ndimensional panels
with small T. It is shown that the pooled estimator remains consistent so long as <1, and
is asymptotically normally distributed if <1/2, for a fixed Tand as N→∞. It is further
shown that when <1/2, the pooled estimator is more efficient than the FE estimator. We
also propose a Hausman type diagnostic test of <1/2 as a simple test of poolability, and
propose a pretest estimator that could be used in practice. Monte Carlo evidence supports
the main theoretical findings and gives some indications of gains to be made from pooling
when <1/2.
I. Introduction
This paper re-examines the issue of pooling in standard panel data models with exogenous
regressors in terms of an exponent coefficient, 0 1, which measures the degree of
pervasiveness of correlated individual effects, defined by
N
i=1
E|i|=ON,
where Nis the cross-section dimension of the panel, and iis the mean zero random part of
the individual effects. The use of exponent allows us to distinguish between poolability
of small Ndimensional panels with large Tfrom large Ndimensional panels with small
T. A set of coefficients could be heterogeneous for a finite N, nevertheless can be deemed
JEL Classification numbers: C01, C23, C33.
*We thank the editor, twoanonymous referees, Ron Smith and Carlos Lamarche for helpful comments.
186 Bulletin
as asymptotically homogeneous if their dispersion tends to zero as N→∞. We use this
idea to motivate conditions under which pooling is valid in large Ndimensional panels,
both when Tis fixed and when it rises with N.
Throughout we allowfor non-zero correlations between the individual effects and the re-
gressors, and as a result the pooled estimators will be biased in the standard case where =1.
We show that the choice between the pooled least squares (PLS) estimator and the fixed
effects (FE) estimator depends on the value of , with the PLS estimator being consistent
for all values of except when =1. For inference, the validity of the PLS estimator
requires <1/2. Both of these conditions are significantly weaker than the homogeneity
assumption made in the literature requiring that E|i|=0 for all i. For example, when=0
we could have a finite number of non-zero E|i|, or more generallywhen E|i|=Ki, for a
fixed positive constant K, and 0<<1. This corresponds to the sparsity assumption often
made in the context of penalized regressions. But our analysis coversnon-sparse str uctures
by allowing the number of non-zero E|i|s to rise with Nbut not proportionately. The de-
gree to which the number of units with non-zero E|i|is allowed to rise with Nis governed
by . For example, when =1/2 the number of cross-section units with non-zero random
effects (RE) could rise with N, with the proportion of such units in total declining to
zero at the rate of N1/2.
The exponent of pervasiveness of individual effects is also closely related to the ex-
ponent of cross-sectional dependence, , recently introduced in Bailey, Kapetanios and
Pesaran (2016) to measure the degree of cross-sectional dependence in panels. Both
exponents measure the degree of pervasiveness of heterogeneity, relates to the hetero-
geneity of the individual effects, and the heterogeneity of factor loadings in a panel
data model with a factor error structure. In a broad sense, can also be viewed as an
exponent of cross-sectional dependence applied to the intercepts viewed as a common
factor.
Our analysis complements and provides further insights on the discussion of ‘pool or
not to pool’in the panel literature.1See for example, Baltagi, Griffin and Xiong (2000), and
Baltagi, Bresson and Pirotte (2008). More specifically, we derivethe asymptotic properties
of the PLS estimator when Nis large and Tis fixed for different values of , and derive
the bias of PLS when =1, and show that the pooled estimator is more efficient than
the FE estimator if <1/2. We also establish the asymptotic equivalence of RE and PLS
estimators when <1. In the case where Nand T→∞, such that T=O(Nd), for some
d>0, the condition for poolability generalizes to (1 d)/2.
The analysis of this paper also showsthe impor tance of knowingin the choice between
PLS (or RE) and FE estimators. In the case of large Nand Tpanels estimation of can
be carried out using the approach of Bailey et al. (2016). But for short Tpanels, which
is of concern in this paper, such an approach will not be applicable and other suitable
techniques will be required. Accordingly, we propose a Hausman type diagnostic test of
<(1 d)/2 which could be used in practice as a simple test of poolability of panel data
1There is also a related literature that considers the problem of pooling more generally and discusses the issue
of pooling in the case of panel data models with heterogenous slopes. As a recent example, see Paap, Wang and
Zhang (2015) and references cited therein. In this paper we focus on the issue of pooling in the context of standard
panel data models with homogeneous slopes. But our approach and generalization of the concept of cross-sectional
heterogeneity can also be applied to panel data models with heterogeneous slopes.
©2017 The Department of Economics, University of Oxford and JohnWiley & Sons Ltd
To pool or not to pool: revisited 187
models. Finally, as an alternative strategy, we also propose a pretest estimator using a
Hausman type diagnostic test and derive its asymptotic properties.
Monte Carlo simulations are conducted to compare the finite sample properties of PLS,
FE and the pretest estimators. The results confirm our main theoretical findings and give
some indication of the magnitudes of the gains involved from pooling when <(1d)/2.
The Monte Carlo results also place the small sample performance of the pretest estimator
somewhere between those of PLS and FE estimators and is to be recommended in practice
where it is not known whether <(1d)/2.
The rest of the paper is organized as follows.Section II sets out the model and its assump-
tions. Section III presents the main theoretical results on the consistency and asymptotic
normality of PLS and FE estimators in terms of different values of . The diagnostic test
of poolability is presented in Section IV. The pretest estimator is discussed in Section V.
Monte Carlo simulations are provided in Section VI, with some concluding remarks in
Section VII. All mathematical derivations are provided in the Appendix.
II. Panel data model
Consider the standard panel data model
yit =i+xit +uit, for i=1, 2,,N;t=1, 2,,T(1)
i=+ifor i=1, 2,,N,(2)
where iare the individual effects, xit isak×1 vector of regressors which we decompose
as
xit =igt+wit, for i=1, 2,,N;t=1, 2,,T,(3)
where igtrepresents the part of xit which is correlated with the individual effects, i,
with gtbeing a k×1 vector of time effects, and wit is the part of xit which is distributed
independently of the individual effects. This is a fairly general specification which al-
lows for non-zero, possibly time-varying, correlations between xit and i, and allows the
regressors to have individual-specific effects and be cross-sectionally correlated. Addi-
tional individual-specific effects can be included in xit through wit. For example, using
equation (3), and assuming that ¯
g=T1T
t=1gt=0, then
i=¯
xi+vi,(4)
where
=¯
g¯
g1¯
g,vi=−¯
g¯
g1¯
g¯
wi,
¯
xi=T1
T
t=1
xit, and ¯
wi=T1
T
t=1
wit,
which is the same as Mundlak (1978) formulation of the individual effects in standard
panel data models.
Throughout we assume Tis fixed and carry out our analysis for Nlarge. Except for the
assumption regarding the individual effects, i, we make the following standard assump-
tions:
©2017 The Department of Economics, University of Oxford and JohnWiley & Sons Ltd

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT