A Partially Heterogeneous Framework for Analyzing Panel Data

Published date01 April 2015
Date01 April 2015
DOIhttp://doi.org/10.1111/obes.12062
274
©2014 The Department of Economics, University of Oxford and JohnWiley & Sons Ltd.
OXFORD BULLETIN OF ECONOMICSAND STATISTICS, 77, 2 (2015) 0305–9049
doi: 10.1111/obes.12062
A Partially Heterogeneous Framework forAnalyzing
Panel Data*
Vasilis Sarafidis† and Neville Weber
Department of Econometrics and Business Statistics, Caulfield East, VIC 3145, Australia
(e-mail: vasilis.sarafidis@monash.edu)
School of Mathematics and Statistics, University of Sydney, NSW 2006, Australia
(e-mail: neville.weber@sydney.edu.au)
Abstract
This article proposes a partially heterogeneous framework for the analysis of panel data
with fixed T. In particular, the population of cross-sectional units is grouped into clusters,
such that slope parameter homogeneity is maintained only within clusters. Our method
assumes no a priori information about the number of clusters and cluster membership and
relies on the data instead. The unknownnumber of clusters and the cor responding partition
are determined based on the concept of ‘partitional clustering’, using an information-based
criterion. It is shown that this is strongly consistent, that is, it selects the true number of
clusters with probability one as N→∞. Simulation experiments show that the proposed
criterion performs well even with moderate Nand the resulting parameter estimates are
close to the true values. We apply the method in a panel data set of commercial banks in
the US and we find five clusters, with significant differences in the slope parameters across
clusters.
I. Introduction
Slope parameter homogeneity is often an assumption that is difficult to justify in panel
data models, both on theoretical grounds and from a practical point of view. On the other
hand, the alternative of imposing no structure on how these coefficients may vary across
individual units may be rather extreme. This argument is in line with evidence provided
by a substantial body of applied work. For instance, Baltagi and Griffin (1997) reject the
hypothesis of coefficient homogeneityin a panel of gasoline demand regressions across the
OECD countries and Burnside (1996) rejects the hypothesis of homogeneous production
function parameters in a panel of US manufacturing industries. Even so, both studies show
*Weare g rateful to twoanonymous referees for helpful comments and suggestions. Excellent research assistance
was provided by Genliang Guan. We have also benefited from helpful comments by Geert Dhaene, Daniel Oron,
Tom Wansbeek, Yuehua Wu and seminar participants at the Erasmus University Rotterdam, University of Leuven,
University ofYorkand the Tinbergen Institute. Financial support from the Research Unit of the Faculty of Economics
and Business at University of Sydney is gratefully acknowledged.
JEL Classification numbers: C13, C33, C51.
A Partially heterogeneous framework for analyzing panel data 275
that fully heterogeneous models lead to very imprecise estimates of the parameters, which
in some cases have even the wrong sign. Baltagi and Griffin notice that this is the case
despite the fact that there is a relatively long time series in the panel – to the extent that the
traditional pooled estimators are superior in terms of root mean square error and forecasting
performance. Furthermore, Burnside suggests that in general his estimates show significant
differences between the homogeneous and the heterogeneous models and the conclusions
about the degree of returns to scale in the manufacturing industry would heavily depend on
which one of these two models is used.As pointed out by Browning and Carro (2007), there
is usually a lot more heterogeneity than whatempirical researchers allow for in econometric
modelling, although the level of heterogeneity and how one allows for it can make a large
difference for outcomes of interest.
This article takes on the view that the modelling framework of slope parameter homo-
geneity (pooling) and full heterogeneity may be polar cases and other intermediate cases
may often provide more realistic solutions in practice. One such example is the pooled
mean group estimator of Pesaran, Shin and Smith (1999) which imposes homogeneity
restrictions with respect to the long-run coefficients of the model, for reasons attributed
to budget constraints, arbitrage conditions and common technologies, while the short-run
dynamics are left completely unrestricted. Here we propose a modelling framework that
imposes partially heterogeneous restrictions not with respect to the dynamics of the model,
but with respect to the cross-sectional dimension, N. In particular, the population of cross-
sectional units is grouped into distinct clusters, such that within each cluster the slope
parameters are homogeneous and all intra-cluster heterogeneity is attributed to a function
of unobserved individual-specific and/or time-specific effects. The clusters themselvesare
heterogeneous, that is, the slope parameters vary across clusters.
Naturally, the practical issue of how to group the individuals into clusters is central
in the article. If there is a priori information about cluster membership and the number
of clusters, the problem reduces to a split-sample standard panel data regression. In many
cases, while it might be plausible to think of a set of factors to which slope parameter
heterogeneity can be attributed, such as differences in tastes, beliefs, abilities, skills or
constraints, these are often unobserved and moreover provide no guidance as to what
the appropriate partitioning is, or how many clusters exist. In addition, there are often
several ways to partition the sample and while the formed clusters may be economically
meaningful, they may not be optimal from a statistical point of view.
Clustering methods have already been advocated in the econometric panel data liter-
ature by some researchers; for instance, Durlauf and Johnson (1995) propose clustering
the individuals using regression tree analysis and Vahid (1999) suggests a classification
algorithm based on a measure of complexity using the principles of minimum description
length and minimum message length, which are often employed in coding theory. 1Both
these methods are based on the concept of hierarchicalclustering , whichinvolves building
a ‘hierarchy’ from the individual units by progressively merging them into larger clusters.
The proposed algorithms provide a consistent estimate of the true number of clusters for
T→∞only. On the contrary, this article proposes estimating the unknown number of
1Kapetanios (2006) proposes an information criterion, based on simulated annealing, to address a related problem
– in particular, how to decompose a set of series into a set of poolable series for which there is evidenceof a common
parameter subvector and a set of series for which there is no such evidence.
©2014 The Department of Economics, University of Oxford and JohnWiley & Sons Ltd

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT