Classical and Bayesian Inference for Income Distributions using Grouped Data

Published date01 February 2021
DOIhttp://doi.org/10.1111/obes.12396
AuthorBastian Gribisch,Tobias Eckernkemper
Date01 February 2021
32
©2020 TheAuthors. OxfordBulletin of Economics and Statistics published by Oxford University and John Wiley & Sons Ltd.
Thisis an open access article under the ter ms of the CreativeCommons Attribution License, which permits use, distribution and reproduction in any medium, provided
the original work is properlycited.
OXFORD BULLETIN OF ECONOMICSAND STATISTICS, 83, 1 (2021) 0305–9049
doi: 10.1111/obes.12396
Classical and Bayesian Inference for Income
Distributions using Grouped Data
Tobias Eckernkemper† and Bastian Gribisch
Institute of Econometrics and Statistics,University of Cologne,Universitaetsstr. 22a,
D-50937 Cologne, Germany (e-mail: eckernkemper@statistik.uni-koeln.de;
bastian.gribisch@statistik.uni-koeln.de)
Abstract
We propose a general framework for Maximum Likelihood (ML) and Bayesian estimation
of income distributions based on grouped data information. The asymptotic properties of
the ML estimators are derived and Bayesian parameter estimates are obtained by Monte
Carlo Markov Chain (MCMC) techniques.A comprehensive simulation experiment shows
that obtained estimates of the income distribution are very precise and that the proposed
estimation framework improves the statistical precision of parameter estimates relative to
the classical multinomial likelihood. The estimation approach is f‌inally applied to a set of
countries included in the World Bank database PovcalNet.
I. Introduction
The empirical analysis of welfare, income inequality and povertyrequires precise estimates
of the distribution of income. An overview on the vast and growing literature on statistical
inference for income distributions is, for example, provided by Kleiber and Kotz (2003),
Chotikapanich (2008) and Bandourian, McDonald and Turley (2003). If the data are fully
released, the distribution can be estimated by standard parametric or non-parametric meth-
ods like Maximum Likelihood (ML) or kernel density estimation. Especially for developing
countries it is, however, common that researchers can only access grouped income data
which are, for example, provided by theWorld Bank and the World Institute for Develop-
ment Economics Research (WIDER). The data typically consist of population shares and
group-specif‌ic mean incomes for 10 to 20 income groups, where the group boundaries
are not provided. This limited data structure causes problems related to partial identif‌ica-
tion of unrestricted income distributions and derived inequality measures (see e.g. Cowell,
1991 and Stoye, 2010), and turns the objective to estimating the parameters of prespecif‌ied
parametric income distributions, which are well known to provide a good f‌it to observed
income data (see e.g. McDonald, 1984, and Hajargasht et al., 2012).
The literature provides a variety of parametric income distributions including, but
not limited to Pareto’s distribution, the lognormal distribution, Champernowne’s
JEL Classif‌ication numbers: C21, C51, D31.
Inference for income distributions 33
distribution, Fisk’sdistribution, the gamma-, generalized gamma-, Weibull-,Singh–Maddala-
and Dagum distribution (see e.g. Kleiber and Kotz, 2003). McDonald (1984) proposed the
generalized beta distribution of the second kind (GB2 distribution), which nests the lognor-
mal, generalized gamma, Singh–Maddala, Beta-2 and Dagum distributions. Parker (1999)
showed that the GB2 distribution can be derived from microeconomic principles and the
distribution has therefore become very popular in applied economic research. An alterna-
tive, f‌lexible way of income modelling is based on mixture distributions, which are, for
example, analysed by Griff‌iths and Hajargasht (2012).
Contributions on statistical inference for grouped income data are rare. The traditional
and most frequently applied method is ML based on sample proportions using a multi-
nomial likelihood function (see e.g. McDonald, 1984, and Bandourian et al., 2003). This
approach is ineff‌icient in the majority of practical applications since it neglects the informa-
tion content of observed group means and does not account for unknown group boundaries.
Subsequent work then focused on nonlinear least squares and GMM estimation, where
relative population- and income shares are effectively matched to their theoretical coun-
terparts (see e.g. Wu and Perloff, 2005; Wu, 2006; Chotikapanich, Griff‌iths, Rao, 2007;
Chotikapanich et al., 2012). Hajargasht et al. (2012) and Griff‌iths and Hajargasht (2015)
propose GMM frameworks which account for unknown group boundaries and observed
group means but lack a solid statistical foundation with respect to the underlying data
generating process (DGP). Hajargasht and Griff‌iths (2020) shift the focus from income
distributions to parametric Lorenz curves and provide a GMM framework covering two
DGPs of empirical relevance, and Chen (2018) generalizes the GMM framework to in-
corporate varying data information. Bayesian approaches to the estimation of parametric
income distributions are provided by Chotikapanich and Griff‌iths (2000), Kakamu (2016)
and Kakamu and Nishino (2019). All Bayesian methods employ Monte Carlo Markov
Chain (MCMC) techniques based on the Metropolis-Hastings (MH) algorithm in order
to obtain samples from the parameters’ joint posterior distribution. While Chotikapanich
and Griff‌iths (2000) employ the standard multinomial likelihood of McDonald (1984), the
recent contributions of Kakamu (2016) and Kakamu and Nishino (2019) employ the joint
likelihood of a set of order statistics as proposed by Nishino and Kakamu (2011), which
is – however – appropriate for quantile-data only. Moreover, both Bayesian settings do not
account for unknown group boundaries and ignore the information of observed group mean
incomes.
Interestingly, while those recent contributions whichaccount for the infor mational con-
tent of group mean incomes completely focused on GMM, the early work of Hitomi et al.
(2008) already developed an asymptotically eff‌icient Quasi-Maximum Likelihood (QML)
approach incorporating the information of group means under observed and predetermined
group boundaries. Their QML approach is asymptotically equivalent to ML and provides
the same asymptotic properties as the subsequent GMM approaches of Hajargasht et al.
(2012) and Griff‌iths and Hajargasht (2015). In the present paper we develop a QML esti-
mation scheme which is similar in nature and asymptotically equivalent to the approach of
Hitomi et al. (2008) and extends the Hitomi framework to unknown group boundaries and
two different DGPs of practical relevance, which involve likelihoods containing different
data information. Moreover, we f‌ind that our QML framework comes particularly close
to the true likelihood for reasonable sample sizes, and combining the derived likelihoods
©2020 The Authors. Oxford Bulletin of Economics and Statistics published by Oxford University and JohnWiley & Sons Ltd.
34 Bulletin
with prior information therefore allows for the implementation of a straight-forward MH
sampling scheme for Bayesian inference. Bayesian estimation using MCMC techniques
is especially attractive for income distributions, since it directly provides valid inference
for nonlinear functions of the distribution parameters, such as the Gini coeff‌icient or the
Headcount ratio. Up to our knowledge, the proposed setting is the f‌irst to incorporate
the information of observed group means into Bayesian estimation of parametric income
distributions under grouped data.
We therefore contribute to the literature by offering a comprehensive discussion of
classical and Bayesian estimation of parametric income distributions for grouped income
data with potentially unknown boundaries while accounting for two methods of grouping
observations. The f‌irst method (DGP1) builds on proportions of observations in each in-
come group, which have been f‌ixed prior to sampling.As a result the group income means
and group boundaries are random. In the second method of grouping (DGP2) the group
boundaries are predetermined prior to sampling. Hence both the number of observations
and the income means in each group are random. Income data from the World Bank or
WIDER typically correspond to DGP1 with unknown group boundaries. Dependent on the
type of DGP the likelihood comprises varying data information including group population
proportions, group means and group boundaries. The multinomial ML method of McDon-
ald (1984) f‌its DGP2 with known boundaries and observed population proportions. The
informational content of the group means is ignored. The QML approach of Hitomi et al.
(2008) f‌its DGP2 with known group boundaries and observed group means and population
proportions. Both likelihoods are misspecif‌ied in case of DGP1. Finally, the order-statistic
based ML approach of Nishino and Kakamu (2011) f‌its DGP1 with known boundaries but
ignores the informative content of observed mean incomes.
Extending the ML approach of McDonald (1984) to incorporate the informational
content of the group means requires the derivation of the joint (conditional) density of
the mean incomes. This distribution is unknown for all relevant income distributions, but
for reasonable sample sizes well approximated by the Gaussian due to standard central
limit arguments. We approximate the joint density of the group means by a product of
Normals with moments given by their asymptotic counterparts. Under DGP1 the group
boundaries constitute random order statistics and can easily be included in the likelihood
(known boundaries, comparable to the ML approach of Nishino and Kakamu, 2011). If
the boundaries are unknown, we exploit asymptotic results of Beach and Davidson (1983)
and maximize the resulting Gaussian likelihood approximation for the group means con-
ditional on the parameters of the income distribution. Under DGP2 both group means
and relative population shares are random and the likelihood results from the product of
the joint conditional density of group means and the multinomial likelihood. If group
boundaries are unknown, we can simply estimate them along with the remaining model
parameters. Bayesian estimation is implemented by combining the derived likelihoods
with according prior information and sampling the resulting posterior using an inde-
pendent MH sampler based on a Gaussian approximation to the posterior distribution.
Since the proposed likelihood functions are based on Gaussian approximations, they es-
sentially resemble QML functions. However, as our simulation experiments show, the
estimation error is of very reduced impact and the QML functions appear close to the true
likelihoods.
©2020 The Authors. Oxford Bulletin of Economics and Statistics published by Oxford University and JohnWiley & Sons Ltd.

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT