Semiparametric Estimator for Binary‐outcome Sample Selection: Prejudice Matters in Election

DOIhttp://doi.org/10.1111/obes.12207
Published date01 June 2018
AuthorJin‐Young Choi
Date01 June 2018
536
©2017 The Department of Economics, University of Oxford and JohnWiley & Sons Ltd.
OXFORD BULLETIN OF ECONOMICSAND STATISTICS, 80, 3 (2018) 0305–9049
doi: 10.1111/obes.12207
Semiparametric Estimator for Binary-outcome
Sample Selection: Prejudice Matters in Election*
Jin-Young Choi
Faculty of Economics and Business, Goethe University Frankfurt, Theodor-W.-Adorno-Platz
4 60629,Frankfurt am Main, Germany (e-mail: choi@econ.uni-frankfurt.de)
Abstract
I propose a semiparametric estimator for binary-outcome sample-selection models that
imposes only single index assumptions on the selection and outcome equations without
specifying the error term distribution. I adopt the idea in Lewbel (2000) using a ‘special
regressor’ to transform the binary response Yso that the transformed Ybecomes linear
in the latent index, which then makes it possible to remove the selection correction term
by differencing the transformed Yequation. There are various versions of the estimator,
which perform differently trading off bias and variance. I conduct a simulation study and
then apply the estimators to US presidential election data in 2008 and 2012 to assess the
impact of racial prejudice on the elections, as a black candidate was involved for the first
time ever in the US history.
I. Introduction
A sample-selection model consists of a selection equation and an outcome equation. For
a continuously distributed outcome/response variable, many semiparametric estimators
have been proposed, in addition to the fully parametric maximum likelihood estimator
(MLE) and the nearly parametric Heckman’s (1979) two-stage estimator: Newey, Powell
and Walker (1990), Ahn and Powell (1993), Donald (1995), Powell (1987, 2001), Chen
(1999), Lewbel (2007), Chen and Zhou (2010), D’Haultfoeuille and Maurel (2013), and
Escanciano and Zhu (2015). These estimators differ in the strength of their assumptions;
specifically, among others, (i) in requiring an exclusion restriction that a regressor appears
in the selection equation, but not in the outcome equation, (ii) in allowing for an unknown
form of heteroscedasticity, (iii) and in identifying the outcome equation intercept as it is
not identified in some estimators.
For binary response, sample-selection models are much more difficult to deal with: call
them ‘binary-outcome selection (models)’.The most popular estimator for binary-outcome
JEL Classification numbers: C14, C35, D72.
*I am grateful to the Editor, two anonymous reviewers, Arthur Lewbel, Myoung-jae Lee and the seminar partici-
pants in Korea University and Sogang Universityfor their helpful comments and relevant references. All errors are
my own.
A new estimator for binary-outcome selection 537
selection model is the MLE in van de Ven and van Praag (1981) under the joint normality
of the two equation error terms and independence between the errors and the regressors.
This estimator is widely available in popular econometric softwares such as Stata. The
MLE, however, runs the risk of misspecifications such as non-normality, heteroscedastic
errors, etc. Also the MLE estimates the correlation coefficient between the two errors,
which sometimes causes a convergence problem. For binar y-outcome selection models,
there is no direct analogue to the Heckman’s (1979) two-stage estimator, although the
usual selection correction term (the ‘inverse Mill’s ratio’) is sometimes added into the
latent response equation on an ad-hoc basis.
Semiparametric estimators for binary-outcome selection models are relatively scarce.
Klein, Shen and Vella (2015) propose a quasi-MLE under a double linear index assump-
tion. They require each index to contain at least one continuous (i.e. continuously dis-
tributed) regressor, in addition to imposing an exclusion restriction with a continuous
regressor. Escanciano, Jacho-Chavez and Lewbel (2014) introduce a semiparametric esti-
mator for binary-outcome selection as an example for more general semiparametric in-
dex models with weighted kernel-based residuals. Their quasi-MLE can be seen as a
generalized version of Klein and Spady (1993) in that it semiparametrically estimates
the likelihood function after plugging in a first-stage non-parametric index estimator. Es-
canciano, Jacho-Chavez and Lewbel (2016) provide a more general version with double
indices for the outcome equation, one linear and the other unknown. Escanciano et al.
(2014, 2016) does not require any exclusion restriction, but uses nonlinearity in one index
instead.
In this paper, I propose a new semiparametric estimator for binary-outcome selec-
tion models that imposes only single index assumptions on the selection and outcome
equations without specifying the error term distribution. The estimator requires a contin-
uous ‘special regressor’ as in Lewbel (2000, 2007), and an excluded variable which may
be discretely distributed. Compared with the parametric and semiparametric estimators
for binary-outcome selection in the literature, the proposed estimator has a closed-form
expression and therefore does not require any numerical optimization except for estimating
the selection equation. The selection equation can be estimated in various ways (with MLE
or a semiparametric estimator), some of which mayentail a numerical optimization, but any
numerical difficulty thereof is certainly lesser than that in estimating the joint distribution
of the two errors in the MLE.
I conduct simulations to compare the performances of the aforementioned estimators
for binary-outcome selection. I find that my estimator performs robustly to heteroscedas-
ticity and non-normality of the errors. Also, it is computationally not as demanding
as likelihood-function-based estimators because it has a closed-form expression. I also
apply the estimator to US presidential election data in 2008 and 2012 to assess the im-
pact of racism, specifically a variable that measures prejudice, on the election of Barack
Obama.
The rest of this paper is organized as follows. Section II introduces the estimator and its
different versions in implementation. Section III presents the simulation study. Section IV
does the empirical analysis. Finally, section V concludes the findings.
©2017 The Department of Economics, University of Oxford and JohnWiley & Sons Ltd

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT