Beware of ‘Good’ Outliers and Overoptimistic Conclusions*

Date01 June 2009
Published date01 June 2009
AuthorMarjorie Gassner,Catherine Dehon,Vincenzo Verardi
DOIhttp://doi.org/10.1111/j.1468-0084.2009.00543.x
437
©Blackwell Publishing Ltd and the Department of Economics, University of Oxford, 2009. Published by Blackwell Publishing Ltd,
9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.
OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 71, 3 (2009) 0305-9049
doi: 10.1111/j.1468-0084.2008.00543.x
PRACTITIONERS’ CORNER
Beware of ‘Good’ Outliers and Overoptimistic
ConclusionsÅ
Catherine Dehon†, Marjorie Gassner† and
Vincenzo Verardi‡†
ECARES and CKE, Université Libre de Bruxelles, B-1050 Brussels, Belgium
(e-mail: cdehon@ulb.ac.be; mgassner@ulb.ac.be)
CRED, University of Namur, B-5000 Namur, Belgium (e-mail: vverardi@ulb.ac.be)
Abstract
The main goal of this paper is to warn practitioners of the danger of neglecting outliers
in regression analysis, in particular, good leverage points (i.e. points lying close to the
regression hyperplane but outlying in the x-dimension). While the types of outliers
which do inuence regression estimates (vertical outliers and bad leverage points)
have been extensively investigated, good leverage points have been largely ignored,
probably because they do not affect the estimated regression parameters. However,
their effect on inference is far from negligible. We propose a step-by-step procedure to
identify and treat all types of outliers. The paper of Persson and Tabellini [American
Economic Review (2004) Vol. 94, pp. 25–46] linking the degree of proportionality of
an electoral system to the size of government is discussed to illustrate how the choice
of a measure and the existence of atypical observations may substantially inuence
results.
I. Introduction
The main objective of econometrics is to confront economic theory with reality. This
is done by estimating a statistical model on the basis of a sample drawn from the
ÅThe authors would like to thank Gis`eleHites and their other colleagues at ECARES (European Center for
Advanced Research in Economics and Statistics), CKE (Centre for Knowledge Economics) and CRED (Centre
for Research in Economic Development) as well as two anonymous referees for helpful comments.Vincenzo
VerardiisAssociate Researcher of the FNRS and gratefully acknowledges their nancial support. All remaining
errors are the authors’ responsibility.
JEL Classication numbers: C12, C21, H11.
438 Bulletin
population. Obviously, this is a simplied representation of reality and the estimated
parameters are meaningful only under a set of strict assumptions which are only
too often neglected in practice. A fundamental hypothesis for classical estimators
such as least-squares (LS) is that all individuals behave in accordance with a unique
underlying model. Now, consider the case of a population composed of two very
unequal-sized groups where individuals behave similarly within, but not between
groups. If sampling is performed adequately, the sample must be representative of
the entire population and thus include individuals of both types. However, if the
model does not take into account the differences that exist between them, classical
methods may lead to results that are representative of neither group. This problem is
not uncommon in practical applications, because the existence of a small number of
unusual observations is often not known in advance.
The difference in behaviour between the groups of individuals could be due to
a measurement or transcription error, but the most common and problematic case is
that of an intrinsic difference between them. In that context, a thorough knowledge
of both types of individuals is needed to implement an appropriate correction. In
practice, this may be quite cumbersome. Robust methods are helpful in this situation
because they allow the researcher to grasp the pattern underlying the vast majority of
the observations while controlling for the inuence of outliers. The minority group
of individuals can then be studied separately from the bulk of the data.
In cross-sectional regression analysis, three types of outliers can cause LS to
break down and lead to estimations that are not representative of the population.
Rousseeuw and Leroy (1987) dene them as vertical outliers,bad leverage points
and good leverage points. Vertical outliers are observations that are outlying in the
dependent variable (the y-dimension) but are not outlying in the design space (the
x-dimension). Their existence affects the estimation of the intercept but only mildly
inuences that of the regression coefcients. Bad leverage points are observations that
are both outlying in the design space and located far away from the regression line.
They affect the estimation of the intercept and the slope coefcients. Finally, good
leverage points are observations that are outlying in the design space but are located
close to the regression line. Their existence only marginally inuences the estimation
of both the intercept and the regression coefcients but, as will be explained later,
affects inference.
Treatment for vertical outliers and bad leverage points has been investigated exten-
sively, given their inuence on the coefcient estimates (see, among others, Rous-
seeuw and van Zomeren, 1990 and Temple, 2000, for some theoretical arguments
and Temple, 1998, for an interesting practical implementation). On the other hand,
good leverage points have largely been ignored. Nevertheless, Ruppert and Simpson
(1990) and Croux (2006) emphasize that their effect is not negligible because good
leverage points might lead to a severe underestimation of standard errors. The intu-
ition for this is simple: imagine a statistically insignicant linear relation between a
dependent variable and an independent variable (where the estimated parameter is
not exactly zero). If one point is added in such a way that it is sufciently far from
©Blackwell Publishing Ltd and the Department of Economics, University of Oxford 2009

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT