a biased and inconsistent estimation of the effect of interest. This analytical ﬂexibility was
described as a key threat to the reliability of inferences in observational research (Hendry,
1980; Leamer, 1983; Sims, 1988).
Various terms have been used to describe the search by authors of primary studies for
estimates with a p-value below the common thresholds of 0.05 or 0.1. We follow the usage
of Simonsohn, Nelson and Simmons (2014) who coined the term ‘p-hacking’ to denote the
selection of statistically signiﬁcant estimates within each study while using ‘publication
bias’ to refer to researchers placing studies without statistically signiﬁcant estimates in the
ﬁle drawer (Rosenthal, 1979). The classical view on publication bias is that in the worst
case only those 5% of studies that produce statistically signiﬁcant estimates by chance are
published and the remaining 95% of studies remain in the ﬁle drawer. However, this view
ignores that each study may engage in p-hacking resulting in a substantially increased rate
of false-positive ﬁndings (Simonsohn et al., 2014). Only those studies that fail to produce
signiﬁcant estimates after p-hacking may remain in the ﬁle drawer.
p-hacking is prevalentin both experimental and observational research and may be more
frequent in economics than in other disciplines, at least for impact evaluationstudies (Vivalt,
2015). p-hacking probably originates in the incentive structure of academic publishing and
limits the reliability of inferences that can be drawn from published empirical studies
(Ioannidis, 2005; Glaeser, 2008). Researchers engaging in p-hacking usually look for
estimates that are not only signiﬁcant but also conﬁrm the theory or hypothesis of interest.
Fanelli (2010) shows that the probability that a paper ﬁnds support for its hypothesis
is high across all research disciplines. The pressure to provide signiﬁcant and theory-
conﬁrming results is increased by declining acceptance rates in top economics journals
and the need to publish in these journals in order to start or advance an academic career
(Card and DellaVigna, 2013).As a result, Young, Ioannidis andAl-Ubaydli (2008) compare
the publication process to the winner’s curse in auction theory. The most spectacular or
exaggerated results are rewarded with publication in the top journals, although in this case
it is the scientiﬁc community rather than the author that is cursed.
In extreme cases, strong theoretical presumptions may lead authors to search for theory-
conﬁrming results (Card and Krueger, 1995). As soon as potentially false theories become
established, empirical research may be characterized by the selection of results that meet
the anticipated expectations of reviewers(Frey, 2003) rather than those that falsify the false
theory. Null results mayonly be considered for publication if a series of articles previously
established the presence of a genuine effect (De Long and Lang, 1992).
The combination of ﬂexible observational research designs in economics and incen-
tives to select for speciﬁc results may introduce severe biases in published empirical ﬁnd-
ings. Experimental sciences improve the reliability of inferences by using meta-analyses
that integrate the evidence of multiple studies while controlling for publication bias (e.g.
Sutton et al., 2000). Such meta-analytic tools are increasingly being used to synthesize
observational research in economics. The Precision-Effect Test (PET) that relates the
t-value of an estimated regression coefﬁcient to the precision of the estimate (Stanley,
2008) is commonly used (e.g. Doucouliagos and Stanley, 2009; Doucouliagos, Stanley and
Viscusi, 2014). If a genuine effect is present, the coefﬁcient’s t-value and its precision are
associated and this relation is used to test for the presence of a genuine effect. However,
such an association between a coefﬁcient’s t-value and its precision might also occur in the
©2017 The Department of Economics, University of Oxford and JohnWiley & Sons Ltd