Economists and Data

Date01 June 1992
AuthorD. P. O'Brien
Published date01 June 1992
DOIhttp://doi.org/10.1111/j.1467-8543.1992.tb00774.x
British Journal
of
Industrial Relations
30:2
June
1992
0007-1080
$3.00
Economists and
Data
D.
P.
O’Brien*
Final version accepted
1
October
1991.
Abstract
This paper is concerned with one central question: the choice between
theories, and the role played by data in that choice. It deals with the uses
economists may make
of
data and the importance
of
understanding the
institutional basis that gives rise to the data
-
an area
in
which labour
economists have traditionally been particularly strong
-
and with the
relevance
of
assumptions. It deals with the ultimate need to choose between
competing theories (despite the role
of
conventionalism)
on
the basis
of
data
rather than retreating into a comfortable ‘methodological pluralism’. It
considers the role of test replication, with reference to the practice
in
natural
science (and its role there
in
checking scientific fraud) and concludes that,
despite extensive technical problems
of
testing, economists have to accept a
data check
if
the rhetoric
of
mathematical technicality is not to overwhelm the
need
to
explain. Parallels are drawn with experience
in
physics (and the
implications
of
the development
of
Chaos and Catastrophe for a naively
predictionist view are noted), medicine and history
(of
which, it
is
argued,
modern economists are far too neglectful).
1.
Introduction
This paper is concerned with one central question: the choice between
theories, and therole played by datain that choice. It seemsincontestablethat
theory (rather than what Marshall called ‘mere crude unanalysed history
-
Pigou
1925:
437)
must have primacy. In a labour market context we cannot
investigate the effects
on
labour supply of tax changes without some explicit,
or implicit, theory about the relationship between economic variables. But
there
is
usually no shortage of competing theories, and thus theory
choice
is
necessary. For instance, labour economics provides the examples of
‘screening’ and ‘productivity’ approaches to investment in education, a wide
variety of competing theories of pay structure, and a number
of
economic
models of trade unions. Data help us not only to choose between theories but
also
to
judge the circumstances to which a particular theory may be applied,
and can also suggest different theoretical approaches.
*
Professor
of
Economics, University
of
Durham
254
British Journal
of
Industria
1
Relations
2.
Data
and
social science
Direct information
Inspection
of
data does not
directly
provide reliable information. In the first
place, in order to interpret the data we need not merely some system
of
concepts
-
Popper’s World
3
(Popper 1972)
-
but at least some explicit
hypotheses about possible relationships between those concepts. Secondly,
observation of a particular relationship within data provides no
logical
reason for believing in a necessary association in the future. Thirdly, data on
their own tell us nothing about causality
-
hypotheses about causality are
external to the data. Fourthly, as Popper has pointed out, the probability
of
a particular data relationship and its informational content are inversely
related (Popper 1959: 270) -it is precisely those things that are not obvious
which are interesting.
All
these problems pose particular difficulties for economists who wish to
obtain information directly from data; for, not only can we not inspect the
data without preconceptions, but the very process of
construction
of
the data
involves prior concepts; anyone who doubts this should have a look at the
National Accounts Statistics: Sources and Methods
produced by HMSO
(Maurice 1968). Yet economists routinely ignore the fact that official data
may be prepared
on
a
conceptual basis which is different from that of their
model. One small but illuminating example is provided by official figures for
rent: not only do these not correspond with the economic concept
,
but rent
is lumped together with income from self-employment ‘and the imputed
charge for the consumption of non-trading capital’ in
Economic
Trends,
Of
course, the problem that data are not independent
of
the observer is
certainly not confined to economics. The paradox
of
Schroedinger’s cat
-
which, in a box with a phial of poison, is neither dead nor alive but in a ‘super
position’ state, as described by Schroedinger’s wave equations, until
observed
-
is well known to physicists. But the cavalier nature with which
published data are employed by many economists (though, emphatically,
not all
-
Henry Phelps Brown is an important exception) would suggest a
certain insensitivity to the underlying problem. Certainly, attempts to
discover a ‘data generation process’, by powerful time-series techniques in
which ‘it is easily possible to generate more test statistics than there are data
points in the sample’ (Johnson 1991:
55),
encounter this problem.
There is also the problem of instability in the data, which has of course
implications far wider than mere attempts to learn directly from the raw
data. The problem of parameter shift (as distinct from the problem
of
unstable parameter
estimates
in the presence of multicolinearity) has led,
following the apparent breakdown
of
demand-for-money equations in the
years 1971-3, to what is known widely as ‘Goodhart’s Law’ (Goodhart 1984:
96). Whether that particular episode is in fact a good example of such
Economists and Data
255
instability is another matter; but that such instability exists can hardly be
doubted, given that economic data are the unique result of an historical
process.
That data relationships do not establish causal relationships is
so
well
understood in general philosophical terms that ‘post hoc ergo propter hoc’ is
a phrase in universal use. It is true that economists have been persuaded to
employ the word ‘causality’ in a sense not recognized by the dictionary,
namely, that associated with Grainger where, ‘if prediction
of
the current
value of
y
is enhanced by using past values of
x’
(Kennedy
1985: 64),
there is
said to exist causality. But this is not causality in any ordinary sense
of
the
term, but rather association.
Some economists have attempted ‘probabilistic induction’. The leading
exponent
of
this was Roy Harrod
(1956).
(An excellent summary is provided
by Phelps Brown
1980a: 28-9;
see also Braithwaite
1958.)
Harrod believed
that to be in general correct, on the basis of experience, was a sufficient
justification of induction, whereas, as critics pointed out, what was relevant
was the likelihood
of
being correct on any one occasion
-
and this could not
be established logically without smuggling in what was required to be
proved, namely the idea of a uniformity of nature. (Induction in this sense is,
of course, to be distinguished from the logical procedure
of
pure mathe-
matics.)
None the less, economists have continued to hanker after induction. This
seems to have been for a variety of reasons. In some cases it has apparently
been based on the (usually implicit) belief that this is the
only
way to obtain
knowledge. Wesley Mitchell is usually cited in this respect; the charge
however was unfair in that particular case (Mitchell
1928:
3;
Burns and
Mitchell
1946: 9-10),
as was the widely quoted charge
of
‘measurement
without theory’. Others appear to have believed that induction is at least
an
important
way to obtain knowledge. This is true of such outstanding figures
in the field of economic methodology as John Stuart Mill
(1868,
VI:
ch.
5)
and
J.
N.
Keynes
(1917:
ch.
6).
Others believe that it is an important check
on
a
priorism.
While agreeing wholeheartedly that such a check is needed,
for reasons to be discussed below, the data
alone
will not provide this.
Indirect information
-
hypotheses
We cannot obtain information directly from data. But we can obtain
knowledge indirectly through the formation
of
hypotheses. These
hypotheses may themselves be suggested by the recognition
of
patterns
within data, even though the patterns themselves do not constitute
information. The importance of pattern recognition is acknowledged in
natural science (Ziman
1978: 43-56).
Indeed, it
is
something that requires a
great deal of specialized practice (p.
54).
Historically it has proved to be of
critical importance in medicine
-
epidemiology
-
in which the link
between smoking and lung cancer is the most familiar example. The fact
that, as we shall see, such an approach can be abused does not invalidate its

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT