Model Selection in Equations with Many ‘Small’ Effects*
Author | Jennifer L. Castle,Jurgen A. Doornik,David F. Hendry |
Date | 01 February 2013 |
Published date | 01 February 2013 |
DOI | http://doi.org/10.1111/j.1468-0084.2012.00727.x |
6
©Blackwell Publishing Ltd and the Department of Economics, University of Oxford 2012. Published by Blackwell Publishing Ltd,
9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.
OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 75, 1 (2013) 0305-9049
doi: 10.1111/j.1468-0084.2012.00727.x
Model Selection in Equations with Many ‘Small’
EffectsÅ
Jennifer L. Castle†, Jurgen A. Doornik‡ and David F. Hendry‡
†Magdalen College and Institute for New Economic Thinking at the Oxford Martin School,
University of Oxford, UK (e-mail: jennifer.castle@magd.ox.ac.uk)
‡Economics Department and Institute for New Economic Thinking at the Oxford Martin School,
University of Oxford, UK (e-mails: jurgen.doornik@economics.ox.ac.uk; david.hendry@
economics.ox.ac.uk)
Abstract
High dimensional general unrestricted models (GUMs) may include important individ-
ual determinants, many small relevant effects, and irrelevant variables. Automatic model
selection procedures can handle more candidate variables than observations, allowing
substantial dimension reduction from GUMs with salient regressors, lags, nonlinear trans-
formations, and multiple location shifts, together with all the principal components, possi-
bly representing ‘factor’ structures, as perfect collinearity is also unproblematic. ‘Factors’
can capture small influences that selection may not retain individually.The final model can
implicitly include more variables than observations, entering via ‘factors’. We simulate
selection in several special cases to illustrate.
I. Introduction
Macroeconomic time-series are complicated processes, with many potential intercorrelated
explanatory variables, long dynamic interactions, various non-stationarities, nonlinearities,
and multiple structural breaks. Building econometric models of such phenomena from data
measured with non-negligible errors requires that all aspects of the time-series be captured,
as any omissions ‘contaminate’ the included effects. High dimensional initial models are
therefore likely, where the potential set of explanatory variables may include individual
determinants with significant explanatory power, irrelevant variables, and relevant vari-
ables that may have small effects individually that would not be significant at conventional
levels, and hence not retained when selection is undertaken, distorting inference (see Leeb
and P¨otscher, 2003, for an analysis). As theory models will always abstract from various
aspects of reality, selection is inevitable, so we address how this third group could be
captured in part by combining variables with small relevant effects to increase their joint
ÅThis research was supported in part by grants from the Open Society Foundations and the Oxford Martin School.
JEL Classification numbers: C52, C22.
Model selection in equations with many ‘small’ effects 7
explanatory power, and hence raise the probability of retention. We propose doing so by
capturing the otherwise unexplained co-movements of the observable time series using
their principal components, which could also embody relevant common forces.
Model selection is then applied to a general unrestricted model (GUM). Absent omni-
science, that GUM needs to include all the individual variables as well as their principal
components, so will be perfectly collinear. We exploit the ability of automatic selection
algorithms to handle such a problem (see e.g. Hendry and Krolzig, 2005, Doornik, 2009a).
Alternatively, significant individual variables could be selected first and then principal
components computed for the non-retained variables to capture additional small effects.
Both procedures are evaluated below.
The structure of the article is as follows. Section II describes the reductions involved
when the model is an over-specification of the DGP, with some substantively relevant and
some irrelevant variables, as well as many small relevant effects. Section III considers
representing the last group by their principal components; section IV considers the issues
of perfect collinearity and more variables than observations introduced by this approach.
Sections V–VII examine Monte Carlo evidence, evaluating the properties of selection: (i)
under the null when no variables or factors are relevant; (ii) when principal components
are used to parsimoniously approximate many small effects; (iii) when there are both indi-
vidually relevant variables and small effects, as well as irrelevant variables. Section VIII
concludes.
II. Dimension reduction
Toformalize reductions, let {xt}denote the time series of npotential explanatory variables
modelling yt, where ztis the complete set of their principal components, both with up to s
lags, and 1{i=t},t=1, ...,Tare a saturating set of impulse indicators, then the GUM is:
yt=
n
i=1
s
j=0
i,jxi,t−j+
n
i=1
s
j=0
i,jzi,t−j+
s
j=1
jyt−j+
T
i=1
i1{i=t}+et(1)
resulting in N>Tregressors. Of these, Lare relevant as defined by non-zero non-central-
ities of the population t-values in the local data generating process (LDGP: the DGP for
the set of variables under consideration, see e.g. Hendry, 2009):
yt=K
i=1iui,t+t(2)
where i/
=0ifui,tdenotes the set of relevant variables, with K≥Las equation (1) may
also omit relevant effects (such as parameter changes), and t∼IN[0, 2
].
As we anticipate high-dimensional GUMs, reduction and selection take five distinct
forms:
(i) conventional selection, where variables with insignificant estimated coefficients
are eliminated;
(ii) lag-length reduction;
(iii) reducing a saturating set of impulse indicators (i.e., one for every observation);
(iv) representing potentially very high-dimensional nonlinear reactions in a low-dimen-
sional form;
©Blackwell Publishing Ltd and the Department of Economics, University of Oxford 2012
To continue reading
Request your trial