Model Selection Criteria for Factor‐Augmented Regressions*

Published date01 February 2013
Date01 February 2013
AuthorGeorge Kapetanios,Jan J. J. Groen
DOIhttp://doi.org/10.1111/j.1468-0084.2012.00721.x
37
©Blackwell Publishing Ltd and the Department of Economics, University of Oxford 2012. Published by Blackwell Publishing Ltd,
9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.
OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 75, 1 (2013) 0305-9049
doi: 10.1111/j.1468-0084.2012.00721.x
Model Selection Criteria for Factor-Augmented
RegressionsÅ
Jan J. J. Groen† and George Kapetanios
Research and Statistics Group, Federal Reserve Bank of New York, 33 Liberty Street, New York,
NY 10045, US (e-mail: jan.groen@ny.frb.org)
Department of Economics, Queen Mary University of London, Mile End Road, London E1 4NS,
UK (e-mail: g.kapetanios@qmul.ac.uk)
Abstract
Existing dynamic factor selection criteria determine the appropriate number of factors in
a large-dimensional panel of explanatory variables, but not all of these have to be rele-
vant for modeling a specic dependent variable within a factor-augmented regression. We
develop theoretical conditions that selection criteria have to meet in order to get consistent
estimates of the relevant factor dimension for such a regression. These incorporate factor
estimation error and do not depend on specic factor estimation methodologies. Using this
framework, we modify standard model selection criteria, and simulation and empirical
applications indicate that these are useful in determining appropriate factor-augmented
regressions.
I. Introduction
When forecasting an economic variable, it is often necessary to incorporate information
from a large set of potential explanatory variables into the forecasting model. Most tradi-
tional macroeconomic prediction approaches, however, are unable to deal with this, either
because it is inefcient or downright impossible to incorporate a large number of variables
in a single forecasting model and estimate it using standard econometric techniques. As an
alternative approach to this problem factor-augmented regressions have gained a promi-
nent place. Aseminal application is Stock and Watson (2002a), where a limited number of
principal components extracted from a large data set are added to a standard linear regres-
sion model which then is used to forecast key macroeconomic variables. Stock and Watson
(2002b) and Bai (2003) formalized the underlying asymptotic theory, which allows the
use of principal components in very large data sets to identify the common factors in such
a data set.
ÅWe thank two anonymous referees as well as the Editor, Anindya Banerjee, for helpful comments, and Craig
Kennedy for excellent research assistance. The views expressed in this article are those of the authors and do not
necessarily reect those of the Federal Reserve Bank of New Yorkor the Federal Reserve System.
JEL Classication numbers: C22, C52, E37.
38 Bulletin
Dynamic factor research in econometrics has spend substantial effort on developing
tests and selection criteria aimed at determining that number of factors that describes best
the dynamics in a large data set of explanatory variables. Awell-known contribution is Bai
and Ng (2002), who derive a range of consistent information criteria that can be used to
identify the common factor space underlying a large panel of predictor series. While the
number of factors selected in such a way provides an upper bound for the number of factors
that should enter the forecasting regression for a particular variable, there is no a priori
reason to suppose that all factors should enter this regression. Therefore, it is of importance
that a form of factor selection is carried out that is tailored at determining a factor-based
forecasting model for a specic variable. This problem has received far less attention in
the literature than the aforementioned issue of determining the number of factors that best
explains the dynamics in large data sets of explanatory variables.
One further important reason for considering this problem has to do with the well
known evidence (see, e.g. Kapetanios (2010)) that determining the number of factors in
large datasets is a difcult undertaking. As a result, the performance of existing meth-
ods suffer considerably under a variety of circumstances. On the other hand, determining
the identity and number of variables in a regression, through information criteria, is a
well understood problem. Further, such information criteria have desirable properties both
asymptotically and in nite samples. Therefore, it seems reasonable to try and use such
criteria for the problem at hand, even if all factors in a large dataset appear in the regression
under consideration.
Intuitively, as the aim is to specify a regression model for a single variable, standard
information criteria may be considered useful in selecting the optimal number of fac-
tors for a particular forecasting regression. However, factor variables are not observed
and as a result this estimation error may matter and make standard information criteria
invalid. Stock and Watson (1998) make this point and propose a selection criterion that
takes into account this estimation error. However, their criteria do not take into account
the sharper asymptotic analysis of Bai and Ng (2006) and, therefore, the form of the
penalty term they propose and the conditions under which it is valid, can be improved
upon. Building on Bai and Ng (2006), Bai and Ng (2009) propose a nal prediction
error (FPE) criterion in which an extra penalty term is added to proxy for the effect of
factor estimation error on the forecasting regression. Optimizing this FPE will yield the
number of factors that asymptotically minimizes the prediction error, but it does not nec-
essarily provide an asymptotically consistent estimate of the number of factors present
in the regression of interest. Also, the nite sample performance of this FPE criterion
depends on the choice of a consistent estimator of the factor estimation error variance.
Alternatively, one can follow Bai and Ng (2008) and select a subgroup of predictors from
the overall macropanel with the best t for the target variable, based on some thresh-
old rule, and subsequently apply principal components on these ‘targeted predictors’ in
order to get the most relevant factors for forecasting. In this article, we rather focus,
like Stock and Watson (1998) and Bai and Ng (2009), on the construction of appropriate
selection criteria that can provide the econometrician with the optimal factor-augmented
regression.
We propose a number of novel insights with respect to this issue of determining the
relevant factors for a specic factor-augmented regression. Firstly, we show that standard
©Blackwell Publishing Ltd and the Department of Economics, University of Oxford 2012

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT