Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Testing Macro Models by Indirect Inference: A Survey for Users

Open Economies Review, 2015
...Read more
econstor Make Your Publications Visible. A Service of zbw Leibniz-Informationszentrum Wirtschaft Leibniz Information Centre for Economics Le, Vo Phuong Mai; Meenagh, David; Minford, Patrick; Wickens, Michael; Xu, Yongdeng Working Paper Testing macro models by indirect inference: A survey for users Cardiff Economics Working Papers, No. E2015/9 Provided in Cooperation with: Cardiff Business School, Cardiff University Suggested Citation: Le, Vo Phuong Mai; Meenagh, David; Minford, Patrick; Wickens, Michael; Xu, Yongdeng (2015) : Testing macro models by indirect inference: A survey for users, Cardiff Economics Working Papers, No. E2015/9, Cardiff University, Cardiff Business School, Cardiff This Version is available at: http://hdl.handle.net/10419/146392 Standard-Nutzungsbedingungen: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Zwecken und zum Privatgebrauch gespeichert und kopiert werden. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich machen, vertreiben oder anderweitig nutzen. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, gelten abweichend von diesen Nutzungsbedingungen die in der dort genannten Lizenz gewährten Nutzungsrechte. Terms of use: Documents in EconStor may be saved and copied for your personal and scholarly purposes. You are not to copy documents for public or commercial purposes, to exhibit the documents publicly, to make them publicly available on the internet, or to distribute or otherwise use the documents in public. If the documents have been made available under an Open Content Licence (especially Creative Commons Licences), you may exercise further usage rights as specified in the indicated licence. www.econstor.eu
Cardiff Economics Working Papers Working Paper No. E2015/9 Testing macro models by indirect inference: a survey for users Vo Phuong Mai Le, David Meenagh, Patrick Minford, Michael Wickens and Yongdeng Xu July 2015 Cardiff Business School Aberconway Building Colum Drive Cardiff CF10 3EU United Kingdom t: +44 (0)29 2087 4000 f: +44 (0)29 2087 4419 business.cardiff.ac.uk This working paper is produced for discussion purpose only. These working papers are expected to be published in due course, in revised form, and should not be quoted or cited without the author’s written permission. Cardiff Economics Working Papers are available online from: econpapers.repec.org/paper/cdfwpaper/ and business.cardiff.ac.uk/research/academic-sections/economics/working-papers Enquiries: EconWP@cardiff.ac.uk
econstor A Service of zbw Make Your Publications Visible. Leibniz-Informationszentrum Wirtschaft Leibniz Information Centre for Economics Le, Vo Phuong Mai; Meenagh, David; Minford, Patrick; Wickens, Michael; Xu, Yongdeng Working Paper Testing macro models by indirect inference: A survey for users Cardiff Economics Working Papers, No. E2015/9 Provided in Cooperation with: Cardiff Business School, Cardiff University Suggested Citation: Le, Vo Phuong Mai; Meenagh, David; Minford, Patrick; Wickens, Michael; Xu, Yongdeng (2015) : Testing macro models by indirect inference: A survey for users, Cardiff Economics Working Papers, No. E2015/9, Cardiff University, Cardiff Business School, Cardiff This Version is available at: http://hdl.handle.net/10419/146392 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Zwecken und zum Privatgebrauch gespeichert und kopiert werden. Documents in EconStor may be saved and copied for your personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich machen, vertreiben oder anderweitig nutzen. You are not to copy documents for public or commercial purposes, to exhibit the documents publicly, to make them publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, gelten abweichend von diesen Nutzungsbedingungen die in der dort genannten Lizenz gewährten Nutzungsrechte. www.econstor.eu If the documents have been made available under an Open Content Licence (especially Creative Commons Licences), you may exercise further usage rights as specified in the indicated licence. Cardiff Economics Working Papers Working Paper No. E2015/9 Testing macro models by indirect inference: a survey for users Vo Phuong Mai Le, David Meenagh, Patrick Minford, Michael Wickens and Yongdeng Xu July 2015 Cardiff Business School Aberconway Building Colum Drive Cardiff CF10 3EU United Kingdom t: +44 (0)29 2087 4000 f: +44 (0)29 2087 4419 business.cardiff.ac.uk This working paper is produced for discussion purpose only. These working papers are expected to be published in due course, in revised form, and should not be quoted or cited without the author’s written permission. Cardiff Economics Working Papers are available online from: econpapers.repec.org/paper/cdfwpaper/ and business.cardiff.ac.uk/research/academic-sections/economics/working-papers Enquiries: EconWP@cardiff.ac.uk Testing macro models by indirect inference: a survey for users Vo Phuong Mai Le (Cardi¤ University)y David Meenagh (Cardi¤ University)z Patrick Minford (Cardi¤ University and CEPR)x Michael Wickens (Cardi¤ University, University of York and CEPR){ Yongdeng Xu (Cardi¤ University)k January 2015 Abstract With Monte Carlo experiments on models in widespread use we examine the performance of indirect inference (II) tests of DSGE models in small samples. We compare these tests with ones based on direct inference (using the Likelihood Ratio, LR). We …nd that both tests have power so that a substantially false model will tend to be rejected by both; but that the power of the II test is substantially greater, both because the LR is applied after reestimation of the model error processes and because the II test uses the false model’s own restricted distribution for the auxiliary model’s coe¢cients. This greater power allows users to focus this test more narrowly on features of interest, trading o¤ power against tractability. JEL Classi…cation: C12, C32, C52, E1 Keywords: Bootstrap, DSGE, New Keynesian, New Classical, indirect inference, Wald statistic, likelihood ratio 1 Introduction An unresolved issue in macroeconomics is how best to evaluate the empirical performance of DSGE models. In this paper we compare a relatively new type of test, indirect inference, with a standard procedure, the Likelihood Ratio test. Our main concern is the performance of these tests in small samples, though we will refer to asymptotic properties where known. Our main …nding is that the power of the likelihood ratio test is rather weak relative to that of the indirect inference test. We consider why we …nd this. We also show how this new testing procedure enables users such as policymakers to exploit the ability of the test and its associated estimator to focus on key features of macro behaviour; this allows then to …nd tractable models that are relevant to their purposes and then to discover whether these models can with total reliability evaluate the policy reforms they are interested in. The paper is set out as follows. In section 2 we consider how in recent work DSGE models have been evaluated empirically. In section 3 we review the main features of the indirect inference testing procedure as implemented in this paper. In section 4 we compare the small sample properties of tests based on indirect inference with the Likelihood Ratio test that is used in direct inference. The comparison is based on Monte Carlo experiments on the widely used DSGE model introduced by Christiano, Eichenbaum and Evans We thank participants in the 2014 Brunel Macro-Finance Conference for helpful comments on an earlier version. We are also grateful to High Performance Computing Wales and our HPC partner, OSTC, for access to the HPC super-computing network. Programmes to implement the methods described in this paper can be downloaded freely and at no cost from www.patrickminford.net/indirectinference. y LeVP@cf.ac.uk; Cardi¤ Business School, Cardi¤ University, Aberconway Building, Colum Drive, Cardi¤, CF10 3EU, UK z Meenaghd@cf.ac.uk; Cardi¤ Business School, Cardi¤ University, Aberconway Building, Colum Drive, Cardi¤, CF10 3EU, UK x Patrick.minford@btinternet.com; Cardi¤ Business School, Cardi¤ University, Aberconway Building, Colum Drive, Cardi¤, CF10 3EU, UK { Cardi¤ Business School, Cardi¤ University, Aberconway Building, Colum Drive, Cardi¤, CF10 3EU, UK k Xuy16@cf.ac.uk;Cardi¤ Business School, Cardi¤ University, Aberconway Building, Colum Drive, Cardi¤, CF10 3EU, UK 1 (2005) and estimated by Smets and Wouters (2003, 2007) on EU and US data. Initially, we use stationary data. In section 5 we extend the analysis to non-stationary data and to the three-equation New Keynesian representation of the model of Clarida, Gali and Gertler (1999), again on both stationary and non-stationary data. In section 6 we consider why the two testing methods have such di¤erent power, drawing on available asymptotic analysis as well as further Monte Carlo experiments. In section 7 we show how the testing methods we propose can be used in practice to reduce model uncertainty for a user with a clear purpose such as policy reform. Our …nal section presents our conclusions. 2 The empirical evaluation of DSGE models DSGE models emerged largely as a response to the perceived shortcomings of previous formulations of macroeconometric models. The main complaints were that these macroeconometric models were not structural despite being referred to as structural macroeconometric models - and so were subject to Lucas’s critique that they could not be used for policy evaluation (Lucas, 1976), that they were not general equilibrium models of the economy but, rather, they comprised a set of partial equilibrium equations with no necessary coherent structure, that they incorporated ‘incredible’ identifying restrictions (Sims, 1980) and that they over-…tted the data through data-mining. For all their theoretical advantages, the strong simplifying restrictions on the structure of DSGE models resulted in a severe deterioration of …t compared to structural macroeconometric models with their ad hoc supply and demand functions, their ‡exible lagged adjustment mechanisms and their serially correlated structural errors. There have been various reactions to the empirical failures of DSGE models. The early version of the DSGE model, the RBC model, was perceived to have four main faults: predicted consumption was too smooth compared with the data, real wages were too ‡exible resulting in employment being too stable, the predicted real interest rate was too closely related to output and the model, being real, could not admit real e¤ects arising from nominal rigidities. In retrospect, however, this empirical examination was limited and ‡awed. Typically, the model was driven by a single real stochastic shock (to productivity); there were no nominal shocks or mechanisms causing them to a¤ect real variables; and the model’s dynamic structure was derived solely from budget constraints and the capital accumulation equation. Subsequent developments of the DSGE model aimed to address these limitations, and other speci…cation issues, and they had some empirical success. Nevertheless, even this success has been questioned; for example Le et al. (2011) reject the widely acclaimed model of Smets-Wouters (2007). Another reaction, mainly from econometricians, is the criticism that DSGE models have been calibrated (to an economy) rather than estimated and tested using traditional methods, and when estimated and tested using classical econometric methods, such as the Likelihood Ratio test, they are usually found to perform poorly and are rejected. Sargent1 , discussing the response of Lucas and Prescott to these rejections, is quoted as saying that they thought that ‘those tests were rejecting too many good models’. Current practice is to try to get around this problem by estimating DSGE models using Bayesian rather than classical estimation methods. Compared with calibration, Bayesian methods allow some ‡exibility in the prior beliefs about the structural parameters and permit the data to a¤ect the …nal estimates. Calibrated parameters or, equivalently, the priors used in Bayesian estimation, often come from other studies or from micro-data estimates. Hansen and Heckman (1996) point out that the justi…cation for these is weak: other studies generally come up with a wide variety of estimates, while micro-estimates may well not survive aggregation. If the priors cannot be justi…ed and uninformative priors are substituted, then Bayesian estimation simply amounts to classical ML in which case test statistics are usually based on the Likelihood Ratio. The frequency of rejection by such classical testing methods is an issue of concern in this paper. A more radical reaction to the empirical failures of DSGE models has been to say that they are all misspeci…ed and so should not be tested by the usual econometric methods which would always reject them - see Canova (1994). If all models are false, instead of testing them in the classical manner under the null hypothesis that they are true, one should use a descriptive statistic to assess the ‘closeness’ of the model 1 In a recent interview Sargent remarked of the early days of testing DSGE models: ‘...my recollection is that Bob Lucas and Ed Prescott were initially very enthusiastic about rational expectations econometrics. After all, it simply involved imposing on ourselves the same high standards we had criticized the Keynesians for failing to live up to. But after about …ve years of doing likelihood ratio tests on rational expectations models, I recall Bob Lucas and Ed Prescott both telling me that those tests were rejecting too many good models.’ Tom Sargent, interviewed by Evans and Honkapohja (2005). 2 to the data. Canova (1994), for example, remarks that one should ask “how true is your false model?” and assess this using a closeness measure. Various econometricians - for example Watson (1993), Canova (1994, 1995, 2005), Del Negro and Schorfheide (2004, 2006) - have shown an interest in evaluating DSGE models in this way. We adopt a somewhat di¤erent approach that restores the role of formal statistical tests of DSGE models and echoes the widely accepted foundations of economic testing methodology laid down by Friedman (1953). Plainly no DSGE model, or indeed no model of any sort, can be literally true as the ‘real world’ is too complex to be represented by a model that is ‘true’ in this literal sense and the ‘real world’ is not a model. In this sense, therefore, all DSGE models are literally false or ‘mis-speci…ed’. Nevertheless an abstract model plus its implied residuals which represent other in‡uences as exogenous error processes, may be able to mimic the data; if so, then according to usual econometric usage, the model would be ‘well speci…ed’. The criterion by which Friedman judged a theory was its potential explanatory power in relation to its simplicity. He gave the example of perfect competition which, although never actually existing, closely predicts the behaviour of industries with a high degree of competition. According to Friedman, a model should be tested, not for its ‘literal truth’, but ‘as if it is true’. Thus, even though a macroeconomic model may be a gross simpli…cation of a more complex reality, it should be tested on its ability to explain the data it was designed to account for by measuring the probability that the data could be generated by the model. In this spirit we assess a model using formal misspeci…cations tests. The probability of rejection gives a measure of the model’s ‘closeness’ to the facts. This procedure can be extended to a sub-set of the variables of the model rather than all variables. In this way, it should be possible to isolate which features of the data the model is able to mimic; di¤erent models have di¤erent strengths and weaknesses (‘horses for courses’) and our procedure can tease these out of the tests. The test criterion may be formulated in a number of ways. It could, for example, be interpreted as a comparison of the values of the likelihood function for the DSGE model, or of a model designed to represent the DSGE model (an auxiliary model), or it could be based on the mean square prediction error of the raw data or on the impulse response functions obtained from these models or, as explained in more detail later, it could be based on a comparison of the coe¢cients of the auxiliary model being associated with the DSGE model. These criteria fall into two main groups: on the one hand, closeness to raw data, size of mean squared errors and ‘likelihood’ and, on the other hand, closeness to data features, to stylised facts or to coe¢cients of VARs or VECMs. Within each of these two categories the criteria can be regarded as mapping into each other so that there are equivalences between them; for example, a VAR implies sets of moments/cross-moments and vice versa. We discuss both types in this paper; we treat the Likelihood Ratio as our representative of the …rst type and the coe¢cients of a VAR as our representative of the second. Before DSGE models were proposed as an alternative to structural macroeconometric models, in response to the latter’s failings, Sims (1980) suggested modelling the macroeconomy as a VAR. This is are now widely used in macroeconometrics as a way of representing the data in a theory-free manner in order, for example, to estimate impulse response functions or for forecasting where they perform as well, or sometimes better, than structural models, including DSGE models, see Wieland and Wolters (2012) and Wickens (2014). Moreover, it can be shown that the solution to a (possibly linearized) DSGE model where the exogenous variables are generated by a VAR is, in general, a VAR with restrictions on its coe¢cients, Wickens (2014). It follows that a VAR is the natural auxiliary model to use for evaluating how closely a DSGE model …ts the data whichever of the measures above are chosen for the comparison. The data can be represented by an unrestricted VAR and the DSGE model by the appropriately restricted VAR; the two sets of estimates can then be compared according to the chosen measure. The apparent di¢culty in implementing this procedure lies in estimating the restricted VAR. Indirect inference provides a simple solution. Having estimated the DSGE model by whatever means - the most widely used at present being Bayesian estimation - the model can be simulated to provide data consistent with the estimated model using the errors backed out of the model. The auxiliary model is then estimated unrestrictedly both on these simulated data and on the original data. The properties of the two sets of VAR estimates can then be compared using the chosen measure. More precise details of how we carry out this indirect inference procedure in this paper are given in the next section2 . 2 In the appendix we review some recent studies of macro models using this method. 3 3 Model evaluation by indirect inference Indirect inference provides a classical statistical inferential framework for judging a calibrated or already, but maybe partially, estimated model whilst maintaining the basic idea employed in the evaluation of the early RBC models of comparing the moments generated by data simulated from the model with actual data. An extension of this procedure is to posit a general but simple formal model (an auxiliary model) — in e¤ect the conditional mean of the distribution of the data — and base the comparison on features of this model, estimated from simulated and actual data. If necessary these features can be supplemented with moments and other measures directly generated by the data and model simulations. Indirect inference on structural models may be distinguished from indirect estimation of structural models. Indirect estimation has been widely used for some time, see Smith (1993), Gregory and Smith (1991,1993), Gourieroux et al. (1993), Gourieroux and Monfort (1995) and Canova (2005). In indirect estimation the parameters of the structural model are chosen so that when this model is simulated it generates estimates of the auxiliary model similar to those obtained from actual data. The optimal choice of parameters for the structural model are those that minimise the distance between the two sets of estimated coe¢cients of the auxiliary model. Common choices for the auxiliary model are the moments of the data, the score and a VAR. Indirect estimates are asymptotically normal and consistent, like ML. These properties do not depend on the precise nature of the auxiliary model provided the function to be tested is a unique mapping of the parameters of the auxiliary model. Clearly, the auxiliary model should also capture as closely as possible the data features of the DSGE model on the hypothesis that it is true. Using indirect inference for model evaluation does not necessarily involve the estimation of the parameters of the structural model. These can be taken as given. They might be calibrated or obtained using Bayesian or some other form of estimation. If the structural model is correct then its predictions about the auxiliary model estimated from data simulated from the given structural model should match those based on actual data. These predictions relate to particular properties (functions of the parameters) of the auxiliary model such as its coe¢cients, its impulse response functions or just the data moments. A test of the structural model may be based on the signi…cance of the di¤erence between estimates of these functions derived from the two sets of data. On the null hypothesis that the structural model is ‘true’ there should be no signi…cant di¤erence. In carrying out this test, rather than rely on the asymptotic distribution of the test statistic, we estimate its small sample distribution and use this. Our choice of auxiliary model exploits the fact that the solution to a log-linearised DSGE model can be represented as a restricted VARMA and also often by a VAR (or if not then closely represented by a VAR). For further discussion on the use of a VAR to represent a DSGE model, see for example Canova (2005), Dave and DeJong (2007), Del Negro and Schorfheide (2004, 2006) and Del Negro et al. (2007a,b) (together with the comments by Christiano (2007), Gallant (2007), Sims (2007), Faust (2007) and Kilian (2007)), and Fernandez-Villaverde et al (2007). A levels VAR can be used if the shocks are stationary, but a VECM is required, as discussed below, if there are non-stationary shocks. The structural restrictions of the DSGE model are re‡ected in the data simulated from the model and will be consistent with a restricted version of the VAR3 . The model can therefore be tested by comparing unrestricted VAR estimates (or some function of these estimates such as the value of the log-likelihood function or the impulse response functions) derived using data simulated from the DSGE model with unrestricted VAR estimates obtained from actual data. The model evaluation criterion we use is based on the di¤erence between the vector of relevant VAR coe¢cients from simulated and actual data as represented by a Wald statistic. If the DSGE model is correct (the null hypothesis) then the simulated data, and the VAR estimates based on these data, will not be signi…cantly di¤erent from those derived from the actual data. The method is in essence extremely simple; although it is numerically taxing, with modern computer resources, it can be carried out quickly. The simulated data from the DSGE model are obtained by bootstrapping the model using the structural shocks implied by the given (or previously estimated) model and computed from the historical data. The test then compares the VAR coe¢cients estimated on the actual data with the distribution of VAR coe¢cient estimates derived from multiple independent sets of the simulated data. We then use a Wald statistic (WS) based on the di¤erence between aT , the estimates of the VAR coe¢cients derived from actual data, and 3 This requires that the model is identi…ed, as assumed here. Le, Minford and Wickens (2013) propose a numerical test for identi…cation based on indirect inference and show that both the SW and the New Keynesian 3-equation models are identi…ed according to it. 4 aS ( 0 ), the mean of their distribution based on the simulated data, which is given by: W S = (aT aS ( 0 ))0 W ( 0 )(aT aS ( 0 )) where W ( 0 ) is the inverse of the variance-covariance matrix of the distribution of simulated estimates aS . and 0 is the vector of parameters of the DSGE model on the null hypothesis that it is true. As previously noted, we are not compelled to use the VAR coe¢cients in this formula: thus one could use other data ‘descriptors’ considered to be key features of the data that the model should match — these could be particular impulse response functions (such as to a monetary policy shock) or particular moments (such as the correlations of various variables with output). However, such measures are functions of the VAR coe¢cients and it seems that a parsimonious set of features is these coe¢cients themselves. There are still issues about which variables to include in the VAR (or equivalently whether to focus only on a subset of VAR coe¢cients related to these variables) and what order of lags the VAR should be. Also it is usual to include the variances of the data or of the VAR residuals as a measure of the model’s ability to match variation. We discuss these issues further below. We can show where in the Wald statistic’s bootstrap distribution the Wald statistic based on the data lies (the Wald percentile). We can also show the Mahalanobis Distance based on the same joint distribution, normalised as a t-statistic, and also the equivalent Wald p-value, as an overall measure of closeness between the model and the data.4 In Le et al. (2011) we applied this test to a well-known model of the US, that of Smets and Wouters (2007; qv). We found that the Bayesian estimates of the Smets and Wouters (SW) model were rejected for both the full post-war sample and for a more limited post-1984 (Great Moderation) sample. We then modi…ed the model by adding competitive goods and labour market sectors. Using a powerful Simulated Annealing algorithm, we searched for values of the parameters of the modi…ed model that might improve the Wald statistic and succeeded in …nding such a set of parameters for the post-1984 sample. A variety of practical issues concerning the use of the bootstrap and the robustness of these methods more generally are dealt with in Le at al (2011). A particular concern with the bootstrap has been its consistency under conditions of near-unit roots. Several authors (e.g. Basawa et al., 1991, Hansen (1999) and Horowitz, 2001a,b) have noted that asymptotic distribution theory is unlikely to provide a good guide to the bootstrap distribution of the AR coe¢cient if the leading root of the process is a unit root or is close to a unit root. This is also likely to apply to the coe¢cients of a VAR when the leading root is close to unity and may therefore a¤ect indirect inference where a VAR is used as the auxiliary model. In Le et al. (2011) we carried out a Monte Carlo experiment to check whether this was a problem in models such as the SW model. We found that the bootstrap was reasonably accurate in small samples, converged asymptotically on the appropriate chi-squared distribution and, being asymptotically chi-squared, satis…ed the usual requirement for consistency of being asymptotically pivotal. 4 Comparing Indirect and Direct Inference testing methods It is useful to consider how indirect inference is related to the familiar benchmark of direct inference. We focus on the Likelihood Ratio as representative of direct inference. We seek to compare the distribution of the Wald statistic for a test of certain features of the data with the corresponding distribution for likelihood ratio tests. We are particularly interested in the behaviour of these distributions on the null hypothesis and the power of the tests as the model deviates increasingly from its speci…cation under the null hypothesis. We address these questions using Monte Carlo experiments. 4.1 Some preliminary experiments comparing indirect with direct inference We base our comparison on tests of the performance of DSGE models. Our …rst comparison is based on the SW model of the US, estimated over the whole post-war sample (1947Q1 2004Q4), and with a VAR as the auxiliary model. We treat the SW model as true. The focus of the two tests is slightly di¤erent: 4 The Mahalanobis Distance is the square root of the Wald value. As the square root of a chi-squared distribution, it can be converted into a t-statistic by adjusting the mean and the size. We normalise this here by ensuring that the resulting t-statistic is 1.645 at the 95% point of the distribution. 5 direct inference asks how closely the model forecasts current data while indirect inference asks how closely the model replicates properties of the auxiliary model estimated from the data. For direct inference we use a likelihood ratio (LR) test of the DSGE model against the unrestricted VAR. In e¤ect, this test shows how well the DSGE model forecasts the ‘data’ compared with an unrestricted VAR estimated on that data. We examine the power of the Wald test by positing a variety of false models, increasing in their order of falseness. We generate the falseness by introducing a rising degree of numerical mis-speci…cation for the model parameters. Thus we construct a False DSGE model whose parameters were moved x% away from their true values in both directions in an alternating manner (even-numbered parameters positive, odd ones negative); similarly, we alter the higher moments of the error processes (standard deviation, skewness and kurtosis) by the same += x%. We may think of this False Model as having been proposed as potentially ‘true’ following previous calibration or estimation of the original model but which was at the time thought to be mis-speci…ed.5 Many of the structural disturbances in the SW model are serially correlated, some very highly. These autocorrelated errors in a DSGE model are regarded as exogenous shocks (or combinations of shocks) to the model’s speci…cation, such as preferences, mark-ups, or technological change, the type of shock depending on which equation they appear in. Although they are, therefore, e¤ectively the model’s exogenous variables, they are not observable except as structural residuals in these equations. The signi…cance of this is that, when the False models are constructed, the autocorrelation processes of the resulting structural errors are likely to be di¤erent. This di¤erence is a marker of the model’s mis-speci…cation, as is the falseness of the structural coe¢cients. In order to give the model the best chance of not being rejected by the LR test, therefore, it is normal to re-estimate the autocorrelation processes of the structural errors. For the Wald test we falsify all model elements, structural and autocorrelation coe¢cients, and innovation properties, by the same += x%. In evaluating the power of the test based on indirect inference using our Monte Carlo procedure we generate 10,000 samples from some True model (where we take an error distribution with the variance, skewness and kurtosis found in the SW model errors), and …nd the distribution of the Wald for these True samples. We then generate a set of 10,000 samples from the False model with parameters and calculate the Wald distribution for this False Model. We then calculate how many of the actual samples from the True model would reject the False Model on this calculated distribution with 95% con…dence. This gives us the rejection rate for a given percentage degree += x of mis-speci…cation, spread evenly across the elements of the model. We use 10,000 samples because the size of the variance-covariance matrix of the VAR coe¢cients is large for VARs with a large number of variables.6 In evaluating the power of the test under direct inference we need to determine how well the DSGE model forecasts the simulated data generated by the True Model compared with a VAR model …tted to these data. We use the …rst 1000 samples; no more are needed in this case. The DSGE model is given a parameter set and for each sample the residuals and their autoregressive parameters are extracted by LIML (McCallum, 1976; Wickens, 1982). The IV procedure is implemented using the VAR to project the rational expectations in each structural equation; the residual is then backed out of the resulting equation. In the forecasting test the model is given at each stage the lagged data, including the lagged errors. We assume that since the lagged errors are observed in each simulated sample, the researcher can also estimate the implied s for the sample errors and use these in the forecast. We assume the researcher does this by LIML which is a robust method — clearly the DSGE model’s forecasting capacity is helped by the presence of these autoregressive error processes. We …nd the distribution of the LR when is the true model. We then apply the 5% critical 5 The ‘falseness’ of the original model speci…cation may arise due to the researcher not in allowing the data to force the estimated parameters beyond some range that has been wrongly imposed by incorrect theoretical requirements placed on the model. If the researcher speci…es a general model that nests the true model then estimation by indirect inference would necessarily converge on the parameter estimates that are not rejected by the tests. Accordingly tests would not reject this (well-speci…ed) model. Thus the tests have power against estimated models that are mis-speci…ed so that the true parameters cannot be recovered. Any estimation procedure that incorrectly imposes parameter values on a true model will generate such mis-speci…cation. In the case of the LR test the same argument applies, except that the estimator in this case FIML. Thus again the LR test cannot have power against a well-speci…ed model that is freely estimated by FIML. 6 We assume in this the accuracy of the bootstrap itself as an estimate of the distribution; the bootstrap substitutes repeated drawings from errors in a particular sample for repeated drawings from the underlying population. Le et al (2011) evaluate the accuracy of the bootstrap for the Wald distribution and …nd it to be fairly high. 6 value from this to the False model LR value for each True sample and obtain the rejection rate for the False Model. Further False models are obtained by changing the parameters by + or x%.7 Percent Mis-speci…ed True 1 3 5 7 10 15 20 Indirect Inference 5:0 19:8 52:1 87:3 99:4 100:0 100:0 100:0 Direct Inference 5:0 6:3 8:8 13:1 21:6 53:4 99:3 99:7 Table 1: Rejection Rates for Wald and Likelihood Ratio for 3 Variable VAR(1) True Model 3% False Model 150 150 100 100 50 50 0 0 -50 -50 -100 -100 -150 -150 -200 Correlation = -0.0077538 -200 -250 -250 -300 -300 -350 0 200 400 600 -350 0 800 Correlation = -0.065982 200 400 600 800 Figure 1: Scatter Plots of Indirect Inference (Wald) v. Direct Inference (LR) for 1000 samples of True Model (3 Variable VAR(1)) 7 The two tests are compared for the same degree of falseness of the structural coe¢cients, with the error properties determined according to the each test’s own logic. Thus for the Wald test, the error properties have the same degree of falseness as the structural coe¢cients so that overall model falseness is the same, rising steadily to give a smooth power function. For the LR test, the error properties are determined by reestimation, the normal test practice; the model’s falseness rises smoothly with the falseness of the structural coe¢cients, and their accompanying implied error processes. Were the LR error properties set at the same degree of falseness as for the Wald, the model’s forecasting performance would go o¤ track and the test would sharply reject, simply for this reason. Thus it would not be testing the model but arbitrarily false residuals- hence normal practice. If, per contra, we were to reestimate the errors in the Wald test for conformity with the LR test, the falseness of the error properties would rise sharply due to estimation error, raising overall model falseness with it, so derailing the smooth rise in falseness for the power function. To obtain exactly the same overall falseness of both tests, one needs to compare them with the same (true) error properties; this comparison is done in section 6, where it again shows much greater power from the Wald test. Of course in practice neither test would be appropriately carried out this way, nor could they since the tester is not told the true errors. The comparisons of the two power functions as done here represents how rejection rates rise as these two di¤erent tests are applied in practice to models of smoothly increasing falseness. 7 Wald v. Likelihood Ratio 200 0 -200 -400 -600 True 1% 3% 5% 7% 10% 15% 20% -800 -1000 Correlation = -0.70574 -1200 0 500 1000 1500 2000 2500 3000 Figure 2: Scatter Plots of Indirect Inference (Wald) v. Direct Inference (LR) for True and False Models (some outliers taken out for clarity of scale)(3 Variable VAR(1)) Table 1 shows that the power of the Indirect Inference Wald test is substantially greater than that of the Direct Inference LR test. With 5% mis-speci…cation, the Wald statistic rejects 99% of the time (at the 95% con…dence level) while the LR test rejects 15% of the time. At a su¢ciently high degree of falseness both reject 100% of the time. Nonetheless, the LR test also has reasonable power. Figure 1, which shows the correlation coe¢cients between the two tests for the true and 3% false models, shows that there is little or no correlation between the two tests across samples. However, Figure 2, which is a scatter diagram of the correlations between the two test statistics on the same samples but for increasing degreees of falseness, shows that as the model becomes more false, both tests increase their rejection rate. Taken together, these …ndings suggest that, when one measure is well-…tting, it may be well-…tting or badly-…tting on the other measure. A possible explanation for these …ndings is that the two tests are measuring di¤erent things; the LR test is measuring the forecasting ability of the model while the Wald test is measuring the model’s ability to explain the sample data behaviour. 4.1.1 Comparison of the tests with di¤erent VAR variable coverage and VAR lag order Tests based on indirect inference that use VARs with a high-order of lags, or VARs with more than just a few variables, are extremely stringent and they tend to reject uniformly. In Le et al. (2011) we proposed ‘directed’ Wald tests where the information used in evaluating a DSGE model was deliberately reduced to cover only ‘essential features’ of the data; of course, all Wald tests are based on chosen features of the data and therefore are always to some degree ‘directed’. Our use of the term is when the Wald test is focused on only a small subset of variables, or aspects of their behaviour. We …nd in Table 2 that for the indirect inference test the power of the Wald statistic tends to rise as the number of variables in the VAR or its lag order is increased. But power of direct inference based on a Likelihood ratio test (using the LIML method on the residuals) does not appear to vary in any systematic way with the benchmark VAR used, either in terms of the number of variables included or the order of the VAR. Why this is the case is a matter for future research. Our conjecture is that forecasting performance across di¤erent variables is highly correlated and that the most recent information provides the dominant input. If so, then adding variables or more lags would make little di¤erence. With indirect inference the addition of 8 INDIRECT INFERENCE VAR — no of coe¤s 3 variable VAR(1) — 9 3 variable VAR(2) — 18 3 variable VAR(3) — 27 5 variable VAR(1) — 25 7 variable VAR(3) — 147 TRUE 5:00 5:00 5:00 5:00 5:00 1% 19:76 38:24 38:22 28:40 75:10 3% 52:14 68:56 65:56 77:54 99:16 5% 87:30 84:10 92:28 97:18 99:96 7% 99:38 99:64 99:30 99:78 100:00 10% 100:00 100:00 100:00 100:00 100:00 15% 100:00 100:00 100:00 100:00 100:00 20% 100:00 100:00 100:00 100:00 100:00 DIRECT INFERENCE VAR — no of coe¤s 3 variable VAR(1) — 9 3 variable VAR(2) — 18 3 variable VAR(3) — 27 5 variable VAR(1) — 25 7 variable VAR(3) — 147 TRUE 5:00 5:00 5:00 5:00 5:00 1% 6:30 6:00 6:00 6:00 5:50 3% 8:80 8:30 7:90 8:20 7:10 5% 13:10 13:40 13:10 11:70 11:40 7% 21:60 23:10 21:90 15:90 18:80 10% 53:40 55:10 52:30 29:30 49:90 15% 99:30 99:40 99:50 93:30 99:60 20% 99:70 99:70 99:70 99:70 99:70 Table 2: Rejection Rates at 95% level for varying VARs variables or VAR detail adds to the complexity of behaviour that the DSGE model must match; the more complexity, the less well can the matching occur when the model is moderately false. Again, this brings out the essential di¤erence in the two measures of performance. 4.1.2 Estimation and test power In the above power comparisons we took the values of the DSGE model as given - perhaps by calibration or Bayesian estimation (where the priors may keep them away from the true values) or by some ine¢cient estimation process that fails to get close to the true parameter values. Suppose instead that we use maximum likelihood (FIML) estimates or indirect inference (II) estimates that minimise the Wald criterion. It is of interest to ask whether this would a¤ect the previous power comparisons as we would then expect the model to be rejected only if it was mis-speci…ed. For example, the model might assume Calvo price/wage setting when there was general competition or vice versa. First, we examine the small sample properties of the two estimators. While we know from earlier work that the estimators have similar asymptotic properties, there is no work comparing their small sample properties. We assess the small sample bias of the two estimators using the same Monte Carlo experiment on the SW model. Thus, we endow the econometrician with the true general speci…cation and re-estimate the model for each of the 1000 samples of data simulated from the true speci…cation of the model. The percentage mean biases and the percentage absolute mean biases are reported in Table 3. We obtain a familiar result that the FIML estimates are heavily biased in small samples. By contrast, we …nd that the II estimator has very small bias; on average it is roughly half the FIML bias and the absolute mean bias is around 4%. Second, we now check the power of each test for the re-estimated SW model against its general misspeci…cation which we require to be substantial otherwise the tests would have trivial power.8 The type of mis-speci…cation that we consider relates to the assumed degree of nominal rigidity in the model. The original SW model is New Keynesian (NK) with 100% Calvo contracting. An alternative speci…cation is a New Classical (NC) version with 100% competitive markets and a one-quarter information lag about prices by households/workers. We then apply the II test of NC to data generated by NK, allowing full re-estimation by II for each sample and vice versa with a test of NK on data generated by NC. This is repeated using the LR test with re-estimation of each sample by FIML - technically we do this by minimising the LR on each sample. The results in Table 4 strikingly con…rm the relative lack of power of the LR test. On NK data, the rejection rate of the NC model with 95% con…dence is 0%, and on NC data the rejection rate of the NK model is also 0%. It would seem, therefore, that with su¢cient ingenuity the NC model can be re-estimated 8 We can translate our results under re-estimation into terms of the ‘degree of falseness’ of the model as in the power functions used above. This will not be removed by the reestimation process. Re-estimation will take the model’s parameters to the corner solution where the estimates cannot get closer to the data without violating the model’s general mis-speci…cation. 9 Starting coef Steady-state elasticity of capital adjustment Elasticity of consumption External habit formation Probability of not changing wages Elasticity of labour supply Probability of not changing prices Wage indexation Price indexation Elasticity of capital utilisation Share of …xed costs in production (+1) Taylor Rule response to in‡ation Interest rate smoothing Taylor Rule response to output Taylor Rule response to change in output Average ' 5:74 1:38 0:71 0:70 1:83 0:66 0:58 0:24 0:54 1:50 2:04 0:81 0:08 0:22 c w L p w p rp ry r y Mean Bias (%) II FIML 0:900 5:804 13:403 0:480 0:759 1:776 0:978 0:483 13:056 1:590 7:820 0:843 4:686 5:587 2:861 5:297 7:941 21:240 3:671 8:086 0:027 6:188 3:228 29:562 2:069 2:815 0:089 29:825 0:171 5:758 Absolute Mean Bias (%) II FIML 0:900 5:804 13:403 0:480 0:759 1:776 0:978 0:483 13:056 1:590 7:820 0:843 4:686 5:587 4:155 5:297 7:941 21:240 3:671 8:086 0:027 6:188 3:228 29:562 2:069 2:815 0:089 29:825 0:171 8:586 Table 3: Small Sample Estimation Bias Comparison (II v. LR) II LR Percentage Rejected NK data NC data NC model NK model 99:6% 77:6% 0% 0% Table 4: Power of the test to reject a false model so as to forecast the data generated by the NK model even better than for the NK model itself (and vice versa) so that it is not rejected at all. By contrast when II is used, the power against general mis-speci…cation is high. The NC model is rejected (with 95% con…dence) 99.6% of the time on NK data and the NK model is rejected 78% of the time on NC data. The implication of this exercise is that the II test is indeed also far more powerful as a detector of general mis-speci…cation than LR. 5 Extending the test comparison We consider two extensions to the above experiments. First, instead of applying stationary shocks to the Smets-Wouters model as above, we apply non-stationary shocks. Second, partly in order to investigate whether these …ndings are model-speci…c, we carry out the same analysis, under both stationary and nonstationary shocks, to another widely-used DSGE model: the 3-equation (forward-looking IS curve, Phillips Curve and Taylor Rule) New Keynesian model of Clarida et al (1999). We …nd that the previous conclusions do not change in any essential way for either model. 5.1 Non-stationary shocks applied to the SW model If the data are non-stationary data then, in order to use the previous tests, we need to create an auxiliary model whose errors are stationary. We therefore use a VECM as the auxiliary model. Following Meenagh et al. (2012), and after log-linearisation, a DSGE model can usually be written in the form A(L)yt = BEt yt+1 + C(L)xt + D(L)et (A1) where yt are p endogenous variables and xt are q exogenous variables which we assume are driven by xt = a(L) xt 10 1 + d + c(L) t : (A2) The exogenous variables may consist of both observable and unobservable variables such as a technology shock. The disturbances et and t are both iid variables with zero means. It follows that both yt and xt are non-stationary. L denotes the lag operator zt s = Ls zt and A(L), B(L) etc. are polynomial functions with roots outside the unit circle. The general solution of yt is yt = G(L)yt + H(L)xt + f + M (L)et + N (L) t : 1 (A3) where the polynomial functions have roots outside the unit circle. As yt and xt are non-stationary, the solution has the p cointegration relations yt = [I G(1)] = xt + g: 1 [H(1)xt + f ] (A4) The long-run solution to the model is yt xt t = xt + g = [1 a(1)] t 1 = i=0 t s : 1 [dt + c(1) t ] S D Hence the long-run solution to xt , namely, xt = xD t + xt , has a deterministic trend xt = [1 S 1 a stochastic trend xt = [1 a(1)] c(1) t . The solution for yt can therefore be re-written as the VECM yt !t = [I G(1)](yt 1 = [I G(1)](yt 1 = M (L)et + N (L) t xt xt 1) + P (L) yt 1 ) + P (L) yt a(1)] + Q(L) xt + f + M (L)et + N (L) 1 + Q(L) xt + f + ! t 1 1 dt and t (A5) implying that, in general, the disturbance ! t is a mixed moving average process. This suggests that the VECM can be approximated by the VARX yt = K(yt where t 1 xt 1) + R(L) yt 1 + S(L) xt + g + (A6) t is an iid zero-mean process. As xt = xt 1 + [1 a(1)] 1 [d + t ] the VECM can also be written as yt = K[(yt 1 yt 1) (xt 1 xt 1 )] + R(L) yt 1 + S(L) xt + h + t: (A7) Either of equations (A6) or (A7) can act as the auxiliary model. Here we focus on equation (A7) which distinguishes between the e¤ect of the trend component of xt and the temporary deviation of xt from trend. These two components have di¤erent e¤ects in our models and so should be distinguished in the data in order to allow the tests to provide the fullest discrimination. It is possible to estimate equation (A7) in one stage by OLS. Using Monte Carlo experiments, Meenagh et al. (2012) show that this procedure is extremely accurate. We therefore use this auxiliary model as our benchmark both for the II test and the LR test. To generate non-stationary data from the DSGE model we endow it with one or more non-stationary error processes. These are constructed by generating AR processes for di¤erences in the structural errors. For the SW model we add banking and money and give it a non-stationary productivity shock. Full details of this version of the SW model are in Le, Meenagh and Minford (2012). The rejection probabilities for the Wald and LR tests are reported respectively in Tables 5. Once more the test based on indirect inference has far more power than the direct LR test. 11 INDIRECT INFERENCE VAR — no of coe¤s 3 variable VAR(1) — 9 3 variable VAR(2) — 18 3 variable VAR(3) — 27 5 variable VAR(1) — 25 7 variable VAR(3) — 147 TRUE 5:0 5:0 5:0 5:0 5:0 1% 7:9 9:2 7:1 11:1 19:9 3% 49:2 45:0 40:5 57:9 77:4 5% 97:8 99:2 98:6 99:6 100:0 7% 100:0 100:0 100:0 100:0 100:0 10% 100:0 100:0 100:0 100:0 100:0 15% 100:0 100:0 100:0 100:0 100:0 20% 100:0 100:0 100:0 100:0 100:0 DIRECT INFERENCE VAR — no of coe¤s 3 variable VAR(1) — 9 3 variable VAR(2) — 18 3 variable VAR(3) — 27 5 variable VAR(1) — 25 7 variable VAR(3) — 147 TRUE 5:0 5:0 5:0 5:0 5:0 1% 5:2 5:1 5:3 5:7 5:0 3% 5:8 5:8 5:8 6:1 6:0 5% 6:2 6:0 6:1 7:2 7:1 7% 7:4 7:3 7:3 7:9 8:3 10% 9:6 9:4 9:5 9:6 10:7 15% 15:6 15:1 15:5 12:6 15:0 20% 26:5 26:2 26:3 21:6 25:3 Table 5: Rejection Rates at 95% level for varying VARs (non-stationary data) 5.2 Extension to the 3-equation New Keynesian model The results for the 3-equation model New Keynesian in‡ation model are reported for stationary data in Table 6 and for non-stationary data in Table 7 . The results are not much di¤erent from those for the much larger Smets-Wouters model. For stationary data the power of the indirect inference test rises rapidly with the degree of falseness, but that of the Likelihood Ratio is much poorer and rises less fast. For non-stationary data the power of the indirect inference test rises less fast than for the Smets-Wouters model, while the power of the LR test is very low and hardly increases with the degree of falseness. These …ndings suggest that, if one is only interested in these three major macro variables, there is no substantial power penalty in moving to a more aggregative model of the economy if indirect inference is used. The power of the LR test is also similar for the two models - but lower than the Wald test - for stationary data and much lower for non-stationary data. INDIRECT INFERENCE VAR — no of coe¤s 2 variable VAR(1) — 4 3 variable VAR(1) — 9 3 variable VAR(2) — 18 3 variable VAR(3) — 27 TRUE 5:0 5:0 5:0 5:0 1% 16:8 25:1 16:1 14:4 3% 82:6 97:7 77:2 73:0 5% 99:6 100:0 98:4 97:5 7% 100:0 100:0 100:0 99:7 10% 100:0 100:0 100:0 100:0 15% 100:0 100:0 100:0 100:0 20% 100:0 100:0 100:0 100:0 DIRECT INFERENCE VAR — no of coe¤s 2 variable VAR(1) — 4 3 variable VAR(1) — 9 3 variable VAR(2) — 18 3 variable VAR(3) — 27 TRUE 5:0 5:0 5:0 5:0 1% 6:0 5:2 5:7 5:4 3% 7:5 6:9 7:2 7:4 5% 9:9 9:0 10:3 9:6 7% 13:2 12:3 13:0 12:3 10% 18:7 18:8 18:8 19:1 15% 29:2 32:3 32:8 33:0 20% 39:3 51:3 51:6 51:6 Table 6: 3-EQUATION MODEL: STATIONARY data: Rejection Rates at 95% level for varying VARs 12 INDIRECT INFERENCE VAR — no of coe¤s 2 variable VAR(1) — 4 3 variable VAR(1) — 9 3 variable VAR(2) — 18 3 variable VAR(3) — 27 TRUE 5:0 5:0 5:0 5:0 1% 9:6 2:9 3:7 3:1 3% 35:6 9:4 12:0 10:8 5% 78:6 40:6 34:8 34:7 7% 93:6 63:1 62:8 55:3 10% 100:0 99:4 96:8 96:9 15% 100:0 100:0 100:0 100:0 20% 100:0 100:0 100:0 100:0 DIRECT INFERENCE VAR — no of coe¤s 2 variable VAR(1) — 4 3 variable VAR(1) — 9 3 variable VAR(2) — 18 3 variable VAR(3) — 27 TRUE 5:0 5:0 5:0 5:0 1% 5:3 5:2 5:2 5:2 3% 5:4 5:3 5:3 5:3 5% 5:6 5:5 5:5 5:5 7% 6:3 5:5 5:5 5:5 10% 7:5 5:7 5:7 5:7 15% 9:2 5:7 5:7 5:7 20% 10:7 5:9 5:9 5:9 Table 7: 3-EQUATION MODEL: STATIONARY data: Rejection Rates at 95% level for varying VARs 6 Why does the indirect inference test have greater power than the Likelihood Ratio test? What we have shown so far is that in small samples the direct inference LR test has far less power than the Indirect Inference Wald test. The LR test is familiar. Let us review exactly the way the Indirect Inference test is carried out. Notice that we simulate the DSGE model to …nd its implied distribution for the VAR coe¢cients; the Wald test then checks whether the data-estimated VAR coe¢cients lie within the 95% bounds of this distribution- i.e. whether the DSGE-model-restricted distribution ’covers’ the data-based VAR coe¢cients at the speci…ed signi…cance level. However, we could have done the test di¤erently, in e¤ect ’the other way round’, creating the distribution of the data-estimated VAR coe¢cients and asking whether this data-based distribution covers the DSGE-model-restricted VAR coe¢cients (which we can obtain as the mean of the model-implied VAR coe¢cients distribution). This is the way in which a standard classical Wald test is performed: thus the data-based distribution (which comes from the true model, unrestricted) is treated as the null and the alternative hypothesis is tested against it in this way. This unrestricted Wald is a transformation of the LR test- as is familiar from standard econometrics. We can also obtain it by bootstrapping the estimated VAR. This distribution is unrestricted because it uses the estimated VAR without imposing on it the restrictions of the true (but unknown) model. Thus when bootstrapping the estimated VAR one holds the T constant, merely bootstrapping the VAR errors (which are linear combinations of the structural errors); whereas if one bootstrapped the true structural model, one would be capturing the overall variation in S across samples due to both the errors and their interaction with the structure of the DSGE model. It turns out that this is an important distinction between the two Walds. We will see below that the Wald using the restricted distribution- the ’restricted Wald’- creates a more powerful test than the one based on the unrestricted distribution- the ’unrestricted Wald’. For now we will simply explore the theoretical di¤erences between the restricted Wald on the one hand and the LR statistic or the unrestricted Wald on the other. Meenagh et al (2015), whom we follow closely in this section, show that the three tests are asymptotically equivalent when the DSGE model being tested is true. However when the DSGE model is false the restricted Wald test is not asymptotically equivalent to the other two. By using the distribution of the model-restricted VAR coe¢cients it generates increased precision of the variance matrix of the coe¢cients of the auxiliary model and so improves the power of the Wald test. 6.1 Summary: why the power is di¤erent With these introductory remarks we are now in a position to analyse the reasons for the di¤erence in power we have found between the two small sample tests, LR and our Indirect Inference Wald, IIW. In summary we …nd two main reasons: a) they are carried out with di¤erent procedures; b) even when the same procedures are followed, the two tests di¤er in power by construction. Let us now discuss these in turn. 13 6.1.1 Reason a): the tests employ di¤erent procedures so the comparison is of di¤erent models We have seen above how when reestimation is permitted using the LR test, power is reduced. Thus when one is …nding the rejection rate when parameter values are falsi…ed, we saw that with the LR test the reestimation of the error process to bring the model back on track reduced the rejection rate. This can be illustrated by comparing the power of the LR test in which the autoregression coe¢cients are re-estimated, as above, with an LR test in which the degree of falsi…cation of the autoregressive coe¢cients is pre-speci…ed, as for the Wald test above. We employ a 3-equation NK model for the comparison. As expected, the results in Table below shows that the LR test with pre-speci…ed autoregressive coe¢cients has considerably greater power than the test using re-estimated autoregressive coe¢cients. 3-equation NK model (no lags) Rejection rate of false models at 95% con…dence: T=200 Re-estimated 0 s Pre-speci…ed 0 s True 5:0 5:0 1% 5:0 5:0 3% 5:3 9:6 5% 6:1 20:2 7% 8:0 39:1 10% 15:4 63:7 15% 48:1 90:7 20% 75:6 98:9 Table 8: Comparing power due to wrong parameter values We further found that the power of the LR test against a completely mis-speci…ed model was virtually nil, because the FIML estimator of the mis-speci…ed model manages to ’data mine’ highly e¤ectively in …tting the wrong model- see Table 8 above. The point here is that the power is again eliminated by bringing the model, across all its parameters and not merely the AR ones, onto track with the data. 6.1.2 Reason b) Comparative power when the LR and Indirect Inference Wald procedures are like-for-like In the above comparison of the joint distribution of the two coe¢cients of interest, the data simulated from the structural model gave serially correlated structural error processes. In order to make the estimates of their joint distribution compatible with the original Smets-Wouters estimation strategy, …rst-order autoregressive processes were …tted to these structural errors for each bootstrap sample. In calculating the power of the tests we proceed a little di¤erently in order that the tests are based on the same assumptions when the structural model is falsi…ed. We now …x both (the vector of structural coe¢cients of the DSGE model) and (the vector of coe¢cients of the autoregressive error processes). Each is falsi…ed by x%. We do not, however, falsify the innovations, maintaining them as having the original true distribution. This last is a matter of convenience as we could extract the exact implied false error innovations, as implied by each data sample, and . But this extraction is a long and computationally-intensive process requiring substantial iteration (because the model expectations depend on the errors while the errors in turn depend on the expectations) . We simply assume, therefore, that the model is false in all respects except for the innovations. For our purposes here, which is to determine the relative power of the two tests when faced with exactly the same falsi…ed models, this creates no problems. We use the SW model as the true model with a sample size of 200 throughout. Our …ndings are reported in Table 9. We …nd that the two test statistics, LR and Wald, generate similar power when the unrestricted Wald test is used, i.e. based on the observed data (the unrestricted VAR). This is what we would expect since the unrestricted Wald, as we have seen, is simply a transformation of the LR test. Focusing on the main case, which is a 3VAR1, and taking 5% falseness as our basic comparison, we see that the rejection rate for the LR test is 38%. For the unrestricted Wald test, based on the unrestricted VAR, the rejection rate is 31%. 14 VAR — no of coe¤s TRUE WALD TEST with unrestricted 2 variable VAR(1) — 4 5:0 3 variable VAR(1) — 9 5:0 3 variable VAR(2) — 18 5:0 3 variable VAR(3) — 27 5:0 5 variable VAR(1) — 25 5:0 7 variable VAR(3) — 147 5:0 IIWALD TEST (with restricted 2 variable VAR(1) — 4 5:0 3 variable VAR(1) — 9 5:0 3 variable VAR(2) — 18 5:0 3 variable VAR(3) — 27 5:0 5 variable VAR(1) — 25 5:0 7 variable VAR(3) — 147 5:0 LIKELIHOOD RATIO TEST 2 variable VAR(1) — 4 5:0 3 variable VAR(1) — 9 5:0 3 variable VAR(2) — 18 5:0 3 variable VAR(3) — 27 5:0 5 variable VAR(1) — 25 5:0 7 variable VAR(3) — 147 5:0 1% 3% VAR 6:2 20:3 3:4 7:5 3:8 5:2 3:9 6:4 2:8 3:2 5:1 3:4 VAR) 9:8 37:7 9:5 36:1 8:3 35:5 9:2 32:9 17:8 85:5 77:6 99:2 12:0 9:4 8:9 8:9 8:9 5:7 28:3 21:8 20:7 20:4 22:4 10:6 5% 7% 10% 15% 20% 69:6 30:7 19:1 21:6 2:6 1:4 61:0 75:0 57:5 54:5 5:4 0:9 99:8 97:4 84:3 84:0 6:2 0:2 100:0 100:0 98:4 97:5 4:5 0:0 100:0 100:0 99:5 98:7 100:0 100:0 80:8 71:0 80:9 78:0 99:8 100:0 96:8 98:1 96:9 95:1 100:0 100:0 100:0 100:0 100:0 100:0 100:0 100:0 100:0 100:0 100:0 100:0 100:0 100:0 100:0 100:0 100:0 100:0 100:0 100:0 45:9 37:5 36:8 36:7 44:3 23:6 63:4 58:9 57:6 56:7 68:6 46:3 83:2 84:0 82:9 82:2 89:6 83:2 97:0 99:0 98:7 98:7 99:6 99:6 99:7 100:0 100:0 100:0 100:0 100:0 Table 9: Comparison of rejection rates at 95% level for Indirect Inference and Direct Inference However, using the restricted Wald (IIW) test the power rises to 85%, over double that of the two other tests.9 Understanding the extra power provided by using the restricted rather than the unrestricted Wald tests In our numerical comparison of the two tests our structural model is the Smets-Wouters model (2007). This is a DSGE model which has a high degree of over-identi…cation (as established by Le et al, 2013). It has 12 structural parameters and 8 parameters in the error processes. It implies a reduced-form VAR of order 4 with seven observable endogenous variables, i.e. a 7VAR4, (Wright, 2015). This has 196 coe¢cients. The size of the VAR in a IIW test and the number of variables is usually lower than a 7VAR4. We concentrate on the dynamic response to own shocks of in‡ation and the short-term nominal interest rate. We focus on the three variables of the above New Keynesian model: in‡ation, the output gap and the nominal interest rate. We use a 3VAR1 in these variables as the auxiliary model. We then examine the own-lag coe¢cients for in‡ation and the short-term interest rate. We estimate the coe¢cients of the 3VAR1 using the observed data for these three variables. We then …nd the distribution of the estimates of the two coe¢cients of interest by bootstrapping the VAR innovations. Next, we estimate the 3VAR1 using data for these three variables obtained by simulating the full SW model. The distribution of these estimates of the two coe¢cients is obtained by bootstrapping the structural innovations generating that sample. The graphs below show the densities of the joint distribution of the two coe¢cients. Figure 3 displays the joint distributions of the two VAR coe¢cients based on 1) the observed data (the unrestricted VAR), 2) simulated data from the original estimates of the structural model (the restricted VAR), and 3) false speci…cations of the structural models by 5% and 10% (the 5% false and 10% false restricted VARs). One can see clearly that 2), the joint distribution based on simulated data from the original structural model, is both more concentrated and more elliptical (implying a higher correlation 9 The unrestricted Wald test uses the variance matrix of the auxiliary model. When the VAR has a very large number of coe¢cients the variance matrix of the coe¢cients has a tendency to become unstable; this occurs even when the number of bootstraps is raised massively (eg to 10000). This is due to over-…tting in small samples (here the sample size is 200); there is then insu¢cient information to measure the variance matrix of the VAR coe¢cients. 15 between the coe¢cients) than 1), that using the observed data. Increasing the falseness of the model causes 3), the joint distributions from the 5% and 10% false DSGE model, to become a little more dispersed and more elliptical; they are also located slightly di¤erently but this is not shown as the distribution is centred on zero in all cases. Figure 4 shows how this a¤ects the power of the Wald test for a model that is 5% false. The green dot in the Figure is the mean of the distribution. The test of this false model can be carried out in two ways. We have drawn the diagram as if the joint test of two VAR coe¢cients chosen have the same power as the overall test of all VAR coe¢cients. The …rst way is to use the unrestricted Wald, using the observed data to estimate a 3VAR1 representation and to derive the joint distribution of the two coe¢cients by bootstrapping. The 5% contour of such a bootstrap distribution is given by the dashed green line; the thick green line shows the critical frontier at which the 5% false model is just rejected. The second way is to use the restricted Wald, using the distribution implied by the simulated data. The red ellipse shows the 5% contour of the resulting joint distribution. The results show that the second method has nearly double the power of the …rst. (Increasing the degree of falseness to 10% raises the power of both to 100%.) Figure 3: Restricted VAR and Unrestricted VAR Coe¢cient Distributions Exploiting the extra power of the Wald-type test with DSGE-model-restricted variance matrix Thus when we eliminate the di¤erence in procedures and test like-for-like we found the two tests are reasonably comparable in power when the indirect inference test is performed using the unrestricted Wald test which uses the variance of the unrestricted VAR (auxiliary) model. This turns out to be because the tests are approximately equivalent on a like-for-like basis. However, we showed above that extra power is delivered by the IIW test set out here, under which the DSGE model being tested is treated as the null 16 Rejection Frontier 5% False (unrestricted) Figure 4: Two 95% contours for tests of 5% False Model- Green=Unrestricted; Red=Restricted. 17 hypothesis: in this case the Wald statistic uses the variance restricted by the DSGE model under test. This gives this restricted Wald test still greater power. It may be possible to raise the power of the Wald test further. We suggest two ways this might be achieved: 1) extending the Wald test to include elements of the variance matrix of the coe¢cients of the auxiliary model; 2) including more of the structural model’s variables in the VAR, increasing the order of the VAR, or both. The basic idea here is to extend the features of the structural model that the auxiliary model seeks to match. The former is likely to increase the power of the restricted Wald test, but not the LR test, as this last can only ask whether the DSGE model is forecasting su¢ciently accurately; including more variables is likely to increase the power of both. There is, of course, a limit to the number of features of the DSGE model that can be included in the test. If, for example, we employ the full model then we run into the objection raised by Lucas and Prescott against tests of DSGE models that "too many good models are being rejected by the data". The point is that the model may o¤er a good explanation of features of interest but not of other features of less interest, and it is the latter that results in the rejection of the model by conventional hypothesis tests. Focusing on particular features is a major strength of the Wald test. 3-equation NK model — no lags (VAR(1) reduced form) Rejection rates at 95% con…dence: T=200 3 variable VAR(1) 3 variable VAR(2) True 5:0 5:0 1% 4:9 4:3 3% 7:3 7:1 5% 16:1 21:7 7% 37:0 40:3 10% 73:3 76:3 15% 99:4 99:8 20% 100:0 100:0 Table 10: Comparing power due to VAR order (3-equation NK model with no lags) Consider now including an indexing lag in the Phillips Curve. This increases the number of structural parameters to 9 and the reduced-form solution is a VAR(2). The power of the Wald test is reported in Table 11. Increasing the number of lags in the auxiliary model has clearly raised the power of the test. 3-equation NK model — with lag (VAR(2) reduced form) Rejection rates at 95% con…dence: T=200 3 variable VAR(1) 3 variable VAR(2) True 5:0 5:0 1% 10:6 6:0 3% 20:7 19:5 5% 47:5 57:9 7% 65:6 91:2 10% 89:6 100:0 15% 98:8 100:0 20% 99:9 100:0 Table 11: Comparing power due to VAR order (3-equation NK model with indexing lag) This additional power is related to the identi…cation of the structural model. The more over-identi…ed the model, the greater the power of the test. Adding an indexation lag has increased the number of overidentifying restrictions exploitable by the reduced form. A DSGE model that is under-identi…ed would produce the same reduced-form solution for di¤erent values of the unidenti…ed parameters and would, therefore have zero power for tests involving these parameters. 18 In practice, most DSGE models will be over-identi…ed- see Le et al (2013). In particular, the SW model is highly over-identi…ed. The reduced form of the SW model is approximately a 7VAR(4) which has 196 coe¢cients. Depending on the version used, the SW model has around 15 (estimatable) structural parameters and around 10 ARMA parameters. The 196 coe¢cients of the VAR are all non-linear functions of the 25 model parameters, indicating a high degree of over-identi…cation. The over-identifying restrictions may also a¤ect the variance matrix of the reduced-form errors. If true, these extra restrictions may be expected to produce more precise estimates of the coe¢cients of the auxiliary model and thereby increase its power. It also suggests that the power of the test may be further increased by using these variance restrictions to provide further features to be included in the test. 7 Using these methods to test a model In this …nal section we discuss the results we have found in using the Smets-Wouters model for monetary and …scal policy purposes in the context of the recent crisis and its aftermath. This work is all on US data for the period since the mid-1980s; we have not found it possible to mimic US behaviour for earlier data, we think because there has been substantial regime change before then- Le et al (2014). We start from the position that the model has credible micro-foundations but that we are searching for a variant of it that a) can allow for a banking system with the monetary base (M0) as an input into it b) can integrate the zero bound on the risk-free interest rate and Quantitative Easing together with bank regulation as policy tools; and c) can explain the behaviour of the three key macro variables: output, in‡ation and interest rates. This is because we want to …nd a model within which we can reliably explore policies that would improve these variables’ behaviour, especially their crisis behaviour. There is of course a large macro literature in which claims are made for the e¢cacy of a variety of policy prescriptions; but here we just focus on the set of policies investigated for this model, to illustrate the power of our methods. We will discuss the model’s properties with these policies in a moment. But …rst let us note that we can test it two ways- by a Likelihood Ratio test for three key macro variables, in‡ation, output and interest rates and also by an IIW test on the same three variables. We choose these because they are focused on the behaviour of the three variables of interest to us as policymakers. The LR test measures how close the model gets to the data- essentially a forecasting test; notice at once that this not really our interest but we are using it as a general speci…cation test. It turns out that the LR test is not sensitive, at least for the SW model, to what variables are included in the test, no doubt becase if a model forecasts some variables well, it must be forecasting the other variables well that are closely linked to them. We carry out the LR test in the usual way, allowing the s to be reestimated on the error processes extracted by LIML. The IIW test looks at how close the model gets to these three variables’ data behaviour- which we are deeply interested in matching and represent by a VECM (which we rewrite as a VARX) here as the data is non-stationary. Thus with the IIW test we have carefully chosen its focus to match our policy interests; we could have chosen a broader group of variables which would have raised the test power but at the cost of possibly not …nding a model that would …t their broader behaviour. Thus we see here that the focus of the test is a crucial aspect of the IIW test. We now reproduce some Monte Carlo experiments for the SW model from Table 1, 5 above: Percent Mis-speci…ed True 1 3 5 7 10 15 20 Wald LR Stationary data 5:0 5:0 19:8 6:3 52:1 8:8 87:3 13:1 99:4 21:6 100:0 53:4 100:0 99:3 100:0 99:7 Wald LR Non-stationary 5:0 7:9 49:2 97:8 100:0 100:0 100:0 100:0 data 5:0 5:2 5:8 6:2 7:4 9:6 15:6 26:5 Table 12: Rejection Rates for Wald and Likelihood Ratio for 3 Variable VAR(1) 19 The basic point we want to emphasise from this comparison is that if this model passes the IIW test, we can be sure it is less than 7% False whereas if it passes the LR test we can only be sure it is less than 15% False under stationarised data; under non-stationary data, the relevant case here, we cannot even be sure it is less than 20% False- in fact we …nd that it requires the model to be as much as 50% False for it to be rejected roughly 100% of the time. When we now apply the two tests to the Monetary model discussed above, it passes both tests. We can now compare how our policy analysis would vary with the two test approaches. Our basic policy results when we treat the model as True are summarised in the …rst row of the following Table 13: Frequency of crisis (expected crises per 1000 years) Policy exercise when model is True when model is 7% False when model is 15% False when model is 50% False Base case Monetary Reform PLT NGDPT PLT+ Mon.Reform NGDPT+ Mon.Reform 20:8 57:4 63:6 70:4 6:62 18:6 Explosive Explosive 2:15 10:3 19:4 33:3 1:83 8:7 19:6 33:4 1:41 11:8 19:4 34:4 1:31 10:3 17:4 34:2 Notes: Base Case: monetary policies as estimated over the sample period; Monetary Reform: running a Monetary Base rule targeted on the credit premium side by side with a Taylor Rule; PLT:substituting Price Level Target for In‡ation Target in Taylor Rule; NGDPT: substituting Nominal GDP target for in‡ation and output targets in Taylor Rule. Table 13: Policy analysis when model have varying falseness If we use the IIW test we know that our model could be up to 7% False but no more. We can discover the e¤ect of this degree of Falseness on our policy results by redoing the whole policy exercise with the parameters disturbed by 7%. We obtain the results shown in the second row of Table 13. In investigating the power of the test, we have simply assumed that we are presented with a False set of parameters somehow from the estimation process. We can then ask what power can we have against a quite mis-speci…ed model whose parameters are simply di¤erent. We have looked at this for the model here, by asking what the power is against a quite di¤erent model- say a New Classical model versus as assumed True SW model. The power is 100%; it is always rejected. So we can be quite sure the True model is not something quite di¤erent. Between these two things we therefore have a lot of reassurance. First, if the model is not well-speci…ed, it will be certainly rejected. Second, if the model is well-speci…ed, then models up to 7% distant from it could be True; and our policy conclusions can be tested for robustness within this range as we have done here. If we use the LR test we know the model could be up to 50% False- we cannot guarantee to reject a model that is less false than this. For example a 15% False model will be rejected only a third of the time. If we now redo the exercise for a 15% disturbance to the parameters we obtain the third row of Table 13. Now our policy is plainly vulnerable. The frequency of crises under the current regime goes up to once every 15 years; with NGDPT+monetary reform it only comes down to once every 50-60 years. This is on the borderline of acceptability. If we look at the 50% false case, shown in the last row of Table 13, it is disastrous. First, only just under half of the bootstrap simulations have sensible solutions. If we take those that do, we can see that the prevalence of crises under the existing regime would be much greater, at one every 14 years. As with 15% False the monetary reform regime is explosive. The other regimes all generate crisis frequency of around one every 30 years which is far from acceptable. To make matters worse, we have seen that the LR test has virtually no power against model misspeci…cation, so that we cannot be sure that a misspeci…ed model with yet other, possibly even worse, results might be at work. What this is showing us is that according to the LR test versions of our model that could be true imply much higher frequency of crises than in the estimated case and the monetary policy regimes suggested as 20 improvements could either give explosive results or produce an improvement in the crisis frequency that is quite inadequate for policy purposes. In other words the policymaker cannot rely on the model policy results. But using the IIW test we can be sure that the recommended policies will deliver the results we claim. 7.1 Can Estimation protect us against Falseness? But would this vulnerability not be reduced if we take ML estimation seriously? Unfortunately, as we saw above, estimation by ML gives us no guarantees of getting close to the true parameters. It is well-known to be a highly biased estimator in small samples- with an average absolute estimation bias across all parameters of nearly 9% in our Monte Carlo experiment above (see Table 3). Bearing in mind that our ’falseness’ measure assumes x as the absolute bias, alternating plus and minus, this suggests that FIML will on average give us this degree of falseness; in any particular sample it could be much larger therefore. We also looked above at whether the Indirect Inference estimator could give us any guarantees in this respect. This estimator was much less biased in small samples, with an average absolute bias about half that of FIML, as again shown in Table 3. However, again this can give us no guarantees of the accuracy of the estimates in any particular sample. It follows that we are essentially reliant on the power of the test, in the sense that this can guarantee that our model is both well speci…ed and no more than 7% false under indirect inference, because if it were either it would have been rejected with complete certainty. The dimension in which we have carried out this examination of the model’s reliability in the face of what we might call ’general falseness’. It may be also that the model’s performance is sensitive to the values of one or two particular parameters and if so we would also need to focus on the extent to which these might be false, how far the test’s power can protect us against this and how sensitive the model is within this range. This further investigation can be carried out in essentially the same way as the one we have illustrated with general falseness. 7.2 Choosing the testing procedure Thus what we have illustrated in this section is how macro models can be estimated and tested by a user with a particular purpose in mind. The dilemma a user faces is the trade-o¤ between test power (i.e. the robustness to being false of a model that marginally passes the test) and model tractability (i.e. the relevance for the facts to be explained of a model that marginally passes the test). Di¤erent testing procedures give di¤erent trade-o¤s as we have seen and is illustrated in the …gure below. Thus the Full Wald test gives the greatest power; but a model that passes this test will have to re‡ect the full complexity of detailed behaviour and thus be highly intractable. At the other extreme the LR test is easy to pass for a simple and tractable model; but it has very low power. In between lie Wald statistics with increasing ’narrowness’ of focus as we move away from the Full Wald. These o¤er lower power in return for higher tractability- somewhere along their trade-o¤ will be chosen by the policymaker, as shown in Figure 5 below. In order for us to …nd a tractable model we have to allow a degree of falseness in the model with respect to the data features other than those the policymaker prizes. The way to do this is to choose an indirect Inference test that focuses tightly (in a ’directed’ way) on the features of the data that are relevant to our modelling purposes. To apply these methods it is necessary to a) estimate and test the model, b) assess which ’directed’ test to choose, c) assess the power in the case of the model being used. We have programmes to do these things which we are making available freely to users- Appendix 2 shows the steps involved in …nding the Wald statistic, as carried out in these programmes10 . 8 Conclusions In this paper we examine the workings of the Indirect Inference Wald test of a macroeconomic model, typically a DSGE model. We show how the model can be estimated by Indirect Inference and how much 1 0 Programmes to implement the methods described in this paper can be downloaded freely and at no cost from www.patrickminford.net/indirectinference. 21 Figure 5: Maximising Friedman Utility power the test has in small samples against falseness in the estimated parameters as well as against complete model misspeci…cation. We perform numerous Monte Carlo experiments with widely-used DSGE models to establish the extent of this power. We consider how the test can be focused narrowly (via a ’directed Wald’) on features of the model in which the user is interested, echoing Friedman’s advice that models should be tested ’as if true’ according to their ability to explain features of the data the user is concerned about. For a user of a model with a clear purpose, for example a monetary policymaker, this testing method o¤ers an attractive trade-o¤ between the chances of …nding a model to pass the test and the power of the test to reject false models. Thus the user can determine whether the model found can be assumed to be reliable enough to use for a policy exercise, by seeing whether it is robust to the potential degree of falseness it could be open to. In this way users can discover whether their models are ’good enough’, in Friedman’s original sense, for the purposes intended, and the model uncertainty facing them can be reduced and even eliminated. Tailor-made programmes to carry out this procedure are now available to applied macro economists. We benchmarked the IIW test against the widely-used Likelihood Ratio, LR, test. A key …nding is that, in small samples, tests based on the IIW test have much greater power than those based on the LR test. This …nding is at …rst sight puzzling as the LR test can be transformed into a standard Wald test, which in turn can be obtained by indirect inference using the unrestricted variance matrix of the auxiliary model coe¢cients estimated on the data. We attempted to explain why this result occurs. We …nd that the di¤erence in power in small samples of the two tests can be attributed to two things. First, for the LR test the autoregressive processes of the structural errors are normally re-estimated when carrying out the test. This ‘brings the model back on track’ and as a result undermines the power of this test as it is, in e¤ect, based on the relative accuracy of one-step ahead forecasts compared with those obtained from an auxiliary VAR model. Second, additional power of the IIW test arises from its use of the restricted variance matrix of the auxiliary model’s coe¢cients, determined from data simulated using the restrictions on the DSGE model. These may give both give more precise estimates of these coe¢cients and provide further features of the model to test. The greater the degree of over-identi…cation of the DSGE model, the stronger this e¤ect. This suggests that for a complex, highly restricted, model like that of Smets and Wouters, the power of the indirect inference Wald test can made very high even in small samples. Because a test of all of the properties of a DSGE model is likely to lead to its rejection, it is preferable to focus on particular features of the model and their implications for the data. This is where the IIW test can ‡exibly be tailored to optimise the ratio of power to tractability. In sum, we …nd that the IIW test can become a formidable weapon in the armoury of the users of macro 22 models, enabling them to estimate a model that can pass the test when suitably focused and then to check its reliability in use against such potential inaccuracy as cannot be ruled out by the power of the test. References [1] Basawa, I.V., Mallik, A.K., McCormick, W.P., Reeves, J.H., Taylor, R.L., 1991. Bootstrapping unstable …rst-order autoregressive processes. Annals of Statistics 19, 1098–1101. [2] Canova, F., 1994. Statistical Inference in Calibrated Models. Journal of Applied Econometrics 9, S123– 144. [3] Canova, F., 1995. Sensitivity Analysis and Model Evaluation in Dynamic Stochastic General Equilibrium Models. International Economic Review 36, 477–501. [4] Canova, F., 2005. Methods for Applied Macroeconomic Research, Princeton University Press, Princeton. [5] Canova, F., Sala, L., 2009. Back to square one: Identi…cation issues in DSGE models. Journal of Monetary Economics 56, 431–449. [6] Christiano, L. 2007. Comment on ‘On the …t of new Keynesian models’. Journal of Business and Economic Statistics 25,143–151. [7] Christiano, L.J., Eichenbaum, M., Evans, C.L., 2005. Nominal Rigidities and the Dynamic E¤ects of a Shock to Monetary Policy. Journal of Political Economy, 113(1), 1-45. [8] Clarida, R., Gali, J. and Gertler, M. L., 1999, ‘The Science of Monetary Policy: A New Keynesian Perspective’, in Journal of Economic Literature, 37(4), pp.1661-1707. [9] Dai, L., Minford, P. and Zhou, P., 2014. A DSGE model of China, Cardi¤ Economics Working Paper No E2014/4, Cardi¤ University, Cardi¤ Business School, Economics Section; also CEPR discussion paper 10238, CEPR, London. [10] Dave,C., De Jong, D.N., 2007. Structural Macroeconomics. Princeton University Press. [11] Del Negro, M., Schorfheide, F., 2004. Priors from General Equilibrium Models for VARs. International Economic Review, 45, 643–673. [12] Del Negro, M., Schorfheide, F., 2006. How good is what you’ve got? DSGE-VAR as a toolkit for evaluating DSGE models. Economic Review, Federal Reserve Bank of Atlanta, issue Q2, 21–37. [13] Del Negro, M., Schorfheide, F., Smets, F., Wouters, R., 2007a. On the …t of new Keynesian models. Journal of Business and Economic Statistics 25,123–143. [14] Del Negro, M., Schorfheide, F., Smets, F., Wouters, R., 2007b. Rejoinder to Comments on ‘On the …t of new Keynesian models’. Journal of Business and Economic Statistics 25,159–162. [15] Evans, R. and Honkapohja, S. (2005) ‘Interview with Thomas J. Sargent’, Macroeconomic Dynamics, 9, 2005, 561–583. [16] Faust, J., 2007. Comment on ‘On the …t of new Keynesian models. Journal of Business and Economic Statistics 25,154–156. [17] Fernandez-Villaverde, J., Rubio-Ramirez, F., Sargent, T, and Watson, M. (2007) ’ABCs (and Ds) of Understanding VARs’, American Economic Review, pp 1021-1026. [18] Friedman, M. 1953. The methodology of positive economics, in Essays in Positive Economics, Chicago: University of Chicago Press. [19] Gallant, A.R., 2007. Comment on ‘On the …t of new Keynesian models’. Journal of Business and Economic Statistics 25,151–152. 23 [20] Gourieroux, C., Monfort, A., 1995. Simulation Based Econometric Methods. CORE Lectures Series, Louvain-la-Neuve. [21] Gourieroux, C., Monfort, A., Renault, E., 1993. Indirect inference. Journal of Applied Econometrics 8, S85–S118. [22] Gregory, A., Smith, G., 1991. Calibration as testing: Inference in simulated macro models. Journal of Business and Economic Statistics 9, 293–303. [23] Gregory, A., Smith, G., 1993. Calibration in macroeconomics, in: Maddala, G. (Ed.), Handbook of Statistics vol. 11, Elsevier, St. Louis, Mo., pp. 703–719. [24] Hansen, B.E., 1999. The Grid Bootstrap And The Autoregressive Model. The Review of Economics and Statistics 81, 594–607. [25] Hansen, L. P. and Heckman, J. J. , 1996. The empirical foundations of calibration. Journal of Economic Perspectives 10(1):87–104. [26] Horowitz, J.L., 2001a. The bootstrap, in: Heckman, J.J., Leamer, E. (Eds.), Handbook of Econometrics, vol.5, ch. 52, 3159–3228, Elsevier. [27] Horowitz, J.L., 2001b. The Bootstrap and Hypothesis Tests in Econometrics. Journal of Econometrics 100, 37–40. [28] Juillard, M., 2001. DYNARE: a program for the simulation of rational expectations models. Computing in economics and …nance 213. Society for Computational Economics. [29] Kilian, L., 2007. Comment on ‘On the …t of new Keynesian models’. Journal of Business and Economic Statistics 25,156–159. [30] Le, V.P.M., Meenagh, D., Minford, P., Wickens, M., 2010, Two Orthogonal Continents? Testing a Two-country DSGE Model of the US and the EU Using Indirect Inference, Open Economies Review, 2010, vol. 21, issue 1, pages 23-44. [31] Le, V.P.M., Meenagh, D., Minford, P., Wickens, M., 2011. How much nominal rigidity is there in the US economy — testing a New Keynesian model using indirect inference. Journal of Economic Dynamics and Control 35(12), 2078–2104. [32] Le, V.P.M., Meenagh, D. and Minford, P., 2012. What causes banking crises? An empirical investigation. Cardi¤ working paper No E2012/14, Cardi¤ University, Cardi¤ Business School, Economics Section; also CEPR discussion paper no 9057, CEPR, London. [33] Le, V.P.M., Minford, P., Wickens, M., 2013, A Monte Carlo procedure for checking identi…cation in DSGE models, working paper E2013/4, Cardi¤ Economics Working Papers, Cardi¤ University, Cardi¤ Business School, Economics Section; also CEPR discussion paper 9411. [34] Le, V.P.M., Matthews, K., Meenagh, D., Minford, P., and Xiao, Z., 2014, Banking and the Macroeconomy in China: A Banking Crisis Deferred? Open Economies Review, vol. 25, issue 1, pages 123-161. [35] Liu, C. and Minford, P., 2014a, Comparing behavioural and rational expectations for the US post-war economy, Economic Modelling, vol. 43, issue C, pages 407-415. [36] Liu, C. and Minford, P., 2014b, How important is the credit channel? An empirical study of the US banking crisis, Journal of Banking and Finance, Volume 41, April, Pages 119-134. [37] Lucas, R.E.,1976, Econometric policy evaluation: A critique, Carnegie Rochester Conference Series on Public Policy No. 1, The Phillips Curve and Labour markets, K. Brunner and A. Meltzer, eds., supplement to Journal of Monetary Economics. [38] McCallum, B.T., 1976. Rational expectations and the natural rate hypothesis: some consistent estimates. Econometrica 44, 4–52. 24 139] N,Ieenagh, D., Nlinford, P. and Wickens, N{. R., 2009. Testing a DSGtr Model of the EU Using Indirect Inference. Open Economies Review, vol. 20, issue 4, pages 435-471. M. R., and Xu, Yongdeng, 2015. Comparing Indirect inference and likelihood testing methods: asymptotic and small sample results. Working paper, Cardiff Economics Working Papers No E2015/& Cardiff University, Cardiff Business School, Economics Section. 140] \tleenagh, D., lVlinford, P.,Wickens, D., N{inford, P., Nowell, E. and Sofat, P., 2010. Can a real business cycle model without price and wage stickiness explain UK real exchange rate behaviour? Journal of International Nloney 141] Nleenagh, and Finance, vol. 29, issue 6, pages 1131-1150. \{inford, P. and Wickens M.R., 2012. Testing macroeconomic models by indirect inference on unfiltered data, Cardiff Working Paper No tr20l21I7, Cardiff University, Cardiff Business School, Economics Section; also CEPR discussion paper no 9058, CEPR, London. 142] Nleenagh, D., f43] }dinford, P. and O:u,2.,2013. Taylor Rule or optimal timeless policy? Reconsidering the Fed's behavior since 1982. Economic Nlodelling, vol. 32, issue C, pages 113-123. [44] Nlinford, P., Theodoridis, K. and Meenagh, D., 2009. Testing a Nlodel of the UK by the \'Iethod of Indirect Inference. Open Economies Review, vol. 20, issue 2, pages 265-291. [45] Nlinford, P., Ou, Z. and Wickens, N{. 2012. 'Revisiting the Great Nloderation: policy or luck? working paper tr201219, Cardiff Economics Working Papers, Cardiff University, Cardiff Business School, Economics Sectionl forthcoming Open Economies Review. 146] Sims, C.A.,1980. Macroeconomics and reality. Econometrica, 48, 1-48. 147] Sims, C.A., 2007. Comment on 'On the fit of new Keynesian models', Journal of Business and Economic Statistics 25,752-154. [48] Smets, F., Wouters, R., 2003. An Estimated Dynamic Stochastic General trquilibrium Nlodel of the Euro Area. Journal of the European Economic Association, 1(5), pll23 1775. 149] Smets, F., Wouters, R., 2007. Shocks and Frictions in US Business Cycles: A Bayesian DSGE Approach. American Economic Review 97, 586 606. [50] Smith, A., 1993. Estimating nonlinear time-series models using simulated vector autoregressions. Journal of Appiied Econometrics 8, 563-584. 151] Watson, M., 1993. iVleasures of fit for calibrated models. Journal of Political Economy 101, 1011-1041. V. and Wolters, NI. H., 2012. Forecasting and policy making, in G. (eds.), Handbook of Economic Forecasting, Vol. 2, Elsevier 152] Wieland, trlliott and A. Timmerman [53] Wickens, X4.R., 1.982. The efficient estimation of econometric models with rational expectations. Review of Economic Studies 49, 55 67 . i54] Wickens, NI.R., 2014. How useful are DSGtr macroeconomic models for forecasting? Open Economies Review. 25.171-193. 9 Appendix 1: Available IIW tests of macro models Le et al. (2011) found that, after re-estimation by indirect inference, the SW model on post-1984 (but precrisis) data passed the indirect inference test comfortably. It is of interest to examine the outcome from using output, inflation and interest rates a likelihood ratio test. The II test used a VAR(I) with three variables variables, as well as with a VAR(I) with for these 3 as the auxiliary model. With a higher-order VAR -more progressively performed worse, being rejected most of the time. than these three variables, the model the model is able to capture outlines' of the behaviour of Le et al. interpreted this to mean that the 'broad is not the fuli but the model these key macroeconomic variabies 'truth'. 25 We choose as the benchmark for the LR test a VAR(1) with 3 variables, as we have seen that the power does not vary with the lag order of the VAR or with the number of variables. For both the LR and Wald tests we generate 1000 sets of bootstrap data from the model’s errors from which we obtain critical values from estimates of the distributions of the test statistics under the null that the model is true. The probabilities of rejecting the null that the model is correct and the VAR is restricted against the alternative of an unrestricted VAR are reported in Table 14. We have found in our Monte Carlo experiments that the power of the LR test is considerably lower than that for the Wald test; with more variables in the VAR and with higher-order lags, we found that the power of the Wald test rose substantially, while remaining little changed for the direct inference LR test. This is consistent with what we …nd here for the modi…ed SW model. Of the two tests, the LR fails to reject at all, while the Wald rejects for any VAR with more than 18 coe¢cients. We can also see that for our main focus on three variables with a VAR(1) (the …rst line of Table 20) both tests give consistent results. Wald+ 83:5 99:6 100 90:1 96:6 100 VAR — no. of coe¢cients 3 variable VAR(1) — 9 3 variable VAR(2) — 18 3 variable VAR(3) — 27 4 variable VAR(1) — 16 5 variable VAR(1) — 25 7 variable VAR(3) — 147 + LR 71:7 71:4 67:7 82:8 74:2 13:4 The Wald test includes the variances of the data in each case Table 14: Tests using varying VARs Comparing the outcomes for the two tests, the LR tests are all passed rather easily indicating that the model is well ’on track’. This was noticed by Smets and Wouters for their original model on which they performed various forecasting tests that are closely related to the LR test used here. In contrast, the model passes the Wald test only using a VAR(1) with 3 or 4 key variables, which is a coarse description of the inter-relationships. For …ner descriptions or with more variables, the model fails. This provides information about what the model can do. In general we …nd that macro models cannot match the details of consumption and investment, even when they can match the key variables: output, in‡ation and interest rates. A possible reason is that the data on consumption and investment are poor; for example, we know that durable consumption goods, which should be treated as capital, are routinely included in consumption. Table 15 summarises the results of many of the recent applications of the use of the indirect inference evaluation procedure. The Wald statistic used is based on the coe¢cients of the auxilary VAR model and the data variances. The …rst column denotes the country, sample episode and the model studied; the third column provides the name of the authors and the reference. The second column gives the results which show that models can be found that are not rejected for key sets of macro variables such as output, in‡ation and interest rates. The …ndings of Le et al (2010, 2011, 2014) are that, in general, models which can match a VAR(1) on a limited number of variables, do not perform as well on VARs with many more variables, and are typically rejected for higher-order VARs than a VAR(1). Another common …nding is that the 3-equation New Keynesian model originally proposed by Clarida, Gali and Gertler (1999) passes the test after re-estimation and can even match higher-order VARs - see Minford and Ou (2013), Liu and Minford (2012, 2014), and also Minford, Ou and Wickens (2013) for similar results. A possible explanation is the relative lack of tight cross-equation restrictions in these small models compared with those imposed by the more elaborate model of Smets and Wouters. 10 Appendix 2: Steps in deriving the Wald statistic The following steps summarise our implementation of the Wald test by bootstrapping: Step 1: Estimate the errors of the economic model conditional on the observed data and 0 . Estimate the structural errors "t of the DSGE macroeconomic model, xt ( 0 ), given the stated values 0 and the observed data. The number of independent structural errors is taken to be less than or equal to the number of endogenous variables. The errors are not assumed to be normally distributed. Where the 26 Country Episode Model Estimation method Result/Wald %tiley UK 1975–2004 Liverpool Model (3 regimes) Calibrated Marginal/98.8 EA 1975–1999 Smets-Wouters Bayesian Reject/100 EA+US 1975–1999 Smets-Wouters ‘world’ Bayesian Outputs,RXR/94.2 US 1982–2007 3-eqn NK-M Calibrated y; ; R/96.5 UK 1959–2007 RBC open economy Calibrated RXR/94.2 US 1947–2004 Smets-Wouters hybrid | Indirect estimation y; ; R/98.7 US 1984–2004 Smets-Wouters hybrid | Indirect estimation y; ; R/83.8 US 1981–2010 3-eqn NK (Rational Exp.) Indirect estimation y; ; R/79.8 US 1981–2010 3-eqn NK (Behavioural Exp.) Indirect estimation Reject/100 US 1981–2010 4-eqn NK+banking z Indirect estimation y; ; R/45.4 China 1978–2007 Smets-Wouters hybrid | Indirect estimation y; ; R/69.0 China 1991–2011 Smets-Wouters hybrid+bkg |z Indirect estimation y; ; R/89.2 non-stationary data z Addition of the banking sector model of Bernanke et al. (1999) New Keynesian with imposition of timeless monetary policy rule | Smets-Wouters with addition of competitive sector y Results column shows variables included in Wald and Wald rejection status with Wald percentile. references Minford et al. (2009) Meenagh et al. (2009) Le et al. (2010) Minford and Ou (2013) Meenagh et al. (2010) Le et al. (2011) Le et al. (2011) Liu and Minford (2014a) Liu and Minford (2014a) Liu and Minford (2014b) Dai et al. (2014) Le et al. (2014) Table 15: Summary of recent tests of DSGE models. equations contain no expectations the errors can simply be backed out of the equation and the data. Where there are expectations estimation is required for the expectations; here we carry this out using the robust instrumental variables methods of McCallum (1976) and Wickens (1982), with the lagged endogenous data as instruments — thus e¤ectively we use the auxiliary model V AR. An alternative method for expectations estimation is the ’exact’ method; here we use the model itself to project the expectations and because these depend on the extracted residuals there is iteration between the two elements until convergence. Step 2: Derive the simulated data On the null hypothesis the f"t gTt=1 are the structural errors. The simulated disturbances are drawn from these errors. In some DSGE models, including the SW model, many of the structural errors are assumed to be generated by autoregressive processes rather than being serially independent. If they are, then under our method we need to estimate them. We derive the simulated data by drawing the bootstrapped disturbances by time vector to preserve any simultaneity between them, and solving the resulting model using Dynare (Juillard, 2001). To obtain the N bootstrapped simulations we repeat this, drawing each sample independently. We set N = 1000. Step 3: Compute the Wald statistic We estimate the auxiliary model — a VAR(1) — using both the actual data and the N samples of simulated data to obtain estimates aT and aS ( 0 ) of the vector . The distribution of aT aS ( 0 ) and its covariance matrix W ( 0 ) 1 are estimated by bootstrapping aS ( 0 ). The bootstrapping proceeds by drawing N bootstrap samples of the structural model, and estimating the auxiliary VAR on each, thus obtaining N values of aS ( 0 ); we obtain the covariance of the simulated variables directly from the bootstrap samples. The resulting set of ak vectors (k = 1; ::::; N ) represents the sampling variation implied by the structural model from which estimates of its mean, covariance matrix and con…dence bounds may be calculated directly. Thus, the estimate of W ( 0 ) 1 is 1 N (ak ak )0 (ak ak ) N k=1 where ak = N1 N k=1 ak . We then calculate the Wald statistic for the data sample; we estimate the bootstrap distribution of the Wald from the N bootstrap samples. We note that the auxiliary model used is a VAR(1) and is for a limited number of key variables: the major macro quantities which include GDP, consumption, investment, in‡ation and interest rates. By raising the lag order of the VAR and increasing the number of variables, the stringency of the overall test of the model is increased. If we …nd that the structural model is already rejected by a VAR(1), we do not proceed to a more stringent test based on a higher order VAR11 . 1 1 This increasing stringency is illustrated by the worsening performance of the model tested in Table 7 below for higher order VARs, as noted in footnote 7. In fact the general representation of a stationary loglinearised DSGE model is a VARMA, which would imply that the true VAR should be of in…nite order, at least if any DSGE model is the true model. However, for the same reason that we have not raised the VAR order above one, we have also not added any MA element. As DSGE models do better in meeting the challenge this could be considered. 27 Rather than focus our tests on just the parameters of the auxiliary model or the impulse response functions, we also attach importance to the ability to match data variances, hence their inclusion in . As highlighted in the debates over the ‘Great Moderation’ and the recent banking crisis, there is a major concern over the scale of real and nominal volatility. In this way our test procedure is within the traditions of RBC analysis. We refer to the Wald statistic based on the full set of variables as the Full Wald test; it checks whether the a vector lies within the DSGE model’s implied joint distribution and is a test of the DSGE model’s speci…cation in a wide sense. We show where in the Wald bootstrap distribution the Wald based on the data lies (the Wald percentile). We also show the Mahalanobis Distance based on the same joint distribution, normalised as a t-statistic, and also the equivalent Wald p-value, as an overall measure of closeness between the model and the data.12 We also consider a second Wald test, which we refer to as a ‘Directed Wald statistic’. This focuses on more limited features of the structural model. Here we seek to know how well a particular variable or limited set of variables is modelled and we use the corresponding auxiliary equations for these variables in the VAR as the basis of our test. For example, we may wish to know how well the model can reproduce the behaviour of US output and in‡ation by creating a Wald statistic based on the VAR equation for these two variables alone. A Directed Wald test can also be used to determine how well the structural model captures the e¤ects of a particular set of shocks. This requires creating the joint distribution of the IRFs for these shocks alone. For example, to determine how well the model deals with supply shocks, we construct the joint distribution of the IRFs for the supply shocks and calculate a Wald statistic for this. Even if the full model is misspeci…ed, a Directed Wald test provides information about whether the model is well-speci…ed enough to deal with speci…c aspects of economic behaviour. 1 2 The Mahalanobis Distance is the square root of the Wald value. As the square root of a chi-squared distribution, it can be converted into a t-statistic by adjusting the mean and the size. We normalise this here by ensuring that the resulting t-statistic is 1.645 at the 95% point of the distribution. 28