Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Academia.eduAcademia.edu

Preprint, August 2005

Abstract

A frequent criticism of unit root tests concerns the poor power and size properties that many such tests exhibit. However, during the past decade or so intensive research has been conducted to alleviate these problems and great advances have been made. The present paper provides a selective survey of recent contributions to improve upon both the size and power of unit root tests and, in so doing, the approach of using rigorous statistical optimality criteria in the development of such tests is stressed. In addition to presenting tests where improved size can be achieved by modifying the standard Dickey-Fuller class of tests, the paper presents the theory of optimal testing and the construction of power envelopes for unit root tests under different conditions allowing for serial correlation, deterministic components, assumptions regarding the initial condition, non-Gaussian errors, and the use of covariates.

7 Improving Size and Power in Unit Root Testing Niels Haldrup and Michael Jansson Abstract A frequent criticism of unit root tests concerns the poor power and size properties that many such tests exhibit. However, during the past decade or so intensive research has been conducted to alleviate these problems and great advances have been made. The present paper provides a selective survey of recent contributions to improve upon both the size and power of unit root tests and, in so doing, the approach of using rigorous statistical optimality criteria in the development of such tests is stressed. In addition to presenting tests where improved size can be achieved by modifying the standard Dickey–Fuller class of tests, the paper presents the theory of optimal testing and the construction of power envelopes for unit root tests under different conditions allowing for serial correlation, deterministic components, assumptions regarding the initial condition, non-Gaussian errors, and the use of covariates. 7.1 7.2 7.3 7.4 Introduction Unit root testing 7.2.1 The augmented Dickey–Fuller and Phillips–Perron classes of tests 7.2.2 Size distortions of unit root tests 7.2.3 Modified unit root tests with good size 7.2.4 Deterministics Power envelopes for unit root tests 7.3.1 The leading special case 7.3.2 Serial correlation 7.3.3 Deterministics 7.3.4 The initial condition 7.3.5 Non-Gaussian errors 7.3.6 Covariates Conclusion 252 253 254 256 257 259 260 262 262 265 266 267 269 271 272 Niels Haldrup and Michael Jansson 253 7.1 Introduction Since the mid-1980s there has been a veritable explosion of research on the importance of unit roots in the analysis of economic and other time series data. The reasons for this are manifold, but perhaps the most important motivation for this work is the fact that the development of the notion of cointegration by Granger (1981) and Engle and Granger (1987) has stressed the significance of unit roots and the importance of making valid statistical inference in the presence of nonstationary time series data. There is a vast literature on developing statistical theory for unit root (integrated) processes and the list of empirical applications using unit root testing is even more impressive. There is also a tremendous literature examining the power and size of unit root tests, not least by adopting numerical simulation (Monte Carlo) techniques under a multitude of different designs of the underlying process. Referring to exhaustive lists of contributions to the theoretical, numerical and empirical literature in this field goes far beyond the space limitation we have here, but surveys referring to many of these contributions can be found in, inter alia, Stock (1994), Maddala and Kim (1998), and Phillips and Xiao (1998). The present chapter reviews some of the results and findings in the unit roots literature that we believe are the most important contributions made over the past decade or so. The review is deliberately chosen to be selective and focuses on those contributions we believe are most likely to be fruitful for future developments of theory as well as in applications. Historically, the criticisms of unit root testing have concerned both the power and size properties of conventional unit root tests (e.g., Schwert, 1989; Agiakloglou and Newbold, 1992; and DeJong, Nankervis, Savin, and Whiteman, 1992a, 1992b). Stimulated in part by these influential Monte Carlo studies, a considerable amount of effort has been devoted to improving the size and/or power properties of unit root tests and much progress has been made. Most of these advances have been made with the help of rigorous statistical theory and are therefore potentially of general methodological interest. Here we survey (a subset of ) these recent contributions. We focus narrowly on the problem of testing for the presence of autoregressive unit roots, partly for concreteness and partly because this branch of nonstationary time series analysis appears to be the one in which the pertinent problems are best understood at this point. Consequently, several important recent advances in the analysis of nonstationary data are abstracted from in the present exposition. These advances include models of higher orders of integration (e.g., Dickey and Pantula, 1987; Haldrup, 1998), tests of seasonally integrated processes (e.g., Hylleberg, Engle, Granger, and Yoo, 1990; Franses, 1996; Ghysels and Osborn, 2001), tests of unit roots against structural breaks (e.g., Perron, 1989, 2005), fractionally integrated processes (e.g., Granger and Joyeux, 1980; Baillie, 1996; Velasco, 2005), testing stationarity against nonstationarity (e.g., Kwiatkowski, Phillips, Schmidt, and Shin, 1992; Saikkonen and Luukkonen, 1993a, 1993b; Jansson, 2004), and panel data unit root tests (e.g., Levin, Lin, and Chu, 2002; Im, Pesaran, and Shin, 2003; and Choi, 2005). Also, a recent literature has developed bootstrap methods 254 Improving Size and Power in Unit Root Testing for unit root tests (e.g., Park, 2002, 2003; Paparoditis and Politis, 2003; Davidson and MacKinnon, 2005). The chapter proceeds as follows. Section 7.2 is concerned with the Dickey–Fuller class of tests (Dickey and Fuller, 1979) and the modifications of this testing framework that have been suggested in order to accommodate (particularly) serially dependent processes and the size distortions that this complication generally implies. The modifications discussed herein include the proposal of Said and Dickey (1984) to use long autoregressions to approximate general dependent processes and the nonparametric corrections of Phillips (1987a) and Phillips and Perron (1988). Special emphasis will be put on the further improvements towards alleviating size distortions made in a series of papers by Ng and Perron, culminating in Ng and Perron (2001). The focus of this recent work is on the importance of appropriately estimating the long-run variance by use of an autoregressivebased spectral density estimator, whilst simultaneously accounting for the serious bias that the estimate of the least squares estimator of the autoregressive coefficient implies even in large samples. Extending the approach to allow for deterministic components by exploiting the GLS detrending procedure of Elliott, Rothenberg, and Stock (1996), the class of tests suggested by Ng and Perron (2001) are argued to exhibit excellent power and size performance. In fact, the tests of Ng and Perron are ‘‘nearly efficient,’’ in the sense that they almost achieve the asymptotic power envelope for unit root tests. The concepts of test efficiency and power envelopes in the construction of tests for unit roots are discussed in depth in section 7.3. To highlight the nonstandard aspects of the unit root testing problem, the discussion of efficiency starts from the benchmark case of a zero mean Gaussian AR(1) model, a fully parametric model in which there are no nuisance parameters. Subsequent subsections discuss how Elliott, Rothenberg, and Stock’s (1996) efficiency results for the benchmark change under modifications of that model. The discussion emphasizes the role of nuisance parameters caused by the accommodation of serial correlation (Elliott, Rothenberg, and Stock, 1996), deterministic components (Elliott, Rothenberg, and Stock, 1996), and/or a nonzero initial condition (Müller and Elliott, 2003), but also touches upon two sources of potentially important power gains in unit root testing, namely non-Gaussian errors and stationary covariates. 7.2 Unit root testing Suppose the observed data {yt : t ¼ 1, . . . , T } is generated by the AR(1) model yt ¼ ryt1 þ et , ð7:1Þ where y0 ¼ 0 and the et  i:i:d:ð0, s2e Þ are unobserved errors. In this model, the unit root testing problem is that of testing H0 : r ¼ 1 vs H1 : r < 1: Niels Haldrup and Michael Jansson 255 Let PT yt1 yt ^ ¼ Pt¼2 r T 2 t¼2 yt1 denote the OLS estimator of r and let tr^ ¼ ^1 r qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PT 2 ffi , s= t¼2 yt1 s2 ¼ T 1 T X ^ yt1 Þ2 , ðy t  r t¼2 ^ and tr^ be the t-statistic associated with the unit root hypothesis. As is well known, r exhibit nonstandard large sample behavior under the unit root hypothesis. Indeed, when r ¼ 1, R1 ^  1Þ !d T ðr W ðr ÞdW ðr Þ R1 2 0 W ðr Þ dr 0 ð7:2Þ and R1 0 W ðr ÞdW ðr Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi tr^ !d q R1 2 0 W ðr Þ dr ð7:3Þ where W is a standard Brownian motion (i.e., a Wiener process) and ‘‘ ! d’’ signifies convergence in distribution. Although the results in the preceding equations can be traced back to White (1958), the limiting distributions in (7.2) and (7.3) are frequently referred to as the Dickey–Fuller distributions, in recognition of the contribution of Dickey, Fuller (1979). 7.2.1 The augmented Dickey–Fuller and Phillips–Perron classes of tests An important implication of (7.2) and (7.3) is that both T (^ r  1) and tr^ are asymptotically pivotal under the null hypothesis; that is, the limiting null distributions of these statistics do not depend on unknown nuisance parameters. However, the practical usefulness of these results is limited by the implausibility of the assumptions regarding the errors et. The impact of serially correlated errors can be illustrated by means of the model yt ¼ ryt1 þ ut , ut ¼ cðLÞet , ð7:4Þ   P j where y0 ¼ 0, et  i:i:d: 0, s2e , and cðLÞ ¼ 1 j¼0 cj L is a lag polynomial whose P1   coefficients {cj} satisfy j¼1 jcj  < 1. The model (7.4) imposes more general assumptions on the serial correlation pattern of yt  ryt  1 than does the AR(1) model in (7.1). As in the AR(1) model, the parameter of interest is r, the unit root testing problem being H0 : r ¼ 1 vs H1 : r < 1. 256 Improving Size and Power in Unit Root Testing Relative to the AR(1) model (7.1), the unit root testing problem in model (7.4) is complicated by the presence of the nuisance parameters {cj}. Phillips (1987a) shows that in this case (7.2) and (7.3) are modified as follows: R1 ^  1Þ !d T ðr 0 W ðr ÞdW ðr Þ þ l R1 2 0 W ðr Þ dr ð7:5Þ and tr^ !d o s R1 0 W ðr ÞdW ðr Þ þ l qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , R1 2 ð Þ W r dr 0 ð7:6Þ     P1 2 is the variance of ut, and where l ¼ o2  s2 = 2o2 , s2 ¼ E u2t ¼ s2e j¼0 cj  2 2 PT P1 2 1 2 o ¼ limT!1 T E ¼ se is the long-run variance of ut.1 t¼1 ut j¼0 cj When the innovations ut are i.i.d. (i.e., when cj ¼ 0 for j 1), o2 ¼ s2 and the limiting distributions in (7.5) and (7.6) simplify to the nuisance parameter free limiting distributions in (7.2) and (7.3). Different routes have been followed in the literature to account for the presence of nuisance parameters in the limiting distributions of T(^ r  1) and tr^ . It was shown by Dickey and Fuller (1979) that when ut is an AR process of (finite) order k, T (~ r  1) and tr~ calculated from the regression ~ yt1 þ yt ¼ r k1 X ~gj Dytj þ ~ vtk ðt ¼ k þ 1, . . . , T Þ ð7:7Þ j¼1 will indeed have the limiting null distributions in (7.2) and (7.3). However, if ut is an ARMA(p, q) process (with q 1), then the auxiliary regression model (7.7) will inadequately solve the nuisance parameter problem, at least if k is held fixed. On the other hand, utilizing results of the Berk (1974) variety and generalizing the celebrated results of Said and Dickey (1984), it has been shown by Chang and Park (2002) that, if ut is an ARMA( p,q) process, then the limiting null distributions of T (~ r  1) and tr~ coincide with the limiting distributions in (7.2) and (7.3)   provided Eðe4t Þ < 1, k ! 1, and k ¼ o T 1=2d for some d > 0.2 Rather than solving the nuisance parameter problem by employing an autoregressive sieve, Phillips (1987a) and Phillips and Perron (1988) use consistent estimators of o2 and s2 to transform the statistics T(^ r  1) and tr^ in a manner that eliminates the influence of nuisance parameters. More specifically, they suggest the statistics ^  1Þ  Zr ¼ T ð r ^2  s ^2 o P T 2 2T 2 t¼2 yt1 ð7:8Þ Niels Haldrup and Michael Jansson 257 and Zt ¼ ^2  s ^ ^2 s o ffi, tr^  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P ^ o 2 ^ 2 T 2 Tt¼2 yt1 2 o ð7:9Þ ^ 2 and s ^2 are consistent estimators of o2 and s2 . The limiting null diswhere o tributions of Zr and Zt coincide with the limiting distributions in (7.2) and (7.3). 7.2.2 Size distortions of unit root tests Several studies (for example, Schwert, 1989; Agiakloglou and Newbold, 1992) have documented that the tests discussed in the previous subsection generally exhibit significant size distortions in finite samples when errors are serially correlated, especially when the errors are of the moving average type with a root approaching minus one. Because a near cancellation of roots occurs when the MA root is very close to minus one, it is not surprising that any unit root test will suffer from severe size distortions in that case. However, it has been found that size can be seriously inflated even for negative roots of moderate magnitude. Schwert (1989) found that the test with the least size distortion is the (Said–Dickey) t-test based on tr~ obtained from a high order autoregression, but even for this test the size problem is not negligible. Moreover, even though long autoregressions may moderate the size problems, the use of a long autoregression typically leads to a nontrivial loss of power (for example, DeJong, Nankervis, Savin, and Whiteman, 1992a, 1992b). Ng and Perron (1995, 2001) further scrutinized rules for truncating long autoregressions when performing unit root tests based on (7.7). Consider the information criterion ~2k þ kCT =T, ICðkÞ ¼ log s ~2k ¼ ðT  kÞ1 s T X 2 , v~tk ð7:10Þ t¼kþ1 where {CT} is a positive sequence satisfying CT ¼ o(T). The Akaike Information Criterion (AIC) sets CT ¼ 2, whereas the Schwartz or Bayesian Information Criterion (BIC) puts CT ¼ log T. Ng and Perron (1995) found that, although these information criteria satisfy the requirement that k ¼ o(T1/3), generally models with too low a value of k are selected, with size distortions as a consequence. Ng and Perron (1995) also demonstrate that using a sequential data dependent procedure, where the significance of coefficients on additional lags are sequentially tested, will yield a test with improved size. However, a problem with the latter procedure is that in other cases the sequential test procedure tends to overparameterize, thereby resulting in a loss of power. More recently, Ng and Perron (2001) have developed an information criterion with a penalty function adequate for integrated time series. The idea is to select some lag order k in the interval between 0 and a preselected value kmax, where the upper bound kmax satisfies kmax ¼ o(T). Their preferred criterion, which can be interpreted as a modified form of the AIC, 258 Improving Size and Power in Unit Root Testing is given by 2k þ 2ðtT ðkÞ þ kÞ=ðT  kmax Þ, MAICðkÞ ¼ log s ð7:11Þ P P 2 2k ð~ 2k ¼ ðT  kmax Þ1 Tt¼kmax þ1 ~n2tk and tT ðkÞ ¼ s r  1Þ2 Tt¼kmax þ1 ¼ yt1 .3 where s It is interesting to observe that the penalty function of the modified criteria is data dependent, which is a way to account for the fact that the bias in the sum of ~  1) is highly dependent upon the selected k. the autoregressive coefficients (i.e., r Even though Ng and Perron (2001) did not examine directly the effect on the size of the Said–Dickey test by using the above rules, simulation results in other contexts demonstrate that modified information criteria are superior to conventional information criteria in truncating long autoregressions with integrated variables when moving average errors are present. In the implementation of the (Phillips–Perron) tests based on Zr and Zt, the estimator ^2 ¼ T 1 s T X ^2t , u ^ t ¼ yt  r ^yt1 , u t¼2 serves as a consistent estimator of s2. With respect to estimation of o2, a wide range of kernel estimators have been considered. These kernel estimators are of the form ^ 2KER ¼ T 1 o T X t¼2 ^2t þ 2 u T 1 X j¼1 0 wðj=MT Þ@T 1 T X 1 ^t u ^tj A, u ð7:12Þ t¼jþ2 where w( ) is some kernel (weight) function and MT is a bandwidth parameter (see for example, Newey and West, 1987; Andrews, 1991).4 The vast majority of unit root tests proposed in the literature use such kernel estimators of the long-run variance o2 to remove the influence of nuisance parameters in the asymptotic distributions. Notwithstanding, it has been shown by Perron and Ng (1996) that no spectral density estimator can completely eliminate size distortions and, in fact, kernel-based estimators tend to aggravate the size distortions.5 This finding is due to the fact that estimation of r and o2 are coupled, in the sense that the least ^ t and hence affects o ^ 2KER : Because the ^ is used in constructing u squares estimator r ^ is well known to be seriously biased in finite (and even in least squares estimator r large) samples when ut exhibits strong serial correlation, the nuisance parameter ^ 2KER is expected to be very imprecise in critical regions of the serial estimator o correlation parameter space. A seemingly obvious way to alleviate this problem is to construct an estimator where residuals are calculated under the null hypothesis, ^t in (7.12), but it has been shown by Phillips and i.e., by using Dyt instead of u Ouliaris (1990) that this leads to inconsistent unit root tests. To produce an estimator that is consistent under the unit root null whilst ^, Perron and Ng (1996, 1998), following earlier attenuating the dependence on r Niels Haldrup and Michael Jansson 259 work by Berk (1974) and Stock (1999), suggested using an autoregressive spectral density estimator based on estimation of the long autoregression (7.7): ^ 2AR ¼ o 1 ~2k s Pk1 j¼1 ~gj 2 , ð7:13Þ where k is chosen according to one of the information criteria discussed above. The Pk1 ~gj , rather than basing motivation for using the regression (7.7) to estimate j¼1 estimation on an autoregressive model in first differences (i.e., the model under the null), is that this ensures a consistent unit root test (e.g., Stock, 1999). The construction (7.13) decouples estimation of o2 from the estimation of r and, ^ 2AR is therefore, helps avoid the problems caused by the bias of r. In particular, o ^ that are caused by the presence of serial immune to potentially severe biases in r correlation in the errors. In a comparison of the size properties of the Phillips–Perron tests using the ^ 2AR and the tests using a Bartlett kernel estimator of o2, it was found that estimator o significant size improvements can be achieved in the most critical parameter ^ 2AR is space. Nevertheless, size distortions are still severe and remain so even if o 2 replaced by the (unknown) true value o (Perron and Ng, 1996), a finding which ^ are an important source of size distortions. The next indicates that biased in r subsection discusses developments that seek to obtain further improvements by addressing this issue. 7.2.3 Modified unit root tests with good size Perron and Ng (1996) consider modified Phillips–Perron tests (referred to as M-tests in the following) that appear to have much improved size properties compared to any other unit root test. Moreover, the tests can be designed such that they satisfy desirable optimality criteria in terms of power; a topic which we will later return to. The tests belong to a class originally suggested by Stock (1999), which exploits the fact that a series converges at different rates under the null and the alternative hypotheses. The first statistic reads MZr ¼ ^2AR T 1 yT2  r PT 2 , 2 2T t¼2 yt1 ð7:14Þ which can also be written in terms of Zr as MZr ¼ Zr þ T ð^ r  1Þ2 : 2 ð7:15Þ ^  1 ¼ Op ðT 1 Þ under the null (super consistency), it is seen that Zr and Because r MZr will be asymptotically equivalent (under the null), implying in particular that the limiting null distribution of MZr is the one given in (7.2). 260 Improving Size and Power in Unit Root Testing The next statistic reads vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u T u X 2 2 , ^ 2 T yt1 MSB ¼ to AR ð7:16Þ t¼2 which is stochastically bounded under the null and is Op (T  1) (and thus tends to zero) under fixed alternatives. This test is related to the Sargan and Bhargava (1983) test, hence the name, and critical values are reported by Stock (1999). Finally, because Zt ¼ MSB Zr, a modified Phillips–Perron t test can be defined as 1 r  1Þ2 : MZt ¼ Zt þ MSB ð^ 2 ð7:17Þ ^ is superconsistent (under the null), implying that the correction Even though r terms associated with Zr and Zt are asymptotically negligible, it is still the case that the correction factors can be important even in moderately large samples, the ^ is severely biased in the presence of strong serial correlation. reason being that r Simulation experiments reported in Perron and Ng (1996) show that the M-tests have impressively lower size distortion compared to other unit root tests that are available in the literature. However, it is essential that the autoregressive spectral ^ 2AR is used as an estimator of o2 to decouple estimation of the density estimator o unit root and the long-run variance for the tests to have good size properties. For instance, for an MA root of, e.g.,  0.8, the actual size of the MZt test is about 6% at a nominal 5% level, whereas Phillips–Perron tests or modified Phillips–Perron tests using kernel estimates of o2 have size close to 100%. The M-tests also appear to be robust to, e.g., measurement errors and additive outliers in the observed series. Franses and Haldrup (1994) and Haldrup, Montanés, and Sanso (2005) show that in these cases with data contamination, unit root inference using standard tests become seriously size affected. Vogelsang (1999) shows that the M-tests effectively solve these problems in terms of test size.6 7.2.4 Deterministics In practical applications the underlying model will also contain deterministic components. These can be accommodated by generalizing (7.4) to a components representation of the form yt ¼ mt þ zt , zt ¼ rzt1 þ ut , ð7:18Þ where z0 ¼ 0, ut is as in (7.4), and mt is a deterministic component. Most of the specifications of mt used in applications are linear-in-parameters specifications of the form mt ¼ dt0 b, ð7:19Þ where dt are known k-vectors of deterministic terms (for some k 1), while b are k-vectors of unknown parameters. The leading special cases of linear-in-parameters Niels Haldrup and Michael Jansson 261 specifications are the constant mean and linear trend specifications, in which dt ¼ 1 and dt ¼ (1,t) 0 , respectively.7 Appropriate detrending of the data is needed if one tests the unit root hypothesis against a trend stationary alternative (i.e., r < 1 in (7.18) and dt ¼ (1,t) 0 in (7.19)). Specifically, the Dickey–Fuller (or Said–Dickey) regressions should take the alternative form ~yt1 þ ~t þ r yt ¼ m k1 X ~gj Dytj þ v~tk : ð7:20Þ j¼1 Appropriate treatment of deterministics is extremely important. For instance, failure to include a time trend regressor in the auxiliary regression when power against the trend stationary alternative is wanted will lead to a test with zero asymptotic power. Similarly, the Phillips–Perron class of tests allow inclusion of deterministic components or, alternatively, detrending of the series prior to unit root testing can be done. For all cases where the model is augmented with deterministics the relevant distributions change accordingly, as Brownian motion processes are replaced with demeaned and detrended Brownian motions of the form W d ðrÞ ¼ WðrÞ  DðrÞ0 Z 1 DðsÞDðsÞ0 ds 0 1 Z 1  DðsÞWðsÞds , ð7:21Þ 0 where D (r) ¼ 1 when dt ¼ 1 and D(r) ¼ (1,r) when dt ¼ (1,t) 0 . With respect to the M-tests of the previous subsection, Ng and Perron (2001) suggest an alternative way of dealing with deterministics. The alternative detrending method, local GLS detrending, is in the spirit of Elliott, Rothenberg, Stock (1996) and has the advantage of yielding tests that are ‘‘nearly’’ efficient, in the sense that they nearly achieve the asymptotic power envelopes for unit root tests. (A discussion of these power envelopes will be provided in section 7.3.) The GLS detrending method can be described as follows. For any series fxt gTt¼1 of length T and any  T 1 x1 , . . . , DxT  cT 1 xT1 Þ0 . The GLS detrended constant c, define xc ¼ ðx1 , Dx2  c, series fy~t g is given by ~, y~t ¼ yt  dt0 b   ~ ¼ arg min yc  dc0 bÞ0 ðyc  dc0 b : b b Elliott, Rothenberg, Stock (1996) suggested c ¼ 7 and c ¼ 13:5 for dt ¼ 1 and dt ¼ (1,t) 0 , respectively, as these values of c correspond to the local alternatives against which the local asymptotic power envelope for 5 percent tests equals ^ 2AR 50 percent. The M-tests constructed using GLS detrended data (and using o together with the modified information criteria of Section 7.2.2) are denoted, respectively, MZrGLS , MSBGLS , and MZtGLS . These tests are shown by Ng and Perron (2001) to have excellent size and local asymptotic power. In conclusion, unit root tests can be constructed with both excellent size and local asymptotic power properties, but to achieve these dual objectives it is necessary to use GLS detrended data. 262 Improving Size and Power in Unit Root Testing 7.3 Power envelopes for unit root tests This section discusses power envelopes (efficiency bounds) for tests of the unit root hypothesis. Power envelopes for unit root tests have proven to be useful for two reasons. First, being attainable upper bounds on power, they give an objective standard against which the power properties of any feasible unit root test can be compared. For instance, the fact that the M-tests discussed in the previous section have local asymptotic power ‘‘close’’ to the appropriate power envelopes (Ng and Perron, 2001) implies that these are ‘‘nearly’’ efficient. Second, the derivation of power envelopes is useful because it suggests how admissible unit root tests with good overall power properties can be constructed. Indeed, the GLS detrending method, which is the key to accommodating deterministic components without sacrificing efficiency, is a natural by-product of the derivation of the power envelope in the presence of deterministic components (Elliott, Rothenberg, and Stock, 1996). 7.3.1 The leading special case A natural starting point for the discussion of power envelopes for unit root tests is the known-variance, zero-mean Gaussian AR(1) model.8 In this model, the observed data {yt : t ¼ 1, . . . ,T } is generated as yt ¼ ryt1 þ et , ð7:22Þ where y0 ¼ 0 and et  i:i:d: N ð0, 1Þ. Any (possibly randomized) unit root test can be represented by means of a test function fT : RT ! ½0, 1, such that H0, the unit root hypothesis, is rejected with probability fT (Y) if YT ¼ (y1, . . . ,yT) 0 ¼ Y. The power (function) associated with fT( ) is given by ErfT (YT), where the subscript on ‘‘E’’ indicates the distribution with respect to which the expectation is taken (i.e., the argument of the power function). When evaluating the power properties of a unit root test, a power envelope is very useful. By definition, a power envelope for a class of unit root tests gives an attainable upper bound on ErfT (YT) for tests in the class. Throughout this section, the class of tests under consideration will be the class of tests of (asymptotic) size a or some subset thereof. The power envelope for size a tests is the function PaT ð Þ given by9 PaT ðrÞ ¼ maxfT ð Þ:E1 fT ðYT Þ¼a Er fT ðYT Þ: ð7:23Þ By construction, PaT ðrÞ is an upper bound on Er fT ðYT Þ for test functions fT( ) associated with tests of size a. Moreover, the power envelope is attainable (pointwise) in the sense that, for every r, Er fT ðYT Þ ¼ PaT ðrÞ for some fT( ) corresponding to a test of size a. Niels Haldrup and Michael Jansson 263 There is a simple and constructive way to derive the power envelope. For the known-variance, zero-mean Gaussian AR (1) model, the log likelihood function, LT( ), satisfies the relation 1 ð7:24Þ LT ðrÞ  LT ð1Þ ¼ Tðr  1ÞST  ½Tðr  1Þ2 HT , 2 P P 2 where ST ¼ T 1 Tt¼2 yt1 Dyt and HT ¼ T 2 Tt¼2 yt1 : An application of the Neyman–Pearson lemma (e.g., Lehmann, 1994, Theorem 3.1) therefore yields  1 ð7:25Þ PaT ðrÞ ¼ Prr Tðr  1ÞST  ½Tðr  1Þ2 HT > kaT ðrÞ , 2 h i where kaT ðrÞ satisfies Pr1 Tðr  1ÞST  12 ½Tðr  1Þ2 HT > kaT ðrÞ ¼ a and the subscript on ‘‘Pr’’ indicates the distribution with respect to which the probability is evaluated. In addition to providing a formula for computing the power envelope, expression (7.25) delivers a characterization of the test that attains the power envelope at any given value of r. Specifically, it follows from (7.25) that the power envelope PaT ðrÞ is attained by the test which rejects for large values of Tðr  1ÞST  12 ½Tðr  1Þ2 HT : Because the functional form of the optimal test against any specific alternative r < 1 depends on r, the unit root testing problem does not admit a uniformly most powerful (UMP) size a test, in spite of the fact that it is a one-sided testing problem without any nuisance parameters. Applying the preceding arguments to other simple hypotheses on r, it can be verified that non-existence of a UMP size a test is a property shared by all one-sided hypothesis tests on the autoregressive coefficient r in (7.22). In other words, the nonexistence of a UMP size a test is not specific to the unit root hypothesis. What is somewhat special about the unit root hypothesis is the fact that non-existence of a UMP size a test holds even asymptotically.10 By analogy with the finite sample situation, the asymptotic power envelope for a class of (sequences of) unit root tests gives an attainable upper bound on local asymptotic power for (sequences of ) tests in the class. Assuming the limit exists, the local asymptotic power function of a sequence ffT ð Þg of unit root tests is the function (with argument c  0) limT!1 E1þT 1 c fT ðYT Þ.11 A sequence ffT ð Þg of unit root tests is said to have asymptotic size a if limT!1 E1 fT ðYT Þ ¼ a: The asymptotic power envelope for tests asymptotically of size a is the function Pa1 ð Þ given by Pa1 ðcÞ ¼ maxffT ð Þg:limT!1 E1 fT ðYT Þ¼a limT!1 E1þT 1 c fT ðYT Þ: ð7:26Þ An explicit formula for the asymptotic power envelope is available. Because the optimal unit root test against the alternative r ¼ 1 þ T 1 c rejects for large values of cST  12 c2 HT , it stands to reason that the asymptotic power envelope Fa1 ðcÞ is attained by the sequence of tests with rejection regions of the form   cST  12 c2 HT > ka1 ðcÞ , where ka1 ðcÞ is such that the sequence has asymptotic size a. 264 Improving Size and Power in Unit Root Testing Indeed, it can be shown that  1 Pa1 ðcÞ ¼ limT!1 Pr1þT 1 c cST  c2 HT > ka1 ðcÞ 2 ð7:27Þ  Z 1 Z 1 1 Wc ðrÞdWðrÞ þ c2 Wc ðrÞ2 dr > ka1 ðcÞ , ¼ Pr c 2 0 0 h R i R1 1 where ka1 ðcÞ satisfies Pr c 0 WðrÞdWðrÞ  12 c2 0 WðrÞ2 dr > ka1 ðcÞ ¼ a, W is a Wiener process, and Wc is an Ornstein–Uhlenbeck process satisfying the stochastic differential equation dWc ðrÞ ¼ cWc ðrÞdr þ dWðrÞ with initial condition Wc(0) ¼ 0.12 As is true of its finite sample counterpart, the asymptotic power envelope can only be attained pointwise. In other words, there does not exist a sequence of tests (asymptotically of size a) which attains Pa1 ðcÞ for all values of c. In the absence of a UMP test, it seems natural to try to derive tests enjoying weaker optimality properties in the hope that these tests will have good overall power properties. Two complementary notions of optimality, local optimality and point optimality, have been employed to derive unit root tests with demonstrable optimality properties. The notion of local optimality leads to a test which maximizes (the scaled limit of ) the derivative of the power function under the null hypothesis. For the unit root testing problem, the locally most powerful test rejects for small values of ST.13 Although admissible, the locally most powerful unit root test turns out to have pretty poor local asymptotic power properties (Stock, 1994). In contrast, the notion of point optimality has been found to deliver (admissible) unit root tests with excellent power properties. By definition, a point optimal unit root test maximizes power against a specific (point) alternative (e.g., King, 1988).14 The family of point optimal unit root tests is obtained as a by-product of the power envelope. It consists of all tests with rejection regions of the form fcST  12 c2 HT > ka1 ðcÞg where c indexes the local alternative against which optimal power is desired. Elliott, Rothenberg, and Stock (1996) found that point optimal unit root tests have local asymptotic power functions essentially identical to the power envelope for a wide range of values of the index c. A popular choice, advocated in the unit root context by Elliott, Rothenberg, and Stock (1996), is the value of c such that the asymptotic power envelope for 5 percent tests equals 50 percent when evaluated at c; that is, the recommended value of c solves the equation P0:05 1 ðcÞ ¼ 0:5: Numerous other unit root tests have been proposed (for a review, see Stock, 1994). The most well-known examples are probably the Dickey–Fuller (1979) tests (i.e., the tests based on Tð^ r1 Þ and tr^ ) and their asymptotic equivalents (e.g., the tests based on Zr and Zt discussed in section 7.2.1). In the case where the error variance is known to equal unity, a Dickey–Fuller type t-test rejects for small values of ^1 r ST qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi: PT 2 ffi ¼ pffiffiffiffiffiffi HT 1= t¼2 yt1 Niels Haldrup and Michael Jansson 265 This test can be interpreted as a (signed) likelihood ratio test.15 Moreover, the test has excellent local asymptotic power properties (Elliott, Rothenberg, and Stock, 1996). In spite of this, the test does not seem to enjoy any conventional optimality properties.16 On the other hand, the Dickey–Fuller estimator test, which rejects for small values of Tð^ r  1Þ ¼ ST =HT , is a member of the class of point optimal tests and is therefore admissible (Stock, 1994). Power envelopes have been derived for a variety of extensions of the basic model (7.22). The remainder of this section discusses five such extensions. These are all of practical and theoretical interest, the practical interest being due to the empirical relevance of the extensions. To simplify the exposition, we discuss each of the extensions in isolation, in each case studying a model which departs as little as possible from the known-variance, zero-mean Gaussian AR(1) model. 7.3.2 Serial correlation This subsection discusses the impact of serial correlation on the form of the asymptotic power envelope. As was explained in section 7.2, the presence of serial correlation introduces nontrivial complications for practitioners wanting to employ unit root tests. In contrast, the presence of serial correlation has no impact on the asymptotic properties of the unit root testing problem, because the asymptotic power envelope (7.27) remains valid under rather general assumptions on the short-run temporal dependence properties of the quasi-difference process fyt  ryt1 g: Consider the Gaussian variant of (7.4) in which yt ¼ ryt1 þ ut , ut ¼ cðLÞet , ð7:28Þ P j where y0 ¼ 0, et  i:i:d: N ð0, 1Þ, and cðLÞ ¼ 1 whose j¼0 cj L is a lag polynomial P1 P1 ir ir (unknown) coefficients {cj} satisfy j¼0 jcj j < 1 and cðe Þ ¼ j¼0 cj e 6¼ 0 for all r [ R The construction of the asymptotic power envelope proceeds in two steps. In the first step, the Neyman–Pearson lemma is used to derive the asymptotic power envelope under the (counterfactual) assumption that {cj} is known. The second step then shows that this envelope is indeed the asymptotic power envelope for unit root tests in the model (7.28) by showing that the bound can be attained (pointwise) without knowledge of {cj}. For now, suppose the parameters {cj} are known. In that case, the derivation of an asymptotic power envelope is conceptually straightforward because r is the only parameter of the model. By the Neyman–Pearson lemma, the unit root test with optimal power against the local alternative r ¼ 1 þ T  1c rejects for large values of the log likelihood ratio LcT ð1 þ T 1 cÞ  LcT ð1Þ, where LcT ð Þ is the log likelihood function. A slick derivation of the asymptotic power envelope can be based on the fact that 1 LcT ð1 þ T 1 cÞ  LcT ð1Þ ¼ cScT  c2 HTc þ op ð1Þ 2 ð7:29Þ 266 Improving Size and Power in Unit Root Testing P 2 under the null hypothesis, where ScT ¼ 12 ðo2 T 1 yT2  1Þ, HTc ¼ o2 T 2 Tt¼2 yt1 , 2 2 17 and o ¼ c(1) is the long-run variance of c(L)et. Using the quadratic expansion (7.29) and the theory of limits of experiments (see, for example, Le Cam and Yang, 2000; van der Vaart, 1998), it can be shown that the asymptotic power envelope for size a tests is attained (at the point c) o by the sequence of tests with rejection regions n a, c a, c of the form cScT  12 c2 HTc > k1 ðcÞ , where k1 ðcÞ is such that the sequence has asymptotic size a. Under the null and contiguous alternatives, the limiting distribution of ScT , HTc does not depend on {cj}. By implication, the critical value a, c function k1 ðcÞ does not depend on {cj}. More importantly, the asymptotic power envelope is invariant with respect to {cj} and is given by the function Pa1 ð Þ defined in (7.27). To show that the upper bound Pa1 ð Þ constitutes the asymptotic power envelope for the model (7.28), it must be shown that Pa1 ð Þ is attainable without knowledge ^ c , computable without of {cj}. To do so, it suffices to exhibit a pair ^ScT , H T c ^c c c ^ knowledge of {cj}, such that ST , HT ¼ ST , HT þ op (1) under the unit root hypothesis (irrespective of the value of {cj}). (Assuming such a pair can be found, ^ c will attain the asymptotic the test which rejects for large values of c^ScT  12 c2 H T power envelope.) The asymptotic equivalence requirement is met by  2 1 2  P 2 ^Sc ¼ 1 o ^c ¼ o ^ 2 T 2 Tt¼2 yt1 ^ 2 is any consistent ^ T yT  1 and H , where o T T 2 2 18 (under the unit root hypothesis) estimator of o . 7.3.3 Deterministics Proceeding as in section 7.2.4, deterministic terms can be accommodated by extending the basic model (7.22) as follows: yt ¼ mt þ zt , zt ¼ rzt1 þ et , ð7:30Þ where mt is an unknown deterministic component, z0 ¼ 0, and et  i.i.d. N (0,1). In this model, {mt} is a nuisance feature in the unit root testing problem. When deriving an asymptotic power envelope in the presence of {mt}, it is tempting to try to employ the same strategy as in the previous subsection; that is, it is tempting to first derive the asymptotic power envelope assuming {mt} is known and then attempt to find a feasible test which attains the bound obtained under the assumption that {mt} is known. That method of construction breaks down in general, however. On the one hand, because zt ¼ yt  mt is generated by the model of section 7.3.1, it is obvious that the asymptotic power envelope is given by the function Pa1 ð Þ (defined in (7.27)) when {mt} is known. On the other hand, for many specifications of {mt} used in practice, it turns out to be impossible to find tests that attain Pa1 ð Þ without knowledge of {mt}. When mt ¼ dt0 b (i.e., mt is of the linear-in-parameters form (7.19)), the principle of invariance (e.g., Lehmann, 1994, chapter 6) can be employed to eliminate the deterministic component from the unit root testing problem. Any testing problem regarding r is invariant under transformations of the form gb ðy1 , . . . , yT Þ ¼ ðy1 þ d10 b, . . . , yT þ dT0 bÞ (where b [ Rk ), the induced transformation in the parameter space being gb (r,b) ¼ (r,b þ b). When b is treated as an unknown Niels Haldrup and Michael Jansson 267 nuisance parameter, it therefore seems natural to restrict attention to unit root tests that are invariant in the sense that their test functions fT( ) satisfy fT ðy1 þ d10 b, . . . , yT þ dT0 bÞ ¼ fT ðy1 , . . . , yT Þ for every b [ Rk . In other words, a test is invariant if the conclusion drawn by the test depends on the observed data {yt} only through the unobserved series {zt}. As a consequence, an invariant test has a test function whose distribution depends only on r.19 Therefore, reduction by invariance eliminates the nuisance parameter b from the unit root testing problem, thereby making it possible to obtain power envelopes by means of the Neyman– Pearson lemma. Drawing on the work of King (1980) and King and Hillier (1985), Dufour and King (1991) used this insight to obtain point optimal invariant tests of simple hypotheses on r in the AR(1) model (7.30). Asymptotic power envelopes were obtained (in a model accommodating serial correlation) for the unit root testing problem by Elliott, Rothenberg, and Stock (1996) in the case where dt is a polynomial trend term. The functional form of the asymptotic power envelope depends on the order of the polynomial trend. It is given by Pa1 ð Þ in the constant mean case, but not otherwise. As was true in the model without deterministic components, the derivation of the asymptotic power envelopes in model (7.30) is constructive in the sense that tests attaining the asymptotic power envelope are obtained as a by-product. The optimal invariant test against the local alternative r ¼ 1 þ T  1c rejects for large values of the profile log likelihood ratio maxb LT (1 þ T  1c,b)  maxbLT(1,b), where LT( ) is the log likelihood function. As in the model without deterministic components, no test attains the power envelope uniformly but appropriately chosen point optimal invariant tests are ‘‘nearly efficient’’ in the sense that their local asymptotic power functions are ‘‘close’’ to the asymptotic power envelopes (Elliott, Rothenberg, and Stock, 1996). This ‘‘near-efficiency’’ property is not shared by the popular Dickey–Fuller (1979) tests, whose local asymptotic power functions fall well short of the asymptotic power envelope. Nevertheless, the class of ‘‘nearly efficient’’ invariant unit root tests contains tests other than the point optimal tests obtained in the derivation of the asymptotic power envelope. Examples include the DF-GLS test of Elliott, Rothenberg, and Stock (1996) and Ng and Perron’s (2001) MGLS tests discussed in section 7.2.4. Consequently, the MGLS tests have both excellent size properties and ‘‘nearly’’ optimal power properties. In view of the inferiority of the Dickey–Fuller (1979) tests in the model (7.30), it would appear that practitioners ought to abandon the use of these tests. More recent research, examining the role of the initial condition y0, has arrived at a slightly less drastic conclusion. A brief discussion of that literature is provided in the next subsection. 7.3.4 The initial condition In the known-variance, zero-mean Gaussian AR (1) model of section 7.3.1, the (unobserved) initial observation y0 is assumed to be equal to zero. Analogous assumptions are made in the models of sections 7.3.2 and 7.3.3. At first sight, these would appear to be innocuous normalizations because it is easy to show that the initial observation is asymptotically negligible whenever T  1/2y0 ¼ op(1), a 268 Improving Size and Power in Unit Root Testing condition that is also satisfied if y0 is treated as a nuisance parameter (i.e., modeled as a constant) or modeled as a random variable with a fixed distribution. Now, if {yt : 0  t  T} is generated by a stationary Gaussian AR(1) model with autoregressive coefficient r and innovation variance equal to unity, the initial observation will satisfy y0  N ½0, 1=ð1  r2 Þ. In that case, the initial observation is not asymptotically negligible, the limiting distribution T  1/2y0 being N ½0, 1=ð2cÞ under local-to-unity asymptotics with r ¼ 1 þ T  1c for some c < 0. To the extent that stationarity is a plausible alternative to the unit root hypothesis, these considerations suggest that the role of the initial condition is worth investigating. The role of the initial condition can be explored by means of the following stripped-down version of the model studied by Müller and Elliott (2003): yt ¼ m þ zt , zt ¼ rzt1 þ et ð7:31Þ where et  i:i:d: N ð0, 1Þ and z0  N ð0, ks20 ðrÞÞ, where k 0 is a known constant, s20 ðrÞ :¼ 1ðjrj < 1Þ=ð1  r2 Þ, 1( ) is the indicator function, and z0 is independent of {et}. The model (7.31) reduces to the model of section 7.3.3 when k ¼ 0. When k ¼ 1, in contrast, {zt} is generated by a stationary AR(1) whenever j r j < 1 and the model reduces to a special case of the model studied by Elliott (1999). Irrespective of the value of k, the initial condition is z0 ¼ 0 under the unit root hypothesis. Due to the inclusion of the constant term m, this assumption is simply a normalization. The derivation of the asymptotic power envelope for the unit-root testing problem proceeds as in section 7.3.3. First, the principle of invariance can be employed to remove the nuisance parameter m. Then, the Neyman–Pearson lemma can be used to characterize the functional form of point optimal invariant tests. The optimal invariant test against the local alternative r ¼ 1 þ T  1c rejects for large values of the profile log likelihood ratio maxb LkT ð1 þ T 1 c, bÞ  maxb LkT ð1, bÞ, where LkT ð Þ is the log likelihood function. The functional form of the point optimal tests and the shape of the asymptotic power envelope both depend on the value of the constant k. Moreover, although the tests discussed in section 7.3.3 (‘‘nearly efficient’’ when k ¼ 0 in the model considered here) are asymptotically similar for any value of k, these tests have power well below the power envelopes corresponding to moderately large values of k (Müller and Elliott, 2003).20 Müller and Elliott (2003) emphasize an alternative interpretation of the power envelopes discussed in the previous paragraph. If the initial condition z0 is treated as an unknown nuisance parameter (as opposed to a random variable with a known distribution), the unit root testing problem is complicated by the presence of an unidentified nuisance parameter under the null hypothesis, the parameters m and z0 appearing in the likelihood only through their sum. Müller and Elliott (2003) deal with this problem by applying a weighted average power criterion in the spirit of Andrews and Ploberger (1994) when deriving asymptotic power envelopes. The weighting functions employed by Müller and Elliott (2003) correspond to the distributional assumption on z0 made in (7.31) and give rise to the same asymptotic power envelopes. Niels Haldrup and Michael Jansson 269 In addition to deriving asymptotic power envelopes for the model (7.31), Müller and Elliott (2003) explore the extent to which existing unit root tests can be ‘‘rationalized’’ as being point optimal tests in model (7.31) for appropriately selected values of the constant k and the local-to-unity parameter c. They find that the tests proposed by Bhargava (1986) can be interpreted as (limiting versions of ) point optimal tests in the model (7.31), as can the locally best invariant tests derived by Dufour and King (1991) and Nabeya and Tanaka (1990). Moreover, Müller and Elliott (2003) argue that, although the popular Dickey–Fuller (1979) tests cannot be ‘‘rationalized’’ in this way, there is a sense in which the Dickey– Fuller tests are well approximated by certain members of the class of point optimal tests in the model (7.31), albeit with rather large values of k. 7.3.5 Non-Gaussian errors All of the power envelopes discussed so far have been derived under the assumption that the latent errors {et} are (standard) normally distributed. Because the normality assumption is implausible in most empirical applications of unit root tests, it is of interest to develop asymptotic power envelopes for unit roots in (possibly) non-Gaussian environments. Consider the model21 yt ¼ ryt1 þ et , ð7:32Þ where y0 ¼ 0 and {et} are i.i.d. errors from an unknown (possibly non-Gaussian) distribution with mean zero and variance one. The derivation of the asymptotic power envelope in section 7.3.1 made use of two results, the Neyman–Pearson lemma and the fact that ðST  cHT , HT Þ ! dc Z 1 Wc ðrÞdWðrÞ, 0 Z 1  Wc ðrÞ2 dr , c  0: 0 The displayed convergence result, which was used to characterize the local asymptotic power of point optimal tests (derived by means of the Neyman– Pearson lemma), remains valid in the model (7.32). In other words, the limiting representation of (ST, HT), the minimal sufficient statistic under the assumption of normality, is invariant with respect to the distribution of {et} as long as E(et) ¼ 0 and Eðe2t Þ ¼ 1.22 By implication, the local asymptotic power function of the point optimal tests from section 7.3.1 does not depend on the distribution of {et} in the model (7.32). The Gaussian asymptotic power envelope of section 7.3.1 therefore gives a lower bound on maximal attainable local asymptotic power in the model (7.32). An upper bound on the magnitude of the power gains available when the errors in the model (7.32) are non-Gaussian can be obtained by deriving the asymptotic power envelope under the (counterfactual) assumption that the underlying error distribution is known. Assuming the errors are generated by a continuous distribution with density f( ), it follows from the Neyman–Pearson lemma that the 270 Improving Size and Power in Unit Root Testing point optimal unit root test against the local alternative r ¼ 1 þ T  1c rejects for f f f large values of the log likelihood ratio LT ð1 þ T 1 cÞ  LT ð1Þ, where LT ð Þ is the log likelihood function. Under appropriate smoothness conditions on f( ), the Neyman–Pearson test admits the following quadratic expansion under the unit root hypothesis: f f LT ð1 þ T 1 cÞ  LT ð1Þ ¼ ¼ T X t¼2 f cST log f ðDyt  cT 1 yt1 Þ  T X t¼2  1 2 f 2 c HT log f ðDyt Þ ð7:33Þ þ op ð1Þ P P f f 2 where ST ¼ T 1 Tt¼2 yt1 ‘f ðDyt Þ, HT ¼ I ff T 2 Tt¼2 yt1 , and ‘f ( ) is a function satisfying E[‘f (et)] ¼ 0, E[et‘f (et)] ¼ 1, and 1  I ff ¼ E½‘f ðet Þ2  < 1. As the notation suggests, the function ‘f ( ) can be interpreted as a score function and I ff is the associated Fisher information for location.23 Jeganathan (1995) gives (absolute continuity and moment) conditions on f( ) under which (7.33) holds with ‘f ðeÞ ¼ q log f ðe  yÞ=qyjy¼0 , while Jansson (2005) shows that differentiability in quadratic mean, an even weaker condition, is sufficient. By implication, the expansion (7.33) is valid for a wide range of error distributions. Using (7.33) and the theory of limits of experiments (e.g., Le Cam and Yang, 2000; van der Vaart, 1998), it can be shown that an upper bound on the local asymptotic power of a unit root test (asymptotically of size a) in the model (7.32) is given by the function (of c  0)    1 f f a, f limT!1 Pr1þT 1 c I 1 cST  c2 HT > k1 ðcÞ ff 2 ð7:34Þ  Z 1 Z 1 1 1=2 a, f Wc ðrÞdBf ðrÞ þ c2 Wc ðrÞ2 dr > k1 ðcÞ ¼ Pr cI ff 2 0 0 h i R R a, f a, f 1=2 1 2 1 2 1 where k1 ðcÞ satisfies Pr cI ff 0 WðrÞdBf ðrÞ  2 c 0 WðrÞ dr > k1 ðcÞ ¼ a, 1=2 W and Bf are correlated Wiener processes with coefficient of correlation I ff , and Wc satisfies the stochastic differential equation dWc(r) ¼ cWc(r)dr þ dW(r) with initial condition Wc(0) ¼ 0. The upper bound (7.34) depends on the density f( ) through I ff , which equals one when the error et is perfectly correlated with ‘f(et) and is strictly greater than one otherwise.24 Rothenberg and Stock (1997) evaluated (7.34) for various values of I ff and found large increases in power as I ff increased. Although this result suggests that non-normality may be an important source of power in unit root testing, a potential problem with the result is that it is derived under the counterfactual assumption the f( ) is known. The upper bound (7.34) is attained by f f the test which rejects for large values of cST  12 c2 HT . Following Jansson (2005), f ^f adaptation is said to be possible if there exists a pair ð^ST , H T Þ, computable without f f f f ^ Þ ¼ ðS , H Þ þ op ð1Þ under the unit root knowledge of f( ), such that ð^ST , H T T T hypothesis. (If adaptation is possible, then the power envelope (7.34) is attained by f ^ f :Þ Jansson (2005) shows that the test which rejects for large values of c^ST  12 c2 H T adaptation is possible when f( ) is known to be symmetric, but not in general. Niels Haldrup and Michael Jansson 271 Section 7.3.6, the final subsection of this section, discusses a source of nontrivial power gains which is available in many cases. 7.3.6 Covariates In most applications of unit root tests, the series {yt} being tested for a unit root is not observed in isolation. Instead, one typically observes at least one time series, say {xt}, in addition to the time series {yt} of interest. As observed by Hansen (1995), the additional time series {xt} contains exploitable information about {yt} whenever its order of integration is known.25 As in section 7.3.1, suppose {yt} is generated by the model yt ¼ ryt1 þ et , ð7:35Þ where y0 ¼ 0 and et  i.i.d. N ð0, 1Þ. To accommodate a covariate with a known order of integration, suppose an additional time series {xt : 1  t  T } is observed whose generating mechanism is  et xt   i:i:d: N    0 1 , 0 d d 1  , ð7:36Þ where d is known (and satisfies jdj < 1). The log likelihood function LdT ð Þ associated with the model (7.35)–(7.36) satisfies the relation 1 LdT ð1 þ T 1 cÞ  LdT ð1Þ ¼ cSdT  c2 HTd , 2 ð7:37Þ P P 2 where SdT ¼ T 1 Tt¼2 yt1 ðDyt  dxt Þ and HTd ¼ T 2 Tt¼2 yt1 . By the Neyman– Pearson lemma, the test which rejects for large values of cSdT  12 c2 HTd is the point optimal unit root test against the local alternative r ¼ 1 þ T 1 c. Unless {xt} is independent of {yt} (in which case d equals zero), this point optimal test makes use of the information in {xt}. By implication, the stationary covariate {xt} contains exploitable information about r unless it is independent of {yt}. The magnitude of the gains in local asymptotic power achievable by exploiting the information in the covariate {xt} can be evaluated by deriving the asymptotic power envelope for the model (7.35)–(7.36). The asymptotic power envelope (for unit root tests asymptotically of size a) is given by the function  1 a, d limT!1 Pr1þT 1 c cSdT  c2 HTd > k1 ðcÞ 2  Z 1 1 a, d Wc ðrÞdVðrÞ þ c2 Wc ðrÞ2 dr > k1 ðcÞ , ð7:38Þ 2 0 0 pffiffiffiffiffiffiffiffiffiffiffiffiffiffi R R1 1 a, d a, d ¼ ffia where k1 ðcÞ satisfies Pr c 1  d2 0 WðrÞdVðrÞ  12 c2 0 WðrÞ2 dr > k1pðcÞ ffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ Pr c pffiffiffiffiffiffiffiffiffiffiffiffiffiffi Z 1  d2 1 W and V are correlated Wiener processes with coefficient of correlation 1  d2 , 272 Improving Size and Power in Unit Root Testing and Wc satisfies the stochastic differential equation dWc(r) ¼ cWc(r)dr þ dW(r) with initial condition Wc(0) ¼ 0. The functional form of the power envelope (7.38) is exactly the same as the functional form of the (infeasible) power envelope (7.34)ffi pffiffiffiffiffiffiffiffiffiffiffiffiffi associated with (known) non-Gaussian error distributions, the quantity 1  d2 1=2 did there. As a consequence, appreciable playing the same role here as I ff power gains are available whenever ‘‘good’’ covariates (having d2 moderately large) can be found. The asymptotic power envelope (7.38) was derived by Hansen (1995). In addition to (implicitly) proposing a family of point optimal tests, Hansen (1995) proposed a regression-based unit root test, the covariate augmented Dickey–Fuller (CADF) test. The CADF test is ‘‘nearly efficient’’ in the model (7.35)–(7.36). If the model is extended to include deterministic components, however, the local asymptotic power of the CADF test is well below the asymptotic power envelope for invariant tests (Elliott and Jansson, 2003). In contrast, the point optimal tests of Elliott and Jansson (2003) are ‘‘nearly efficient’’ in the presence of deterministic components. 7.4 Conclusion This chapter has reviewed recent advances in the literature on unit root testing, emphasizing developments aimed at reducing size distortions and/or boosting the power of unit root tests. As should be apparent from the discussion herein, significant advances in both directions have been made during the decade since the publication of Stock (1994). The literature now seems to have reached a relatively mature state and it is difficult to predict if any major advances will occur over the next decade. Nevertheless, it seems worth pointing out two potential shortcomings of the existing body of knowledge. In relation to the first of the two main themes of the present survey, size distortions, it remains an open question whether the use of refined asymptotic approximations can enhance the theoretical understanding of the properties of the bootstrap and/or guide the choice among asymptotically equivalent testing procedures. On the power front, it would appear to be useful to further investigate the extent to which non-Gaussianity and/or extraneous information (other than information about the integration properties of observed covariates, as in section 7.3.6) can be an exploitable source of power in unit root testing applications. Notes 1. Under the stated assumptions regarding ut, the long-run variance o2 equals 2pfu(0), where fu ( ) is the spectral density of ut. 2. As shown by Chang and Park (2002), the ARMAP( p, q) assumption on ut can actually 1 j be replaced by the weaker assumption that j¼0 cj z 6¼ 0 for every j z j  1 (and P1 jjc j < 1). j j¼0 Niels Haldrup and Michael Jansson 273 3. A similarly modified form of the BIC is ~2k þ logðT  kmax ÞðtT ðkÞ þ kÞ=ðT  kmax Þ: MBICðkÞ ¼ log s ^ 2KER is consistent if MT1 þ T 1=2 MT ¼ oð1Þ 4. When {yt} is generated by (7.4), the estimator o and o( ) satisfies the conditions of Jansson (2002). 5. Also, Kim and Schmidt (1990) demonstrate, by means of Monte Carlo simulations, that a range of different kernel estimators and different ways of selecting the bandwidth parameter seem to deliver unit root tests with fairly similar finite sample properties. 6. Haldrup, Montanés, and Sanso (2005) show that the presence of additive and other types of outliers (as well as measurement errors) has implications for the (moving average) serial correlation structure of the data, so Vogelsangs’s (1996) results are consistent with Perron and Ng’s (1996) Monte Carlo results on the behavior of the M-tests in the presence of MA errors. 7. The class of linear-in-parameters specifications also includes structural break models with a known break date. In contrast, structural break models with an unknown break date do not belong to the class of linear-in-parameters specifications. For a survey of structural break models, see Perron (2005). 8. The following discussion draws on Stock (1994). 9. More generally, the power envelope for a class FT of tests functions is given by supfT [FT Er fT ðYT Þ. In (7.23), ‘‘sup’’ has been replaced with ‘‘max’’ in recognition of the fact that the sup is attained. 10. It follows from (7.24) and the properties of exponential families (e.g., Lehmann, 1994) that the model admits a two-dimensional minimal sufficient statistic whose distribution belongs to a curved exponential family (Efron, 1975). Therefore, the functional form of a most powerful test of a simple null against a simple alternative in the known-variance, zero-mean Gaussian AR(1) model depends on the alternative, implying in that one-sided testing problems do not admit UMP size a tests. The limiting experiment associated with a simple null hypothesis on r with j r j < 1 corresponds to a full exponential family model (the log likelihood ratios are locally asymptotically normal (Le Cam, 1960) and therefore admits a UMP size a test. In contrast, the limiting experiment associated with the unit root hypothesis corresponds to a curved exponential family model (the log likelihood ratios are locally asymptotically quadratic ( Jeganathan, 1995), but not locally asymptotically (mixed) normal) and does not admit a UMP size a test. 11. For details on local asymptotic power and local-to-unity asymptotics, see Stock (1994) and the references therein [e.g., Chan Wei (1987) and Phillips (1987b)]. 12. A proof of (7.27) can be based on the inequality  1 1 E1þT 1 c fT ðYT Þ  Pr cST  c2 HT > kaT cÞ , T ð1 þ T 2 aT ¼ E1 fT ðYT Þ, R1 R1 and the fact that ðST  cHT , HT Þ !dc ð 0 Wc ðrÞdWðrÞ, 0 Wc ðrÞ2 drÞ, where !dc signifies 1 convergence in distribution when r ¼ 1 þ T c. 13. Local optimality of ST follows from a Neyman–Pearson argument and the fact that   d d   E 1 f ðYT Þ ¼ T 1 Er fT ðYT Þ ¼ E1 ½ST fT ðYT Þ dc 1þT c T dr c¼0 r¼1 for any test function fT( ). 14. Following Davies (1969), point optimal testing procedures are sometimes referred to as beta-optimal testing procedures. 274 Improving Size and Power in Unit Root Testing 15. The (signed) likelihood ratio test rejects for large values of maxr1 LT ðrÞ  LT ð1Þ ¼ 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. minðST , 0Þ2 , 2HT pffiffiffiffiffiffiffi a decreasing function of ST = HT . In fact, it appears to be unknown if the Dickey–Fuller (1979) t-test is even admissible, in the sense that there does not exists a unit root test with uniformly superior local asymptotic power. Studying a general class of models with locally asymptotically quadratic log likelihood ratios, Ploberger (2004) gives a complete class result for a twosided testing problem and shows that the likelihood ratio test is not a member of his (essentially) complete class of tests. Elliott, Rothenberg, and Stock (1992) obtained the expansion (7.29) under the assumption that {yt} is generated by a Gaussian AR model of finite order. Using ingenious arguments, Elliott Rothenberg Stock (1996) established (7.29) for model (7.28) under P the additional assumption that 1 j¼1 jjcj j < 1. The latter assumption can be removed by using a slightly modified version of the proof employed by Elliott, Rothenberg, and Stock (1996). The existence of such estimators follows from Jansson (2002) who shows that standard kernel estimators of o2 [e.g., Newey West (1987), Andrews (1991)] are consistent under the assumptions of this subsection. A formal proof of this claim can be based on Lehmann (1994, Theorem 6.3) and the fact that r is a maximal invariant under the group of transformations of the form gb ðr, bÞ, where b [ Rk . Asymptotic similarity of the tests discussed in section 7.3.3 follows from the fact that the null distribution of {yt} does not depend on k. The same fact implies that the test which rejects for large values of maxb LkT ð1 þ T 1 c, bÞ  maxb LkT ð1, bÞ is a point optimal unit root test even if k is treated as an unknown nuisance parameter. By implication, the power envelope for the model in which k is treated as an unknown nuisance parameter coincides with the family (indexed by k) of power envelopes derived under the assumption that k is a known constant. The following discussion draws on Jansson (2005). This invariance result follows from Donsker’s theorem (e.g., Billingsley, 1999) and the continuous mapping theorem. Indeed, ‘f ( ) is the score function, evaluated at y ¼ 0, of the location model Xi ¼ y þ ei, where the errors {ei} are i.i.d. with density function f( ). The correlation between et and ‘f (et) is unity when the underlying distribution is Gaussian, the score function of the standard normal location model being ‘f (e) ¼ e. An important example of a unit root testing problem in which a covariate with a known order of integration is observed is the problem of testing for absence of cointegration when the (potentially) cointegrating vector is prespecified (Elliott, Jansson, and Pesavento, 2005; Zivot, 2000). References Agiakloglou, C. and P. Newbold (1992) Empirical evidence on Dickey–Fuller type tests. Journal of Time Series Analysis 13, 471–83. Andrews, D.W.K. (1991) Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817–58. Andrews D.W.K. and W. Ploberger (1994) Optimal tests when a nuisance parameter is present only under the altenative. Econometrica 62, 1383–1414. Baillie, R. (1996) Long memory processes and fractional integration in econometrics. Journal of Econometrics 73, 6–59. Niels Haldrup and Michael Jansson 275 Berk, K.N. (1974) Consistent autoregressive spectral estimates. Annals of Statistics 2, 489–502. Bhargava, A. (1986) On the theory of testing for unit roots in observed time series. Review of Economic Studies 53, 369–84. Billingsley, P. (1999) Convergence of Probability Measures, 2nd edn. New York: Wiley. Chan, N.H. and C.Z. Wei (1987) Asymptotic inference for nearly nonstationary AR(1) processes. Annals of Statistics 15, 1050–1063. Chang, Y. and J.Y. Park (2002) On the asymptotics of ADF tests for unit roots. Econometric Reviews 21, 431–47. Choi, I. (2005) ‘‘Nonstationary Panels,’’ Palgrave Handbooks of Econometrics: Vol. 1, forthcoming. Davidson, R. and J. MacKinnon (2005) Bootstrap methods in econometrics, Palgrave Handbooks of Econometrics: Vol. 1, forthcoming. Davies, R.B. (1969) Beta-optimal tests and an application to the summary evaluation of experiments. Journal of the Royal Statistical Society, Series B 31, 524–38. DeJong, D., J. Nankervis, N. Savin and C. Whiteman (1992a) Integration versus trendstationarity in macroeconomic time series. Econometrica 60, 423–34. DeJong, D., J. Nankervis, N. Savin and C. Whiteman (1992b) The power problems of unit root tests for time series with autoregressive errors. Journal of Econometrics 53, 323–43. Dickey, D.A. and W.A. Fuller (1979) Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74, 427–31. Dickey, D.A. and S.G. Pantula (1987) Determining the order of differencing in autoregressive processes. Journal of Business and Economic Statistics 5, 455–61. Dufour, J.-M. and M.L. King (1991) Optimal invariant tests for the autocorrelation coefficient in linear regressions with stationary or nonstationary AR(1) errors. Journal of Econometrics 47, 115–43. Efron, B. (1975) Defining the curvature of a statistical problem (with applications to second order efficiency). Annals of Statistics 3, 1189–1242. Elliott, G. (1999) Efficient tests for a unit root when the initial observation is drawn from its unconditional distribution. International Economic Review 40, 767–83. Elliott, G. and M. Jansson (2003) Testing for unit roots with stationary covariates. Journal of Econometrics 115, 75–89. Elliott, G., M. Jansson and E. Pesavento (2005) Optimal power for testing potential cointegrating vectors with known parameters for nonstationarity. Journal of Business and Economic Statistics 23, 34–48. Elliott, G., T.J. Rothenberg and J.H. Stock (1992) Efficient tests for an autoregressive unit root. NBER Technical Working Paper No. 130. Elliott, G., T.J. Rothenberg and J.H. Stock (1996) Efficient tests for an autoregressive unit root. Econometrica 64, 813–36. Engle, R.F. and C.W.J. Granger (1987) Cointegration and error correction: representation, estimation, and testing. Econometrica 55, 251–76. Franses, P. (1996) Periodicity and Stochastic Trends in Economic Time Series. Oxford: Oxford University Press. Franses, P.H. and N. Haldrup (1994) The effects of additive outliers on tests for unit roots and cointegration. Journal of Business and Economic Statistics 12, 471–8. Ghysels, E. and D. Osborn (2001) The Econometric Analysis of Seasonal Time Series. Cambridge, UK: Cambridge University Press. Granger, C.W.J. (1981) Some properties of time series data and their use in econometric model specification. Journal of Econometrics 16, 121–30. Granger, C.W.J. and R. Joyeux (1980) An introduction to long memory time series models and fractional differencing. Journal of Time Series Analysis 1, 15–29. Haldrup, N. (1998) An econometric analysis of I(2) variables. Journal of Economic Surveys 12, 595–650. Haldrup, N., A. Montanés and A. Sanso (2005) Measurement errors and outliers in seasonal unit root testing. Journal of Econometrics 127, 103–28. Please update. 276 Improving Size and Power in Unit Root Testing Hansen, B.E. (1995) Rethinking the univariate approach to unit root testing: using covariates to increase power. Econometric Theory 11, 1148–1171. Hylleberg, S., R.F. Engle, C.W.J. Granger and B. Yoo (1990) Seasonal integration and cointegration. Journal of Econometrics 44, 215–38. Im, K.S., M.H. Pesaran and Y. Shin (2003) Testing for unit roots in heterogeneous panels. Journal of Econometrics 115, 53–74. Jansson, M. (2002) Consistent covariance matrix estimation for linear processes. Econometric Theory 18, 1449–1459. Jansson, M. (2004) Stationarity testing with covariates. Econometric Theory 20, 56–94. Jansson, M. (2005) Semiparametric power envelopes for tests of the unit root hypothesis. Manuscript, UC Berkeley. Jeganathan, P. (1995) Some aspects of asymptotic theory with applications to time series models. Econometric Theory 11, 818–87. Kim, K. and P. Schmidt (1990) Some evidence on the accuracy of Phillips-Perron tests using alternative estimates of nuisance parameters. Economics Letters 34, 345–50. King, M.L. (1980) Robust tests for spherical symmetry and their application to least squares regression. Annals of Statistics 8, 1265–1271. King, M.L. (1988) Towards a theory of point optimal testing. Econometric Reviews 6, 169–218. King, M.L. and G.H. Hillier (1985) Locally best invariant tests of the error covariance matrix of the linear regression model. Journal of the Royal Statistical Society, Series B, 47, 98–102. Kwiatkowski D., P.C.B. Phillips, P. Schmidt and Y. Shin (1992) Testing the null hypothesis of stationarity against the alternative of a unit Root: how sure are we that economic time series have a unit root? Journal of Econometrics 54, 159–78. Le Cam, L. (1960) Locally asymptotically normal families of distributions. University of California Publications in Statistics 3, 37–98. Le Cam, L. and G.L. Yang (2000) Asymptotics in Statistics: Some Basic Concepts. 2nd edn. New York: Springer-Verlag. Lehmann, E.L. (1994) Testing Statistical Hypotheses. 2nd edn. New York: Chapman and Hall. Levin, A., C.-F. Lin and C.-S.J. Chu (2002) Unit root tests in panel data: asymptotic and finitesample properties. Journal of Econometrics 108, 1–25. Maddala, G. and I.M. Kim (1998) Unit Roots, Cointegration and Structural Change. Cambridge: Cambridge University Press. Müller, U.K. and G. Elliott (2003) Tests for unit root and the initial condition. Econometrica 71, 1269–1286. Nabeya, S. and K. Tanaka (1990) Limiting power of unit-root tests in time-series regression. Journal of Econometrics 46, 247–71. Newey, W.K. and K.D. West (1987) A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–8. Ng, S. and P. Perron (1995) Unit root tests is ARMA models with data-dependent methods for the selection of the truncation lag. Journal of the American Statistical Association 90, 268–81. Ng, S. and P. Perron (2001) Lag length selection and the construction of unit root tests with good size and power. Econometrica 69, 1519–1554. Paparoditis, E. and D.N. Politis (2003) Residual-based block bootstrap for unit root testing. Econometrica 71, 813–55. Park, J.Y. (2002) An invariance principle for sieve bootstrap in time series. Econometric Theory 18, 469–90. Park, J.Y. (2003) Bootstrap unit root tests. Econometrica 71, 1845–1895. Perron, P. (1989) The great crash, the oil price shock and the unit root Hypothesis. Econometrica 57, 1361–1401. Perron, P. (2005) ‘‘Dealing with structural breaks.’’ Palgrave Handbooks of Econometrics: vol. 1, forthcoming. Perron P. and S. Ng (1996) Useful modifications to unit root tests with dependent errors and their local asymptotic properties. Review of Economic Studies 63, 435–65. Niels Haldrup and Michael Jansson 277 Perron, P. and S. Ng (1998) An autoregressive spectral density estimator at frequency zero for nonstationarity tests. Econometric Theory 14, 560–603. Phillips, P.C.B. (1987a): Time series regression with a unit root. Econometrica 55, 277–301. Phillips, P.C.B. (1987b): Towards a unified asymptotic theory for autoregression. Biometrika 74, 535–47. Phillips, P.C.B. and S. Ouliaris (1990) Asymptotic properties of residual based tests for cointegration. Econometrica 58, 165–93. Phillips, P.C.B. and P. Perron (1988) Testing for a unit root in time series regression. Biometrika 75, 335–46. Phillips, P.C.B. and Z. Xiao (1998) A primer on unit root testing. Journal of Economic Surveys 12, 423–69. Ploberger, W. (2004) A complete class of tests when the likelihood is locally asymptotically quadratic. Journal of Econometrics 118, 67–94. Rothenberg, T.J. and J.H. Stock (1997) Inference in a Nearly Integrated Autoregressive Model with Nonnormal Innovations. Journal of Econometrics 80, 269–86. Said, S.E. and D.A. Dickey (1984) Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 71, 599–607. Saikkonen, P. and R. Luukkonen (1993a): Point optimal tests for testing the order of differencing in ARIMA models. Econometric Theory 9, 343–62. Saikkonen, P. and R. Luukkonen (1993b): Testing for a moving average unit root in autoregressive integrated moving average models. Journal of the American Statistical Association 88, 596–601. Sargan, J. and A. Bhargava (1983) Testing for residuals from least squares regression being generated by Gaussian random walk. Econometrica 51, 153–74. Schwert, G.W. (1989) Test for unit roots: a Monte Carlo investigation. Journal of Business and Economic Statistics, 7, 147–60. Stock, J.H. (1994) Unit roots, structural breaks and trends. In R.F. Engle and D.L. McFadden (eds), Handbook of Econometrics, Volume IV. New York: North Holland, pp. 2739–2841. Stock, J.H. (1999) A class of tests for integration and cointegration. In Cointegration, Causality, and Forecasting: A Festschrift for Clive W.J. Granger. Oxford: Oxford University Press, pp. 135–67. van der Vaart, A.W. (1998) Asymptotic Statistics. Cambridge: Cambridge University Press. Velasco, C. (2005) Semi-Parametric Estimation of Long Memory Models, Palgrave Handbooks of Econometrics: Vol. 1, forthcoming. Vogelsang, T.J. (1999) Two simple procedures for testing for a unit root when there are additive outliers. Journal of Time Series Analysis 20, 237–52. White, J.S. (1958) The limiting distribution of the serial correlation coefficient in the explosive case. Annals of Mathematical Statistics 29, 1188–1197. Zivot, E. (2000) The power of single equation tests for cointegration when the cointegrating vector is prespecified. Econometric Theory 16, 407–39.