Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

An Improved Nonparametric Unit-Root Test

2000, SSRN Electronic Journal

ISSN 1440-771X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ An Improved Nonparametric Unit–Root Test Jiti Gao and Maxwell King August 2012 Working Paper 16/12 An Improved Nonparametric Unit–Root Test Jiti Gao and Maxwell King Department of Econometrics and Business Statistics Monash University Melbourne, Australia Abstract: This paper proposes a simple and improved nonparametric unit–root test. An asymptotic distribution of the proposed test is established. Finite sample comparisons with an existing nonparametric test are discussed. Some issues about possible extensions are outlined. Key words: Autoregression, nonparametric unit–root test, nonstationary time series, specification testing. JEL Classification: C12, C14, C22. 1 1 Introduction While there is a long literature for the field of parametric unit–root testing, discussion about using non– and semi–parametric tests for unit–root specification in nonlinear time series models has attracted little attention. To the best of our knowledge, the only published work available to us is Gao et al (2009a). A follow–up extension is given in Gao and King (2011). As discussed in both papers, the main advantage of estimating and specification testing the mean function simultaneously is that one may avoid bringing possible mis–specification issues through pre–specifying a parametric linear model before testing a linear unit–root problem. As the finite sample studies show in both papers, a nonparametric test is directly applicable when there is some unknown nonlinearity and nonstationarity involved in the mean function. In such a case, existing parametric unit–root tests proposed for the linear unit–root case are not valid. This is basically the main motivation for us to address some nonparametric unit–root testing issues. In the related literature for the field of nonparametric specification for nonstationary time series, the same type of nonparametric tests have been proposed and studied in Gao et al (2009b), and Wang and Phillips (2012) for parametric model specification in nonlinear co–integrating regression models. Discussion about other related estimation and testing issues may be found from two recent survey papers by Gao (2012), and Sun and Li (2012). As may be seen from the relevant literature, existing test statistics are of the same type of standardised quadratic forms of nonstationary time series. While such test statistics all converge in distribution to a standard normality as the limiting distribution, both the establishment and the implementation of such an asymptotically normal test involve all sorts of unnecessary complexities and technicalities, particularly in the case where an autoregressive structure is involved as in the papers by Gao et al (2009a), and Gao and King (2011). In order to address both theoretical and computational issues, this paper develops a simple and improved nonparametric test. As shown in Section 2 below, a functional of the standard Brownian motion is the limiting distribution of the proposed test and its proof is quite concise. This is not unnatural considering the fact that existing parametric unit– root test statistics all have functionals of the standard Brownian motion as their limiting distributions. Section 3 then compares the finite–sample performance of the proposed test with its natural competitor proposed in Gao et al (2009a). Our conclusion is that it is easy to implement the proposed test and it is also more powerful than the natural competitor. Additionally, the proposed test is able to test a nonlinear unit–root structure against a sequence of asymptotically localised alternatives. 2 The organisation of this paper is summarised as follows. Section 2 establishes the proposed test and then develops its asymptotic theory. Section 3 discusses possible extensions of the proposed test for the univariate case to a multivariate case. Simulated examples are used in Section 4 to evaluate the finite–sample performance of the proposed test. Some concluding remarks are given in Section 5. Mathematical assumptions and proofs are all given in Appendices A and B. 2 Test statistic and theory 2.1 Asymptotic theory Consider a nonlinear time series model of the form yt = m(xt ) + et , t = 1, 2, · · · , n, xt = yt−1 and E [et |xt ] = 0, (2.1) where y0 = 0, m(·) is an unknown function defined on R1 = (−∞, ∞), and {et } is a sequence of martingale differences satisfying Assumption A.1 listed in Appendix A below. Our interest in this paper is to test H0 : P (m(xt ) = xt ) = 1 versus H1 : P (m(xt ) = xt + ∆n (xt )) = 1, (2.2) where ∆n (x) is a local ‘departure’ function such that minn≥1 inf x∈R1 |∆n (x)| > 0. In other words, we are only interested in a kind of local departure from the null hypothesis, because of the explosive nature of the integrated structure of {xt }. When m(x) is parametrically specified as m(x) = θx, the literature focuses on testing H0 : θ = 1. Before we construct our test, we estimate m(·) by minimising   n 1X xt − x 2 (yt − β) K n t=1 h (2.3) over β = m(x), where K(·) is a probability kernel function, and h is a bandwidth parameter. Function m(x) is then estimated by Pn K m(x) b = Pt=1 n t=1 K  xt −x yt h  . xt −x h (2.4) To test H0 , the main idea is to compare m(x) and x through using some distance function, such as L2 (m) = Z ∞ −∞ (m(x) − x)2 π(x)dx, 3 (2.5) where π(x) is a known probability weight function satisfying 0 < Recall that m(x) is estimated by Pn K m(x) b = Pt=1 n t=1 K  R∞ −∞ π 2 (x)dx < ∞. xt −x yt h  . xt −x h (2.6) In order to construct our test, we introduce a smoothed version of x of the form  Pn xt −x xt t=1 K h  m(x) e = Pn . xt −x t=1 K h (2.7) We then define the following quantities: n 1 X pb(x) = √ K nh t=1 n 1 X qb(x) = √ K nh t=1 n 1 X qe(x) = √ K nh t=1    xt − x h xt − x h xt − x h    , yt , (2.8) xt . This paper now proposes using a test statistic of the form Z ∞ √ 2 Ln (h) ≡ (b q (x) − qe (x))2 π(x)dx nh Z−∞ ∞ √ = (m(x) b −m e (x))2 pb2 (x) π(x)dx, nh2 (2.9) −∞ which is in a similar fashion to the original proposal discussed in Härdle and Mammen (1993) for the independent sample case. As shown in Appendix A below, we have as n → ∞ n 1 X 2 Ln (h) = √ eb π(xt ) · n t=1 t en (h) + oP (1), + oP (1) ≡ L Z where ebt = yt − yt−1 under H0 , L(u) = n n t−1 2 XX K (u)du + √ ebt ebs π(xs ) · L n t=2 s=1 −∞ ∞ 2 R∞ −∞  xt − xs h  (2.10) K(v)K(u + v)dv, and n t−1 2 X 2 XX en (h) = σ √(K) L eb2t π(xt ) + √ ebt ebs π(xs ) · L n t=1 n t=2 s=1 ≡ S1n + S2n ,  xt − xs h  , (2.11) R∞ K 2 (u)du. Since Ln (h) involves an integral in R1 = (−∞, ∞), it is not en (h) in both simulated and real data examples. computationally easy to use L where σ 2 (K) = −∞ As also shown in Appendix A below, S1n converges in distribution to a random variable and S2n converges to zero in probability. Mainly because of these facts, there is no need to 4 standardise Ln (h) to establish an asymptotic normality as the limiting distribution of the standardised version of Ln (h). In the stationary case where {xt } is stationary, however, this kind of standardisation is needed, because of    Z ∞ n Z ∞ n 1X xt − x 1 X 2 2 2 π(x)dx et = K K 2 (u)du + oP (1) π(xt )et · nh t=1 h n t=1 −∞ −∞ →P C2 (K, π, σe2 ), (2.12) where C2 (K, π, σe2 ) = σe2 · E[π(x1 )] · R∞ −∞ K 2 (u)du is a non–random quantity. We now state the main theorem of this paper; its proof is given in Appendix A. Theorem 2.1. Suppose that Assumptions A.1 and A.2 listed in Appendix A below hold. Then under H0 Ln (h) = √ nh2 as n → ∞, where σ 2 (K) = Z ∞ −∞ R∞ −∞ (b q (x) − qe(x))2 π(x)dx →D σ 2 (K) · σe2 · LB (1, 0) (2.13) K 2 (v)dv, σe2 = E[e21 ], and LB (1, 0) is a random variable and its cumulative distribution function is given by FL (x) = P (LB (1, 0) ≤ x) = in which Φ(x) is the cdf of N (0, 1). Remark 2.1. (i) σe2 can be estimated by σ be2 =    2Φ(x) − 1, x ≥ 0,   0, 1 n unknown. Pn t=1 x < 0, (yt − yt−1 )2 under H0 when it is (ii) Note that it is quite common in the parametric case to have a functional of Brownian motion as a limiting distribution of a unit–root test statistic. 2.2 Discussion of power properties As pointed out in the introductory section, an existing test for the autoregressive case is the test proposed in Gao et al (2009a) as follows: Pn Pn xt −xs h  ebs ebt Mn (h) = q P P .  2 s 2 e 2 nt=1 ns=1 K 2 xt −x b e b s t h  P P s eb2s eb2t . We then have under H0 : Let σ bn2 = 2 nt=1 ns=1 K 2 xt −x h   n X n X xt − xs 2 2 σ bn = 2 u2s u2t , K h t=2 s=1   3 σn2 = E σ bn2 = C(1 + o(1)) · n 2 h, t=1 s=1,6=t 5 K (2.14) (2.15) where ut = yt − yt−1 = et under H0 , and C > 0 is a constant. In a similar fashion to the derivations used in either the proof of Lemma B.5 of Li et al (2011) or the proof of Theorem 3.2 of Wang and Phillips (2012), we may show that there are constants 0 < C1 < C2 < ∞ such that under H0 :   3 3 lim P C1 n 2 h ≤ σ bn2 ≤ C2 n 2 h = 1. (2.16) n→∞ This section is then interested in a sequence of local departure functions of the form: ∆n (x) = δn · ∆(x), (2.17) where δn → 0 as n → ∞, and ∆(x) is chosen such that for j = 1, 2 Z ∞ Z ∞ 2 ∆2 (x)π j (x)dx < ∞. ∆ (x)dx < ∞ and −∞ Let M1n = Pn Pn xt −xs h s=1 K t=1  (2.18) −∞ ebs ebt . Note that ebt = yt − xt − ∆n (xt ) = ut − ∆n (xt ) under H1 . It may be shown under H1 that   xt − xs M1n = K ebs ebt h t=1 s=1     n X n n X n X X xt − xs xt − xs us ut + ∆n (xs ) ∆n (xt ) + oP (1) K = K h h t=1 s=1 t=1 s=1   n X n X xt − xs ≥ ∆n (xs ) ∆n (xt ) + oP (1) ≡ M2n + oP (1), K h t=1 s=1 n X n X where, as shown in Appendix B, we have that as n → ∞ R1n when R∞ −∞ E [M2n ] nh ≡ = Cδn2 · p 3 · (1 + o(1)) = Cδn2 σn n2 h q √ nh · (1 + o(1)) (2.19) ∆2 (x)dx < ∞. Meanwhile, under H1 Lemma A.2 in Appendix A below shows that as n → ∞    xs − x xt − x K π(x)dx K h h −∞ −∞    Z ∞  n n δn2 X X xt − x xs − x ≥ √ K ∆(xs )∆(xt ) K π(x)dx + oP (1) h h nh t=1 s=1 −∞ n n 1 XX ebs ebt Ln (h) = √ nh t=1 s=1 Z ∞ Z ∞  ≡ L1n + oP (1), where we have as n → ∞ E[L1n ] = C(1 + o(1)) · δn2 · 6 √ nh (2.20) when R∞ −∞ ∆2 (x)π(x)dx < ∞. It follows from equations (2.19) and (2.20) that there is some C0 > 0 such that q √ E [L1n ] E [L1n ] σ2n = = C0 nh → ∞, R1n E [M1n ] (2.21) which implies that Ln (h) is more powerful than Mn (h) under a sequence of local departure functions of the forms (2.17) and (2.18). Some detailed derivations of equations (2.19)–(2.21) are given in Appendix B below. Section 4 below evaluates the finite sample performance of Ln (h) and Mn (h). 3 Extensions This section discusses possible extensions of model (2.1) as well as the proposed test to the case where a set of stationary time series regressors may also involve in a nonlinear time series model of the form yt = m(xt , zt ) + et , t = 1, 2, · · · , n, xt = yt−1 and E[et |xt , zt ] = 0, (3.1) where y0 = 0, {zt } is a vector of stationary regressors, and m(·, ·) is an unknown function. The interest here is in a kind of specification testing of the form H0 : P (m(xt , zt ) = xt + g(zt ; θ0 )) = 1 versus H1 : P (m(xt , zt ) = xt + g(zt ; θ0 ) + ∆n (xt , zt )) = 1, (3.2) where g(·, θ0 ) is a parametrically known function indexed by θ0 , a vector of unknown parameters, and ∆n (x, z) is a sequence of departure functions. Under H0 , model (3.1) suggests estimating θ0 by θb that minimises n 1X [yt − xt − g(zt ; θ)]2 n t=1 over all possible θ. Meanwhile, model (3.1) also suggests estimating m(·, ·) by     Pn zt −z xt −x K yt K 2 1 t=1 h1 h2    ,  m(x, b z) = P n xt −x K2 zth−z t=1 K1 h1 2 (3.3) (3.4) where Ki (·) for i = 1, 2 are probability kernel functions and hi for i = 1, 2 are bandwidth parameters. 7 To test H0 , discussion in Section 2 suggests constructing a test based on a kind of distance b In order to construct our test, we introduce a smoothed between m(x, b z) and x + g(z; θ). version of x + g(z; θ0 ) of the form m(x, e z; θ0 ) =     zt −z xt −x K K (xt + g(zt ; θ0 )) 1 2 t=1 h1 h2     . Pn zt −z xt −x K2 h 2 t=1 K1 h1 Pn (3.5) b To avoid We may then introduce a distance function between m(x, b z) and m(x, e z; θ). introducing some random denominator problems, we propose using a modified distance function by comparing the following quantities:     n X zt − z 1 xt − x K2 yt and K1 qb(x, z) = √ h1 h2 nh1 h2 t=1     n X 1 zt − z xt − x qe(x, θ0 ) = √ K2 (xt + g(zt ; θ0 )) . K1 h1 h2 nh1 h2 t=1 We then propose using a test statistic of the form Z ∞Z ∞ q  2 2 2 Ln (h1 , h2 ) = nh1 h2 qb(x, z) − qe x, z; θb π1 (x)π2 (z)dzdx −∞ −∞ Z ∞Z ∞ q  2 2 2 pb2 (x, z) π1 (x)π2 (z)dzdx, m(x, b z) − m e x, z; θb = nh1 h2 −∞ −∞ where πi (u) are both known probability weight functions satisfying 0 <     P zt −z for i = 1, 2, and pb(x, z) = √nh11 h2 nt=1 K1 xth−x K . 2 h2 1 R∞ −∞ (3.6) (3.7) πi2 (u)du < ∞ Since there is an autoregressive structure involved in model (3.1) and yt and zt are highly correlated and dependent on each other, it is not so clear whether Theorem 2.1 for the univariate case could be extended for Ln (h1 , h2 ) in the multivariate case. Section 4 below however shows that Ln (h1 , h2 ) works well numerically. 4 4.1 Examples of implementation Computational aspects This section introduces an approximate version of Ln (h1 , h2 ) and then a natural extension of the test proposed in Gao et al (2009a) before a bandwidth selection method is discussed. en (h) in Section 2 above, we may approximate Ln (h1 , h2 ) Similarly to the derivation of L 8 by n X en (h1 , h2 ) = √1 eb2 π1 (xt )π2 (zt ) L n t=1 t ∞ −∞ 1 √ 2 π ∞ K12 (u)K22 (v)dvdu −∞ 2 XX √ ebt ebs π1 (xs )π2 (zs ) L1 n t=2 s=1  xt − xs h1  L2  zt − zs h2 R 2 where π1 (x) = Z t−1 n + Z z ∞ 1 , π2 (z) = √12π e− 2 , Li (u) = −∞ Ki (v)Ki (u + v)dv, π (1+x2 )  2 2 1 1 − u4 − x2 √ √ = 2 πe when Ki (x) = 2π e for i = 1, 2, and ebt = yt and Li (u) in which θb is defined by (3.3).  , (4.1) R∞ Ki2 (u)du =  b − xt − g(zt ; θ) , −∞ Meanwhile, a natural extension of the test proposed in Gao et al (2009a) for the univariate case can be defined by   zs −zt K ebt 2 t=1 h2 Mn (h1 , h2 ) = r     , Pn Pn x −x z −z b2s K12 sh1 t K22 sh2 t eb2t t=1 s=1,6=t e Pn Pn bs K1 s=1,6=t e   xs −xt h1  (4.2)  b where ebt = yt − xt − g(zt ; θ) and θb is the same as in (3.3). As may be seen from the proposed tests, certain bandwidth parameters are involved. In Table 4.1a below, a fixed bandwidth is used. In general, we propose using a cross–validation based method to choose suitable bandwidth parameters. en (h1 , h2 ) and Mn (h1 , h2 ) are Because Edgeworth expansions for the distributions of L not readily available, we are therefore unable to adopt the power–function approach for the choice of optimal bandwidths (as has been discussed in Li et al 2011 for the univariate case). Instead, we propose using an estimation–based optimal bandwidths of the form:   b b h1cv , h2cv = arg where m b −t (xt , zt ; h1 , h2 ) = n min (h1 ,h2 )∈Hcv   1X (yt − m b −t (xt , zt ; h1 , h2 ))2 , n t=1   z −z xt −xs K2 t s ys   h2   h1 Pn z −z xt −xu K2 th s u=1,6=t K1 h1 2 Pn s=1,6=t K1 (4.3) and i h i h 1 1 1 1 Hcv = c1 n− 12 −c0 , c2 n− 12 +c0 × d1 n− 6 −d0 , d2 n− 6 +d0 for some 0 < c1 < c2 < ∞, 0 < c0 < 1 , 48 0 < d1 < d2 < ∞ and 0 < d0 < 1 . 24 Before selecting Hcv , we actually calculated equation (4.3) over all possible intervals. Our computation indicates that Hcv is the smallest possible interval on which the cross–validation function attains its smallest value. Let lr be the asymptotic critical valueI of the sample distribution of the proposed test in each case. In both Examples 4.2 and 4.3 below, we then use the chosen bandwidths involved 9 in a regression bootstrap method to select a simulated critical value lr∗ in each case. Let en (h1 , h2 ) or Mn (h1 , h2 ). Qn (h1 , h2 ) denote either L   h1cv , b h2cv already has some Our experience with Examples 4.2 and 4.3 shows that Qn b   b b stable sizes and good power values under the choice of h1cv , h2cv . This may be because this pair of bandwidths may be either exactly identical or extremely close to such bandwidth values that maximise the power function while controlling the size function. In the stationary time series case, the theory developed in Chapter 3 of Gao (2007) shows that such estimation– based optimal bandwidth values may also be optimal for testing purposes. We then propose using the following bootstrap method to approximate lr by lr∗ in each case. Step 1: Generate the bootstrap residuals {e∗t } by e∗t = σ be ηt∗ , where  2 1 X yt − xt − g zt ; θb , = n t=1 n σ be2 in which {ηt∗ , 1 ≤ t ≤ n} is a sequence of i.i.d. random variables drawn from ! √ ! √ √ √ 5 − 1 5 + 1 5 + 1 5−1 P ηt∗ = − and P ηt∗ = = √ = √ . 2 2 2 5 2 5 (4.4) (4.5) b + e∗ . The resulting sample {(y ∗ , xt , zt ), 1 ≤ t ≤ n} is Step 2: Obtain yt∗ = xt + g(zt ; θ) t t called a bootstrap sample. Step 3: Use the data set {(yt∗ , xt , zt ), 1 ≤ t ≤ n} to re–estimate (α, β, γ) and denote   b∗n b h1cv , b h2cv , which is their estimators by (b α∗ , βb∗ , γ b∗ ). Then calculate the test statistic Q   bγ bn b α, β, b) with h1cv , b h2cv by replacing {(yt , xt , zt )} and (b the corresponding version of Q {(yt∗ , xt , zt )} and (b α∗ , βb∗ , γ b∗ ), respectively.   ∗ b b b Step 4: Repeat Steps 1–3 Mb = 250 times and produce Mb = 250 versions of Qn h1cv , h2cv .   b∗n,m (h1 , h2 ), m = 1, 2, · · · , Mb . Then, we conb∗n b h1cv , b h2cv by Q Denote the M versions of Q   b∗n,m b h1cv , b h2cv . That is, struct the empirical distributions of Q         ∗ b b b b b∗ b ≤ x|W h , h Q ≤ x = P h , h P∗ Q n , 1cv 2cv 1cv 2cv n n where Wn = {(yt , xt , zt ), 1 ≤ t ≤ n}.   h1cv , b h2cv , choose lr∗ such that For each pair b P and estimate lr by lr∗ . ∗     ∗ ∗ b b b Qn h1cv , h2cv > lr = r 10 en (h1 , h2 ) and Equation (4.3) is used for the choice of (h1 , h2 ) in the implementation of L Mn (h1 , h2 ) in Tables 4.2 and 4.3 below. A special case of (4.3) associated with the proposed bootstrap method is used for the univariate case to be implemented in Table 4.1b below. 4.2 Simulated examples This section evaluates the finite sample performance of the proposed test and its competitors. As existing studies (such as, Gao et al 2009a; Gao and King 2011), already show that Mn (h) is needed and has better finite–sample performances than those proposed for the linear unit– root case, such as the Dickey–Fuller test and its various versions, this paper focuses on the finite–sample comparison between Mn (h) and Ln (h). Example 4.1. Consider a linear time series model of the form: H0 : yt = β0 xt + et , t = 1, 2, · · · , n, (4.6) H1 : yt = β1 xt + ∆n (xt ) + et , t = 1, 2, · · · , n, (4.7) versus where xt = yt−1 , yt = yt−1 + ut with y0 = 0 and ut ∼ N (0, 1), βi = 1, and ∆n (x) = √ δn 1 + x2 with δn = log(n) 1 2n 8 . (4.8) The choice of δn can be discussed in the same way as will be done for the general case in (4.14) below. This section then compares the finite sample performance of the following two test statistics: en (h), L1n (h) ≡ Mn (h) and L2n (h) = L (4.9) R∞ x2 1 √1 e− 2 and L(u) = K(v)K(u + v)dv are chosen in which π(x) = π (1+x 2 ) , K(x) = −∞ 2π R∞ 2 for the computation of the two test statistics. Note that −∞ K (u)du = 2√1 π and L(u) = 2 1 √ e− π u2 4 2 when K(x) = x √1 e− 2 2π . In this example, we use an asymptotic critical value (acv) and a fixed bandwidth of 1 h = n− 4 in each case. For L1n , we use z0.01 = 2.33 at the 1% level and z0.05 = 1.645 at the 5% level. For L2n , we use the critical value, lr , of σ 2 (K) LB (1, 0) at the 1% level and at the 5% level. We then consider cases where the number of replications was N = 1, 000 and the simulations were done for data sets of sizes n = 100, 300 and 500. Let ficv denote the frequency of L1n (b hicv ) > zr for i = 0, 1 under H0 or H1 , and gicv denote the frequency of L2n (b hicv ) > lr for i = 0, 1 under H0 or H1 . The simulation results are given in Table 4.1a. 11 In addition, we also consider using a regression bootstrap method to choose bootstrap critical values (bcv) zr∗ and lr∗ . Let Qn (h) denote either L1n (h) or L2n (h). Table 4.1b then gives the corresponding results for the bootstrap case. Table 4.1a: Non–bootstrap with M = 1000 H0 L1n L2n n 1% 5% 1% 5% 100 0.001 0.004 0.000 0.005 300 0.001 0.006 0.009 0.026 500 0.009 0.028 0.018 0.051 H1 L1n L2n n 1% 5% 1% 5% 100 0.466 0.557 0.843 0.861 300 0.863 0.898 0.941 0.944 500 0.966 0.975 0.985 0.990 Table 4.1b: Bootstrap with Mb = 250, M = 1000 H0 L1n (h) L2n (h) n 1% 5% 1% 5% 100 0.009 0.022 0.018 0.068 300 0.005 0.013 0.011 0.068 500 0.009 0.025 0.009 0.056 H1 L1n (h) L2n (h) n 1% 5% 1% 5% 100 0.476 0.567 0.961 0.971 300 0.863 0.900 0.995 0.997 500 0.962 0.976 0.994 0.994 Tables 4.1a and 4.1b show that the proposed test works well in the finite sample case. While Table 4.1a shows that the proposed test works well when an asymptotic critical value is 12 combined with a fixed bandwidth, Table 4.1b shows that both the sizes and power values can be improved when a bootstrap method is used in association with a data–driven bandwidth in each case. Meanwhile, both Tables 4.1a and 4.1b show that L2n is more powerful than L1n . Similarly to equation (4.9), we introduce the following definitions: en (h1 , h2 ), L1n (h1 , h2 ) ≡ Mn (h1 , h2 ) and L2n (h1 , h2 ) = L (4.10) Examples 4.2 and 4.3 below evaluate the finite sample performance of L1n (h1 , h2 ) and L2n (h1 , h2 ). Example 4.2. Consider a bivariate linear time series model of the form: H0 : yt = α + βyt−1 + γzt + et , t = 1, 2, · · · , n, (4.11) H1 : yt = α + βyt−1 + γzt + ∆n (yt−1 , zt ) + et , t = 1, 2, · · · , n, (4.12) versus where y0 = 0, α = 0 and β = γ = 1,        0   1 ρ   et   ∼ N   ,    , 0 zt ρ 1 (4.13) with ρ = 0 or ρ = 0.9, δn z 2 ∆n (y, z) = p 1 + y2 with δn = log(n) 1 2n 8 . (4.14) Note that there is some endogeneity between et and zt when ρ = E[et zt ] = 0.9. Note √ also that the choice of δn in theory is to ensure that δn → 0 and δn2 nh1 h2 → ∞. Since the 1 1 leading orders of h1 and h2 are chosen as n− 12 and n− 6 , respectively in the cross–validation method in (4.3), the choice of δn in (4.8) satisfies the theoretical requirements. Table 4.2 shows that the extended version, Ln (h1 , h2 ), of the proposed test Ln (h) also works well numerically when there is a linear unit–root structure involved in the model under H0 . Meanwhile, Table 4.2 demonstrates that the proposed test is still applicable and even works well in the case where there is some endogeneity between zt and et . This motivates us to develop an asymptotic theory for the proposed test even under certain endogeneity assumptions. In the same pattern as has been seen from Tables 4.1a and 4.1b, additionally, Table 4.2 indicates that L2n is more powerful than L1n . 13 Table 4.2: Bootstrap with Mb = 250, M = 1000 ρ=0 ρ = 0.9 L1n (h1 , h2 ) L2n (h1 , h2 ) L1n (h1 , h2 ) L2n (h1 , h2 ) n 1% 5% 1% 5% 1% 5% 1% 5% 100 0.002 0.012 0.021 0.076 0.004 0.017 0.025 0.063 300 0.006 0.022 0.015 0.060 0.011 0.034 0.015 0.068 500 0.009 0.031 0.011 0.052 0.012 0.033 0.016 0.051 H0 L1n (h1 , h2 ) L2n (h1 , h2 ) L1n (h1 , h2 ) L2n (h1 , h2 ) n 1% 5% 1% 5% 1% 5% 1% 5% 100 0.195 0.322 0.515 0.698 0.200 0.379 0.887 0.944 300 0.436 0.540 0.614 0.797 0.590 0.715 0.960 0.974 500 0.568 0.655 0.736 0.856 0.651 0.739 0.960 0.979 H1 Example 4.3. Consider a nonlinear time series model of the form: H0 : yt = βyt−1 + 1 + γzt + et , t = 1, 2, · · · , n, 2 τ + yt−1 (4.15) versus H1 : yt = βyt−1 + 1 + γzt + ∆n (yt−1 , zt ) + et , t = 1, 2, · · · , n, 2 τ + yt−1 where y0 = 0, and β = γ = τ = 1,         0   1 ρ   et   ∼ N   ,    , zt 0 ρ 1 z2 , 1+y 2 with ρ = 0 or ρ = 0.9 and ∆n (y, z) = √δn in which δn = log(n) 1 2n 8 (4.16) (4.17) . Table 4.3 reveals that the tests are applicable to the case where a nonlinear unit–root structure is involved in model (4.15) under H0 , in addition to supporting the findings reported in Table 4.2 that L2n (h1 , h2 ) is more powerful than L1n (h1 , h2 ) and that both tests are applicable to the case where there is some endogeneity between zt and et . Note that {yt } generated by (4.15) is a 21 –null recurrent Markov chain (see, for example, Gao, Tøstheim and Yin 2011), even though it is not linearly integrated. This may imply that the proposed tests are applicable to the case where {yt } is a nonstationary, but not necessarily a linear unit–root time series. 14 Table 4.3: Bootstrap with Mb = 250 and M = 1000 ρ=0 L1n (h1 , h2 ) L2n (h1 , h2 ) L1n (h1 , h2 ) L2n (h1 , h2 ) n 1% 5% 1% 5% 1% 5% 1% 5% 100 0.003 0.017 0.020 0.080 0.009 0.023 0.020 0.054 300 0.004 0.023 0.016 0.061 0.011 0.039 0.015 0.052 500 0.018 0.048 0.014 0.052 0.011 0.034 0.012 0.048 H0 H1 5 ρ = 0.9 L1n (h1 , h2 ) L2n (h1 , h2 ) L1n (h1 , h2 ) L2n (h1 , h2 ) 100 0.142 0.245 0.500 0.614 0.325 0.467 0.884 0.921 300 0.262 0.373 0.655 0.756 0.590 0.701 0.946 0.958 500 0.368 0.484 0.717 0.819 0.672 0.755 0.957 0.974 Conclusions and discussions This paper has proposed a simple and improved nonparametric test for specifying a unit–root structure involved in a nonlinear time series model. The proposed test has been compared with an existing one both theoretically and empirically. Meanwhile, the proposed test has been extended to a multivariate version for the case where both stationary and nonstationary regressors may be involved simultaneously in the same model. The finite sample performance of the proposed tests and their competitors have all been evaluated, while the theory for the multivariate version of the proposed test has not been established in this paper. Examples 4.2 and 4.3 also show that the tests under study are all applicable to the cases where there is some endogeneity between the regressors and the error terms, although we have not developed our theory to cover such an important case. Further discussion is left for future research. 6 Acknowledgments The authors would like to thank the conference participants of the 3rd WISE–Humboldt Workshop on “High Dimensional and Nonstationary Time Series” held on 19–20 May 2012, Xiamen, China, for their constructive comments, particularly those from Professor Wolfgang Härdle and Professor Yongmiao Hong. The authors would also like to thank Dr Jiying Yin for his excellent computing assistance. Thanks also go to the Australian Research Council Discovery Grants Program under Grant Number: DP1096374. 15 7 Appendices 7.1 Appendix A In order to prove Theorem 2.1, we need to introduce the following assumptions. Assumption A.1. (i) Let {Ft } be a σ–field generated by {es : 1 ≤ s ≤ t}. Let {et } be a sequence   of stationary martingale differences satisfying E [et |Ft−1 ] = 0 and E e2t |Ft−1 = σe2 almost surely, h i where 0 < σe2 < ∞ is some constant. In addition, maxt≥1 E |et |2+δ0 |Ft−1 < ∞ almost surely for some δ0 > 0. (ii) Let p(u) be the marginal density function of e1 and pτ (v, w) be the joint density of (e1 , e1+τ ) for any τ ≥ 1. Suppose that p(u) is continuous in u and pτ (v, w) is continuous in (v, w). Ptj (iii) For any positive integers 1 ≤ t1 < t2 < · · · < tn ≤ n, define Sij = k=ti +1 ek for S 1 ≤ i < j ≤ n. Let qij (w|vkl ) be the conditional density function of √ ij tj −ti given Skl = vkl for all 1 ≤ k ≤ i − 1 and 1 ≤ l ≤ j − 1. Suppose that there is some constant 0 < Cs < ∞ such that max(i,j,k,l) sup(w,vkl ) qij (w|vkl ) ≤ Cs . Let g(·) be the marginal density of   maxt≥1 supu g √ut ≤ Cg for some constant 0 < Cg < ∞. Pt e s=1 √ s t and then satisfy Assumption A.2. (i) Let K(·) be a symmetric and continuous probability kernel function satisfying for j = 0, 1, 2, Z ∞ −∞ ||u||j K 2 (u)du < ∞ and Z ∞ −∞ Z ∞ −∞ ||v||j K(u + v)K(v)dv 2 du < ∞. (ii) The bandwidth h satisfies h → 0, nh2 → ∞ and nh4 = o(1) as n → ∞. R∞ (iii) Let π(·) be a known probability weight function such that −∞ π 2+δ0 (u)du < ∞ for the same δ0 > 0 as in Assumption A.1(i). (iv) In addition, there is some function D(x) satisfying R∞ −∞ D(x)dx π(x)| ≤ D(x) · |y − x| for any (x, y) ∈ Ω(ǫ) = {(x, y) : |y − x| ≤ ǫ, x, y ∈ < ∞ such that |π(y) − R1 }, where ǫ > 0 is some small constant. Assumptions A.1 and A.2 are quite reasonable and easily verifiable. Assumption A.1(i) imposes the martingale difference structure to avoid imposing a kind of mixing condition on {et }. In this case, one will need to use some existing inequalities (such as, Lemma A.1 of Gao 2007) to deal with such terms: E[ei ej ek el ] for the case where i, j, k and l are different. As may be seen from the proof of Theorem 2.1 below, the derivations involving E[et |Ft−1 ] = 0 may be replaced by 4 |E[es et ] − E[es ]E[et ]| ≤ Cα 4+δ0 (|t − s|) when an α–mixing condition is used, in which α(k) represents the mixing coefficient (as defined in Lemma A.1 of Gao 2007 for example). In summary, this paper adopts the martingale–difference assumption to avoid dealing with all sorts of technicalities that are part of the consequence of imposing a mixing condition on {et }. Assumption A.1(ii) is standard, and Assumption A.1(iii) basically imposes the boundedness of the conditional density function for all (ti , tj ). When tj −ti → ∞, Lemma B.1 in Appendix B below 16 implies that qij (w|v) → φ(w), where φ(·) is the density function of the standard normal random variable U ∼ N (0, 1). When (i, j) is fixed and the support of xs is compact, the boundedness of gij (w|v) follows from the common assumption of the continuity of gij (w|v) in (w, v). When (i, j) is fixed and gij (w|v) → 0 as w → ∞, the boundedness of gij (w|v) also follows trivially. The second part of Assumption A.1(iii) follows similarly and trivially. In summary, it is not unreasonable to assume the boundedness in Assumption A.1(iii). Assumption A.2 is also quite standard except that Assumption A.2(ii) imposes a stronger condition than nh5 = O(1). Existing literature, such 10 as Gao et al (2009a), have to assume nh 3 = 0. We now introduce some necessary lemmas before we prove Theorem 2.1. Lemma A.1. Let the conditions of Theorem 2.1 hold. Under H0 , we have as n → ∞  !  Z ∞ n X x − x 1 t √ eb2t π(x)dx K2 Sb1n = h nh t=1 −∞  !  Z ∞ n 1 X 2 xt − x √ = e2t π(x)dx ≡ S1n , K h nh −∞ t=1     Z ∞ n n xs − x 1 X X xt − x b √ K π(x)dx S2n = ebs ebt · K h h nh t=1 −∞ s=1,6=t     Z ∞ n n 1 X X xs − x xt − x √ = K π(x)dx ≡ S2n . es et · K h h nh −∞ (A.1) (A.2) t=1 s=1,6=t Lemma A.2. Let the conditions of Theorem 2.1 hold. Under H1 , we have as n → ∞  !  Z ∞ n X x − x 1 t √ eb2t π(x)dx K2 Sb1n = h nh t=1 −∞ !   Z ∞ n 1 X 2 xt − x √ ≥ K ∆2n (xt ) π(x)dx + oP (1), (A.3) h nh −∞ t=1     Z ∞ n n 1 X X xs − x xt − x b √ K π(x)dx (A.4) ebs ebt · K S2n = h h nh t=1 −∞ s=1,6=t     Z ∞ n n xs − x 1 X X xt − x √ K ∆n (xs )∆n (xt )π(x)dx + oP (1). ≥ K h h nh −∞ t=1 s=1,6=t Lemma A.3. Let Assumptions A.1 and A.2 hold. Then as n → ∞ n 1 X 2 √ e ψ(xt ) →D LBu (1, 0) · σe2 · n t=1 t Z ∞ ψ(x)dx, (A.5) −∞ where ψ(·) = π(·) or D(·). The proofs of Lemmas A.1–A.3 are given in Appendix B below. We now give the proof of Theorem 2.1. 17 Proof of Theorem 2.1. Without loss of generality, we let σe2 ≡ 1 throughout Appendices A and B. In view of Lemma A.1, in order to prove Theorem 2.1, it suffices to show that as n → ∞  Z ∞  !  Z ∞ n 1 X 2 xt − x 2 2 √ K (u)du , (A.6) et π(x)dx →D LBu (1, 0) · S1n = K h nh −∞ −∞ t=1 S2n = Z ∞ −∞     n n xt − x 1 X X xs − x √ K π(x)dx = oP (1). es et · K h h nh t=1 (A.7) s=1,6=t We start with the proof of (A.6). Under Assumptions 2.1 and 2.2, using Lemma A.3, we have as n → ∞ Z Z n n 1 X 2 ∞ 2 1 X 2 ∞ 2 √ √ S1n = K (u)π(xt − uh)du = K (u)π(xt )du e e n t=1 t −∞ n t=1 t −∞ Z n 1 X 2 ∞ 2 √ + K (u) (π(xt − uh) − π(xt )) du e n t=1 t −∞ Z ∞ Z ∞ n n 1 X 2 1 X 2 2 e π(xt ) K (u)du + O(h) · √ et D(xt ) |u|K 2 (u)du = √ n t=1 t n −∞ −∞ t=1 Z Z n ∞ ∞ 1 X 2 = √ K 2 (u)du, K 2 (u)du + oP (1) →D LBu (1, 0) · et π(xt ) n t=1 −∞ −∞ (A.8) which completes the proof of (A.6). We then prove (A.7). Let B(s, t) ≡ B(xs , xt ) = Z ∞ −∞ K  xt − x h  K t−1  xs − x h 2X At ≡ A(x1 , · · · , xt ; e1 , · · · , et−1 ) = B(s, t) es , h  π(x)dx, s=1 S2n n 1 X At e t . =√ n t=2 (A.9) Similarly to the derivations in (A.8), we have    Z ∞  xs − x xt − xs xs − x + K π(x)dx B(s, t) = K h h h −∞  Z ∞  xt − xs = h· K + u K (u) π(xs − uh)du h −∞    Z ∞  xt − xs xt − xs K +h· + u K (u) (π(xs − uh) − π(xs )) du = h π(xs ) · L h h −∞ ≡ h (B1 (s, t) + B2 (s, t)) , (A.10) where L(v) = R∞ −∞ K(u + v)K(u)du. We then have n S2n 1 X =√ At e t = n t=2 n t−1 2 X X √ B1 (s, t)es n t=2 s=1 ≡ S2n1 + S2n2 . 18 ! n t−1 2 X X B2 (s, t)es et + √ n t=2 s=1 ! et (A.11)  2  . Observe that We first deal with E S2n1     n t−1 8 XX 2 2 2 2 xt − xs E = es et E π (xs )L n h t=2 s=1       n t−1 s−1 xt − xv xt − xs 8 XXX L es ev e2t E π(xs )π(xv )L + n h h t=3 s=2 v=1 " ! ! # tX tX n 1 −1 1 −1 tX 2 −1 8 X + E B1 (s1 , t1 )B1 (s2 , t2 )es1 es2 et2 et1 n  2 S2n1  t1 =3 t2 =2 s1 =1 s2 =1 ≡ J1n + J2n + J3n . (A.12)  2  , we introduce the following definitions. Let ut = Before we evaluate the order of E J1n yt − xt = yt − yt−1 . Note that ut = et under H0 . Let pt (x) and qt (x) be the probability density   xt x 1 √ √ functions of xt and √ . For s < t, let qst (u|v) be , respectively. We then have p (x) = q t t t t t xs xt . given √ the conditional density function of √ s t Pt−1 Pt−1 P Since xt = i=1 ui , xt − xs = j=s uj and xt = xs + us + t−1 j=s+1 uj , additionally, one will Pt−1 need to involve the joint distributions of us , vs = xs and wst = j=s+1 uj to evaluate the order of J1n in equation (A.12). Let pst (u, v, w) and qst (u, v, w) be the joint distributions of (us , vs , wst ) and   xs √ wst us , √ , respectively. Then, we have the following expressions involving the joint density , s t−s−1   xs wst ,√ functions of √t−s−1 , u : s s pst (u, v, w) = =   v w 1 qst u, √ , √ √ √ s t−s−1 s t−s−1     w v 1 v qst √ |u, √ qs √ |u p(u), √ √ s s s t−s−1 t−s−1 where qs (·|u) denotes the conditional density of xs √ s (A.13) given u, and p(u) is the marginal density function of us . In view of the notational introduction just below (A.12) and Assumption A.1(iii), we have as n→∞     n t−1 8 XX 2 2 xt − xs E π (xs )L e2s J1n = n h t=2 s=1   Z ∞Z ∞Z ∞ n t−1 8 XX u+w 1 = u2 π 2 (v)L √ √ n h s t − s −∞ −∞ −∞ t=2 s=1     v v w | √ , u qs √ |u p(u)dwdvdu × qst √ s t−s s Z ∞Z ∞ Z m t−1 ∞ 8h X X 1 u2 π 2 (v)L (w) = √ √ n s t − s −∞ −∞ −∞ t=2 s=1     wh − u v v × qst √ | √ , u qs √ |u p(u)dwdvdu s t−s s 19 Z Z Z m t−1 8h X X 1 u p(u)du · π (v)dv L(w)dw · ≤ C √ √ n s t−s −∞ −∞ −∞ ∞ 2 ∞ 2 ∞ t=2 s=1 nh = C(1 + o(1)) · = O(h) = o(1). n (A.14) Ps−1 Similarly to equation (A.13), one may deal with J2n . Let zsv = k=v+1 uk . Note that xt − Pt−1 Ps−1 x = xt − xs + xs − xv = us + j=s+1 uj + uv + k=v+1 uk . The joint density functions of  v √ wst , √ zsv , u , u are given by s v t−s−1 s−v−1   z 1 w 1 √ qst u1 , u2 , √ ,√ pst (u1 , u2 , w, z) = √ √ s t−s−1 s−v−1 s−v−1 t−s−1   w 1 z √ = √ qst √ |u1 , u2 , √ s−v−1 t−s−1 t−s−1 s−v−1   z (A.15) |u1 , u2 p(u1 , u2 ), × qsv √ s−v−1 where p(u1 , u2 ) denotes the joint density of (uv , us ). In a similar fashion to the derivations in (A.14), we have as n → ∞       n t−1 s−1 8 XXX xt − xv xt − xs 2 E [|J2n |] ≤ L |es ev |et E π(xs )π(xv )L n h h t=3 s=2 v=1 t−1 s−1 n XXX 1 8 √ √ √ n v − 1 s − v − 1 t−s−1 t=3 s=2 v=1     Z ∞ Z ∞ v1 + v3 v2 + v4 + v1 + v3 |v1 v2 |π(v5 )π(v2 + v4 + v5 )L ··· ··· L h h −∞ −∞     v4 v5 v5 v3 v4 √ √ √ √ √ | , , v2 , v1 qsv | , v2 , v1 × qst t−s−1 s−v−1 v−1 s−v−1 v−1   v5 × qv √ |v2 , v1 p(v1 , v2 )dv5 dv4 dv3 dv2 dv1 v−1 Z ∞ Z ∞Z ∞ L(v3 )L(v3 + v4 )dv4 dv3 ≤ C(1 + o(1)) · π 2 (v5 )dv5 · −∞ −∞ −∞ Z ∞Z ∞ |v1 v2 |p(v1 , v2 )dv1 dv2 · = −∞ n t−1 s−1 8h2 X X X −∞ × n when nh4 = o(1). t=3 s=2 v=1 √  √ 1 √ √ = O nh2 = o(1) v−1 s−v−1 t−s−1 (A.16) Meanwhile, it is obvious that J3n = 0. (A.17) Equations (A.12)–(A.16) then imply as n → ∞  2  = o(1). E S2n1 (A.18) In a similar way to the derivations of (A.12)–(A.16), using Assumption 2.2(iv) in particular, we have as n → ∞  2  ≤ Ch, E S2n2 which deduces S2n2 = oP (1). 20 (A.19) The proof of Theorem 2.1 follows from equations (A.6), (A.7), (A.17)–(A.19). Appendix B This appendix gives the proofs of Lemmas A.1–A.3 and then the derivations of equations (2.18)– (2.20) are given in the last part of this appendix. We first introduce a very useful lemma, which has been used in the verification of Assumption A.1(iii). The proof of Lemma B.1 below follows from some standard central limit theorems (see, for example, Awad 1981; Denker and Gordin 2003). Let {ej } satisfy Assumption A.1(i) and φbk (x) be the probability density function of Lk = Pk b √1 j=1 ej and φk (x|Fk−1 ) be the conditional probability density function of Lk given Fk−1 , k σ e where {Fk } is a sequence of σ–fields generated by {ei , 1 ≤ i ≤ k} and σe2 is the same as in Assumption A.1(i). Lemma B.1. Under Assumption A.1(i)(ii), we have as k → ∞ sup φbk (x) − φ(x) → 0 and x∈R1 sup φbk (x|Bk−1 ) − φ(x) → 0 almost surely, (B.1) x∈R1 where φ(·) is the probability density of the standard normal random variable U ∼ N (0, 1). Proof of Lemma A.1. Recall that under H0 : ebt = yt − xt = yt − yt−1 = et . Thus, the verification of Lemma A.1 follows trivially. Proof of Lemma A.2. We only prove equation (A.4), as the proof of (A.3) follows similarly. Let εbt (x) = ebt K  xt − x h  and ebt = et + ∆n (xt ) (B.2) under H1 , where ∆n (x) = δn ∆(x) is the same as defined in (2.17) and (2.18). Then, we have under H1 :     n n 1 XX xs − x xt − x √ K π(x)dx ebs ebt · K h h nh t=1 s=1 −∞     Z ∞ n n xt − x 1 XX xs − x √ es et · K = K π(x)dx h h nh t=1 s=1 −∞     Z ∞ n n xs − x 1 XX xt − x √ K π(x)dx + ∆n (xt )es · K h h nh t=1 s=1 −∞     Z ∞ n n xt − x xs − x 1 XX √ + ∆n (xs )et · K K π(x)dx h h nh t=1 s=1 −∞     Z ∞ n n 1 XX xt − x xs − x √ + ∆n (xs )∆n (xt ) · K K π(x)dx h h nh −∞ Tbn = Z ∞ t=1 s=1 ≡ 4 X j=1 Tbjn . (B.3) 21 We will show that under H1 : √  Tbkn = oP δn2 nh for k = 2, 3. (B.4) We need only to prove (B.4) for either k = 2 or k = 3. Meanwhile, similarly to the derivations in (A.14), in order to deal with Tb3n , it suffices to show that as n → ∞ " E δn n n X X t=1 ∆(xs )π(xs )L s=1  xt − xs h ! # 2 et = C (1 + o(1)δn2 nh, (B.5) which follows similarly from the proof of (A.14). Equation (B.5) then implies that as n → ∞   q √  √ b T3n = OP δn nh = oP δn2 nh , (B.6) which, along with the derivations in (B.3)–(B.6), shows that under H1 , as n → ∞    xs − x xt − x K π(x)dx h h −∞     Z ∞ n n xs − x 1 XX xt − x √ K π(x)dx + oP (1), ≥ ∆n (xs )∆n (xt ) · K h h nh −∞ Tbn = Z ∞ n n 1 XX √ ebs ebt · K nh t=1 s=1  (B.7) t=1 s=1 which completes the proof of Lemma A.2. Proof of Lemma A.3. In view of existing results (such as, Theorem 2.1 of Wang and Phillips 2009), in order to prove Lemma A.3, it suffices to show that as n → ∞ n n n   1 X 1 X 1 X √ ψ(xt )e2t = √ ψ(xt )E[e2t ] + √ ψ(xt ) e2t − E e2t n t=1 n t=1 n t=1 n σ2 X ψ(xt ) + oP (1), = √e n t=1 (B.8) which follows from " n #2 n  2   2 1X  1 X 2 = ψ(xt ) et − E et E ψ(xt ) e2t − E e2t n n + = t=1 n X t−1 X 2 n 1 n t=2 s=1 n X t=1      2 et − E e2t E ψ(xt )ψ(xs ) e2s − E e2s   2 E ψ(xt ) e2t − E e2t t=1 n X n Z CX ∞ 2 ψ (x)pt (x)dx E ψ (xt ) = n t=1 t=1 −∞     Z ∞ n CX 1 x 1 2 √ = ψ (x)qt √ dx = O √ = o(1) n n t −∞ t t=1 C ≤ n  2  22 xt by Assumptions A.1 and A.2, where pt (·) and qt (·) denote the marginal densities of xt and √ , t   x 1 respectively, and we have used the relationship of pt (x) = √t qt √t . Equation (B.8) then completes the proof of Lemma A.3. Derivations of equations (2.19) and (2.20). Similarly to the proof of Lemma A.2, under H1 , we have as n → ∞     n X n X xt − xs xt − xs K ebs ebt = es et K h h t=1 s=1 t=1 s=1   n X n X xt − xs 2 +δn K ∆(xs ) ∆(xt ) + oP (1) h t=1 s=1   n n X X xt − xs 2 ∆(xs ) ∆(xt ) + oP (1) K ≥ δn · h n X n X t=1 s=1 ≡ where Qn (h) = Pn t=1 Pn δn2 · Qn (h) + oP (1), s=1 K xt −xs h  (B.9) ∆(xs ) ∆(xt ). Straightforward derivations imply that as n → ∞ E [Qn (h1 , h2 )] = C1 (1 + o(1)) nh (B.10) for some C1 > 0. Similarly, we may show that as n → ∞ " n n #   XX 3 2 2 xt − xs 2 2 K σ2n ≡ E es et = C2 (1 + o(1)) n 2 h. h (B.11) t=1 s=1 Equations (B.9)–(B.11) thus complete an outline of the derivation of (2.19). In view of the proof of Lemma A.2, in order to complete the derivation of (2.20), it suffices to show that as n → ∞ and for some C2 > 0     Z ∞  n n 1 XX xs − x xt − x √ K ∆n (xs )∆n (xt ) π(x)dx K E h1 h nh t=1 s=1 −∞ √ = C2 (1 + o(1)) nh, (B.12) which follows similarly to the derivations used elsewhere. Thus, we omit these details. Such details are available upon request, however. REFERENCES Awad, A. A., 1981. Conditional central limit theorems for martingales and reversed martingales. The INdian Journal of Statistics Series A 43, 100–106. Denker, M., Gordin, M., 2003. On conditional central limit theorems for stationary processes, In Lecture Notes–Monograph Series, Vol. 41, Probability, Statistics and Their Applications: Papers in Honor of Rabi Bhattacharya, pp. 133–152. 23 Gao, J., 2007. Nonlinear Time Series: Semiparametric and Nonparametric Methods. Chapman & Hall, London. Gao, J., 2012. Identification, estimation and specification in a class of semi–linear time series models. Working paper available at http://ideas.repec.org/p/pra/mprapa/39256.html. Gao, J., King, M. L., 2011. A new test in linear models against nonparametric errors. Working paper available at http://www.buseco.monash.edu.au/ebs/pubs/wpapers/2011/wp20-11.pdf. Gao, J., King, M. L., Lu, D., Tjøstheim, D., 2009a. Specification testing in nonlinear and nonstationary time series autoregression. Annals of Statistics 37, 3893–3928. Gao, J., King, M. L., Lu, D., Tjøstheim, D., 2009b. Nonparametric specification testing for nonlinear time series with nonstationarity. Econometric Theory 25, 1869–1892. Gao, J., Tjøstheim, D., Yin, J., 2011. Estimation in threshold autoregressive models with a stationary and a unit root regime. Available at http://ideas.repec.org/p/msh/ebswps/2011-21.html. Härdle, W., Mammen, E., 1993. Comparing nonparametric versus parametric regression fits. Annals of Statistics 21, 1926–1947. Li, D., Gao, J., Chen, J., Lin, Z., 2011. Nonparametric specification testing in nonlinear and nonstationary time series models. Working paper available at http://www.jitigao.com/page1006.aspx. Sun, Y., Li, Q., 2012. Nonparametric and Semiparametric Estimation and Hypothesis Testing with Nonstationary Time Series. Forthcoming in Handbook on Applied Nonparametric Econometrics and Statistics. Oxford University Press. Wang, Q. Y., Phillips, P. C. B., 2009. Asymptotic theory for local time density estimation and nonparametric cointegrating regression. Econometric Theory 25, 710-738. Wang, Q. Y., Phillips, P. C. B., 2012. Specification testing for nonlinear cointegrating regression. Annals of Statistics 40, 727–758. 24