Columbia University and Barclays Capital: The Annals of Statistics 10.1214/11-AOS935 Institute of Mathematical Statistics
Columbia University and Barclays Capital: The Annals of Statistics 10.1214/11-AOS935 Institute of Mathematical Statistics
Columbia University and Barclays Capital: The Annals of Statistics 10.1214/11-AOS935 Institute of Mathematical Statistics
Xt = Zt 0 Zt1 ,
X
0j Xtj .
Zt =
j=0
Under this invertibility constraint, standard estimation procedures that produce asymptotically normal estimates are readily available. For example,
if represents the maximum likelihood estimator, found by maximizing the
Gaussian likelihood based on the data X1 , . . . , Xn , then it is well known (see
Brockwell and Davis [6]) that
d
(1.2)
n( 0 ) N(0, 1 02 ).
From the form of the limiting variance in (1.2), the asymptotic behavior
let alone the scaling, is not immediately clear in the unit root case
of ,
corresponding to 0 = 1.
In the case fZ is Gaussian, the parameters 0 and 2 are not identifiable without the constraint |0 | 1. In particular, the profile Gaussian loglikelihood, obtained by concentrating out the variance parameter, satisfies
(1.3)
Ln () = Ln (1/).
n( 0 ) ,
eling some environmental time series related to climate change [19]. After
detrending and fitting an ARMA model to the time series, Smith noticed
that the MA component appeared to have a unit root. One explanation for
this phenomenon is that detrending often involves the application of a highpass filter to the time series. In particular, the filter diminishes or obliterates
any power in the time series at low frequencies (including the 0 frequency).
Consequently, the detrended data will have a spectrum with 0 power at frequency 0, which can only be fitted with ARMA process that has a unit root
in the MA component. While we only consider unit roots in higher order
moving averages in this paper, we believe the techniques developed here will
be applicable in a more general framework of an ARMA model. This will be
the subject of future investigation.
In this paper, we will use the stochastic approaches described in [4]
and [13] to first study the case when there is a regression component in
the time series and errors are generated from noninvertible MA(1). A vital
issue in extending these results to higher order MA models is the scaling
required for the auxiliary variable. The scaling used for the regression problem in the MA(1) case provides insight into the way in which the auxiliary
variable should be scaled in the higher order case. Quite surprisingly, when
there is only one unit root in the MA(2) process, that is,
(1.4)
Xt = Zt + c1 Zt1 + c2 Zt2 ,
c1 c1 d
1 c22
c1 (1 c2 )
(1.5)
.
n
N 0,
c2 c2
c1 (1 c2 )
1 c22
One difference, however, is that c1 and c2 are now totally dependent asymptotically [c1 (1 c2 ) = (1 c2 )2 ].
As seen from (1.3), the first derivative of the profile likelihood function
is always 0 when = 1. Therefore, the development of typical score tests or
Wald tests is intractable in this case. Davis, Chen and Dunsmuir [9] used
the asymptotic result from [10] to develop a test of H0 : = 1 based on MLE
and the generalized likelihood ratio. Interestingly, we will see that the estimator of the unit root in the MA(2) case has the same limit distribution
as the corresponding estimator in the MA(1) case. Thus, we can extend the
methods used in the MA(1) case to test for unit roots in the MA(2) case.
The paper is organized as follows. In Section 2, we demonstrate our
method of proof applied to the MA(1) model with regression. This case
plays a key role in the extension to higher order MAs. Section 3 contains
the results for the unit root problem in the MA(2) case. In Section 4, we
compare likelihood based tests with Tanakas locally best invariant and un-
biased (LBIU) test [20] for testing the presence of a unit root. It is shown
that the likelihood ratio test performs quite well in comparison to the LBIU
test. In Section 5, numerical simulation results are presented to illustrate
the theory of Section 3. In Section 6, there is a brief discussion that connects the auxiliary variables in higher order MAs with terms in a regression
model with MA(1) errors. Finally, in Section 7, the procedure for handling
the MA(q) case with q 3 is outlined. It is shown that the tools used in
the MA(1) and MA(2) cases are still applicable and are, in fact, sufficient
in dealing with higher order cases.
2. MA(1) with nonzero mean. In this section, we will extend the methods of Breidt et al. [4] and Davis and Song [13] to a regression model with
MA(1) errors. These results turn out to have connections with the asymptotics in the higher order unit root cases (see Section 6). First, consider the
model
p
X
Xt =
(2.1)
bk0 fk (t/n) + Zt 0 Zt1 ,
k=0
zi = Yi + Yi1 + + i1 Y1 + i zinit
!
!
p
p
X
X
= Xi
bk fk (i/n) + Xi1
bk fk ((i 1)/n) +
k=0
+ i1 X1
=
Zi Zi1 +
+
i1
k=0
p
X
bk fk (1/n) + i zinit
k=0
p
X
k=0
bk0 fk (i/n)
Z1 Z0 +
p
X
k=0
p
X
k=0
bk fk (i/n) +
bk0 fk (1/n)
p
X
k=0
bk fk (1/n) + i zinit
= Zi (1 )
+
p
X
k=0
i1
X
j=0
(bk0 bk )
:= Zi yi +
:= Zi wi .
p
X
k=0
ij
fk (j/n)
j=1
(bk0 bk )
i
X
ij fk (j/n)
j=1
and
0
zinit = Z0 + .
n
Further set
bk = bk0 +
(2.3)
k 0
.
n3/2
Note that (2.3) essentially characterizes the convergence rate of the estimated bk to its true value bk0 . At first glance, this parameterization may look
odd since it depends on the known parameter values, which are unavailable.
This form of reparameterization is used only for deriving the asymptotic theory of the maximum likelihood estimators and not for estimation purposes.
One notes that = n( 1), k = n3/2 (bk bk0 ), so that the asymptotics
of the MLE and bk of the associated parameters are found by the limiting behavior of = n( 1), k = n3/2 (bk bk0 ). Hence, it is not necessary
to know the true values in this analysis. The scaling n3/2 for the regression coefficients is an artifact of the assumption that the regressors take the
form fk (t/n) that is imposed on the problem. This also results in a clean
expression for the limit.
Under the (~
, , ) parameterization, it is easily seen [13], minimizing ln (~b,
, zinit ) with respect to ~b, , zinit is equivalent to minimizing the function
(2.4)
Un (~
, , )
1
[ln (~b, , zinit) ln (~b0 , 1, Z0 )]
02
with respect to ~ , and . Then using the weak convergence results in Davis
and Song [13],
Un (~
, , )
n
n
n
X
1 X 2
wi Zi X wi2
= 2
zi Zi2 = 2
+
0 i=0
02
2
i=0
i=0 0
2
+
1Z s
0
0
p
X
Z 1 Z
0
k=0
1
(st)
dW (t) dW (s) + 2
s
(st)
es dW (s)
fk (t) dt dW (s)
e(st) dW (t) + es
:= U (~
, , ),
p
X
k=0
e(st) fk (t) dt
!2
ds
where indicates weak convergence on C(Rp+1 (, 0] R). Throughout this paper, when referring to convergence of stochastic processes on C(Rk ),
p
+2
0
1
dW (s) 2
s
0
p
X
k=0
Z 1 Z
0
(st) k
t dt dW (s)
e(st) dW (t) + es
p
X
k=0
e(st) tk dt
!2
ds.
From now on we consider the simple case of just a nonzero mean, that is,
p = 0 and f0 (t) = 1. The formula further simplifies to
Z 1Z s
U (0 , , ) = 2
e(st) dW (t) dW (s)
0
0
(2.5)
Z 1
Z 1
1 es
s
e dW (s) 20
+2
dW (s)
0
0
Z 1 Z s
1 es 2
(st)
s
ds.
+
e
dW (t) + e 0
0
0
As shown in [13], one can recover the exact likelihood by integrating out
the initial parameter effects. More specifically,
f (xn , zinit ) =
n
Y
f (zt )
t=0
1
2 2
n+1
Pn 2
z
exp t=02 t
2
n+1
P
ln (b0 , , zinit ) ln (b00 , 1, Z0 ) + nt=0 Zt2
exp
=
2 2
2 2
Pn
n+1
1
Un (0 , , )02
Zt2
t=0
=
exp
,
exp
2 2
2 2
2 2
(2.6)
n+1
Pn
Zt2
t=0
exp
=
2 2
2 2
Z +
0
Un (0 , , )02
d.
exp
2 2
n
A similar argument as in [13] then shows that by profiling out the variance
parameter 2 the exact profile log-likelihood Ln (0 , ) has the following
property:
Ln (0 , ) Ln (0 , 0)
d
(2.7)
L (0 , )
Z +
U (0 , , )
d
exp
= log
2
Z +
U (0 , 0, )
exp
log
d.
2
The weak convergence results on C(R2 ) in (2.7) can be used to show convergence in distribution of a sequence of local maximizers of the objective functions Ln to the maximizer of the limit process L provided the latter is unique
almost surely. This is the content of Remark 1 (see also Lemma 2.2) of Davis,
Knight and Liu [12], which for ease of reference, we state a version here.
Remark 2.1. Suppose {Ln ()} is a sequence of stochastic processes
which converge in distribution to L() on C(Rk ). If L has a unique maximizer a.s., then there exists a sequence of local maximizers {n } of {Ln }
Note that this is consistent with many
that converge in distribution to .
of the statements made in the classical theory for maximum likelihood (see,
e.g., Theorem 7.1.1 of Lehmann [15]) and for inference in nonstandard time
series models; see Theorems 8.2.1 and 8.6.1 in Rosenblatt [16], Breidt et
al. [5], Andrews et al. [3] and Andrews et al. [2]. In some cases, for example, if the {Ln } have concave sample paths, this can be strengthened to
convergence of the global maximizers of Ln . See also Davis, Chen and Dun-
smuir [9], Davis and Dunsmuir [11], Breidt et a.l [5] for examples of other
cases when {Ln } are not concave.
Returning to our example, under the case when 0 = 1, that is, = 0,
the limit of the exact likelihood is L(0 , = 0). This corresponds to the
situation of inference about the mean term when it is known that the driving
noise is an MA(1) process with a unit root. Since the Gaussian likelihood
is a quadratic function of regression coefficients, L(0 , = 0) is a quadratic
function in 0 . Applying Remark 2.1, we obtain that the MLE 0 converges
in distribution to 0 , the global maximizer of L(0 , = 0). In particular, 0
is the value that makes 0 L(0 , = 0) = 0. Since
L(0 , = 0)
0
R +
exp{U (0 , = 0, )/2}((1/2)(U (0 , = 0, )/0 )) d
=
,
R +
exp{U (0 , = 0, )/2} d
where
U (0 , = 0, ) = 2W (1) 20
and
U (0 , = 0, ) = 20
0
Solving
0 L(0 ,
(2.8)
s dW (s) +
s ds 2
( s0 )2 ds
1
0
s dW (s) 2
s ds.
0
= 0) = 0, we find that
Z 1
s dW (s) 6W (1) N(0, 12)
0 = 12
0
and hence
(2.9)
d
n3/2 (b0,n b0 ) = 0 0,n N(0, 1202 ).
This counter-intuitive result was also obtained earlier by Chen et al. [8]. It
says the MLE of the mean term in the process would behave like a normal
distribution asymptotically, but with convergence rate n3/2 . Notice that,
even if one does not know the true value of , the MLE of the mean term
would still behave very much like (2.9) due to the large pile-up effect in this
case. However, the MLE is not asymptotically normal, if both b0 and are
estimated.
3. MA(2) with unit roots. The above approach, which also works in the
invertible case, does not rely on detailed knowledge of the form of the eigenvectors and eigenvalues of the covariance matrix. Hence it has the potential
to work in higher order models where the eigenvector and eigenvalue struc-
Fig. 1.
,
n
0,
and
= 0 + ,
n
R.
10
For convenience, define the intermediate process Yt = (1 0 B)Zt and observe that
Xt = (1 0 B)(1 0 B)Zt = (1 0 B)Yt .
In the MA(2) case, two augmented initial variables Zinit and Yinit are needed.
These initial variables and the joint likelihood have a simple form, that is,
(3.1)
Zinit = Z1
n
Y
fZ (zj ).
j=1
As what has been shown in the MA(1) case, the key of our method is to
calculate the formula for the residual ri := Zi zi , which can be obtained
from
zi = yi + yi1 + + i1 y1 + i yinit + i+1 zinit
!
!
i1
i
X
X
i1j
i1
ij
i
Xj + yinit +
Xj + yinit +
=
j=1
j=1
i
X
ij+1 ij+1
j=1
(3.2)
Xj +
i+1 i+1
yinit + i+1 zinit
i1
(0 )( 0 ) X ij1
= Zi
Zj
j=1
i1
(0 )(0 ) X ij1
i+1 i+1
Zj +
(yinit Y0 )
j=1
+ i+1 (zinit Z1 ) + (0 )
(3.3)
= Zi ri ,
i+1 i+1
Z1
where the fourth equation (3.2) comes from the fact that Xj = Zj (0 +
0 ) Zj1 + 0 0 Zj2 and Y0 = Z0 0 Z1 . Therefore, the residuals ri are
given by
ri =
i1
(0 )( 0 ) X ij1
Zj
j=1
11
(3.4)
i1
i+1 i+1
(0 )(0 ) X ij1
Zj
(yinit Y0 )
+
j=1
i+1 i+1
Z1 .
Notice that the residuals ri no longer have a neat form as in the MA(1) case.
This is what makes the MA(2) case more interesting yet more complicated.
In the following calculations, let
0 1
0 2
yinit = Y0 +
and zinit = Z1 + .
n
n
i+1 (zinit Z1 ) (0 )
(3.5)
n
n
X
X
ri Zi
ri2
+
.
02
2
i=1
i=1 0
i1
i+1 i+1
(0 )( 0 ) X ij1
Zj
(yinit Y0 ),
Ai :=
j=1
i1
(0 )(0 ) X ij1
Bi :=
Zj ,
j=1
i+1
Ci :=
(zinit Z1 ),
i+1 i+1
Z1 .
P
To determine the weak limit of 2 ni=1 ri Z2 i in (3.5) in the continuous
0
function space, note that
Di := (0 )
n
n
i1
X
Ai Zi
( 0 )( 0 ) X X ij1 Zj Zi
2
=2
0 0
02
i=1
i=1 j=1
n
n
X
X
21
21
Zi
i+1 Zi
i+1
0
0
n( )
n( )
i=1
=2
(3.6)
i=1
i1
n X
X
(1 0 + /n)
ij1 Zj Zi
1+
1 0 + /n / n
n
0 n0
i=1 j=1
n
X
i+1 Zi
21
1+
+
(1 0 + /n / n)
n
n0
i=1
12
n
X
21
i+1 Zi
0 +
n(1 0 + /n / n)
n
n0
i=1
1Z
(st)
21
dW (t) dW (s) +
1 0
es dW (s),
where the last term disappears in the limit due to the fact that |0 | < 1.
Similarly, we have
2
i1
n
n
X
( 0 )(1 ) X X ij1 Zj Zi
Bi Zi
=
2
0 0
02
i=1 j=1
i=1
(1 0 / n)
=2
1 0 + /n / n
n X
i1
X
ij1 Zj Zi
0 +
n
0 n0
i=1 j=1
(3.7)
= 2
i1
n X
X
i=1 j=1
ij1
0
Zj Zi
+ op (1)
0 n0
(3.8)
2N,
1
where N N(0, 1
2 ). The third equality holds because |0 | is strictly
0
smaller than 1, and op (1) is uniform in on any compact set of R. The
weak convergence from (3.7) to (3.8) follows from martingale central limit
theorem; see Hall and Heyde [14]. It can also be shown that N and the W (t)
process from (3.6) are independent; see Theorem 2.2 in Chan and Wei [7].
Following similar arguments, it is easy to show that
n
n
X
X
Di Zi p
Ci Zi p
0
and
2
0.
2
2
0
02
i=1
i=1
n
n
X
X
A2i + Bi2 + Ci2 + Di2
ri2
=
2 i=1
02
i=1 0
n
X
2Ai Bi + 2Ai Ci + 2Ai Di + 2Bi Ci + 2Bi Di + 2Ci Di
,
02
i=1
ds,
1 0
2
0
0
i=1 0
13
(3.10)
n
X
Bi2 p 2
var(N ).
2
i=1 0
0
and
0.
2
2
i=1 0
i=1 0
Next we show that all the cross product terms also vanish in the limit,
namely,
n
X
2Ai Bi + 2Ai Ci + 2Ai Di + 2Bi Ci + 2Bi Di + 2Ci Di p
(3.12)
0.
02
i=1
P
p
i
Here we only give the details for showing ni=1 Ai B
0; the other cases
2
0
can be proved in an analogous manner. Notice that for any fixed M > 0 and
any [M, 0],
n
X
Ai Bi /n(1 0 + /n)/ n(1 0 / n)
2 =
(1 0 + /n / n)2
0
i=1
! i1
!
i1
n
X
X
X
Z
Z
j
j
ij1
ij1
0
0
i=1 j=1
j=1
(/ n)(1 0 / n)(1 / n)
+
(1 0 + /n / n)2
"
#
i1
n
X
X
Z
j
ij1
( i+1 i+1 )
(3.13)
0
j=1
i=1
n
X
=
n
i=1
i1
X
Zj
ij1
0
0
j=1
i1
X
j=1
1+
n
ij1
Z
j
n0
"
#
n
i1
1 X
i+1 X ij1 Zj
+
0
1+
n
n
0
i=1
j=1
n
i1
1 X X 2ij Zj
+ op (1),
0
n
0
i=1 j=1
14
1 X
d
Ri
S(t),
Sn (t) :=
n
i=0
1
10
l
l=0 0
=
where
=
and S(t) is a standard Brownian motion. Also,
since Ri is adapted to the -fields Fi generated by Z0 , . . . , Zi . By Theorem 2.1 in [13], we obtain
Z 1
n
1 X
i+1
d
es dS(s)
on C[M, 0].
Ri1
1+
n
n
0
i=1
Therefore,
(3.14)
"
#
i1
n
i+1 X ij1 Zj p
1 X
0
1+
0
n
n
0
on C[M, 0].
j=1
i=1
n
n
i1
1 X X 2ij Zj
1 X i+1
p
0 Ri1 0.
=
0
n
0
n
(3.15)
i=1 j=1
Since
n
X
(3.16)
i=1
i=1
i1
X
j=1
1+
n
ij1
Z
j
n0
Ri1
is in the form of the double sum in Theorem 2.8 in [13], except that {Ri }
is no longer a martingale difference sequence. However, we can still follow
the proof of Theorem 2.8 in [13] and show that (3.16) has a nondegenerate
weak limit in C[M, 0]. It follows that
! i1
!
i1
n
X
X X ij1 Zj
ij1 Zj
0
1+
n
0
n
n0
i=1 j=1
j=1
(3.17)
!
i1
n
ij1 Zj
X X
Ri1 p
0.
1+
=
n
n
n0
n
i=1
j=1
Thus, combining (3.14), (3.15) and (3.17), we conclude that the terms in (3.13)
go to 0 in probability on C[M, 0]. The convergence in probability of the
other terms in (3.12) can also be proved in a similar way. To sum up, we
have shown the key stochastic process convergence result, that is,
Un (, , 1 , 2 )
d
U (, , 1 )
(3.18)
= 2
1Z s
Z 1
21
es dW (s)
1 0 0
2
s
e
ds + 2 var(N ).
Z 1 Z s
e(st) dW (t) +
+
15
1
1 0
Using (3.18), one can easily derive the asymptotics for the exact profile
log-likelihood denoted by Ln (, ). In particular,
(3.19)
Ln (, ) Ln (0, 0)
Z +
U (, , 1 )
d
exp
log
d1
2
Z +
U (0, 0, 1 )
exp
log
d1
2
:= L (, )
Z +
U (, )
2
var(N ) + log
exp
d
= N
2
2
Z +
U (0, )
log
exp
d ,
2
(3.20)
where =
(3.21)
1
10
and U (, ) is given by
Z 1 Z s
U (, ) = 2
2
Z 1 Z s
(st)
s
+
e
dW (t) + e
ds,
0
which is the limiting process of the joint likelihood obtained in the unit root
MA(1) case, see also Davis and Song [13]. We state the key result of this
paper in the following theorem.
Theorem 3.1. Consider the model given in (1.4) with two roots and
which are parameterized by
=1+
and
= 0 + .
n
Ln (, ) Ln (0, 0) L (, )
on C([, 0] R),
16
where
2
var(N ) + U ()
2
(3.22)
1
2
d
var(N ) + Z0 ().
= N
2
2
exp
U () = log
d
2
(3.23)
Z +
U (0, )
exp
log
d
2
and
X
X
2 2 k2 Xk2
2 k2
Z0 () =
log 2 2
+
(3.24)
.
( 2 k2 + 2 ) 2 k2
k + 2
L (, ) = N
k=1
k=1
Furthermore, there exists a sequence of local maxima n , n of Ln (, ) converging in distribution to MLE , MLE , the global maximum of the limiting
process U (, ). If model (1.4) has, at most, one unit root, then for the
estimators c1 and c2 , we have
c1 (1 c2 )
1 c22
c1 c1 d
(3.25)
.
N 0,
n
c2 c2
c1 (1 c2 )
1 c22
MLE
N
d
n(
c1 c1 ) = MLE
MLE =
n
var(N )
d
17
Here, we use the fact that MLE < a.s. as stated in (Theorem 4.3 in [13]).
One can also calculate the limiting asymptotic covariance of c1 and c2 as
var(
MLE ) = (1 20 ) = (1 + 0 )(1 0 )
= c1 (1 c2 ).
Remark 3.4. The above theorem says that when |0 | < 1 and 0 = 1,
we have a similar asymptotic result for c1 and c2 as in the invertible case.
If we only consider the original parameters
c1 andc2 , the effect of the unit
Z +
U (0, )
exp
log
d ,
2
which is the limiting process of the exact profile log-likelihood in the MA(1)
case. On the other hand when is given, becomes the only parameter that
needs to be estimated
(3.26)
Xt = (1 0 B)(1 B)Zt .
Because of the invertibility of the operator 1 0 B, we can get an intermediate process Yt by inverting the operator. Namely,
X
1
(3.27)
k0 Xtk = (1 B)Zt .
Xt =
Yt :=
(1 0 B)
k=0
18
problems arise. In this subsection, we discuss some issues when there are
two unit roots in the MA polynomial.
3.2.1. Case 2: c2 = 1 and c1 6= 2. This corresponds to the case that the
true parameters are on the boundary c2 = 1, that is, the boundary AC in
Figure 1, which means the two roots live on the unit circle and are not real
valued. Denote the two generic complex valued roots of the MA polynomial
~
~
by = rei and = rei . To avoid confusion in notation, we use ~i to
i1
(0 )( 0 ) X ij1
Zj
j=1
i1
0 )
X
(0 )(
+
ij1 Zj
j=1
(3.28)
i+1 i+1
(zinit,0 Z0 )
i i )
(
(zinit,1 Z1 )
( + 0 0 )(i+1 i+1 )
+
Z1 .
+
We also adopt the parameterization for r, and two initial variables given by
and = 0 + ,
r=1+
n
n
0 1
0 2
zinit,0 = Z0 +
and zinit,1 = Z1 + .
n
n
P
P
r2
Again, we study the limiting process of 2 ni=1 ri Z2 i + ni=1 i2 . Here we
0
0
P
only present the first term of ni=1 ri Z2 i for illustration; the limit of the other
0
terms can be derived in a similar fashion. By Theorem 2.8 in [13], we obtain
n i1
1 X X ij Zj Zi
n
02
i=0 j=1
n
X
i=0
i1
X
j=1
1Z s
0
1+
n
~
ij
~i0 j ! ~i0 i
i
j
e
Zj e Zi
exp ~i
n
n0
n0
19
[nt]
X
k=0
Zk
cos(k0 )
n0
[nt]
X
k=0
Zk
.
sin(k0 )
n0
The weak convergence of W1,n (t) and W2,n (t) to two independent Brownian
motions is guaranteed by Theorem 2.2 in Chan and Wei [7].
By Theorem 2.1 in [13] we have
Z
n
1 X i Zi d 1 s+~is
e
dW(s).
0
n
0
i=0
(3.29)
+ 41
42
ei0
2~i sin 0
1
2~i sin 0
Z
Z
s+~is
dW(s)
s+~is
dW(s) ,
e
0
1
e
0
where
means the real part of a complex function. The weak limit of
Pn {}
2
2
manner using Coroli=1 ri /0 can also be computed in an analogous
P
lary 2.10 in [13]. However, the weak limit of ni=1 ri2 /02 has an even more
complicated form than (3.29).
By integrating out the auxiliary variables, the exact likelihood can be
recovered as well. However, the form of the joint likelihood function is
much more complicated than the one computed in the one unit root case.
The asymptotic properties and pile-up probabilities in this case remain unknown.
3.2.2. Case 3: c2 = 1 and c1 = 2. This corresponds to the vertex A in
the -region in Figure 1. It is convenient to first consider a special case of
local asymptotics when the approach to the corner is through the boundary
c1 c2 = 1. With this constraint, the dimension of the parameters has been
reduced from two to one. We parameterize the MA(2) in this case by
(3.30)
Xt = Zt ( + 1)Zt1 + Zt2
20
and define a Zinit and a Yinit as in (3.1), but with different normalization,
that is,
(3.31) = 1 +
,
n
Yinit = Y0 +
0 2
and Zinit = Z1 + .
n
0 1
n3/2
Then, with the help of the theorems in Davis and Song [13], it follows that
n
n
X
X
ri Zi
ri2
Un (, 1 , 2 ) = 2
+
02
2
i=1
i=1 0
Z 1Z s
d
e(st) dW (t) dW (s)
2
0
(3.32)
+ 22
1
0
es dW (s)
21
(1 es ) dW (s)
Z 1 Z s
1 es 2
(st)
s
ds.
e
dW (t) + 2 e 21
+
0
0
There is a connection between this limiting process and the one in (2.5)
derived for the limiting process for an MA(1) model with a nonzero mean.
Notice that in (2.5), U (0 , , ) is exactly the process we just derived with 1
and 2 replaced by and 0 . This leads us to an interesting connection of
the mean term in the lower order MA model and the initial value in the
higher order MA model, which we will discuss further in the Section 6.
Alternatively, if we do not impose the constraint c1 c2 = 1, there are
two possible ways to parameterize the roots. First, the vertex can be approached through the real region, where c1 = , c2 = and the roots
are parameterized further as
and = 1 +
,
n
+
c1 = 1 1 +
n
and c2 = 1 +
1
+
+o
.
n
n
=1+
which makes
and =
,
n
which implies
1
2
+o
c1 = 1 1 +
n
n
1
2
+o
and c2 = 1 +
.
n
n
21
Therefore, in either case, if we ignore the higher order terms, c1 and c2 can
be approximated as
and c2 = 1 + .
c1 = 1 1 +
n
n
This parameterization, however, is exactly the one we have seen in the conditional case, which suggests that one of the unit roots has pile-up with
probability one asymptotically while the other unit root behaves like the
unit root in the conditional case; see (3.30) and (3.32). This claim is also
supported by the simulation results; see Table 4 in Section 5.
4. Testing for a unit root in an MA(2) model. A direct application of
the results in the previous section is testing for the presence of a unit root
in the MA(2) model. For the testing problem, we extend the idea of a generalized likelihood ratio test proposed in Davis, Chen and Dunsmuir [9] to
the MA(2) case. Tests based on MLE are also considered in this section. We
will compare these tests with the score-type test of Tanaka [20].
To specify our hypothesis testing problem in the MA(2) case, the null
hypothesis is H0 : there is exactly one unit root in the MA polynomial, and
the alternative is HA : there are no unit roots. The asymptotic theory of the
previous section allows us to approximate the nominal power against local
alternatives. To set up the problem, for the model
Zt1 + 1 +
Zt2
Xt = Zt + 1 +
n
n
with || < 1. We want to test H0 : = 0 versus HA : < 0.
To describe the test based on the generalized likelihood ratio, let GLRn =
2(Ln (MLE , MLE ) Ln (0, MLE,0 )), where MLE,0 is the MLE of when
d
= 0. An application of Theorem 3.1 gives GLRn L (MLE , MLE )
L (0, MLE ) = U (MLE ), where L (, ) and U () are given in (3.22)
and (3.23) and MLE = N/ var(N ). Notice that the limit distribution of
GLRn only depends on MLE , and serves as a nuisance parameter, which
does not play a role in the limit. Define the (1 )th asymptotic quantile bGLR () and bMLE () as
P(U (MLE ) > bGLR ()) =
and
Since the limiting random variables U (MLE ) and MLE are the same as
in the MA(1) unit root case, the critical values of bGLR () and bMLE () are
the same as those provided in Table 3.2 of Davis, Chen and Dunsmuir [9].
There has been limited research on the testing for a unit root in the
MA(2) case. One approach, proposed by Tanaka, was based on a score type
of statistic, which is locally best invariant and unbiased (LBIU). However,
22
Fig. 2. Power curve with respect to local alternatives when = 0.3 (upper) and when
= 0.5 (lower). Sample size n = 50. The size of the test is set to be 0.05.
23
Fig. 3. Power curve with respect to local alternatives when = 0. Sample size n = 50.
The size of the test is set to be 0.05.
0.0119 and 0.0015 which are much smaller than the nominal size 0.05. GLR
seems to be the best among the three choices. This is due to the fact that
the GLR only considers the maximum value of the likelihood ratio instead
of the MLE of c1 and c2 . Therefore, even if c1 and c2 are in the complex
region, the GLR test can still be carried out whereas the test based on MLE
is not even well defined in this case. Although the size of the GLR test is
often slightly greater than the nominal size, GLR gives the best performance
under this situation.
Finally, we compare these tests when = 0; that is, the model is in fact
a unit root MA(1). The test developed for the MA(2) case is still applicable.
The results are summarized in Figure 3. Clearly, the power functions of
the tests designed for the MA(1) dominate the power functions of their
counterparts designed for the MA(2). However, it is surprising that for large
local alternatives (greater than 9 or so), the GLR for the MA(2) model
outperforms the LBIU for the MA(1) model.
5. Numerical simulations. In this section, we present simulation results
that illustrate the theory from Section 3. Realizations were simulated from
the MA(2) process given by
(5.1)
Xt = Zt (1 + )Zt1 + Zt2 ,
where takes the values 0.3, 0 and 0.3, respectively. The MA(2) model
was replicated 10,000 times for each choice of , and then the MLEs for
the MA(2) coefficients 1 and 2 were calculated for each replicate. The
empirical pile-up probability, the empirical variance and MSE of the MLEs
are reported in Tables 1 to 3. Notice that the numbers in the tables
for the
variance and the MSE are reported for the normalized estimates n(
ci ci ),
i = 1, 2.
24
Sample
size
25
50
100
400
1,000
Pile-up
probability
Variance
of c1
MSE
of c1
Variance
of c2
MSE
of c2
Correlation
of c1 and c2
0.5436
0.6041
0.6234
0.6398
0.6437
2.1701
1.4063
1.1108
0.9788
0.9290
2.1970
1.4118
1.1108
0.9788
0.9290
2.4455
1.4967
1.1490
0.9854
0.9327
2.6536
1.5553
1.1636
0.9890
0.9338
0.9347
0.9644
0.9815
0.9953
0.9981
Table 2
Summary of the case: = 0 [MA(1) with a unit root]
Sample
size
25
50
100
400
1,000
Pile-up
probability
Variance
of c1
MSE
of c1
Variance
of c2
MSE
of c2
Correlation
of c1 and c2
0.5870
0.6182
0.6220
0.6318
0.6334
2.1624
1.3661
1.1661
1.0440
1.0329
2.1629
1.3670
1.1670
1.0441
1.0330
2.5037
1.4690
1.2082
1.0544
1.0351
2.6355
1.5053
1.2224
1.0578
1.0384
0.8792
0.9378
0.9662
0.9918
0.9966
Table 3
Summary of the case: = 0.3
Sample
size
25
50
100
400
1,000
Pile-up
probability
Variance
of c1
MSE
of c1
Variance
of c2
MSE
of c2
Correlation
of c1 and c2
0.6171
0.6347
0.6447
0.6472
0.6511
1.8370
1.2820
1.0748
0.9245
0.9232
1.8806
1.3053
1.0853
0.9267
0.9242
2.1654
1.3647
1.1215
0.9316
0.9256
2.2287
1.3820
1.1299
0.9339
0.9263
0.7950
0.8938
0.9397
0.9822
0.9933
25
Table 4
Pile-up probabilities for the case: c1 = 2
Sample size
Pile-up probability
100
500
1,000
5,000
0.246
0.804
0.961
0.999
sample size is small and > 0, the MLEs of c1 and c2 are more likely to
be in the complex region than those when < 0. Thus the limiting process
would approximate the likelihood function poorly when > 0, which in turn
results in less pile-up in smaller sample sizes.
Table 4 summarizes the pile-up effects for the model considered in Section 3.2.2, where the two roots of the MA polynomial are both 1. In one
realization, the estimators are said to exhibit a pile-up if the MLEs of c1
and c2 are on the boundary c1 c2 = 1.
As seen in the table, the pile-up probability is increasing to 1 with sample
size. However, the claimed 100% probability of pile-up is not a good approximation for small sample sizes. Even when n = 500, the pile-up is only about
80%.
6. Unit roots and differencing. As pointed out in Section 3.2.2, there is
a link between the mean term in the lower order MA model and the initial
value in the higher order MA model. To illustrate this, consider the simple
case when
Yt = 0 + Zt ,
where {Zt } i.i.d. (0, 02 ). So Yt is an i.i.d. sequence with a common mean.
It is clear that
d
n(
0 ) N(0, 02 ),
where
is the MLE of obtained by maximizing the objective Gaussian
likelihood function. Now suppose we difference the time series to obtain
Xt = (1 B)Yt = Zt Zt1 ,
which becomes an MA(1) process with a unit root. The initial value as
defined before of this differenced process is
Zinit = Z0 = Y0 0 .
From the results in Theorem 4.2 in [13], if it is known that an MA(1) time
series has a unit root, that is, = 0, we have
U ( = 0, ) = 2W (1) + 2 .
26
Clearly,
= W (1) and with our parameterization of zinit , we have
n(zinit Z0 )
n(Y0
Y0 + 0 )
=
=
0
0
n(
0 ) d
d
Yt = 0 + b0 t + Zt ,
which, after differencing, delivers an MA(1) model with a unit root and
a nonzero mean given by
Xt = (1 B)Yt = b0 + Zt Zt1 .
d
From (2.9), we know n3/2 (b b0 ) N(0, 1202 ). But this can be obtained
much more easily by analyzing the model (6.1). This is just a simple application of linear regression, and we can get exactly the same asymptotic
result for b.
Now consider the model from Section 2,
where = 1 +
Yt = b0 + Zt Zt1 ,
Yinit = Y0 = b0 + Z0 Z1 ,
then yinit Yinit can be viewed as b b0 . Since b converges at the rate of n3/2 ,
so does yinit . This explains the parametrization given in (3.31) as well as the
resemblance of (2.5) and (3.32).
7. Going beyond second order. The techniques proposed in this paper
can be adapted to handle the unit root problem for MA(q) with q 3.
However, the complexity of the argument, mostly in terms of bookkeeping,
also increases with the order q. In this section, we outline the procedure
for the MA(3) case, from which extensions to larger orders are straightforward.
27
For simplicity, assume 0 6= 0 6= 0 . Now we form two intermediate processes Yt and Wt and consider three augmented initial variables defined
by Zinit = Z2 , Yinit = Z1 + 0 Zinit and Winit = Y0 + 0 Yinit . Similar arguments as in Section 3 show that the joint likelihood of (X, Winit , Yinit , Zinit )
has a simple form given by
fX,Winit,Yinit,Zinit (xn , winit , yinit, zinit ) =
n
Y
fZ (zj ).
j=2
As in the MA(1) and MA(2) cases, maximizing this joint likelihood is essentially equivalent to minimizing the objective function
Un =
n
1 X 2
(z Zi2 ).
02 i=2 i
The key to this analysis is to write out the explicit expression for zi which
is basically an estimator for Zi . The following equations are straightforward
to derive:
(7.2)
wk =
k
X
kl Xl + k winit ,
l=1
(7.3)
yj =
j
X
k=1
(7.4)
zi =
i
X
j=1
yj =
j
X
jk+1 jk+1
k=1
Xk +
j+1 j+1
winit + j+1 yinit ,
28
i
X
(( ij+1 ij+1 )
j=1
(( )( )( ))1 Xj
2 i
2 (i i )
( i )
i
+ winit
+
( )( ) ( )( )
i+2 i+2
yinit + i+2 zinit .
While this is a more complicated looking expression than the one encountered in the MA(2) case, the coefficient of Xj in the sum looks very similar
to (3.2), only with more terms. Now replacing Xj with (7.1), zi can be
written as
zi = Zi
(7.6)
i1
X
z
Ci,j
Zj
j=2
=1+
,
n
0,
= 0 +
n
0 y
yinit = Yinit +
n
n
n
X
X
ri2
ri Zi
+
02
2
i=2 0
i=2
and = 0 + ,
n
and
0 z
zinit = Zinit + .
n
29
= 2
+
n
X
i1
X
i=2
z Zj
Ci,j
0
j=2
n
X
i1
X
i=2
z Zj
Ci,j
0
j=2
+
+
+
n
n
n
Zi
0
!2
,
and
1+
1+
n
n
0 0
n
n0
i=2 j=2
i=2
that were used in the MA(1) and MA(2) cases. By using a martingale central limit theorem and theorems proved in Davis and Song [13], one can
establish the weak convergence of Un (, , , w , y , z ) to a random element U (, , , w , y , z ) in C(R6 ). Now arguing as in Section 3, the initial
variables can be integrated out, and the limiting process of the exact profile
log-likelihood can be established.
For general q > 3, the residual ri = zi Zi has the form
ri =
i1
X
j=q+1
z
Ci,j
Zj +
q
X
k=1
30
[3] Andrews, B., Davis, R. A. and Breidt, F. J. (2006). Maximum likelihood estimation for all-pass time series models. J. Multivariate Anal. 97 16381659.
MR2256234
[4] Breidt, F. J., Davis, R. A., Hsu, N.-J. and Rosenblatt, M. (2006). Pile-up probabilities for the Laplace likelihood estimator of a non-invertible first order moving
average. In Time Series and Related Topics. Institute of Mathematical Statistics
Lecture NotesMonograph Series 52 119. IMS, Beachwood, OH. MR2427836
[5] Breidt, F. J., Davis, R. A. and Trindade, A. A. (2001). Least absolute deviation
estimation for all-pass time series models. Ann. Statist. 29 919946. MR1869234
[6] Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods.
Springer, New York.
[7] Chan, N. H. and Wei, C. Z. (1988). Limiting distributions of least squares estimates
of unstable autoregressive processes. Ann. Statist. 16 367401. MR0924877
[8] Chen, M. C., Davis, R. A. and Song, L. (2011). Inference for regression models
with errors from a non-invertible MA(1) process. J. Forecast. 30 630.
[9] Davis, R. A., Chen, M. and Dunsmuir, W. T. M. (1995). Inference for MA(1)
processes with a root on or near the unit circle. Probab. Math. Statist. 15 227
242. MR1369801
[10] Davis, R. A. and Dunsmuir, W. T. M. (1996). Maximum likelihood estimation for
MA(1) processes with a root on or near the unit circle. Econometric Theory 12
129. MR1396378
[11] Davis, R. A. and Dunsmuir, W. T. M. (1997). Least absolute deviation estimation
for regression with ARMA errors. J. Theoret. Probab. 10 481497. MR1455154
[12] Davis, R. A., Knight, K. and Liu, J. (1992). M -estimation for autoregressions with
infinite variance. Stochastic Process. Appl. 40 145180. MR1145464
[13] Davis, R. A. and Song, L. (2012). Functional convergence of stochastic integrals
with application to statistical inference. Stochastic Process. Appl. 122 725757.
[14] Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application.
Academic Press, New York. MR0624435
[15] Lehmann, E. L. (1999). Elements of Large-Sample Theory. Springer, New York.
MR1663158
[16] Rosenblatt, M. (2000). Gaussian and Non-Gaussian Linear Time Series and Random Fields. Springer, New York. MR1742357
[17] Sargan, J. D. and Bhargava, A. (1983). Maximum likelihood estimation of regression models with first order moving average errors when the root lies on the unit
circle. Econometrica 51 799820. MR0712371
[18] Shephard, N. (1993). Maximum likelihood estimation of regression models with
stochastic trend components. J. Amer. Statist. Assoc. 88 590595. MR1224385
[19] Smith, R. L. (2008). Statistical trend analysis. In Weather and Climate Extremes in
a Changing Climate (Appendix A) 127132.
[20] Tanaka, K. (1990). Testing for a moving average unit root. Econometric Theory 6
433444. MR1094221
[21] Tanaka, K. (1996). Time Series Analysis: Nonstationary and Noninvertible Distribution Theory. Wiley, New York. MR1397269
Department of Statistics
1255 Amsterdam Ave
Columbia University
New York, New York 10027
USA
Barclays Capital
745 7th Ave
New York, New York 10019
USA