Columbia University and Barclays Capital: The Annals of Statistics 10.1214/11-AOS935 Institute of Mathematical Statistics

The Annals of Statistics
2011, Vol. 39, No. 6, 30623091

DOI: 10.1214/11-AOS935
c Institute of Mathematical Statistics, 2011

arXiv:1203.2496v1 [math.ST] 12 Mar 2012
UNIT ROOTS IN MOVING AVERAGES BEYOND FIRST ORDER1

By Richard A. Davis and Li Song
Columbia University and Barclays Capital
The asymptotic theory of various estimators based on Gaussian
likelihood has been developed for the unit root and near unit root
cases of a first-order moving average model. Previous studies of the
MA(1) unit root problem rely on the special autocovariance structure
of the MA(1) process, in which case, the eigenvalues and eigenvectors of the covariance matrix of the data vector have known analytical
forms. In this paper, we take a different approach to first consider the
joint likelihood by including an augmented initial value as a parameter and then recover the exact likelihood by integrating out the initial
value. This approach by-passes the difficulty of computing an explicit
decomposition of the covariance matrix and can be used to study unit
root behavior in moving averages beyond first order. The asymptotics
of the generalized likelihood ratio (GLR) statistic for testing unit
roots are also studied. The GLR test has operating characteristics
that are competitive with the locally best invariant unbiased (LBIU)
test of Tanaka for some local alternatives and dominates for all other
alternatives.
1. Introduction. In this paper we consider inference for moving average

models that possess one or more unit roots in the moving average polynomial. To introduce the problem, lets first consider the MA(1) model given
by
(1.1)
Xt = Zt 0 Zt1 ,
where 0 R, {Zt } is a sequence of independent and identically distributed

(i.i.d.) random variables with EZt = 0, EZt2 = 02 and density function fZ .
The MA(1) model is invertible if and only if |0 | < 1, since in this case Zt
can be represented explicitly in terms of past values of Xt , that is,
X
0j Xtj .
Zt =
j=0
Received June 2010; revised July 2011.

Supported in part by NSF Grants DMS-07-43459 and DMS-11-07031.
AMS 2000 subject classifications. 62M10.
Key words and phrases. Unit roots, moving average.
1
This is an electronic reprint of the original article published by the

Institute of Mathematical Statistics in The Annals of Statistics,
2011, Vol. 39, No. 6, 30623091. This reprint differs from the original in
pagination and typographic detail.
1
R. A. DAVIS AND L. SONG
Under this invertibility constraint, standard estimation procedures that produce asymptotically normal estimates are readily available. For example,
if represents the maximum likelihood estimator, found by maximizing the
Gaussian likelihood based on the data X1 , . . . , Xn , then it is well known (see
Brockwell and Davis [6]) that
d
(1.2)
n( 0 ) N(0, 1 02 ).
From the form of the limiting variance in (1.2), the asymptotic behavior
let alone the scaling, is not immediately clear in the unit root case
of ,
corresponding to 0 = 1.
In the case fZ is Gaussian, the parameters 0 and 2 are not identifiable without the constraint |0 | 1. In particular, the profile Gaussian loglikelihood, obtained by concentrating out the variance parameter, satisfies
(1.3)
Ln () = Ln (1/).
It follows that = 1 is a critical value of the profile likelihood, and hence

there is a positive probability that = 1 is indeed the maximum likelihood
estimator. If 0 = 1, then it turns out that this probability does not vanish
asymptotically (see, e.g., Anderson and Takemura [1], Tanaka [21] and Davis
and Dunsmuir [10]). This phenomenon is referred to as the pile-up effect.
For the case that 0 = 1 or is near one in the sense that 0 = 1 + /n, it was
shown in Davis and Dunsmuir [10] that
d
n( 0 ) ,
where is a random variable with a discrete component at 0, corresponding

to the asymptotic pile-up effect, and a continuous component. Most of the
early work on this problem was based on explicit knowledge of the eigenvectors and eigenvalues of the covariance matrix for observations from an
MA(1) process; see Anderson and Takemura [1]. Recently, Breidt et al. [4]
and Davis and Song [13] looked at model (1.1) under the Laplace likelihood
and the Gaussian likelihood without resorting to knowledge of the precise
form of eigenvectors and eigenvalues of the covariance matrix. Instead they
introduced an auxiliary variable, which acts like an initial value and can be
integrated out to form the likelihood.
With a couple exceptions, most of previous work dealt exclusively with the
zero-mean case. Sargan and Bhargava [17] and Shephard [18] showed that
for the nonzero mean case, the so-called pile-up effect is more severe than the
zero mean case. Chen, Davis and Song [8] extended the results from Davis
and Dunsmuir [10] to regression models with errors from a noninvertible
MA(1) process. It is shown that, with a mean term present in the model,
the pile-up probability goes up to more than 0.95.
The MA unit root problem can arise in many modeling contexts, especially if a time series exhibits trend and seasonality. For example, in personal
communication, Richard Smith has mentioned the presence of a unit in mod-
UNIT ROOTS IN MOVING AVERAGES
eling some environmental time series related to climate change [19]. After
detrending and fitting an ARMA model to the time series, Smith noticed
that the MA component appeared to have a unit root. One explanation for
this phenomenon is that detrending often involves the application of a highpass filter to the time series. In particular, the filter diminishes or obliterates
any power in the time series at low frequencies (including the 0 frequency).
Consequently, the detrended data will have a spectrum with 0 power at frequency 0, which can only be fitted with ARMA process that has a unit root
in the MA component. While we only consider unit roots in higher order
moving averages in this paper, we believe the techniques developed here will
be applicable in a more general framework of an ARMA model. This will be
the subject of future investigation.
In this paper, we will use the stochastic approaches described in [4]
and [13] to first study the case when there is a regression component in
the time series and errors are generated from noninvertible MA(1). A vital
issue in extending these results to higher order MA models is the scaling
required for the auxiliary variable. The scaling used for the regression problem in the MA(1) case provides insight into the way in which the auxiliary
variable should be scaled in the higher order case. Quite surprisingly, when
there is only one unit root in the MA(2) process, that is,
(1.4)
Xt = Zt + c1 Zt1 + c2 Zt2 ,
where c1 c2 = 1 and {Zt } i.i.d. (0, 2 ), the asymptotic distribution of

the maximum likelihood estimator (
c1 , c2 ) is exactly the same as in invertible
MA(2) case; see [6]. That is,

c1 c1 d
1 c22
c1 (1 c2 )
(1.5)
.
n
N 0,
c2 c2
c1 (1 c2 )
1 c22
One difference, however, is that c1 and c2 are now totally dependent asymptotically [c1 (1 c2 ) = (1 c2 )2 ].
As seen from (1.3), the first derivative of the profile likelihood function
is always 0 when = 1. Therefore, the development of typical score tests or
Wald tests is intractable in this case. Davis, Chen and Dunsmuir [9] used
the asymptotic result from [10] to develop a test of H0 : = 1 based on MLE
and the generalized likelihood ratio. Interestingly, we will see that the estimator of the unit root in the MA(2) case has the same limit distribution
as the corresponding estimator in the MA(1) case. Thus, we can extend the
methods used in the MA(1) case to test for unit roots in the MA(2) case.
The paper is organized as follows. In Section 2, we demonstrate our
method of proof applied to the MA(1) model with regression. This case
plays a key role in the extension to higher order MAs. Section 3 contains
the results for the unit root problem in the MA(2) case. In Section 4, we
compare likelihood based tests with Tanakas locally best invariant and un-
biased (LBIU) test [20] for testing the presence of a unit root. It is shown
that the likelihood ratio test performs quite well in comparison to the LBIU
test. In Section 5, numerical simulation results are presented to illustrate
the theory of Section 3. In Section 6, there is a brief discussion that connects the auxiliary variables in higher order MAs with terms in a regression
model with MA(1) errors. Finally, in Section 7, the procedure for handling
the MA(q) case with q 3 is outlined. It is shown that the tools used in
the MA(1) and MA(2) cases are still applicable and are, in fact, sufficient
in dealing with higher order cases.
2. MA(1) with nonzero mean. In this section, we will extend the methods of Breidt et al. [4] and Davis and Song [13] to a regression model with
MA(1) errors. These results turn out to have connections with the asymptotics in the higher order unit root cases (see Section 6). First, consider the
model
p
X
Xt =
(2.1)
bk0 fk (t/n) + Zt 0 Zt1 ,
k=0
where {Zt } is defined as in (1.1), 0 = 1, bk0 , k = 0, . . . , p, are regression

coefficients and fk (t/n), k = 0, . . . , p, are covariates at time t. Notice that the
covariates fk (t/n) are also assumed
to be functions on [0, 1]. Note that the
P
detrended series Yt = Xt pk=0 bk fk (t/n) has exactly the same likelihood
as the one for the zero-mean case. As shown in [13], by concentrating out the
scale parameter , maximizing the joint Gaussian likelihood is equivalent to
minimizing the following objective function:
n
X
(2.2)
zt2
for || 1,
ln (~b, , zinit) =
t=0
where ~b = (b0 , . . . , bp ) , Zinit = Z0 , and zi is given by
zi = Yi + Yi1 + + i1 Y1 + i zinit
!
!
p
p
X
X
= Xi
bk fk (i/n) + Xi1
bk fk ((i 1)/n) +
k=0
+ i1 X1
=
Zi Zi1 +
+
i1
k=0
p
X
bk fk (1/n) + i zinit
k=0
p
X
k=0
bk0 fk (i/n)
Z1 Z0 +
p
X
k=0
p
X
k=0
bk fk (i/n) +
bk0 fk (1/n)
p
X
k=0
bk fk (1/n) + i zinit
= Zi (1 )
+
p
X
k=0
i1
X
j=0
(bk0 bk )
:= Zi yi +
:= Zi wi .
p
X
k=0
i1j i (Z0 zinit )

i
X
ij
fk (j/n)
j=1
(bk0 bk )
i
X
ij fk (j/n)
j=1
As in [13], we adopt the parametrization for and zinit given by

=1+
and
0
zinit = Z0 + .
n
Further set
bk = bk0 +
(2.3)
k 0
.
n3/2
Note that (2.3) essentially characterizes the convergence rate of the estimated bk to its true value bk0 . At first glance, this parameterization may look
odd since it depends on the known parameter values, which are unavailable.
This form of reparameterization is used only for deriving the asymptotic theory of the maximum likelihood estimators and not for estimation purposes.
One notes that = n( 1), k = n3/2 (bk bk0 ), so that the asymptotics
of the MLE and bk of the associated parameters are found by the limiting behavior of = n( 1), k = n3/2 (bk bk0 ). Hence, it is not necessary
to know the true values in this analysis. The scaling n3/2 for the regression coefficients is an artifact of the assumption that the regressors take the
form fk (t/n) that is imposed on the problem. This also results in a clean
expression for the limit.
Under the (~
, , ) parameterization, it is easily seen [13], minimizing ln (~b,
, zinit ) with respect to ~b, , zinit is equivalent to minimizing the function
(2.4)
Un (~
, , )
1
[ln (~b, , zinit) ln (~b0 , 1, Z0 )]
02
with respect to ~ , and . Then using the weak convergence results in Davis
and Song [13],
Un (~
, , )
n
n
n
X
1 X 2
wi Zi X wi2
= 2
zi Zi2 = 2
+
0 i=0
02
2
i=0
i=0 0

d
2
+
1Z s
0
0
p
X
Z 1 Z
0
k=0
1
(st)
dW (t) dW (s) + 2
s
(st)
es dW (s)
fk (t) dt dW (s)
e(st) dW (t) + es
:= U (~
, , ),
p
X
k=0
e(st) fk (t) dt
!2
ds
where indicates weak convergence on C(Rp+1 (, 0] R). Throughout this paper, when referring to convergence of stochastic processes on C(Rk ),
p
the notation () means convergence in distribution (probability)

on C(K) where K is any compact set in Rk .
As a special case of a polynomial, set fk (t) = tk . In this case, the limiting
process U (~
, , ) is
Z 1Z s
e(st) dW (t) dW (s)
U (~
, , ) = 2
0
+2
0
1
dW (s) 2
s
0
p
X
k=0
Z 1 Z
0
(st) k
t dt dW (s)
e(st) dW (t) + es
p
X
k=0
e(st) tk dt
!2
ds.
From now on we consider the simple case of just a nonzero mean, that is,
p = 0 and f0 (t) = 1. The formula further simplifies to
Z 1Z s
U (0 , , ) = 2
e(st) dW (t) dW (s)
0
0
(2.5)
Z 1
Z 1
1 es
s
e dW (s) 20
+2
dW (s)
0
0

Z 1 Z s
1 es 2
(st)
s
ds.
+
e
dW (t) + e 0
0
0
As shown in [13], one can recover the exact likelihood by integrating out
the initial parameter effects. More specifically,
f (xn , zinit ) =
n
Y
f (zt )
t=0
1
2 2
n+1
Pn 2
z
exp t=02 t
2
n+1
P

ln (b0 , , zinit ) ln (b00 , 1, Z0 ) + nt=0 Zt2
exp
=
2 2
2 2

Pn
n+1
1
Un (0 , , )02
Zt2
t=0
=
exp
,
exp
2 2
2 2
2 2

integrating out the augmented variable zinit yields

Z +
f (xn , zinit ) dzinit
f (xn ) =
(2.6)
n+1

Pn
Zt2
t=0
exp
=
2 2
2 2

Z +
0
Un (0 , , )02
d.
exp
2 2
n

A similar argument as in [13] then shows that by profiling out the variance
parameter 2 the exact profile log-likelihood Ln (0 , ) has the following
property:
Ln (0 , ) Ln (0 , 0)
d
(2.7)
L (0 , )

Z +
U (0 , , )
d
exp
= log
2

Z +
U (0 , 0, )
exp
log
d.
2
The weak convergence results on C(R2 ) in (2.7) can be used to show convergence in distribution of a sequence of local maximizers of the objective functions Ln to the maximizer of the limit process L provided the latter is unique
almost surely. This is the content of Remark 1 (see also Lemma 2.2) of Davis,
Knight and Liu [12], which for ease of reference, we state a version here.
Remark 2.1. Suppose {Ln ()} is a sequence of stochastic processes
which converge in distribution to L() on C(Rk ). If L has a unique maximizer a.s., then there exists a sequence of local maximizers {n } of {Ln }
Note that this is consistent with many
that converge in distribution to .
of the statements made in the classical theory for maximum likelihood (see,
e.g., Theorem 7.1.1 of Lehmann [15]) and for inference in nonstandard time
series models; see Theorems 8.2.1 and 8.6.1 in Rosenblatt [16], Breidt et
al. [5], Andrews et al. [3] and Andrews et al. [2]. In some cases, for example, if the {Ln } have concave sample paths, this can be strengthened to
convergence of the global maximizers of Ln . See also Davis, Chen and Dun-
smuir [9], Davis and Dunsmuir [11], Breidt et a.l [5] for examples of other
cases when {Ln } are not concave.
Returning to our example, under the case when 0 = 1, that is, = 0,
the limit of the exact likelihood is L(0 , = 0). This corresponds to the
situation of inference about the mean term when it is known that the driving
noise is an MA(1) process with a unit root. Since the Gaussian likelihood
is a quadratic function of regression coefficients, L(0 , = 0) is a quadratic
function in 0 . Applying Remark 2.1, we obtain that the MLE 0 converges
in distribution to 0 , the global maximizer of L(0 , = 0). In particular, 0
is the value that makes 0 L(0 , = 0) = 0. Since
L(0 , = 0)
0
R +
exp{U (0 , = 0, )/2}((1/2)(U (0 , = 0, )/0 )) d
=
,
R +
exp{U (0 , = 0, )/2} d
where
U (0 , = 0, ) = 2W (1) 20
and
U (0 , = 0, ) = 20
0
Solving
0 L(0 ,
(2.8)
s dW (s) +
s ds 2
( s0 )2 ds
1
0
s dW (s) 2
s ds.
0
= 0) = 0, we find that
Z 1
s dW (s) 6W (1) N(0, 12)
0 = 12
0
and hence
(2.9)
d
n3/2 (b0,n b0 ) = 0 0,n N(0, 1202 ).
This counter-intuitive result was also obtained earlier by Chen et al. [8]. It
says the MLE of the mean term in the process would behave like a normal
distribution asymptotically, but with convergence rate n3/2 . Notice that,
even if one does not know the true value of , the MLE of the mean term
would still behave very much like (2.9) due to the large pile-up effect in this
case. However, the MLE is not asymptotically normal, if both b0 and are
estimated.
3. MA(2) with unit roots. The above approach, which also works in the
invertible case, does not rely on detailed knowledge of the form of the eigenvectors and eigenvalues of the covariance matrix. Hence it has the potential
to work in higher order models where the eigenvector and eigenvalue struc-
Fig. 1.
region defined by c1 c2 1, c1 c2 1, |c2 | 1.
ture is not known explicitly. We will concentrate on the MA(2) process in

this section and further illustrate our methods.
In the following section, we consider the model given in (1.4), where parameters c1 , c2 , the triangular shaped region depicted in Figure 1. The
interior of this region corresponds to the invertibility region of the parameter space. Note that the triangular region is separated into complex roots
and real roots of the MA polynomial 1 + c1 z + c2 z 2 by a quadratic curve
c21 4c2 = 0.
If the parameters are on the boundary of the region, it indicates presence of unit roots. Otherwise, the model is said to be invertible; see also
Brockwell and Davis [6]. Model (1.4) can also be represented in terms of the
roots of the MA polynomial by
Xt = (1 + c1 B + c2 B2 )Zt
= (1 0 B)(1 0 B)Zt ,
where c1 = 0 0 and c2 = 0 0 .
3.1. Case 1: |0 | < 1 and 0 = 1. This case corresponds to the situation
of only one unit root in the MA polynomial, that is, the boundary AB in
Figure 1. Let Ln (, ) be the profile likelihood of an MA(2) process. Again,
we adopt the parametrization
=1+
,
n
0,
and
= 0 + ,
n
R.
10
For convenience, define the intermediate process Yt = (1 0 B)Zt and observe that
Xt = (1 0 B)(1 0 B)Zt = (1 0 B)Yt .
In the MA(2) case, two augmented initial variables Zinit and Yinit are needed.
These initial variables and the joint likelihood have a simple form, that is,
(3.1)
Zinit = Z1
and Yinit = Z0 0 Zinit ,
fX,Yinit ,Zinit (xn , yinit , zinit ) = fY,Yinit,Zinit (yn , yinit , zinit )

= fZ,Zinit (zn , zinit )
=
n
Y
fZ (zj ).
j=1
As what has been shown in the MA(1) case, the key of our method is to
calculate the formula for the residual ri := Zi zi , which can be obtained
from
zi = yi + yi1 + + i1 y1 + i yinit + i+1 zinit
!
!
i1
i
X
X
i1j
i1
ij
i
Xj + yinit +
Xj + yinit +
=
j=1
j=1
+ i1 (X1 + yinit) + i yinit + i+1 zinit

=
i
X
ij+1 ij+1
j=1
(3.2)
Xj +
i+1 i+1
yinit + i+1 zinit
i1
(0 )( 0 ) X ij1
= Zi
Zj
j=1
i1
(0 )(0 ) X ij1
i+1 i+1
Zj +
(yinit Y0 )
j=1
+ i+1 (zinit Z1 ) + (0 )
(3.3)
= Zi ri ,
i+1 i+1
Z1
where the fourth equation (3.2) comes from the fact that Xj = Zj (0 +
0 ) Zj1 + 0 0 Zj2 and Y0 = Z0 0 Z1 . Therefore, the residuals ri are
given by
ri =
i1
(0 )( 0 ) X ij1
Zj
j=1
11
(3.4)
i1
i+1 i+1
(0 )(0 ) X ij1
Zj
(yinit Y0 )
+
j=1
i+1 i+1
Z1 .
Notice that the residuals ri no longer have a neat form as in the MA(1) case.
This is what makes the MA(2) case more interesting yet more complicated.
In the following calculations, let
0 1
0 2
yinit = Y0 +
and zinit = Z1 + .
n
n
i+1 (zinit Z1 ) (0 )
With a similar argument as in [13], we opt to minimize the objective function

Un (, , 1 , 2 ) = 2
(3.5)
n
n
X
X
ri Zi
ri2
+
.
02
2
i=1
i=1 0
First note that ri = Ai + Bi + Ci + Di , where
i1
i+1 i+1
(0 )( 0 ) X ij1
Zj
(yinit Y0 ),
Ai :=
j=1
i1
(0 )(0 ) X ij1
Bi :=
Zj ,
j=1
i+1
Ci :=
(zinit Z1 ),
i+1 i+1
Z1 .
P
To determine the weak limit of 2 ni=1 ri Z2 i in (3.5) in the continuous
0
function space, note that
Di := (0 )
n
n
i1
X
Ai Zi
( 0 )( 0 ) X X ij1 Zj Zi
2
=2
0 0
02
i=1
i=1 j=1
n
n
X
X
21
21
Zi
i+1 Zi
i+1
0
0
n( )
n( )
i=1
=2
(3.6)
i=1

i1
n X
X
(1 0 + /n)
ij1 Zj Zi
1+
1 0 + /n / n
n
0 n0
i=1 j=1

n
X
i+1 Zi
21
1+
+
(1 0 + /n / n)
n
n0
i=1
12

n
X
21
i+1 Zi
0 +
n(1 0 + /n / n)
n
n0
i=1
1Z
(st)
21
dW (t) dW (s) +
1 0
es dW (s),
where the last term disappears in the limit due to the fact that |0 | < 1.
Similarly, we have
2
i1
n
n
X
( 0 )(1 ) X X ij1 Zj Zi
Bi Zi
=
2
0 0
02
i=1 j=1
i=1
(1 0 / n)
=2
1 0 + /n / n

n X
i1
X
ij1 Zj Zi
0 +
n
0 n0
i=1 j=1
(3.7)
= 2
i1
n X
X
i=1 j=1
ij1
0
Zj Zi
+ op (1)
0 n0
(3.8)
2N,
1
where N N(0, 1
2 ). The third equality holds because |0 | is strictly
0
smaller than 1, and op (1) is uniform in on any compact set of R. The
weak convergence from (3.7) to (3.8) follows from martingale central limit
theorem; see Hall and Heyde [14]. It can also be shown that N and the W (t)
process from (3.6) are independent; see Theorem 2.2 in Chan and Wei [7].
Following similar arguments, it is easy to show that
n
n
X
X
Di Zi p
Ci Zi p
0
and
2
0.
2
2
0
02
i=1
i=1
For the second term in (3.5), writing
n
n
X
X
A2i + Bi2 + Ci2 + Di2
ri2
=
2 i=1
02
i=1 0
n
X
2Ai Bi + 2Ai Ci + 2Ai Di + 2Bi Ci + 2Bi Di + 2Ci Di
,
02
i=1
and using Corollary 2.10 in [13], we have

2
Z 1 Z s
n
X
1
A2i d
(st)
s
e
dW (t) +
e
(3.9)
ds,
1 0
2
0
0
i=1 0
13
(3.10)
n
X
Bi2 p 2
var(N ).
2
i=1 0
Moreover, it is relatively easy to show that

n
n
X
X
Ci2 p
Di2 p
(3.11)
0
and
0.
2
2
i=1 0
i=1 0
Next we show that all the cross product terms also vanish in the limit,
namely,
n
X
2Ai Bi + 2Ai Ci + 2Ai Di + 2Bi Ci + 2Bi Di + 2Ci Di p
(3.12)
0.
02
i=1
P
p
i
Here we only give the details for showing ni=1 Ai B
0; the other cases
2
0
can be proved in an analogous manner. Notice that for any fixed M > 0 and
any [M, 0],
n
X
Ai Bi /n(1 0 + /n)/ n(1 0 / n)
2 =
(1 0 + /n / n)2
0
i=1
! i1
!
i1
n
X
X
X
Z
Z
j
j
ij1
ij1
0
0
i=1 j=1
j=1
(/ n)(1 0 / n)(1 / n)
+
(1 0 + /n / n)2
"
#
i1
n
X
X
Z
j
ij1
( i+1 i+1 )
(3.13)
0
j=1
i=1
n
X
=
n
i=1
i1
X
Zj
ij1
0
0
j=1
i1
X
j=1
1+
n
ij1
Z
j
n0
"
#

n
i1
1 X
i+1 X ij1 Zj
+
0
1+
n
n
0
i=1
j=1
n
i1
1 X X 2ij Zj
+ op (1),
0
n
0
i=1 j=1
where op (1) is uniform in and on any compact set in R R. Setting

P
Ri = ij=1 ij
0 Zj /0 , it follows that Ri is a stationary AR(1) process
satisfying
Ri = 0 Ri1 + Zi /0 .
14
Since |0 | < 1, we can apply Theorem 3.7. in Tanaka [21] to obtain

[nt]
1 X
d
Ri
S(t),
Sn (t) :=
n
i=0
1
10
l
l=0 0
=
where
=
and S(t) is a standard Brownian motion. Also,
since Ri is adapted to the -fields Fi generated by Z0 , . . . , Zi . By Theorem 2.1 in [13], we obtain

Z 1
n
1 X
i+1
d
es dS(s)
on C[M, 0].
Ri1
1+
n
n
0
i=1
Therefore,
(3.14)
"
#

i1
n
i+1 X ij1 Zj p
1 X
0
1+
0
n
n
0
on C[M, 0].
j=1
i=1
It is also easy to see that
n
n
i1
1 X X 2ij Zj
1 X i+1
p
0 Ri1 0.
=
0
n
0
n
(3.15)
i=1 j=1
Since
n
X
(3.16)
i=1
i=1
i1
X
j=1
1+
n
ij1
Z
j
n0
Ri1
is in the form of the double sum in Theorem 2.8 in [13], except that {Ri }
is no longer a martingale difference sequence. However, we can still follow
the proof of Theorem 2.8 in [13] and show that (3.16) has a nondegenerate
weak limit in C[M, 0]. It follows that
! i1
!

i1
n
X
X X ij1 Zj
ij1 Zj
0
1+
n
0
n
n0
i=1 j=1
j=1
(3.17)
!

i1
n
ij1 Zj
X X
Ri1 p
0.
1+
=
n
n
n0
n
i=1
j=1
Thus, combining (3.14), (3.15) and (3.17), we conclude that the terms in (3.13)
go to 0 in probability on C[M, 0]. The convergence in probability of the
other terms in (3.12) can also be proved in a similar way. To sum up, we
have shown the key stochastic process convergence result, that is,
Un (, , 1 , 2 )
d
U (, , 1 )
(3.18)
= 2
1Z s
Z 1
21
es dW (s)
1 0 0
2
s
e
ds + 2 var(N ).
e(st) dW (t) dW (s) + 2N +
Z 1 Z s
e(st) dW (t) +
+
15
1
1 0
Using (3.18), one can easily derive the asymptotics for the exact profile
log-likelihood denoted by Ln (, ). In particular,
(3.19)
Ln (, ) Ln (0, 0)

Z +
U (, , 1 )
d
exp
log
d1
2

Z +
U (0, 0, 1 )
exp
log
d1
2
:= L (, )

Z +
U (, )
2
var(N ) + log
exp
d
= N
2
2

Z +
U (0, )
log
exp
d ,
2
(3.20)
where =
(3.21)
1
10
and U (, ) is given by

Z 1 Z s
U (, ) = 2
e(st) dW (t) + es dW (s)

0
2
Z 1 Z s
(st)
s
+
e
dW (t) + e
ds,
0
which is the limiting process of the joint likelihood obtained in the unit root
MA(1) case, see also Davis and Song [13]. We state the key result of this
paper in the following theorem.
Theorem 3.1. Consider the model given in (1.4) with two roots and
which are parameterized by
=1+
and
= 0 + .
n
Denote the profile log-likelihood based on a Gaussian likelihood as Ln (, ).

Then Ln (, ) satisfies
d
Ln (, ) Ln (0, 0) L (, )
on C([, 0] R),
16
where
2
var(N ) + U ()
2
(3.22)
1
2
d
var(N ) + Z0 ().
= N
2
2
The processes U () and Z0 () are defined by

Z +
U (, )
exp
U () = log
d
2
(3.23)

Z +
U (0, )
exp
log
d
2
and

X
X
2 2 k2 Xk2
2 k2
Z0 () =
log 2 2
+
(3.24)
.
( 2 k2 + 2 ) 2 k2
k + 2
L (, ) = N
k=1
k=1
Furthermore, there exists a sequence of local maxima n , n of Ln (, ) converging in distribution to MLE , MLE , the global maximum of the limiting
process U (, ). If model (1.4) has, at most, one unit root, then for the
estimators c1 and c2 , we have

c1 (1 c2 )
1 c22
c1 c1 d
(3.25)
.
N 0,
n
c2 c2
c1 (1 c2 )
1 c22
Remark 3.2. The equivalence in distribution of the processes U ()

and 12 Z0 () is given in Theorem 4.3 in Davis and Song [13]. As mentioned in
Davis and Dunsmiur [10], convergence on C(-,0] does not necessarily imply
convergence of the corresponding global maximizers. Additional arguments
were required to show that the maximum likelihood estimator converged in
distribution to the global maximizer of the limit process. We suspect that
the same holds here for MLE,n and MLE,n and simulation results, some of
which are contained in Sections 4 and 5, bear this out.
Remark 3.3. To establish the convergence in (3.25), if there is exactly
one unit root, then
MLE
N
d
n(
c1 c1 ) = MLE
MLE =
n
var(N )
d
= N(0, 1 20 ) = N(0, 1 c22 ),
0 MLE MLE MLE d

MLE
+
n(
c2 c2 ) = MLE +
n
n
N
d
=
= N(0, 1 20 ) = N(0, 1 c22 ).
var(N )
17
Here, we use the fact that MLE < a.s. as stated in (Theorem 4.3 in [13]).
One can also calculate the limiting asymptotic covariance of c1 and c2 as
var(
MLE ) = (1 20 ) = (1 + 0 )(1 0 )
= c1 (1 c2 ).
Remark 3.4. The above theorem says that when |0 | < 1 and 0 = 1,
we have a similar asymptotic result for c1 and c2 as in the invertible case.
If we only consider the original parameters
c1 andc2 , the effect of the unit
root disappears in the limit. But n(

c1 c1 ) and n(
c2 c2 ) are perfectly
dependent in the limit, since c1 (1 c2 ) = 1 c22 .
Remark 3.5. The estimated roots and
calculated from c1 and c2
are asymptotically independent. Interestingly, MLE corresponding to the
unit root in MA(2) has exactly the same distribution as the MLE in the
MA(1) case. So the pile-up and other properties of MLE follow exactly from
those in the MA(1) case. It may seem surprising that the unit root in the
MA(2) model (when there is only one unit root) behaves asymptotically just
like the unit root in MA(1) case. To see this, consider the situation where
we are given the parameter and = 0 . In this case, = 0 and

Z +
U (, )
d
exp
Ln (, 0) Ln (0, 0) log
d
2

Z +
U (0, )
exp
log
d ,
2
which is the limiting process of the exact profile log-likelihood in the MA(1)
case. On the other hand when is given, becomes the only parameter that
needs to be estimated
(3.26)
Xt = (1 0 B)(1 B)Zt .
Because of the invertibility of the operator 1 0 B, we can get an intermediate process Yt by inverting the operator. Namely,
X
1
(3.27)
k0 Xtk = (1 B)Zt .
Xt =
Yt :=
(1 0 B)
k=0
Since we are dealing with asymptotics, inverting the operator 1 0 B is

feasible. Therefore, the transformed process Yt is indeed an MA(1) process
with the true parameter 0 = 1. Then it follows naturally that the properties
of the estimator of in this situation should be equivalent to those of in
a unit root MA(1) process.
3.2. MA(2) with two unit roots. In moving from the unit root problem
for the MA(1) model to the MA(2) model, several new and challenging
18
problems arise. In this subsection, we discuss some issues when there are
two unit roots in the MA polynomial.
3.2.1. Case 2: c2 = 1 and c1 6= 2. This corresponds to the case that the
true parameters are on the boundary c2 = 1, that is, the boundary AC in
Figure 1, which means the two roots live on the unit circle and are not real
valued. Denote the two generic complex valued roots of the MA polynomial
~
~
by = rei and = rei . To avoid confusion in notation, we use ~i to
represent 1. A rather different representation of the residuals ri is used

in this case, that is,
ri =
i1
(0 )( 0 ) X ij1
Zj

j=1
i1
0 )
X
(0 )(
+
ij1 Zj

j=1
(3.28)
i+1 i+1
(zinit,0 Z0 )

i i )
(
(zinit,1 Z1 )

( + 0 0 )(i+1 i+1 )
+
Z1 .

+
We also adopt the parameterization for r, and two initial variables given by
and = 0 + ,
r=1+
n
n
0 1
0 2
zinit,0 = Z0 +
and zinit,1 = Z1 + .
n
n
P
P
r2
Again, we study the limiting process of 2 ni=1 ri Z2 i + ni=1 i2 . Here we
0
0
P
only present the first term of ni=1 ri Z2 i for illustration; the limit of the other
0
terms can be derived in a similar fashion. By Theorem 2.8 in [13], we obtain
n i1
1 X X ij Zj Zi
n
02
i=0 j=1
n
X
i=0
i1
X
j=1
1Z s
0
1+
n
~
ij

~i0 j ! ~i0 i
i
j
e
Zj e Zi
exp ~i
n
n0
n0
e(st)+i(st) dW(t) dW(s),
19
where W(t) is a two-dimensional Brownian motion, W(t) = W1 (t) + ~iW2 (t),

W(t) = W1 (t) ~iW2 (t) and W1 (t) and W2 (t) are the corresponding weak
limits of the sum
W1,n (t) =
[nt]
X
k=0
Zk
cos(k0 )
n0
and W2,n (t) =
[nt]
X
k=0
Zk
.
sin(k0 )
n0
The weak convergence of W1,n (t) and W2,n (t) to two independent Brownian
motions is guaranteed by Theorem 2.2 in Chan and Wei [7].
By Theorem 2.1 in [13] we have
Z
n
1 X i Zi d 1 s+~is
e
dW(s).
0
n
0
i=0
Therefore, (3.28) leads to

n
X
ri Zi d
~
2
4 ( cos 0 sin 0 + cos 0 + ~i sin 0 )ei0
2
0
i=1

Z 1Z s
(st)+~i(st)
e
dW(t) dW(s)
(3.29)
+ 41
42
ei0
2~i sin 0
1
2~i sin 0
Z
Z
s+~is

dW(s)
s+~is

dW(s) ,
e
0
1
e
0
where
means the real part of a complex function. The weak limit of
Pn {}
2
2
manner using Coroli=1 ri /0 can also be computed in an analogous
P
lary 2.10 in [13]. However, the weak limit of ni=1 ri2 /02 has an even more
complicated form than (3.29).
By integrating out the auxiliary variables, the exact likelihood can be
recovered as well. However, the form of the joint likelihood function is
much more complicated than the one computed in the one unit root case.
The asymptotic properties and pile-up probabilities in this case remain unknown.
3.2.2. Case 3: c2 = 1 and c1 = 2. This corresponds to the vertex A in
the -region in Figure 1. It is convenient to first consider a special case of
local asymptotics when the approach to the corner is through the boundary
c1 c2 = 1. With this constraint, the dimension of the parameters has been
reduced from two to one. We parameterize the MA(2) in this case by
(3.30)
Xt = Zt ( + 1)Zt1 + Zt2
20
and define a Zinit and a Yinit as in (3.1), but with different normalization,
that is,
(3.31) = 1 +
,
n
Yinit = Y0 +
0 2
and Zinit = Z1 + .
n
0 1
n3/2
Then, with the help of the theorems in Davis and Song [13], it follows that
n
n
X
X
ri Zi
ri2
Un (, 1 , 2 ) = 2
+
02
2
i=1
i=1 0
Z 1Z s
d
e(st) dW (t) dW (s)
2
0
(3.32)
+ 22
1
0
es dW (s)
21
(1 es ) dW (s)

Z 1 Z s
1 es 2
(st)
s
ds.
e
dW (t) + 2 e 21
+
0
0
There is a connection between this limiting process and the one in (2.5)
derived for the limiting process for an MA(1) model with a nonzero mean.
Notice that in (2.5), U (0 , , ) is exactly the process we just derived with 1
and 2 replaced by and 0 . This leads us to an interesting connection of
the mean term in the lower order MA model and the initial value in the
higher order MA model, which we will discuss further in the Section 6.
Alternatively, if we do not impose the constraint c1 c2 = 1, there are
two possible ways to parameterize the roots. First, the vertex can be approached through the real region, where c1 = , c2 = and the roots
are parameterized further as
and = 1 +
,
n

+
c1 = 1 1 +
n
and c2 = 1 +

1
+
+o
.
n
n
=1+
which makes
The second parameterization is through the complex region, in which the

~
~
roots are rei and rei with c1 = 2r cos(), c2 = r 2 . The radius and the
angular parts are further parameterized as
r=1+
and =
,
n
which implies

1
2
+o
c1 = 1 1 +
n
n

1
2
+o
and c2 = 1 +
.
n
n
21
Therefore, in either case, if we ignore the higher order terms, c1 and c2 can
be approximated as

and c2 = 1 + .
c1 = 1 1 +
n
n
This parameterization, however, is exactly the one we have seen in the conditional case, which suggests that one of the unit roots has pile-up with
probability one asymptotically while the other unit root behaves like the
unit root in the conditional case; see (3.30) and (3.32). This claim is also
supported by the simulation results; see Table 4 in Section 5.
4. Testing for a unit root in an MA(2) model. A direct application of
the results in the previous section is testing for the presence of a unit root
in the MA(2) model. For the testing problem, we extend the idea of a generalized likelihood ratio test proposed in Davis, Chen and Dunsmuir [9] to
the MA(2) case. Tests based on MLE are also considered in this section. We
will compare these tests with the score-type test of Tanaka [20].
To specify our hypothesis testing problem in the MA(2) case, the null
hypothesis is H0 : there is exactly one unit root in the MA polynomial, and
the alternative is HA : there are no unit roots. The asymptotic theory of the
previous section allows us to approximate the nominal power against local
alternatives. To set up the problem, for the model

Zt1 + 1 +
Zt2
Xt = Zt + 1 +
n
n
with || < 1. We want to test H0 : = 0 versus HA : < 0.
To describe the test based on the generalized likelihood ratio, let GLRn =
2(Ln (MLE , MLE ) Ln (0, MLE,0 )), where MLE,0 is the MLE of when
d
= 0. An application of Theorem 3.1 gives GLRn L (MLE , MLE )
L (0, MLE ) = U (MLE ), where L (, ) and U () are given in (3.22)
and (3.23) and MLE = N/ var(N ). Notice that the limit distribution of
GLRn only depends on MLE , and serves as a nuisance parameter, which
does not play a role in the limit. Define the (1 )th asymptotic quantile bGLR () and bMLE () as
P(U (MLE ) > bGLR ()) =
and
P(MLE > bMLE ()) = .
Since the limiting random variables U (MLE ) and MLE are the same as
in the MA(1) unit root case, the critical values of bGLR () and bMLE () are
the same as those provided in Table 3.2 of Davis, Chen and Dunsmuir [9].
There has been limited research on the testing for a unit root in the
MA(2) case. One approach, proposed by Tanaka, was based on a score type
of statistic, which is locally best invariant and unbiased (LBIU). However,
22
Fig. 2. Power curve with respect to local alternatives when = 0.3 (upper) and when
= 0.5 (lower). Sample size n = 50. The size of the test is set to be 0.05.
implementation of this test requires choosing a sequence ln at a suitable

rate. One choice is ln = o(n1/4 ), yet this may not always work well, especially
if > 0; see also [20]. Next we compare the power curves of the three tests
for sample size n = 50.
Figure 2 below shows the power curves based on MLE, GLR and LBIU
tests, when the invertible root in the MA(2) model is 0.3 and 0.5,
respectively. Since the score-type test of Tanaka is demonstrated to be locally best invariant unbiased, it has a very small edge on the GLR test up
to the local alternative 4 or so. Thereafter, the GLR test increasingly outperforms the LBIU test by a wide margin. When the sample size is 50, the
local alternative parameter corresponds to = 1 4/50 = 0.92. Also, as seen
in Figure 2, the power function based on the MLE dominates the power
function of the LBIU test for local alternatives greater than 8 or 9.
In the case when > 0 especially for small sample sizes like 50, the behavior of the tests based on MLE and LBIU are very poor. This is because
when > 0 and there is one unit root, the two parameters c1 and c2 lie on
the boundary c1 c2 = 1 which is close to the complex region boundary
c21 4c2 = 0. But our asymptotic results are derived in a way which assumes
that the two roots are only approaching the limit through the real region.
This holds asymptotically, but in finite sample cases, when we maximize the
likelihood jointly over c1 and c2 , it is likely that the two maximizers would
fall into the complex region. As gets closer to 1 this effect becomes more
severe. Thus we do not recommend using the test based on the MLE when
the invertible root is likely to be negative. Using the test based on MLE
usually gives larger size of the test. The LBIU is not good in this case either
as pointed out in Tanaka [20]. The upper tail probabilities are greatly underestimated when gets closer to 1, and hence H0 tends to be accepted
much more often. Simulation results show that when the sample size is 50,
and the true is 0.3 and 0.5, the corresponding size of the LBIU test is
23
Fig. 3. Power curve with respect to local alternatives when = 0. Sample size n = 50.
The size of the test is set to be 0.05.
0.0119 and 0.0015 which are much smaller than the nominal size 0.05. GLR
seems to be the best among the three choices. This is due to the fact that
the GLR only considers the maximum value of the likelihood ratio instead
of the MLE of c1 and c2 . Therefore, even if c1 and c2 are in the complex
region, the GLR test can still be carried out whereas the test based on MLE
is not even well defined in this case. Although the size of the GLR test is
often slightly greater than the nominal size, GLR gives the best performance
under this situation.
Finally, we compare these tests when = 0; that is, the model is in fact
a unit root MA(1). The test developed for the MA(2) case is still applicable.
The results are summarized in Figure 3. Clearly, the power functions of
the tests designed for the MA(1) dominate the power functions of their
counterparts designed for the MA(2). However, it is surprising that for large
local alternatives (greater than 9 or so), the GLR for the MA(2) model
outperforms the LBIU for the MA(1) model.
5. Numerical simulations. In this section, we present simulation results
that illustrate the theory from Section 3. Realizations were simulated from
the MA(2) process given by
(5.1)
Xt = Zt (1 + )Zt1 + Zt2 ,
where takes the values 0.3, 0 and 0.3, respectively. The MA(2) model
was replicated 10,000 times for each choice of , and then the MLEs for
the MA(2) coefficients 1 and 2 were calculated for each replicate. The
empirical pile-up probability, the empirical variance and MSE of the MLEs
are reported in Tables 1 to 3. Notice that the numbers in the tables
for the
variance and the MSE are reported for the normalized estimates n(
ci ci ),
i = 1, 2.
24

Table 1
Summary of the case: = 0.3
Sample
size
25
50
100
400
1,000
Pile-up
probability
Variance
of c1
MSE
of c1
Variance
of c2
MSE
of c2
Correlation
of c1 and c2
0.5436
0.6041
0.6234
0.6398
0.6437
2.1701
1.4063
1.1108
0.9788
0.9290
2.1970
1.4118
1.1108
0.9788
0.9290
2.4455
1.4967
1.1490
0.9854
0.9327
2.6536
1.5553
1.1636
0.9890
0.9338
0.9347
0.9644
0.9815
0.9953
0.9981
Table 2
Summary of the case: = 0 [MA(1) with a unit root]
Sample
size
25
50
100
400
1,000
Pile-up
probability
Variance
of c1
MSE
of c1
Variance
of c2
MSE
of c2
Correlation
of c1 and c2
0.5870
0.6182
0.6220
0.6318
0.6334
2.1624
1.3661
1.1661
1.0440
1.0329
2.1629
1.3670
1.1670
1.0441
1.0330
2.5037
1.4690
1.2082
1.0544
1.0351
2.6355
1.5053
1.2224
1.0578
1.0384
0.8792
0.9378
0.9662
0.9918
0.9966
Table 3
Summary of the case: = 0.3
Sample
size
25
50
100
400
1,000
Pile-up
probability
Variance
of c1
MSE
of c1
Variance
of c2
MSE
of c2
Correlation
of c1 and c2
0.6171
0.6347
0.6447
0.6472
0.6511
1.8370
1.2820
1.0748
0.9245
0.9232
1.8806
1.3053
1.0853
0.9267
0.9242
2.1654
1.3647
1.1215
0.9316
0.9256
2.2287
1.3820
1.1299
0.9339
0.9263
0.7950
0.8938
0.9397
0.9822
0.9933
As seen in the tables, the correlation of c1 and c2 is increasing to 1 with

the sample size. The variances and the MSEs are converging to the theoretical value 1 c22 . As pointed out in [10] and [9], the asymptotic results
work remarkably well even for small sample sizes in the MA(1) case. Here,
although the pile-up probability is still 0.6518, the rates vary depending
on . For > 0, rates are slow while for < 0 rates are much faster. From
the derivation of the asymptotic results, there are error terms in the likelihood that vanish asymptotically and contribute to a more lethargic rate of
convergence. Again the asymptotic results were derived assuming the roots
are always in the real region, which only holds asymptotically. When the
25
Table 4
Pile-up probabilities for the case: c1 = 2
Sample size
Pile-up probability
100
500
1,000
5,000
0.246
0.804
0.961
0.999
sample size is small and > 0, the MLEs of c1 and c2 are more likely to
be in the complex region than those when < 0. Thus the limiting process
would approximate the likelihood function poorly when > 0, which in turn
results in less pile-up in smaller sample sizes.
Table 4 summarizes the pile-up effects for the model considered in Section 3.2.2, where the two roots of the MA polynomial are both 1. In one
realization, the estimators are said to exhibit a pile-up if the MLEs of c1
and c2 are on the boundary c1 c2 = 1.
As seen in the table, the pile-up probability is increasing to 1 with sample
size. However, the claimed 100% probability of pile-up is not a good approximation for small sample sizes. Even when n = 500, the pile-up is only about
80%.
6. Unit roots and differencing. As pointed out in Section 3.2.2, there is
a link between the mean term in the lower order MA model and the initial
value in the higher order MA model. To illustrate this, consider the simple
case when
Yt = 0 + Zt ,
where {Zt } i.i.d. (0, 02 ). So Yt is an i.i.d. sequence with a common mean.
It is clear that
d
n(
0 ) N(0, 02 ),
where
is the MLE of obtained by maximizing the objective Gaussian
likelihood function. Now suppose we difference the time series to obtain
Xt = (1 B)Yt = Zt Zt1 ,
which becomes an MA(1) process with a unit root. The initial value as
defined before of this differenced process is
Zinit = Z0 = Y0 0 .
From the results in Theorem 4.2 in [13], if it is known that an MA(1) time
series has a unit root, that is, = 0, we have
U ( = 0, ) = 2W (1) + 2 .
26
Clearly,
= W (1) and with our parameterization of zinit , we have
n(zinit Z0 )
n(Y0
Y0 + 0 )
=
=
0
0
n(
0 ) d
d
= W (1) = N(0, 1),

=
0
which is consistent with the classical result. Therefore we can conclude that
whenever we have an MA model with a unit root, the information stored
in the initial value comes from the information of the mean term from the
undifferenced series. So differencing the series will not get rid of the mean
parameter; instead, differencing creates a new parameter Zinit which behaves
like the mean in the undifferenced series and its effect persists even asymptotically. With this, we can now explain easily the result in (2.9). Turning to
a little more complicated model consisting of i.i.d. noise and a linear trend,
that is,
(6.1)
Yt = 0 + b0 t + Zt ,
which, after differencing, delivers an MA(1) model with a unit root and
a nonzero mean given by
Xt = (1 B)Yt = b0 + Zt Zt1 .
d
From (2.9), we know n3/2 (b b0 ) N(0, 1202 ). But this can be obtained
much more easily by analyzing the model (6.1). This is just a simple application of linear regression, and we can get exactly the same asymptotic
result for b.
Now consider the model from Section 2,
where = 1 +
Yt = b0 + Zt Zt1 ,
is near or on the unit circle. By differencing we obtain

Xt = (1 B)Yt = Zt (1 + )Zt1 + Zt2 .
If we define Zinit as before and
Yinit = Y0 = b0 + Z0 Z1 ,
then yinit Yinit can be viewed as b b0 . Since b converges at the rate of n3/2 ,
so does yinit . This explains the parametrization given in (3.31) as well as the
resemblance of (2.5) and (3.32).
7. Going beyond second order. The techniques proposed in this paper
can be adapted to handle the unit root problem for MA(q) with q 3.
However, the complexity of the argument, mostly in terms of bookkeeping,
also increases with the order q. In this section, we outline the procedure
for the MA(3) case, from which extensions to larger orders are straightforward.
27
Suppose {Xt } follows an MA(3) model, which is parameterized in terms

of the reciprocals of the zeros of the MA polynomial, that is,
Xt = Zt (0 + 0 + 0 )Zt1
+ (0 0 + 0 0 + 0 0 )Zt2 0 0 0 Zt3
(7.1)
= (1 0 B)(1 0 B)(1 0 B)Zt

= (1 0 B)(1 0 B)Yt
= (1 0 B)Wt .
For simplicity, assume 0 6= 0 6= 0 . Now we form two intermediate processes Yt and Wt and consider three augmented initial variables defined
by Zinit = Z2 , Yinit = Z1 + 0 Zinit and Winit = Y0 + 0 Yinit . Similar arguments as in Section 3 show that the joint likelihood of (X, Winit , Yinit , Zinit )
has a simple form given by
fX,Winit,Yinit,Zinit (xn , winit , yinit, zinit ) =
n
Y
fZ (zj ).
j=2
As in the MA(1) and MA(2) cases, maximizing this joint likelihood is essentially equivalent to minimizing the objective function
Un =
n
1 X 2
(z Zi2 ).
02 i=2 i
The key to this analysis is to write out the explicit expression for zi which
is basically an estimator for Zi . The following equations are straightforward
to derive:
(7.2)
wk =
k
X
kl Xl + k winit ,
l=1
(7.3)
yj =
j
X
jk wk + j winit + j+1 yinit ,
k=1
(7.4)
zi =
i
X
ij yj + i winit + i ( + )yinit + i+2 zinit .
j=1
Plugging (7.2) into (7.3), we obtain

(7.5)
yj =
j
X
jk+1 jk+1
k=1
Xk +
j+1 j+1
winit + j+1 yinit ,
28
and plugging this into (7.4), we obtain

zi =
i
X
(( ij+1 ij+1 )
j=1
+ ( ij+1 ij+1 ) + (ij+1 ij+1 ))
(( )( )( ))1 Xj

2 i
2 (i i )
( i )
i
+ winit
+
( )( ) ( )( )
i+2 i+2
yinit + i+2 zinit .
While this is a more complicated looking expression than the one encountered in the MA(2) case, the coefficient of Xj in the sum looks very similar
to (3.2), only with more terms. Now replacing Xj with (7.1), zi can be
written as
zi = Zi
(7.6)
i1
X
z
Ci,j
Zj
j=2
Ciw (winit Winit ) Ciy (yinit Yinit ) Ciz (zinit Zinit )

= Zi ri ,
z is the coefficient for Z in z and is a combination of ij , ij

where Ci,j
j
i
y
ij
w
z
and , and Ci , Ci and Ci are coefficients for winit Winit , yinit Yinit and
zinit Zinit . They are linear combinations of i , i and i . For illustration,
assume the MA(3) model has only one unit root with |0 | < 1, |0 | < 1 and
0 = 1. We can then reparameterize the parameters as
=1+
,
n
0,
and the initial values as

0 w
winit = Winit + ,
n
= 0 +
n
0 y
yinit = Yinit +
n
Then the objective function Un becomes

Un (, , , w , y , z )
= 2
(7.7)
n
n
X
X
ri2
ri Zi
+
02
2
i=2 0
i=2
and = 0 + ,
n
and
0 z
zinit = Zinit + .
n
29
= 2
+
n
X
i1
X
i=2
z Zj
Ci,j
0
j=2
n
X
i1
X
i=2
z Zj
Ci,j
0
j=2
Ciw w Ciy y Ciz z

+ + +
n
n
n
y
Ciw w Ci y Ciz z
+
+
+
n
n
n
Zi
0
!2
z , C w , C y and C z , the sum in (7.7)

Because of the special structure of Ci,j
i
i
i
consists of terms that have a similar structure to quantities like

i1
n
n
X
1 X X
ij Zj Zi
i Zi
,
and
1+
1+
n
n
0 0
n
n0
i=2 j=2
i=2
that were used in the MA(1) and MA(2) cases. By using a martingale central limit theorem and theorems proved in Davis and Song [13], one can
establish the weak convergence of Un (, , , w , y , z ) to a random element U (, , , w , y , z ) in C(R6 ). Now arguing as in Section 3, the initial
variables can be integrated out, and the limiting process of the exact profile
log-likelihood can be established.
For general q > 3, the residual ri = zi Zi has the form
ri =
i1
X
j=q+1
z
Ci,j
Zj +
q
X
k=1
Cik (initk INITk ),
where {INIT1 , . . . , INITq } are q augmented initial variables, defined either

through the i.i.d. random variables Zt or through the intermediate processes
z is only a linear combinalike Yt in the above example. Furthermore, Ci,j
tion of (1ij , . . . , qij ), where (1 , . . . , q ) are reciprocals of the roots of the
MA(q) polynomial. Coefficients Cik , k = 1, . . . , q, are only linear combinations of (1i , . . . , qi ). This special structure of ri allows us to apply the weak
convergence theorems
proved in Davis
P and Song [13] to find the limiting proP
cess of Un = 2 ni=q+1 ri Zi /02 + ni=q+1 ri2 /02 , from which the limiting
behavior of the maximum likelihood estimators of the i s can be derived.
Acknowledgments. We would like to thank the referees and the Associate
Editor for their insightful comments, which were incorporated into the final
version of this paper.
REFERENCES
[1] Anderson, T. W. and Takemura, A. (1986). Why do noninvertible estimated moving averages occur? J. Time Series Anal. 7 235254. MR0883008
[2] Andrews, B., Calder, M. and Davis, R. A. (2009). Maximum likelihood estimation for -stable autoregressive processes. Ann. Statist. 37 19461982.
MR2533476
30
[3] Andrews, B., Davis, R. A. and Breidt, F. J. (2006). Maximum likelihood estimation for all-pass time series models. J. Multivariate Anal. 97 16381659.
MR2256234
[4] Breidt, F. J., Davis, R. A., Hsu, N.-J. and Rosenblatt, M. (2006). Pile-up probabilities for the Laplace likelihood estimator of a non-invertible first order moving
average. In Time Series and Related Topics. Institute of Mathematical Statistics
Lecture NotesMonograph Series 52 119. IMS, Beachwood, OH. MR2427836
[5] Breidt, F. J., Davis, R. A. and Trindade, A. A. (2001). Least absolute deviation
estimation for all-pass time series models. Ann. Statist. 29 919946. MR1869234
[6] Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods.
Springer, New York.
[7] Chan, N. H. and Wei, C. Z. (1988). Limiting distributions of least squares estimates
of unstable autoregressive processes. Ann. Statist. 16 367401. MR0924877
[8] Chen, M. C., Davis, R. A. and Song, L. (2011). Inference for regression models
with errors from a non-invertible MA(1) process. J. Forecast. 30 630.
[9] Davis, R. A., Chen, M. and Dunsmuir, W. T. M. (1995). Inference for MA(1)
processes with a root on or near the unit circle. Probab. Math. Statist. 15 227
242. MR1369801
[10] Davis, R. A. and Dunsmuir, W. T. M. (1996). Maximum likelihood estimation for
MA(1) processes with a root on or near the unit circle. Econometric Theory 12
129. MR1396378
[11] Davis, R. A. and Dunsmuir, W. T. M. (1997). Least absolute deviation estimation
for regression with ARMA errors. J. Theoret. Probab. 10 481497. MR1455154
[12] Davis, R. A., Knight, K. and Liu, J. (1992). M -estimation for autoregressions with
infinite variance. Stochastic Process. Appl. 40 145180. MR1145464
[13] Davis, R. A. and Song, L. (2012). Functional convergence of stochastic integrals
with application to statistical inference. Stochastic Process. Appl. 122 725757.
[14] Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application.
Academic Press, New York. MR0624435
[15] Lehmann, E. L. (1999). Elements of Large-Sample Theory. Springer, New York.
MR1663158
[16] Rosenblatt, M. (2000). Gaussian and Non-Gaussian Linear Time Series and Random Fields. Springer, New York. MR1742357
[17] Sargan, J. D. and Bhargava, A. (1983). Maximum likelihood estimation of regression models with first order moving average errors when the root lies on the unit
circle. Econometrica 51 799820. MR0712371
[18] Shephard, N. (1993). Maximum likelihood estimation of regression models with
stochastic trend components. J. Amer. Statist. Assoc. 88 590595. MR1224385
[19] Smith, R. L. (2008). Statistical trend analysis. In Weather and Climate Extremes in
a Changing Climate (Appendix A) 127132.
[20] Tanaka, K. (1990). Testing for a moving average unit root. Econometric Theory 6
433444. MR1094221
[21] Tanaka, K. (1996). Time Series Analysis: Nonstationary and Noninvertible Distribution Theory. Wiley, New York. MR1397269
Department of Statistics
1255 Amsterdam Ave
Columbia University
New York, New York 10027
USA
Barclays Capital
745 7th Ave
New York, New York 10019
USA

Columbia University and Barclays Capital: The Annals of Statistics 10.1214/11-AOS935 Institute of Mathematical Statistics

Uploaded by

Copyright:

Available Formats

Columbia University and Barclays Capital: The Annals of Statistics 10.1214/11-AOS935 Institute of Mathematical Statistics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Columbia University and Barclays Capital: The Annals of Statistics 10.1214/11-AOS935 Institute of Mathematical Statistics

Uploaded by

Copyright:

Available Formats

The Annals of Statistics

2011, Vol. 39, No. 6, 30623091

arXiv:1203.2496v1 [math.ST] 12 Mar 2012

UNIT ROOTS IN MOVING AVERAGES BEYOND FIRST ORDER1

1. Introduction. In this paper we consider inference for moving average

where 0 R, {Zt } is a sequence of independent and identically distributed

Received June 2010; revised July 2011.

This is an electronic reprint of the original article published by the

R. A. DAVIS AND L. SONG

It follows that = 1 is a critical value of the profile likelihood, and hence

where is a random variable with a discrete component at 0, corresponding

UNIT ROOTS IN MOVING AVERAGES

where c1 c2 = 1 and {Zt } i.i.d. (0, 2 ), the asymptotic distribution of

R. A. DAVIS AND L. SONG

where {Zt } is defined as in (1.1), 0 = 1, bk0 , k = 0, . . . , p, are regression

where ~b = (b0 , . . . , bp ) , Zinit = Z0 , and zi is given by

UNIT ROOTS IN MOVING AVERAGES

i1j i (Z0 zinit )

As in [13], we adopt the parametrization for and zinit given by

R. A. DAVIS AND L. SONG

the notation () means convergence in distribution (probability)

UNIT ROOTS IN MOVING AVERAGES

integrating out the augmented variable zinit yields

R. A. DAVIS AND L. SONG

UNIT ROOTS IN MOVING AVERAGES

region defined by c1 c2 1, c1 c2 1, |c2 | 1.

ture is not known explicitly. We will concentrate on the MA(2) process in

R. A. DAVIS AND L. SONG

and Yinit = Z0 0 Zinit ,

fX,Yinit ,Zinit (xn , yinit , zinit ) = fY,Yinit,Zinit (yn , yinit , zinit )

+ i1 (X1 + yinit) + i yinit + i+1 zinit

UNIT ROOTS IN MOVING AVERAGES

With a similar argument as in [13], we opt to minimize the objective function

First note that ri = Ai + Bi + Ci + Di , where

R. A. DAVIS AND L. SONG

For the second term in (3.5), writing

and using Corollary 2.10 in [13], we have

UNIT ROOTS IN MOVING AVERAGES

Moreover, it is relatively easy to show that

where op (1) is uniform in and on any compact set in R R. Setting

R. A. DAVIS AND L. SONG

Since |0 | < 1, we can apply Theorem 3.7. in Tanaka [21] to obtain

It is also easy to see that

UNIT ROOTS IN MOVING AVERAGES

e(st) dW (t) dW (s) + 2N +

e(st) dW (t) + es dW (s)

Denote the profile log-likelihood based on a Gaussian likelihood as Ln (, ).

R. A. DAVIS AND L. SONG

The processes U () and Z0 () are defined by

Remark 3.2. The equivalence in distribution of the processes U ()

= N(0, 1 20 ) = N(0, 1 c22 ),

0 MLE MLE MLE d

UNIT ROOTS IN MOVING AVERAGES

root disappears in the limit. But n(

Since we are dealing with asymptotics, inverting the operator 1 0 B is

R. A. DAVIS AND L. SONG

represent 1. A rather different representation of the residuals ri is used

e(st)+i(st) dW(t) dW(s),