Unit 13

Method of Moments Estimation Maximum Likelihood Estimation Comparison of MOM and MLE
Unit 13: ARMA Estimation
Taylor Brown
Department of Statistics, University of Virginia
Fall 2020
1 / 39
Readings for Unit 13
Textbook chapter 3.5 (pages 115 to 121).
2 / 39
Last Unit
1 ARMA forecasting.
2 Prediction error.
3 Prediction interval.
3 / 39
This Unit
1 Method of Moments Estimation.

2 Maximum Likelihood Estimation.
4 / 39
Motivation
In this unit, we explore a couple of ways to estimate the

parameters for ARMA models: Method of Moments (MOM)
estimation and Maximum Likelihood (ML) estimation.
5 / 39
1 Method of Moments Estimation
2 Maximum Likelihood Estimation
3 Comparison of MOM and MLE
6 / 39
Method of Moments
Let’s start with the method of moments (MOM) estimation. The

idea behind this is to equate population moments to sample
moments and then solve for the parameters in terms of the sample
moments.
We re-use a lot of the same equations from the previous section!
7 / 39
AR Estimation
Let’s first assume that we have a causal AR(p) model
φ(B)(xt − µ) = wt ,
where the white noise wt has variance σw2 ,
φ(B) = 1 − φ1 B − · · · − φp B p .
Given n observations x1 , x2 , . . . , xn , we are interested in estimating

the parameters φ1 , . . . , φp and σw2 . Initially we assume that the
order p is known.
8 / 39
AR Estimation
We’ll assume again without loss of generality (WLOG) that µ = 0.

Why?
E [xt ] = µ can always be estimated with the first sample moment x̄.
If µ 6= 0, then transform your data before estimating as follows:

x̃t = xt − x̄.
9 / 39
Yule-Walker Estimation for AR(p)
The method of moments works well when estimating causal AR(p)

models. We consider the causal AR(p) model
xt = φ1 xt−1 + · · · + φp xt−p + wt . (1)
For h = 1, . . . , p, multiply both sides of (1) by xt−h , and take

expectations:
γ(h) = φ1 γ(h − 1) + · · · + φp γ(h − p) (2)
When h = 0, we do the same thing and get
σw2 = γ(0) − φ1 γ(1) + · · · + φp γ(p) (3)
10 / 39
We call these the Yule-Walker equations
γ(h) = φ1 γ(h − 1) + φ2 γ(h − 2) + · · · + φp γ(h − p), h = 1, . . . , p

σw2 = γ(0) − φ1 γ(1) − φ2 γ(2) − · · · − φp γ(p).
We can also write them in matrix notation that should look

familiar:
Γp φ = γp σw2 = γ(0) − φ0 γp
where Γp = {γ(k − j)}pj,k=1 is a p × p matrix, φ = (φ1 , · · · , φp )0 is
a p × 1 vector, and γp = (γ(1), · · · , γ(p))0 is a p × 1 vector.
11 / 39
Using method of moments, put hat signs on everything, and then

solve for the desired parameters:
Γ p φ = γp σw2 = γ(0) − φ0 γp
yields
φ̂ = Γ̂−1
p γ̂p
and
σ̂w2 = γ̂(0) − γ̂p0 Γ̂−1

p γ̂p .
12 / 39

One more small move: divide through by γ̂(0) before solving, so
that we have formulas in terms of ACFs.
1 1
Γp φ = γp σw2 = γ(0) − φ0 γp
γ(0) γ(0)
gives us
φ̂ = R̂p−1 ρ̂p (4)

and
h i
σ̂w2 = γ̂(0) 1 − ρ̂0p R̂p−1 ρ̂p
where R̂p = {ρ̂(k − j)}pj,k=1 is a p × p matrix and

ρ̂p = (ρ̂(1), · · · , ρ̂(p))0 is a p × 1 vector.
13 / 39
The asymptotic behavior of the Yule-Walker estimators for causal

AR(p) processes is
√
d
→ N 0, σw2 Γ−1

n φ̂ − φ − p (5)
and
p
→ σw2 .
σ̂w2 −
14 / 39
The variance-covariance matrix for φ̂ is
σ 2 −1
Var (φ̂) = Γ
n p
σ2
= R −1 (6)
nγ̂(0) p
15 / 39
Simulation Example
Using the armasim() function, I simulated n = 1000 observations

from the following AR(2) process
xt = 1.5xt−1 − 0.75xt−2 + wt
where σw2 = 1. For the sample, γ̂(0) = 7.69697, ρ̂(1) = 0.8456375,
and ρ̂(2) = 0.5054795.
16 / 39
Simulation Example
The data had ρ̂(1) = .849, ρ̂(2) = .519 and γ̂(0) = 8.903
φ̂ = R̂p−1 ρ̂p
−1
1 .849 .849
=
.849 1 .519

1.463
=
−.723
Also,
h i
σ̂w2 = γ̂(0) 1 − ρ̂0p R̂p−1 ρ̂p
h i
= γ̂(0) 1 − ρ̂0p φ̂

1.463
= 8.903 ∗ (1 − .849 .519 ) = 1.187
−.723
17 / 39
Fish Population Example
In Unit 11, we looked at the ACF and PACF of the time series
from “recruit.dat”, which contains data on fish population in the
central Pacific Ocean. The numbers represent the number of new
fish for each month in the years 1950-1987.
18 / 39

Recruit Data
1.0
0.5
ACF
0.0
−0.5
0.5 1.0 1.5

LAG
1.0
0.5
PACF
0.0
−0.5
0.5 1.0 1.5 19 / 39

Let’s check the results of fitting an AR(2) model using

Yule-Walker estimation in R.
> rec.yw<-ar.yw(rec, order=2)

> rec.yw$x.mean
[1] 62.26278
> rec.yw$ar
[1] 1.3315874 -0.4445447
> sqrt(diag(rec.yw$asy.var.coef))
[1] 0.04222637 0.04222637
20 / 39

Recruit Data with 24 Month Predictions
100
80
60
40
20
0
1950 1960 1970 1980 1990
Time 21 / 39
rec.pred <- predict(rec.yw, n.ahead=24)

ts.plot(rec, rec.pred$pred, col=1:2)
lines(rec.pred$pred - rec.pred$s, col=4, lty=2)
lines(rec.pred$pred + rec.pred$s, col=4, lty=2)
22 / 39
Method of Moments Estimation for MA(q)
Consider an invertible MA(1) process xt = wt + θwt−1 , with

|θ| < 1. We know that
θ
ρ(1) = .
1 + θ2
Using method of moments, we equate ρ̂(1) to ρ(1) and solve a
quadratic equation in θ.
23 / 39
The invertible solution(s) is(are)

p
1 ± 1 − 4ρ̂(1)2
θ̂ = .
2ρ̂(1)
• If |ρ̂(1)| < 0.5, two solutions exist, so we pick the invertible

one.
• If ρ̂(1) = ±0.5, θ̂ = ±1. No invertible solution.
• If ρ̂(1) > 0.5, no real solutions exist: the method of moments
fails to yield an estimator of θ.
24 / 39
For higher order MA(q) models, the method of moments quickly

gets complicated. The equations are non-linear in θ1 , · · · , θq , so
numerical methods need to be used.
25 / 39
26 / 39
Maximum Likelihood Estimation
To illustrate the main concept with maximum likelihood

estimation, we consider the AR(1) model with nonzero mean
xt = µ + φ(xt−1 − µ) + wt , (7)
where |φ| < 1 and wt ∼ iid N(0,σw2 ).
27 / 39
We seek the likelihood
L(µ, φ, σw2 ) = fµ,φ,σw2 (x1 , x2 , . . . , xn ). (8)
The likelihood function (8) L(µ, φ, σw2 ) is functionally equivalent to

the joint probability distribution of the observed data x1 , x2 , . . . , xn .
28 / 39
For a given data set, think of the likelihood as a function of the

parameters (not the data). Since we’ve already observed the data
x1 , x2 , . . . , xn , we can find parameters (µ, φ, σw2 ) to maximize the
likelihood L(µ, φ, σw2 ). This is the basic idea behind maximum
likelihood estimation.
29 / 39
Likelihood Function
We will use the following:
L(µ, φ, σw2 ) = f (x1 , . . . , xt )

= f (x1 )f (x2 |x1 )f (x3 |x2 , x1 ) · · · f (xt |xt−1 , xt−2 , . . . , x1 )
= f (x1 )f (x2 |x1 )f (x3 |x2 ) · · · f (xt |xt−1 )
30 / 39
Likelihood Function
These are all the same:
xt = µ + φ(xt−1 − µ) + wt wt ∼ Normal(0, σw2 )
xt |xt−1 ∼ Normal(µ + φ(xt−1 − µ), σw2 )
and
n [x − µ − φ(x 2o
1 t t−1 − µ)]
fxt |xt−1 (xt |xt−1 ) = p exp − .
2πσw2 2σw2
31 / 39
Likelihood Function
We have
L(µ, φ, σw2 ) = fx1 (x1 ) × fx2 |x1 (x2 |x1 ) × · · · × fxn |xn−1 (xn |xn−1 )
= fx1 (x1 )(2πσw2 )−(n−1)/2 ×

n Pn [x − µ − φ(x 2o
t t−1 − µ)]
exp − t=2 .
2σw2
32 / 39
Likelihood Function: what is fx1 (x1 )?

In midterm 1, we assumed
σ2

x1 ∼ Normal µ,
1 − φ2
because that would allow all other time points to have the same
marginal distribution.
Here’s another rationalization: assume this model is causal

(|φ| < 1), and pretend you have an infinite history of data
(impossible in practice).
The causal representation x1 = µ + ∞ j

P
j=0 φ w1−j is. Take
expectations on both sides, and take the variance on both sides.
Since wt are iid normal, x1 is a normal with mean µ and variance

σw2 /(1 − φ2 ).
33 / 39
Likelihood Function
The likelihood function is

Pn
n − µ − φ(xt−1 − µ)]
t=2 [xt
L(µ, φ, σw2 ) = fx1 (x1 )(2πσw2 )−(n−1)/2 exp −
2σw2
n S(µ, φ) o
= (2πσw2 )−n/2 (1 − φ2 )1/2 exp − ,
2σw2
where
n
X
2 2
S(µ, φ) = (1 − φ )(x1 − µ) [xt − µ − φ(xt−1 − µ)]2 .
t=2
34 / 39
The Log-likelihood Function
It is worth pointing out that it is more common to consider the

log-likelihood
`(µ, φ, σ 2 ) = log L(µ, φ, σ 2 )

n S(µ, φ) o
2 −n/2 2 1/2
= log (2πσw ) (1 − φ ) exp −
2σw2
n 1 S(µ, φ)
= − log(2πσw2 ) + log(1 − φ2 ) −
2 2 2σw2
Numerically more stable, and the derivatives are easier to calculate.
35 / 39
The variance estimator
The variance estimator can be obtained after you get the other
estimators:

d 2 d n 2 1 2 S(µ, φ)
L(µ, φ, σw ) = − log(2πσw ) + log(1 − φ ) −
dσ 2 dσ 2 2 2 2σw2
2
n 1
= − 2 + S(µ, φ)
2σw σw2
Setting that equal to 0, replacing σw2 with σ̂w2 , and solving for σ̂w2
gives us
S(µ, φ)
σ̂w2 =
n
36 / 39
The variance estimator
The estimates for µ and σ 2 is more complicated. Taking the

derivative with respect to these, and setting the equations equal to
0 yields something that cannot be solved analytically. It’s usually
accomplished with a numerical procedure (e.g. Newton-Raphson or
Fisher scoring).
37 / 39
38 / 39
Properties of ML Estimators
39 / 39

Unit 13

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 13

Uploaded by

Copyright:

Available Formats

Method of Moments Estimation Maximum Likelihood Estimation Comparison of MOM and MLE

Unit 13: ARMA Estimation

Department of Statistics, University of Virginia

Readings for Unit 13

Textbook chapter 3.5 (pages 115 to 121).

1 Method of Moments Estimation.

In this unit, we explore a couple of ways to estimate the

1 Method of Moments Estimation

2 Maximum Likelihood Estimation

3 Comparison of MOM and MLE

Let’s start with the method of moments (MOM) estimation. The

We re-use a lot of the same equations from the previous section!

Let’s first assume that we have a causal AR(p) model

where the white noise wt has variance σw2 ,

Given n observations x1 , x2 , . . . , xn , we are interested in estimating

We’ll assume again without loss of generality (WLOG) that µ = 0.

If µ 6= 0, then transform your data before estimating as follows:

Yule-Walker Estimation for AR(p)

The method of moments works well when estimating causal AR(p)

xt = φ1 xt−1 + · · · + φp xt−p + wt . (1)

For h = 1, . . . , p, multiply both sides of (1) by xt−h , and take

γ(h) = φ1 γ(h − 1) + · · · + φp γ(h − p) (2)

When h = 0, we do the same thing and get

σw2 = γ(0) − φ1 γ(1) + · · · + φp γ(p) (3)

Yule-Walker Estimation for AR(p)

We call these the Yule-Walker equations

γ(h) = φ1 γ(h − 1) + φ2 γ(h − 2) + · · · + φp γ(h − p), h = 1, . . . , p

We can also write them in matrix notation that should look

Yule-Walker Estimation for AR(p)

Using method of moments, put hat signs on everything, and then

σ̂w2 = γ̂(0) − γ̂p0 Γ̂−1

Yule-Walker Estimation for AR(p)

φ̂ = R̂p−1 ρ̂p (4)

where R̂p = {ρ̂(k − j)}pj,k=1 is a p × p matrix and

Yule-Walker Estimation for AR(p)

The asymptotic behavior of the Yule-Walker estimators for causal

Yule-Walker Estimation for AR(p)

The variance-covariance matrix for φ̂ is

Using the armasim() function, I simulated n = 1000 observations

Fish Population Example

Fish Population Example

0.5 1.0 1.5

0.5 1.0 1.5 19 / 39

Fish Population Example

Let’s check the results of fitting an AR(2) model using

> rec.yw<-ar.yw(rec, order=2)

Fish Population Example

1950 1960 1970 1980 1990

Fish Population Example

rec.pred <- predict(rec.yw, n.ahead=24)

Method of Moments Estimation for MA(q)

Consider an invertible MA(1) process xt = wt + θwt−1 , with

Method of Moments Estimation for MA(q)

The invertible solution(s) is(are)

• If |ρ̂(1)| < 0.5, two solutions exist, so we pick the invertible

Method of Moments Estimation for MA(q)

For higher order MA(q) models, the method of moments quickly

1 Method of Moments Estimation

2 Maximum Likelihood Estimation

3 Comparison of MOM and MLE

Maximum Likelihood Estimation

To illustrate the main concept with maximum likelihood