Stat 520 CH 7 Slides

Chapter 7: Parameter Estimation in Time Series Models
I In Chapter 6, we learned about how to specify our time series

model (decide which specific model to use).
I The general model we have considered is the ARIMA(p, d, q)
model.
I The simpler models like AR, MA, and ARMA are special cases
of this general ARIMA(p, d, q) model.
I Now assume we have chosen appropriate values of p, d, and q
(possibly based on evidence from the ACF, PACF, and/or
EACF plots).
I Assume that our observed time series data Y1 , . . . , Yn follow a
stationary ARMA(p, q) model.
I In the case of nonstationary original data, we can assume that
taking d differences has produced differenced data that
displays stationarity.
I We now must estimate the unknown parameters in that
stationary ARMA(p, q) model.
Hitchcock STAT 520: Forecasting and Time Series
Method of Moments Estimation
I One of the easiest methods of parameter estimation is the
method of moments (MOM).
I The basic idea is to find expressions for the sample moments
and for the population moments and equate them:
n
1X r
Xi = E (X r )
n
i=1
I The E (X r ) expression will be a function of one or more

unknown parameters.
I If there are, say, 2 unknown parameters, we would set up
MOM equations for r = 1, 2, and solve these 2 equations
simultaneously for the two unknown parameters.
I In the simplest case, if there is only 1 unknown parameter to
estimate, then we equate the sample mean to the true mean
of the process and solve for the unknown parameter.
MOM with AR models
I First, we consider autoregressive models.

I In the simplest case, the AR(1) model, given by
Yt = φYt−1 + et , the true lag-1 autocorrelation ρ1 = φ.
I For this type of model, a method-of-moments estimator would
simply equate the true lag-1 autocorrelation to the sample
lag-1 autocorrelation r1 .
I So our MOM estimator of the unknown parameter φ would be
φ̂ = r1 .

MOM with an AR(2) model
I In the AR(2) model, we have unknown parameters φ1 and φ2 .

I From the Yule-Walker equations,
ρ1 = φ1 + ρ1 φ2 and ρ2 = ρ1 φ1 + φ2
I In the method of moments, we will replace the true lag-1 and

lag-2 autocorrelations, ρ1 and ρ2 , by the sample
autocorrelations r1 and r2 , respectively.

MOM with an AR(2) model, continued
I That gives the equations
r1 = φ1 + r1 φ2 and r2 = r1 φ1 + φ2
which are then solved for φ1 and φ2 to obtain
r1 (1 − r2 ) r2 − r12
φ̂1 = and φ̂ 2 =
1 − r12 1 − r12
I The general AR(p) model is estimated in a similar way, with

the Yule-Walker equations being used to obtain the
Yule-Walker estimates φ̂1 , φ̂2 , . . . , φ̂p .

MOM with MA Models
I We run into problems when trying to using the method of

moments to estimate the parameters of moving average
models.
I Consider the simple MA(1) model, Yt = et − θet−1 .
I The true lag-1 autocorrelation in this model is
ρ1 = −θ/(1 + θ2 ).
I If we equate ρ1 to r1 , we get a quadratic equation in θ.
I If |r1 | < 0.5, then only one of the two real solutions satisfies
the invertibility condition |θ| < 1.
q
I That solution is θ̂ = −1 + 1 − 4r12 /(2r1 ).
I But if |r1 | = 0.5, no invertible solution exists, and if |r1 | > 0.5,
then no real solution at all exists, and the method of moments
fails to give any estimator of θ.

More MOM Problems with MA Models
I With higher-order MA(q) models, the set of equations for

estimating θ1 , . . . , θq is highly nonlinear and could only be
solved numerically.
I There would be many solutions, only one of which is invertible.
I In any case, for MA(q) models, the method of moments
usually produces poor estimates, so it is not recommended to
use MOM to estimate MA models.

MOM Estimation of Mixed ARMA Models
I Consider only the simplest mixed model, the ARMA(1, 1)

model.
I Since ρ2 /ρ1 = φ, a MOM estimator of φ is φ̂ = r2 /r1 .
I Then the equation
(1 − θφ̂)(φ̂ − θ)
r1 =
1 − 2θφ̂ + θ2
can be used to solve for an estimate of θ.
I This is a quadratic equation is θ, and so we again keep only
the invertible solution (if any exist) as our θ̂.

MOM Estimation of the Noise Variance
I We still must estimate the variance σe2 of our error

component.
I For any model, we first estimate the variance of the time
series process itself, γ0 = var (Yt ), by the sample variance
n
1 X
s2 = (Yt − Ȳ )2
n−1
i=1
I Then we can take advantage of known relationships among

the parameters in our specified model to obtain a formula for
σ̂e2 .

Formulas for MOM Noise Variance Estimators in Common
Models
I For AR(p) models, σ̂e2 = (1 − φ̂1 r1 − φ̂2 r2 − · · · − φ̂p rp )s 2 .

I For the AR(1) model, this reduces to σ̂e2 = (1 − r12 )s 2 .
I For MA(q) models,
s2
σ̂e2 = .
1 + θ̂12 + θ̂22 + · · · + θ̂q2
I For ARMA(1, 1) models,
1 − φ̂2
σ̂e2 = s 2.
1 − 2φ̂θ̂ + θ̂2

MOM Estimation in Some Simulated Time Series
I The course web page has R code to estimate the parameters

in several simulated AR, MA, and ARMA models.
I The estimates of the AR parameters are good, but the
estimates of the MA parameters are poor.
I In general, MOM estimators for models with MA terms are
inefficient.

MOM Estimation in Some Real Time Series (Hare data)
I On the course web page, we see some estimation of
parameters for real time series data.
I For the Canadian hare data, we employ a square-root
transformation and select an AR(2) model:
p p p
( Yt − µ) = φ1 ( Yt−1 − µ) + φ2 ( Yt−2 − µ) + et
I Note that because the mean√of the process is not zero, we

initially subtract off µ = E ( Yt ) throughout.
I Using the method of moments, we estimate the unknown
parameters µ, φ1 , and φ2 (see R example).
I The final estimated model is
p p p
( Yt −5.82) = 1.1178( Yt−1 −5.82)−0.519( Yt−2 −5.82)+et
with estimated noise variance 1.97.

MOM Estimation in Real Time Series (Oil price data)
I For the Oil price data, we select an MA(1) model for the
differences of the logged oil prices:
(∇ log Yt − µ) = et − θet−1
I We again subtract off µ = E (∇ log Yt ) throughout to account
for the fact that the real data may not have mean zero.
I Using the method of moments, we estimate the unknown
parameters µ and θ (see R example).
I The final estimated model is
(∇ log Yt − 0.004) = et + 0.222et−1
I Based on the standard error of the estimate of µ (see formula
on page 28), it could be argued that the value of 0.004 is not
significantly different from 0, so we could drop this 0.004 from
the final model.
Least Squares Estimation
I Since method-of-moments performs poorly for some models,

we examine another method of parameter estimation: Least
Squares.
I We first consider autoregressive models.
I We assume our time series is stationary (or that the time
series has been transformed so that the transformed data can
be modeled as stationary).
I To account for the possibility that the mean is nonzero, we
subtract µ from each observation and treat µ as a parameter
to be estimated.

LS Estimation for the AR(1) Model
I Consider the mean-centered AR(1) model:
Yt − µ = φ(Yt−1 − µ) + et
I The least squares method seeks the parameter values that

minimize the sum of squared differences:
n
X
Sc (φ, µ) = [(Yt − µ) − φ(Yt−1 − µ)]2
t=2
I This criterion is called the conditional sum-of-squares function

(CSS).

LS Estimation of µ for the AR(1) Model
I Taking the derivative of CSS with respect to µ, setting equal

to 0 and solving for µ, we obtain the LS estimator of µ:
n
X n
1 X
µ̂ = Yt − φ Yt−1
(n − 1)(1 − φ)
t=2 t=2
I For large n, this µ̂ ≈ Ȳ , regardless of the value of φ.

LS Estimation of φ for the AR(1) Model
I Taking the derivative of CSS with respect to φ, setting equal

to 0 and solving for φ, we obtain the LS estimator of φ:
Pn
(Yt − Ȳ )(Yt−1 − Ȳ )
φ̂ = t=2 Pn 2
t=2 (Yt−1 − Ȳ )
I This estimator is almost identical to r1 : it’s just missing one

term in the denominator, (Yn − Ȳ )2 .
I So, especially for large n, the LS and MOM estimators are
nearly identical in the AR(1) model.
I In the general AR(p) model, the LS estimators of µ and of
φ1 , . . . , φp are approximately equal to the MOM estimators,
especially for large samples.

LS Estimation for Moving Average Models
I Consider now the MA(1) model:
Yt = et − θet−1
I Recall that this can be written as
Yt = −θYt−1 − θ2 Yt−2 − θ3 Yt−3 − · · · + et .
I So a least squares estimator of θ can be obtained by finding

the value of θ that minimizes
X
Sc (θ) = [Yt + θYt−1 + θ2 Yt−2 + θ3 Yt−3 + · · · ]2
I But this is nonlinear in θ, and the infinite series causes

technical problems.

LS Estimation for Moving Average Models
I Instead, we proceed by conditioning on one previous value of
et . Note that
et = Yt + θet−1
I If we set e0 = 0, then we have the set of recursive equations
e1 = Y1 , e2 = Y2 + θe1 , . . . , en = Yn + θen−1 .
I Since we know Y1 , Y2 , . . . , Yn (these are the observed data
values) and can calculate the e1 , e2 , . . . , en recursively, the
only unknown quantity here is θ.
I We can do a numerical search for the value of θ (within P the
invertible range between −1 and 1) that minimizes (et )2 ,
conditional on e0 = 0.
I A similar approach works for higher-order MA(q) models,
except that we assume e0 = e−1 = · · · = e−q = 0 and the
numerical search is multidimensional, since we are estimating
θ1 , . . . , θ q .
LS Estimation for ARMA Models
I With the ARMA(1, 1) model:
Yt = φYt−1 + et − θet−1 ,
we note that
et = Yt − φYt−1 + θet−1
and minimize Sc (φ, θ) = nt=2 et2 ; note that the sum starts at
P
t = 2 to avoid having to choose an “initial” value Y0 .
I With the general ARMA(p, q) model, the procedure is similar,
except that we assume ep = ep−1 = · · · = ep+1−q = 0, and we
estimate φ1 , . . . , φp , θ1 , . . . , θq .
I For large samples, when the parameter sets yield invertible
models, the initial values for ep , ep−1 , . . . , ep+1−q have little
effect on the final parameter estimates.

Maximum Likelihood Estimation
I On the other hand, for small to moderate sample sizes (and

for stochastic seasonal models), assuming
ep = ep−1 = · · · = ep+1−q = 0 can greatly affect the final
parameter estimates, which is undesirable.
I In those cases, rather than using least squares, it may be
advantageous to use maximum likelihood (ML) estimation.
I An advantage of ML estimation is that it uses all of the
information in the data (not just the first few moments as in
MOM).
I Also, many large-sample results are known about the sampling
distribution of ML estimators.
I A disadvantage of ML estimation is that we must assume the
form of the joint probability distribution of the time series
process.

Maximum Likelihood in Time Series Models
I The likelihood function is the joint density function of the

data, but treated as a function of the unknown parameters,
given the observed data Y1 , . . . , Yn .
I For the models we have studied, The likelihood L is a function
of the φ’s, θ’s, µ, and σe2 , given the observed Y1 , . . . , Yn .
I The maximum likelihood estimates (MLEs) are the values of
the parameters that maximize this likelihood function, i.e.,
that are the “most likely” parameter values given the data we
actually observed.

Maximum Likelihood in the AR(1) Model
I In the AR(1) model with an unknown but constant mean, the

parameters we must estimate are φ, µ, and σe2 .
I To perform ML estimation in the AR(1) model, we must
assume a distribution for our data.
I The typical assumption is that the {et } in the AR(1) model
are iid N(0, σe2 ) random variables.
I Under this assumption, the likelihood function (details are
given on page 159) is:

2 2 −n/2 2 1/2 1
L(φ, µ, σe ) = (2πσe ) (1 − φ ) exp − 2 S(φ, µ)
2σe
where
S(φ, µ) = nt=2 [(Yt − µ) − φ(Yt−1 − µ)]2 + (1 − φ2 )(Y1 − µ).
P

MLE’s in the AR(1) Model
I This S(φ, µ) is called the unconditional sum-of-squares

function.
I We must find estimates φ̂, µ̂, and σ̂e2 that maximize the
likelihood function (in practice, we typically maximize the
log-likelihood function, which produces equivalent estimates).
I The estimator of the noise variance σe2 , in terms of the other
estimates, is
S(φ̂, µ̂)
σ̂e2 = .
n
I Note that dividing by n − 2 rather than n produces a less
biased estimator, but for large sample sizes, this makes little
practical difference.

MLE’s in the AR(1) Model
I We still need to estimate φ and µ.

I Comparing the unconditional sum-of-squares function to the
conditional sum-of-squares function we saw earlier, note that
S(φ, µ) = Sc (φ, µ) + (1 − φ2 )(Y1 − µ)2 , so for large sample
sizes, S(φ, µ) ≈ Sc (φ, µ).
I This implies that our ML estimates of φ and µ will be very
similar to the LS estimates, at least for large sample sizes.
I The likelihood function for general ARMA models is more
complicated, but ML estimates can usually be found in these
models.
I In practice, for AR models, MA models, or general ARMA or
ARIMA models, we can often find either the LS estimates or
the ML estimates easily using R.

Properties of the Estimators
I Recall that LS estimators and ML estimators become

approximately equal for large samples.
I So the large-sample properties of LS estimators and ML
estimators are identical for basic ARMA-type models.
I For large n, these estimators are approximately unbiased and
normally distributed.
I Note: For AR models, MOM estimators have identical
large-sample properties as LS and ML estimators.
I But for models with MA terms, MOM estimators have poor
performance and should not be used!
I For some common models, variance and correlation results for
the estimators are given on page 161.

Properties of the Estimators in AR(1) and MA(1) models
I For example, for the AR(1) model, var (φ̂) ≈ (1 − φ2 )/n, and
for the MA(1) model, var (θ̂) ≈ (1 − θ2 )/n.
I Clearly, the variance of the estimator decreases (i.e., the
precision improves) as n increases.
I For the AR(1) model, the variance of the estimator φ̂ will be
low when the true φ is near 1.
I For the MA(1) model, the variance of the estimator θ̂ will be
low when the true θ is near 1.

Parameter Estimation with Some Simulated Time Series
I See the course web page for R examples for parameter

estimation for two different simulated AR(1) series, each with
n = 60, using the MOM, LS, and ML methods.
estimation for a simulated AR(2) series, with n = 120, using
the MOM, LS, and ML methods.
estimation for a simulated ARMA(1,1) series, with n = 100,
using the LS and ML methods (why not MOM here?).
I For these sample sizes, the various methods perform similarly
in terms of their accuracy of estimation.
I With smaller sample sizes, the methods may produce more
different results.

Parameter Estimation with the Color Property Time Series
I For the color property time series, we had specified an AR(1)

model.
I The R examples show the estimation of φ using the MOM,
LS, and ML methods (note n = 35 here).
I From the ML estimate, the estimated AR(1) model would be
Yt = 0.57Yt−1 + et
where the noise variance is estimated to be 24.83.

I Since ρk = φk for an AR(1) process, we see that the
autocorrelations will be positive for any lag, but will die off as
the lag k increases.

Parameter Estimation with the Hare Abundance Time
Series
I For the Canadian hare abundance data, recall that we will
take the square root of the original abundance values.
I In the previous MOM example, we modeled the data with an
AR(2) model, but here we choose an AR(3) model, which
may be more appropriate based on the PACF.
I The R examples show the estimation of φ1 , φ2 , φ3 and µ (as
well as σe2 ) using the MOM, LS, and ML methods (note
n = 31 here).
I The final estimated model (from the ML estimates) is:
p p p
( Yt − 5.69) = 1.052( Yt−1 − 5.69) − 0.229( Yt−2 − 5.69)−
p
0.393( Yt−3 − 5.69) + et

Parameter Estimation with the Hare Abundance Time
Series (Continued)
I From the standard errors of the estimates, the lag-2

coefficient does not appear significantly different from zero.
I So we could optionally drop the lag-2 term and refit the AR
model with only the lag-1 and lag-3 terms.

Parameter Estimation with the Oil Price Time Series
I Our earlier analysis specified an MA(1) model for the

differences of the logged oil prices.
I The R example shows the estimation of θ using several
methods.
I Again, the method of moments is not recommended for the
MA(1) model.

Parameter Estimation with Other Time Series
I See the R examples on parameter estimation for several other

data sets:
I We estimate the parameters of an AR(2) model for the
recruitment data.
I We estimate the parameters of an MA(1) model for the
differenced logged varve data.
I Either an AR(1) model or an MA(2) model seems to fit the
differences of the logged GNP data well.

Large-sample Inference about the Model Parameters
I When the model parameters are estimated by the ML method,
then the ML estimators are approximately normally distributed
when n is large.
I So we can use normal-based inference to get, say, confidence
intervals for the true values of the parameters.
I For example, it may be of interest to know whether 0 is a
plausible value of some parameter.
I For large samples, a (1 − α)100% CI for a parameter takes the
form:
estimate ± (zα/2 )(estimated standard error)
I For example, in an AR(1) model, a 95% CI for φ is:
q
φ̂ ± 1.96 (1 − φ̂2 )/n
I For example, in an MA(1) model, a 90% CI for θ is:
q
θ̂ ± 1.645 (1 − θ̂2 )/n
Small-sample Inference about the Model Parameters
I The ML estimators are not necessarily approximately normally

distributed when n is small.
I So when n is small, we can use a more general approach,
bootstrap-based inference, to get confidence intervals for the
true values of the parameters.
I Section 7.6 gives details about bootstrap intervals.
I Some R examples give code for calculating 95% bootstrap CIs
for ARIMA-type model parameters using four different
methods; note that Method IV makes the fewest assumptions
about the error distribution.
I The bootstrap method also makes it possible to construct CIs
about relevant functions of the model parameters.

Stat 520 CH 7 Slides

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stat 520 CH 7 Slides

Uploaded by

Copyright:

Available Formats

Chapter 7: Parameter Estimation in Time Series Models

I In Chapter 6, we learned about how to specify our time series

I The E (X r ) expression will be a function of one or more

I First, we consider autoregressive models.

Hitchcock STAT 520: Forecasting and Time Series

I In the AR(2) model, we have unknown parameters φ1 and φ2 .

I In the method of moments, we will replace the true lag-1 and

Hitchcock STAT 520: Forecasting and Time Series

I That gives the equations

which are then solved for φ1 and φ2 to obtain

I The general AR(p) model is estimated in a similar way, with

Hitchcock STAT 520: Forecasting and Time Series

I We run into problems when trying to using the method of

Hitchcock STAT 520: Forecasting and Time Series

I With higher-order MA(q) models, the set of equations for

Hitchcock STAT 520: Forecasting and Time Series

I Consider only the simplest mixed model, the ARMA(1, 1)

Hitchcock STAT 520: Forecasting and Time Series

I We still must estimate the variance σe2 of our error

I Then we can take advantage of known relationships among

Hitchcock STAT 520: Forecasting and Time Series

I For AR(p) models, σ̂e2 = (1 − φ̂1 r1 − φ̂2 r2 − · · · − φ̂p rp )s 2 .

I For ARMA(1, 1) models,

Hitchcock STAT 520: Forecasting and Time Series

I The course web page has R code to estimate the parameters

Hitchcock STAT 520: Forecasting and Time Series

I Note that because the mean√of the process is not zero, we

with estimated noise variance 1.97.

I Since method-of-moments performs poorly for some models,

Hitchcock STAT 520: Forecasting and Time Series

I Consider the mean-centered AR(1) model:

I The least squares method seeks the parameter values that

I This criterion is called the conditional sum-of-squares function

Hitchcock STAT 520: Forecasting and Time Series

I Taking the derivative of CSS with respect to µ, setting equal

I For large n, this µ̂ ≈ Ȳ , regardless of the value of φ.

Hitchcock STAT 520: Forecasting and Time Series

I Taking the derivative of CSS with respect to φ, setting equal

I This estimator is almost identical to r1 : it’s just missing one

Hitchcock STAT 520: Forecasting and Time Series

I Consider now the MA(1) model:

I Recall that this can be written as

Yt = −θYt−1 − θ2 Yt−2 − θ3 Yt−3 − · · · + et .

I So a least squares estimator of θ can be obtained by finding

I But this is nonlinear in θ, and the infinite series causes

Hitchcock STAT 520: Forecasting and Time Series

I With the ARMA(1, 1) model:

Hitchcock STAT 520: Forecasting and Time Series

I On the other hand, for small to moderate sample sizes (and

Hitchcock STAT 520: Forecasting and Time Series

I The likelihood function is the joint density function of the

Hitchcock STAT 520: Forecasting and Time Series

I In the AR(1) model with an unknown but constant mean, the

Hitchcock STAT 520: Forecasting and Time Series

I This S(φ, µ) is called the unconditional sum-of-squares

Hitchcock STAT 520: Forecasting and Time Series

I We still need to estimate φ and µ.

Hitchcock STAT 520: Forecasting and Time Series

I Recall that LS estimators and ML estimators become

Hitchcock STAT 520: Forecasting and Time Series

Hitchcock STAT 520: Forecasting and Time Series

I See the course web page for R examples for parameter