Applying this operator to the Xt sequence, we obtain
φ(B)Xt = Xt − φ1 Xt−1 − · · · − φp Xt−p = Xt − φi Xt−i .
This equation is often used in time series analysis to describe the dynamic dependence
of Xt on its past values.
The equation
φ(B)Xt = c, (1)
where c is a constant, is called a “difference equation” of order p. If c = 0, the
equation is homogeneous. The variable Xt , which satisfies the difference equation
in (1), is a solution of that equation. In practice, different φ(B) can give rise to
different dynamic behavior of Xt . We shall use such a difference equation to describe
the dynamic pattern of a linear time series.
4. First-order Difference Equations
A difference equation is a deterministic relationship between the current value Xt
and its past values Xt−i with i > 0. In some cases, it may also contain the current
and past values of a “forcing” or “driving” variable. A first order difference equation
involves only one lagged variable:
Xt = φXt−1 + bat + c,
The term γφt is included because this is the only function such that (1 − φB)ft = 0,
where γ is a unknown parameter whose value is determined by some initial condition.
Note that γφt is the solution to the homogeneous equation (1 − φB)Xt = 0. As in the
theory of differential equations, the solution of a difference equation consists of the
solution to the homogeneous part plus a particular solution to the inhomogeneous
If |φ| < 1 and at is bounded with mean zero, then E(Xt ) approaches c/(1 − φ) from
any starting point.
Obviously, the value of φ determines the qualitative behavior of the equation Xt =
φt X0 . For the first-order equation, there are three types of solution. If 0 < φ < 1,
then Xt dampes smoothly to zero. If −1 < φ < 0, the Xt dampes oscillatorily to
zero. If |φ| > 1, then Xt is explosive. [The case of |φ| = 1 is obvious.]
Consider the general second-order difference equation:
(1 − φ1 B − φ2 B 2 )Xt = δ + bat .
Solutions of this equation can be computed by factoring the backshift polynomial as,
(1 − φ1 B − φ2 B 2 ) = (1 − λ1 B)(1 − λ2 B). Considering just the homogeneous equation,
we find a solution of the form
Xt = c1 (λ1 )t + c2 (λ2 )t ,
where c1 and c2 are unknown parameters depending on the initial conditions. Note
that 1/λ1 , 1/λ2 are the zeros of the polynomial 1 − φ1 B − φ2 B 2 . If the homogeneous
solution is to remain bounded, we would require |λi | < 1 for i = 1, 2, or equivalently
that the zeros of the polynomial 1 − φ1 B − φ2 B 2 lie outside the unit circle (modulus
> 1). [Note: Zeros of 1 − φ1 B − φ2 B 2 are roots of the equation 1 − φ1 B − φ2 B 2 = 0.]
For a second-order equation, we have three possibilities: (a) distinct real roots, (b)
equal real roots, and (c) complex roots. The quadratic formula gives the roots as
φ1 ± φ21 + 4φ2
λi = .
We have complex roots if φ21 + 4φ2 < 0. The roots are a ± bi where i is −1. Note
that the complex roots come in a conjugate pair.
If we write the roots using polar form, we can see how oscillatory solutions are possible.
a ± bi = r(cos θ ± i sin θ)
√ √ √
where r = a2 + b2 and cos θ = a/r = φ1 /(2 −φ2 ), or θ = cos−1 (φ1 /(2 −φ2 ).
Using DeMoive’s formula, namely cos θ + i sin θ = eiθ , we can write
Xt = c1 (reiθ )t + c2 (re−iθ )t
= rt (c1 eitθ + c2 e−itθ )
= rt [(c1 + c2 ) cos(tθ) + i(c1 − c2 ) sin(tθ)].
Equal roots case:
(1 − λB)2 Xt = 0
The solution of which is
Xt = c1 λt + c2 tλt .
6. General Case:
The above results can readily be extended to the general higher-order difference equa-
7. Stochastic Difference Equations: When the “forcing” factor is stochastic, we have
a general difference equation. In particular, the case in which the forcing variable
is a sequence of independent and identically distributed normal random variables
{at } plays an important role in time series analysis. Here the solution Xt is usually
correlated and follows certain statistical distribution.
8. Let at and Xt be input and output at time t, respectively. Consider the linear system
Xt = ψ0 at + ψ1 at−1 + ψ2 at−2 + · · · = ψ(B)at ,
where ψ(B) = ψ0 + ψ1 B + ψ2 B 2 + · · · and ψ0 = 1. Consider the relationship between
ψi and coefficients π(B) discussed earlier.
In other words, Xt is strictly stationary if (a) the distribution of Xt and Xs are the same
for all t and s, (b) the joint distribution of (Xt , Xt+s ) is the same as that of (Xt+r , Xt+r+s )
for all r and s, (c) the joint distribution of (Xt , Xt+s , Xt+s+u ) is identical to that of
(Xt+r , Xt+r+s , Xt+r+s+u ) for all r, s and u, and so on.
In practice, we often relax the requirement of stationarity by considering only the weak
stationarity. A time series Xt is weakly stationary if
E(Xt ) = µ, a constant
Cov(Xt , Xt+k ) = γk , a function depending only on k.
In other words, Xt is weakly stationary if its first two moments are time invariant. Of
course, here we assume that the first two moments of Xt exist. Some people refer to weakly
stationary processes as covariance stationary processes.
Clearly, strict stationarity implies weak stationarity provided that the first two moments
of the series exist. On the other hand, a weakly stationary series may not be strictly
In many applications, we assume that the time series Xt is Gaussian, that is, jointly
normal. This is mainly for statistical convenience. Since the distribution of a normal
distribution is determined by its first two moments, weak stationarity is equivalent to
strict stationarity for a Gaussian time series.
We shall discuss non-Gaussian time series later.
· · · γn−2 γn−1
γ0 γ1 γ2
γ1 γ0 γ1 · · · γn−3 γn−2
Σ= .. .. .. .. ..
. . . . .
γn−1 γn−2 γn−3 · · · γ1 γ0
where γk is the lag-k autocovariance function of Xt . Note that it is easy to see that (i)
ρ0 = 1, (ii) |ρ` | < 1, and (iii) ρ` = ρ−` .
Ergodicity: Since we often have a single realization from the time series under study, we
must estimate the parameters of a particular time series model using observations of this
realization. The basic reason that we can do so is the theory of ergodicity. This is another
time invariant property we shall use. A simple way to discuss ergodicity is as follows:
Consider the random variable Xt . The traditional way to estimate the mean of this
random variable is to have a random sample of m observations drawn from the distribution
of Xt . Denote the sample by Xt,1 , · · · , Xt,m . Then, the mean µ of Xt is estimated by
1 X
µ̃ = Xt,i .
m i=1
where X̄ = n1 nt=1 Xt . If a time series satisfies the requirement that the “time” averages
Note that the method of moments in time series analysis depends on ergodicity. What
is the method of moments?
Linear time series and Wold decomposition: The simplest time series is a sequence of
iid N (0, σ 2 ):
· · · , a−2 , a−1 , a0 , a1 , a2 , · · · or simply {at }∞
t=−∞ .
If {at } are iid, but not Gaussian, then we have a strictly stationary series.
where {at } is an iid sequence. The requirement that at are iid is rather strong. In prac-
tice, Xt may contain some deterministic trend component or is subjected to the effect of
exogenerous variables. We shall discuss this definition further when we introduce nonlinear
time series later.
White Noise. A white noise series {at } is defined as follows: (1) E(at ) = 0 for all t, (2)
E(a2t ) = σ 2 for all t, and (3) Vov(at as ) = 0 for t 6= s. That is, a white noise series is a
sequence of uncorrelated random variables with mean zero and variance σ 2 . Note that a
white noise series is serially uncorrelated, but not necessarily serially independent.
If we want to generate a series which is non-independent, we can take linear combina-
tions of white noise terms:
Xt = at + ψ1 at−1 + ψ2 at−2 + · · · = (1 + ψ1 B + ψ2 B 2 + · · ·)at .
This is a one-sided linear filter of the at series; we average the current and past values of
the at ’s to generate the observations Xt ’s. A process generated in this way is called a linear
process or more specifically a moving average process.
Note that the Xt process considered above has zero mean; we can simply add a constant
term µ to the right hand side to Xt so that it has a non-zero mean.
If Xt is weakly stationary, we require that its variance exist which in turn requires that
ψi2 < ∞.
where ψ0 = 1.
This generating function just serves to store the sequence of autocovariances in a convenient
form with the device that the k-th coefficient is γk . Generating functions of this sort are
useful in many ways, for instance, they are convenient in book-keeping. By substituting
the generating function in the formula for the autocovariance, we can obtain
Γ(z) = σ 2 ψ(z)ψ(z −1 ).
All linear time series models have an infinite moving average representation. That is,
any linear time series model can be written in the form of a moving average model of order
infinity. The only difference between different models is the different restrictions on the ψ
weights. The general linear model considered here has even greater applicability than one
might think. We shall gradually see the flexibility of the model in this course. Here we
simply rely on a very important theorem due to Wold which states that any stationary
process can be decomposed into two parts:
Yt = Dt + Xt
For those who are interested in a formal proof of Wold decomposition, see Brockwell
and Davis (1991, page 187).
2. Quarterly U.S. GDP in billions dollars, seasonally adjusted annual rate. 1947-2008.
3. Monthly Producer Price Index: all commodities, index 1982=100, not seasonally
adjusted, 1921-2008.
5. Monthly U.S. Unemployment rate: civilian 16 years and older; seasonally adjusted,
6. Monthly U.S. Total Nonfarm Payrolls: All employees, seasonally adjusted, 1939-2008.
Daily US−EU exchange rate: 1999.1−2008.9
••••• •••
•• •• •
••• •
• ••• ••
••••••••••••••••• ••
• •
•• •
••••• •••
• ••
•• •
••• •• • •
••••••• ••••• •• ••• ••• •
•• ••••••••••••••••••
• ••••••••••••••••••••
• • •
••• • •••• • • •
• •
• •••• • •
• •
•••••• •• • ••• ••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••• ••••••••••••• • ••••
• •••••••••••••••••••••••••••••••
•• ••••••••••••••••
• •••••• ••••
•••• •••••• •••••••••• •••••
• ••••
• •
••• •••
• •
••• • •
• •••••••••••
•• •
•••••• • •• ••
••••• ••••••••• •••••••••••••••• ••••
•• •• •
••••••••••••••• • ••
•••••• •• •••••••• ••
• ••••••••••• •• ••• •
••••••• ••••••••••
•••••••• ••• •••••••••••••••••••••
•• ••••••• ••••••• • ••
•• •••• •••••••••••• •••••••• ••
••••• ••••• • ••••••••• ••••••••••••••• •••••
• •• •••• ••••••• ••• •••••••••••••••••
••••••••• ••••••••• •••
••••••• •••••
Quarterly US GDP: 1947.I−2008.II, seasonally adjusted
• •••••
Monthly US Producer Price Index: 1921−2008
• •••
• • ••
•••••••••••• •• •
•• ••••••••••• ••••
•• ••
•• •
••••••••• •••••
Monthly US M2 Money Stock: 1959−2008 (NS)
Monthly US unemployment rate: 1948−2008
•• ••
• •
• •
•• •• •
••• • •
• ••
• ••• • •
••••••• • • •••
•••• • •• ••••••••• ••••
unemp. rate
• • •••••
• ••• • •• ••• •••••••••
•• ••• • ••••• • ••••
•••• •• • •• •• • •• ••••• ••••
• • ••••
•••• • • •• •• •• • •
•• • • •• • • •• • •• •••
••• •
• •• • •• •••••• • •••••••••••• ••• •••••••••••
• •
• • • • •
• •
•• • • • • • •••
••• • • • ••••••••• ••••• •
• •••• • •
••••••• •••• •••• •
• ••••• ••••••••• •• •• •• •••••••••••• •• •••• • ••••• •
•• •• • •• •••• •
• • •• •••••••• ••••• •• •••••••••
•• ••• •• ••• •••• •
• •• • • •
••• •• •••
• ••
• ••• •••
••••• •••••
• ••
• • • • •• • • • •••••• •• •••••••
• ••• ••••••••••• ••
• •••••••••• •••• •••
•••• • •
••••••••• ••• •••• • • • •
•••• • • ••••••••••••••• • •
• • •• •••
•• ••• • • •••••
•• •
•••••••• •
Monthly US Nonfarm Payrolls: all employees
80000 100000 120000 140000
•• •
•• ••••••••••
•• •••
• ••
••••• •••••
• •
•••• ••• •••
•••••••• •••••
• •••••••••••
••••••••••• ••
••• •••••