Chapter 2 - Lecture Notes
Chapter 2 - Lecture Notes
µ(t) = E[Xt ], t ∈ T.
γ(t, s) = cov(Xt , Xs )
= E[(Xt − µX (t))(Xs − µX (s))]
for all t, s ∈ T.
The variance function is defined by
Thus µ(t), var(Xt ) and γ(t, s) are just real functions of t and (t, s) respectively.
EXAMPLE 1 . Consider the Gaussian process (Xt , t ∈ [0, 1]) of i.i.d. N(0, 1) random
variables Xt .
Its expectation and covariance functions are given by
µX (t) = 0 and
γ(t, s) = 1 if t = s; 0 otherwise.
1
DEFINITION 2 A time series is said to be strictly stationary if, for any n ∈ Z+ and
all integers h, (X1 , ..., Xn ) and (X1+h , · · · , Xn+h ), s ∈ Z} have the same distribu-
tions.
DEFINITION 3 Denote
Weak stationarity is also called stationarity directly since we are mainly inter-
ested in this type of stationarity. The relationship between strict stationarity and
weak stationarity is as follows. Strict stationarity and finiteness of second mo-
ment ensure weak stationarity. However, generally speaking, weak stationarity
does not imply strict stationarity. An exceptional case is that the strict stationarity
is the same as the weak stationarity when Xt is Gaussian process. Note that some
distributions may have infinite second moment such as Cauchy distribution.
REMARK 1 . Usually, we first have a look at whether either
µ(t) = c0 or var(Xt ) = c1 . (1.1)
If so, we need to look at whether the following holds
γ(t, s) = h(t − s). (1.2)
If (1.1) doesn’t hold, then we will conclude that Xt is not weakly stationary.
The verification of (1.2) is then not required.
EXAMPLE 2 Consider
Xt = β0 + β1 t + et ,
where et ’s are i.i.d. with mean zero and variance one, and β1 ̸= 0. Is yt stationary
?
Solution: It is not stationary because
EXt = β0 + β1 t,
dependent on time t.
EXAMPLE 3 Consider a random walk of the form
Xt = Xt−1 + et , t = 1, 2, . . . ,
where et is a sequence of i.i.d. random variables with E[et ] = 0 and E[et2 ] = 1.
Let X0 = 0.
2
◃ Justify whether Xt is stationary.
◃ Is Zt = Xt − Xt−1 stationary ?
cov(Xt , Xt+k )
ρk = √ = γk /γ0 k ∈ Z.
var(Xt )var(Xt+k )
◃ γ0 = var(Xt ); ρ0 = 1.
◃ γk = γ−k ; ρk = ρ−k .
Therefore, the ACFs are often plotted only for the nonnegative lags.
Cov(Zt , Zs ) = 0
EXAMPLE 5 We can build time series with white noise sequence. Suppose Zt ∼
W N(0, σ 2 ). Let
Xt = Zt + θZt−1 (MA(1)model)
Then
E(Xt ) = E(Zt + θZt−1 ) = EZt + θEZt−1 = 0.
3
and
4
MH4500 TIME SERIES ANALYSIS
Chapter 2 (part 2): Time series regression
The first step in the analysis of any time series is to plot the data. If there
are apparent discontinuities in the series, such as a sudden change of level, it
may be advisable to analyze the series by first breaking it into homogeneous
segments. If there are outlying observations, they should be studied carefully
to check whether there is any justification for discarding them. Inspection of a
graph may also suggest the possibility of representing the data as a realization of
the process,
Xt = Tt + St + et , (0.1)
where Tt is a slowly changing function known as a trend component, St is a
function with known period d referred to as a seasonal component, and et is a
random noise component. The error term et represents random fluctuations that
cause the Xt values to deviate from the average level EXt .
If the seasonal and noise fluctuations appear to increase with the level of the
process then a preliminary transformation of the data is often used to make the
transformed data compatible with model 0.1.
Our aim is to estimate and extract the deterministic components Tt and St in
the hope that the residual or noise component et will turn out to be a stationary
random process. We can then use the theory of such processes to find a satisfac-
tory probabilistic model for the process et , to analyze its properties, and to use
it in conjunction with Tt and St for purposes of prediction and control of Xt .
An alternative approach is to apply difference operators repeatedly to the data
set Xt until the differenced observations resemble a realization of some stationary
process Zt . We can then use the theory of stationary processes for the modelling,
analysis and prediction of Zt and hence of the original process.
The two approaches to trend and seasonality removal, (a) by estimation of Tt
and St in (0.1) and (b) by differencing the data {Xt } , will be discussed in some
detail.
1
DEFINITION 1 A trend model is
Xt = Tt + et
where xt is the time series in period t, Tt is the trend in time period t, et is the
error term in time period t.
1600
1400
1200
SP500 index
1000
800
600
400
200
0 500 1000 1500 2000 2500 3000 3500 4000
time (daily)
world population
3000
2500
population (million)
2000
1500
1000
500
0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
year
Tt = β0
2
world brith rate
2
birth rate
1.5
1
1950 1960 1970 1980 1990 2000 2010
year
The above three are the most commonly used. This depends on the fact that
many functions can be well approximated, on an interval of finite length, by a
polynomial of reasonably low degree. However, other more complicated trend
exist such as the p-th order polynomial trend.
Tt = a0 + a1 t + a2 t 2 (1.1)
This method needs to assume that the error term et satisfies the constant variance
and independence assumptions.
Method 2 (Differencing to generate stationary data). As an alternative, we now
attempt to eliminate the trend term by differencing.
3
DEFINITION 2 We define the first difference operator ∇ by
Similarly,
∇i Xt = ∇(∇i−1 Xt )
with ∇0 Xt = Xt .
4
DEFINITION 3 We say that the time series exhibits constant seasonal variation if
the magnitude of the seasonal swing does not depend on the level of the time
series.
We say that the time series exhibits varying (increasing) seasonal variation
if the magnitude of the seasonal swing depends on the level of the time series.
xtλ − 1
zt = , if λ > 0
λ
log(xt ).
In fact this is common way to make data display constant seasonal variation. In
addition, one may also take the square root transformation (λ = 21 ).
or {
1, if time period t is season k
Dk,t =
0, otherwise
5
For example, (normal) seasons Spring, Summer, Autumn, Winter can be de-
scribed by
seasons D1,t D2,t D3,t
Spring 1 0 0
Summer 0 1 0
Autumn 0 0 1
Winter 0 0 0
878, 1005, 1173, 883, 972, 1125, 1336, 988, 1020, 1146,
1400, 1006, 1108, 1288, 1570, 1174, 1227, 1468, 1736, 1283
.
1800
1600
no. of passengers
1400
1200
1000
800
0 2 4 6 8 10 12 14 16 18 20
time: month
yt = log(xt ).
The plot says that taking logarithms of data may equalize the seasonal variation
reasonably well.
Fit the following model
yt = Tt + St + et
6
7.6
log(no. of passengers)
7.4
7.2
6.8
6.6
0 2 4 6 8 10 12 14 16 18 20
time: month
where Tt = β0 + β1 t.
Using dummy variables for the seasonality gives the following regression
model
where {
1, if time period t is season k
Dk,t =
0, otherwise
Below we assume that the constant variance and independence regarding the
error et are satisfied.
7
The data are listed below
1 20 0 0 0 7.16
8
We have the following calculations
(X T X)−1 =
0.4250 −0.0187 −0.2562 −0.2375 −0.2187
−0.0187 0.0016 0.0047 0.0031 0.0016
−0.2562 0.0047 0.4141 0.2094 0.2047
−0.2375 0.0031 0.2094 0.4063 0.2031
−0.2187 0.0016 0.2047 0.2031 0.4016
141.29
1498.36
X Y =
T
34.71 .
35.43
36.33
(with n = 20 and np = 5)
The Durbin-Watson statistic is
∑n−1
(êt − êt−1 )2
DW = t=1∑n 2
= 0.8346.
t=1 êt
√
The standard deviation s ckk of the estimators are
9
Sum squares of prediction errors
n
∑
SSE = êt2 = 0.0101
t=1
Thus, we have
R2 = 1 − SSE/Syy = 0.9846.
The F statistics is
(Syy − SSE)/(np − 1)
F= = 240.0942.
SSE/(n − np )
(i) point Predictions: ŷt = Xt β̂: first quarter in year 61: Xt = (1, 21, 1, 0, 0),
10
7.6
log(no. of passengers)
7.4
Observations
7.2
6.8 predictions
6.6
0 5 10 15 20 25
time quarter
2000
1800
Observations
no. of passengers
1600
1400
1200
predictions
1000
800
0 5 10 15 20 25
time quarter
11
DEFINITION 7 The Durbin-Watson statistic is
∑n−1
(et − et−1 )2
DW = t=1∑n 2
,
t=1 et
60
50
40
30
20
10
0
0 2 4 6 8 10 12 14 16 18 20
time
20
10
predict errors
−10
−20
0 2 4 6 8 10 12 14 16 18 20
time
12
2.2 Modelling seasonality using trigonometric functions
Sometimes regression models involving trigonometric terms can be used to fore-
cast time series exhibiting either constant or increasing seasonal variation. Such
models are as follows.
13
where
14
7.6
observations
log(no. of passengers)
7.4
7.2
6.8 predictions
6.6
0 5 10 15 20 25
time quarter
1800
1600 observations
no. of passengers
1400
1200
1000 predictions
800
0 5 10 15 20 25
time quarter
Xt = Tt + St + et , (3.1)
where St+L = St and L is the length of seasonal period. This means that it belongs
to constant seasonal variation.
Method 2 (Differencing at lag d). The technique of differencing which we
applied earlier to non-seasonal data can be adapted to deal with seasonality of
period d by introducing the lag-L difference operator ∇L defined by
∇L Xt = Xt − Xt−L = (1 − BL )Xt .
(This operator should not be confused with the operator ∇L = (1 − B)L defined
earlier.)
Applying the operator ∇L to the model
Xt = Tt + St + et ,
∇L Xt = Tt − Tt−L + et − et−L ,
15
EXAMPLE 3 Consider a simple random walk of the form
Xt = Xt−1 + et , t = 1, 2, ..., T,
Tt = c0 + c1 t + c2 t 2 , t = 0, 1, ....
Xt = Tt + et , t = 0, 1, ...,
16