ECMT3150: The Econometrics of Financial

1c. Linear Time Series Analysis

Simon Kwok
University of Sydney

Semester 1, 2022

1. Regression with Time Series Errors

1.1 When would a regression break down?
1.2 Spurious Regression
1.3 Cointegration
2. Unit Root Models
2.1 Stochastic vs Deterministic Trends
2.2 Unit Root Tests
3. Seasonal Models
4. Long Memory Models

Regression Model with Time Series Errors
Let fxt g and fyt g be two time series. Let’s say we run the regression
yt = xt0 β + εt . (1)
1 k k 1

The OLS estimate of β is

! 1 T
β̂ = ∑ xt xt0 ∑ xt yt . (2)
t =1 t =1

Suppose the conditional covariance of β̂ is estimated by

! 1
Cov ( β̂jx ) = σ̂ε ∑ xt xt
d 2 0
t =1

where σ̂2ε is the OLS variance estimator of the residuals fε̂t g, given by
σ̂2ε = T 1 k ∑T 2
t =1 ε̂t .
Q: Upon diagnostic checks, we may …nd that fε̂t g are heteroskedastic
d ( β̂jx ) consistent
and/or serially correlated. Is β̂ consistent for β, and Cov
for Cov ( β̂jx )?
A: It depends on the true data generating process (DGP).
Regression Model with Time Series Errors
Scenario 1: Let x = [x1 , x2 , . . . , xT ], a k T matrix of full row
rank. The true DGP is (1), where fxt g is a covariance stationary
process with E (kxt k2 ) < ∞ and E [xt xt0 ] being positive de…nite,
fεt g wn(0, σ2ε ), and the two processes fxt g and fεt g are
Consequence: β̂ is consistent, and Cov d ( β̂jx ) is consistent.
Sketch of proof: Substitute (1) into (2).
! 1
β̂ = β + ∑ xt xt0 ∑ x t εt .
t =1 t =1

Rewrite the equation as

! 1 !
1 1
β̂ β=
T ∑ xt xt0
T ∑ x t εt .
t =1 t =1

Since E (kxt k2 ) < ∞ and E (ε2t ) < ∞, which imply that E (kxt εt k) < ∞ by
Cauchy-Schwartz inequality, we apply the strong law of large numbers and obtain
1 1
∑ xt xt0 ∑ x t εt
a.s . a.s .
! E (xt xt0 ), ! E (xt εt ) = 0,
T t =1 T t =1
a.s .
as T ! ∞, so that β̂ β ! 0 as T ! ∞. 4 / 20
Regression Model with Time Series Errors
Conditional on x = [x1 , x2 , . . . , xT ], the covariance matrix of β̂ is
Cov ( β̂jx ) = E [( β̂ β)( β̂ β)0 ]
! 1 ! ! 1
= ∑ xt xt0 Cov ∑ x t εt x ∑ xt xt0 .
t =1 t =1 t =1

Now let us compute

! " ! ! #
Cov ∑ x t εt x = E ∑ x t εt ∑ xt0 εt x
t =1 t =1 t =1
" #
= E ∑ xt xt0 ε2t + ∑ ∑ (xs xt0 + xt xs0 )εs εt x
t =1 s =1 t =s +1
= ∑ xt xt0 E (ε2t jx ) + ∑ ∑ (xs xt0 + xt xs0 )E (εs εt jx ).
t =1 s =1 t =s +1

Since fεt g wn (0, σ2ε ) and E (xt εt ) = 0, we have E (ε2t jx ) = E (ε2t ) = σ2ε and
E (εs εt jx ) = E (εs εt ) = 0 for all s 6= t, and so Cov ∑T
t =1 x t ε t x = σ2ε ∑Tt=1 xt xt0 . It
follows that Cov ( β̂jx ) = σ2ε ∑T 0
t =1 x t x t .
a.s .
Since E (ε2t ) < ∞, we have, by the strong law of large numbers, σ̂2ε ! σ2ε , and hence,
! 1 ! 1
∑ xt xt0 ∑ xt xt0
d ( β̂jx ) = σ̂2ε a.s .
Cov ! σ2ε = Cov ( β̂jx ) as T ! ∞.
t =1 t =1
Regression Model with Time Series Errors

Scenario 2: The true DGP is (1) with serially uncorrelated but

heteroskedastic εt .
d ( β̂jx ) is inconsistent.
Consequence: β̂ is consistent, but Cov

Solution: Use the White (1980) heteroskedasticity consistent (HC)

estimator for Cov ( β̂jx ):
! 1 ! ! 1
d ( β̂jx )HC =
Cov ∑ xt xt0 ∑ ε̂2t xt xt0 ∑ xt xt0 .
t =1 t =1 t =1

Regression Model with Time Series Errors
Scenario 3: The true DGP is (1) with serially correlated and
heteroskedastic fεt g.
d ( β̂jx ) and Cov
Consequence: β̂ is consistent, but Cov d ( β̂jx )HC are

Solution: Use the Newey and West (1987) heteroskedasticity and

autocorrelation consistent (HAC) estimator:
! 1 ! 1
d ( β̂jx )HAC
Cov = ∑ xt xt0 ĈHAC ∑ xt xt0 ,
t =1 t =1
T ` T
ĈHAC = ∑ ε̂2t xt xt0 + ∑ wj ∑ ε̂t ε̂t 0
j (xt xt j + xt j xt0 ).
t =1 j =1 t =j +1

One needs to pick the k parameter ` and the weights

j truncation
T 2/9
fwj gj`=1 (e.g., ` = 4 100 and wj = 1 `+j 1 ).

Regression Model with Time Series Errors
Scenario 4: The true DGP contains lagged values of yt , and fεt g
is serially correlated. E.g.,

yt = βyt + εt ,
εt = θεt 1 + ut ,

where fut g wn(0, σ2u ).

Consequence: β̂ is inconsistent.
To see this, …rst note that this model is the same as regression (1) with one regressor
xt = yt 1 and an εt with fεt g AR (1 ). Now, let us compute

E (x t εt ) = E [y t 1 εt ]
= E [y t 1 ( θεt 1 + u t )]
= θE (yt 1 εt 1 + E [y t
) 1 u t ].

However, E (yt 1 εt 1 ) = E [( βyt 2 + εt 1 )εt 1 ] 6= 0, and E [yt 1 u t ] = 0, so that

E (xt εt ) 6= 0. As a result, the proof of the consistency of β̂ under scenario 1 breaks

Solution: Use MLE instead of OLS.

Unit Root Nonstationarity
A process is nonstationary if some of its unconditional moments
vary with time. Some examples are:
I random walk (unit root / stochastic trend process):
yt = yt 1 + εt , where fεt g wn(0, σ2ε ).
I random walk with drift: yt = c + yt 1 + εt , where
fεt g wn(0, σ2ε ).
I trend-stationary time series: yt = a + bt + ut , where ut is
I ARIMA(p, d, q): f∆d ut g ARMA(p, q ), where ∆d yt is the
d th order di¤erence of yt .1 i.e.,
[1 φ(L)] (∆d ut ) = [1 + θ (L)] εt , where φ( ) and θ ( ) are
polynomial function of orders p and q, and fεt g wn(0, σ2ε ).
1 For any positive integer d 1, the d th order di¤erence is de…ned as
∆d yt = (1 L )d yt , where L is the lag operator. E.g.,
∆1 yt = ∆yt = (1 L )yt = yt yt 1 , and
∆2 yt = (1 L )2 yt = yt 2yt 1 + yt 2 .
Random Walk (RW)
For t 1, yt = yt 1 + εt , where fεt g wn(0, σ2ε ). y0 = initial
Write yt in terms of the noises: yt = y0 + εt + εt 1 + ε1 .
Interpretation: any previous shock εt j has a permanent e¤ect on
yt .
Conditional on F0 , the mean is E (yt jF0 ) = y0 and the variance is
Var (yt jF0 ) = tσ2ε . The variance grows linearly with time.

Ex: Show that ŷt (`) = E [yt +` jFt ] = yt for all ` > 0.
This shows that fyt g is a martingale. Interpretation: the best
point forecast of a RW is given by its current value.

Ex: Show that the forecast error êt (`) = yt +` ŷt (`) has variance
`σ2ε , which diverges as ` ! ∞ (hopeless to forecast RW in the
distant future).

Ex: Show that the ACF is ρj = 1 for all integers j (long memory).
Random Walk with Drift

For t 1, yt = c + yt 1 + εt , where fεt g wn(0, σ2ε ). y0 =

initial value
Write yt in terms of the noises:

yt = y0 + ct
|{z} + εt + εt 1 + ε1 .
|{z} | {z }
initial point deterministic trend stochastic trend

E (yt jF0 ) = y0 + ct (so c = average rate of change in yt over

Conditional on F0 , y0 + ct is deterministic, so Var (yt jF0 ) = tσ2ε
(same as RW without drift).

Ex: Show that ŷt (`) = E [yt +` jFt ] = c ` + yt for all ` > 0.

Ex: Show that Var [êt (`)] = `σ2ε .

Trend-Stationary Time Series

For t 1, yt = a + bt + ut , where fut g is stationary (e.g., an

ARMA model) with mean zero and variance σ2u . fyt g has a
deterministic linear trend a + bt but no stochastic trend.
E (yt ) = a + bt (so b = average rate of change in yt over time).
As a + bt is deterministic, Var (yt ) = σ2u , which is time-invariant if
it exists.

Ex: Show that ŷt (`) = E [yt +` jFt ] = a + b (t + `) for all ` > 0.

Ex: Show that Var [êt (`)] = σ2u .

Dickey-Fuller (DF) Test
Consider the regression

yt = φ1 yt 1 + εt .

We want to test: H0 : φ1 = 1 vs Ha : φ1 < 1.

t =1 y t y t 1
Run the regression, get the OLS estimate φ̂1 = , and
∑T 2
t =1 y t 1
obtain the residual variance σ̂2ε = 1
T 1 ∑ (y φ̂1 yt 1 )2 .
rt =1 t
The standard error of φ̂1 is s.e.(φ̂1 ) = .
t =1 y t

The DF test statistic is the t ratio of φ̂1 under H0 :

φ̂1 1 ∑Tt=1 εt yt 1
DF = = q .
s.e.(φ̂1 ) σ̂ε ∑Tt=1 yt2 1

Under H0 , DF converges to a nonstandard distribution (function of

standard Brownian motion) as T ! ∞ (need to use simulation to
get critical value).
Dickey-Fuller Test

∑Tt=1 εt yt 1
DF = q .
σ̂ε ∑Tt=1 yt2 1
Q: What is the asymptotic distribution of DF under H0 ?
Sketch of proof: Let W (t ) be the standard Brownian motion (in continouous time).
As T ! ∞, by the strong law of large numbers,
I σ̂2ε a.s .
! σ2ε .
Also, applying the functional central limit theorem,

I d σ2ε
T ∑T
t =1 y t 1 εt ! 2[W (1 )2 1 ],
I 1
∑T 2
t =1 y t 1
d 1
! σ2ε 0 W (s )2 ds .

Combining the limits using Slutsky’s theorem, we obtain

1 2
d 2 [W (1 ) 1]
DF ! q R1 .
0 W (s ) ds

Augmented Dickey-Fuller (ADF) Test
We augment the regression model with a deterministic intercept ct
and p 1 lagged di¤erenced series ∆yt 1 , . . . , ∆yt p +1
p 1
yt = ct + βyt 1+ ∑ φi ∆yt i + εt .
i =1

We want to test: H0 : β = 1 vs Ha : β < 1.

The ADF test statistic is the t ratio of β̂ (OLS estimate of β):
β̂ 1
ADF = .
s.e.( β̂)
Under H0 , ADF converges to a di¤erent nonstandard distribution
as T ! ∞ (need to use simulation to get critical value).
Equivalently, we may run the error-correction regression of ∆yt :
p 1
∆yt = ct + βc yt 1 + ∑ φi ∆yt i + εt .
i =1

Note that βc = β 1.
Spurious Regression
Suppose fyt g and fxt g contain a unit root.

Q: After running the regression (1), we may detect a unit root in

the residuals (e.g., as revealed by ADF test), and …nd a
statistically signi…cant β̂. Is the inference on β̂ reliable?

A: β̂ can be spuriously signi…cant, and R 2 spuriously high. This is

known as spurious regression.
I Take the …rst-order di¤erence of fyt g and fxt g, and run the regression
∆yt = α + β∆xt + εt . Check for serial correlations of the residuals by looking at
theirpACF. Add lags of ∆yt , ∆xt and εt if necessary. The OLS estimates α̂ and β̂
are T -consistent and asymptotically normal as T ! ∞.
I Add lagged values of yt and xt to regression (1). However, the asymptotic
distributions of α̂ and β̂ are non-standard.
I Apply Cochrane-Orcutt adjustment.

Suppose fyt g and fxt g contain a unit root.

Q: After running regression (1), we …nd that the residuals are

stationary. What does β̂ represent?

A: In this case, fyt g and fxt g are cointegrated, We say that the
f(yt , xt )g pair displays a cointegrating relationship given by (1)
with cointegrating vector (1, β). As for inference, the OLS
estimate β̂ is super-consistent (T -consistent), and the asymptotic
distribution is non-standard.

Seasonal Models
Time series may exhibit cyclical patterns (e.g., weekly pattern for
daily series, monthly pattern for weekly series).

Q: Suppose fyt g has a cyclical pattern of periodicity s. How to

carry out analysis?

A: If fyt g is stationary, we may remove the cyclicity by applying

the seasonal adjustment:

∆s yt = (1 Ls )yt
= yt yt s.

If fyt g has a unit root, we may apply the seasonal adjustment to

the …rst-di¤erenced series:

∆s (∆yt ) = (1 Ls )(1 L)yt = (yt yt 1) (yt s yt s 1)

= yt yt 1 yt s + yt s 1.

Seasonal Models

Multiplicative season model:

wt : = ( 1 Ls )(1 L)yt = (1 θL)(1 λLs )ut (3)

Ex: What is the ACF of fwt g?

If λ = 1, then the seasonal factor (1 Ls ) appears on both sides

of (3). This suggests that the seasonal pattern is deterministic.
Exact-likelihood estimation can reveal this and is recommended.

Long Memory / Fractionally di¤erenced Model
Let fεt g wn(0, σ2ε ), and suppose that φ( ) and θ ( ) are
polynomial functions of orders p and q. We say that yt follows an
autoregressive fractionally integrated moving-average model,
ARFIMA(p, d, q ), if

[1 φ(L)]∆d yt := [1 + θ (L)]εt .

I d 2 ( 0.5, 0) : long-range negative dependence, with ACF

ρj j 2d 1 (hyperbolic decay) as j ! ∞.2
jj j
I d = 0 : e.g., for AR (1), ρj = φ1 (exponential decay).
I d 2 (0, 0.5) : long-range positive dependence, ρj j 2d 1

(hyperbolic decay) as j ! ∞.
I d 2 [0.5, 1) : mean-reverting, non-stationary process.
I d = 1 : martingale, unit root process, ρj 1 for all integers j.

2“ ” means “is proportional to”.

