Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

TSNotes 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Lecture Notes (2)

Univariate time series modelling and


forecasting

Introduction: Univariate Time Series Models

• Univariate time series models are class of specifications where one attempts to
model and to predict financial variables using only information contained in
their own past values and possibly current and past values of an error term.

• This practice can be contrasted with structural models, which are multivariate in
nature, and attempt to explain changes in a variable by reference to the
movements in the current or past values of others (explanatory) variables. Time
series models may be useful when a structural model is inappropriate.

• An important class of time series models is the family of AutoRegressive


Integrated Moving Average (ARIMA) models, usually associated with Box and
Jenkins (1976).

• In order to define, estimate and use ARIMA models, we first need to specify the
notation and to define several important concepts.

2
Some Notation and Concepts

• A Strictly Stationary Process


A strictly stationary process is one where

i.e. the probability measure for the sequence {yt} is the same as that for {yt+m}
 m.

• A Weakly Stationary Process


If a series satisfies the next three equations, it is said to be weakly or covariance
stationary
1. E(yt) =  , t = 1,2,...,
2.
3.  t1 , t2

Some Notation and Concepts(cont’d)

• So if the process is covariance stationary, all the variances are the same and all
the covariances depend on the difference between t1 and t2. The moments
, s = 0,1,2, ...
are known as the covariance function.
• The covariances, s, are known as autocovariances since they are the
covariances of y with its own previous values.
• However, the value of the autocovariances depend on the units of measurement
of yt. It is thus more convenient to use the autocorrelations which are the
autocovariances normalised by dividing by the variance:
, s = 0, 1, 2, .....

• If we plot s against s=0,1,2,... then we obtain the autocorrelation function


(ACF) or correlogram.

4
Correlogram for UK HP Data

A White Noise Process

• A white noise process is one with (virtually) no discernible structure. A


definition of a white noise process is

• Thus the autocorrelation function will be zero apart from a single peak of 1
at s = 0.  approximately N(0, 1/T) where T = sample size.

• We can use this to do significance tests for the autocorrelation coefficients


by constructing a confidence interval.

• For example, a 95% confidence interval would be given by .


If the sample autocorrelation coefficient, , falls outside this region for
any value of s, then we reject the null hypothesis that the true value of the
coefficient at lag s is zero.

6
Joint Hypothesis Tests

• We can also test the joint hypothesis that all m of the k correlation coefficients
are simultaneously equal to zero using the Q-statistic developed by Box and
Pierce:

where T = sample size, m = maximum lag length


• The Q-statistic is asymptotically distributed as a .

• However, the Box Pierce test has poor small sample properties, so a variant
has been developed, called the Ljung-Box statistic:

• This statistic is very useful as a portmanteau (general) test of linear dependence


in time series.

An ACF Example

• Question:
Suppose that a researcher had estimated the first 5 autocorrelation coefficients
using a series of length 100 observations, and found them to be (from 1 to 5):
0.207, -0.013, 0.086, 0.005, -0.022.
Test each of the individual coefficient for significance, and use both the Box-
Pierce and Ljung-Box tests to establish whether they are jointly significant.

• Solution:
A coefficient would be significant if it lies outside (-0.196, +0.196) at the 5%
level, so only the first autocorrelation coefficient is significant.
Q=5.09 and Q*=5.26
Compared with a tabulated 2(5)=11.1 at the 5% level, so the 5 coefficients
are jointly insignificant.

8
Moving Average Processes

• The simplest class of time series model is that of the moving average process
• Let ut (t=1,2,3,...) be a sequence of independently and identically distributed
(iid) random variables with E(ut)=0 and Var(ut)= 2, then
yt =  + ut + 1ut-1 + 2ut-2 + ... + qut-q

is a qth order moving average model MA(q).

• Its properties are


E(yt)=; Var(yt) = 0 = (1+ )2
Covariances

Lag Operator Notation for MA(q)

• Using the lag operator (backshift operator) notation:


Lyt = yt-1 Liyt = yt-i

• yt =  + ut + 1ut-1 + 2ut-2 + ... + qut-q =  + (L)ut

where (L) = 1 + 1 L+ 2 L2 + ... + q Lq

10
Example of an MA Problem

Consider the following MA(2) process:

where ut is a zero mean white noise process with variance .


(i) Calculate the mean and variance of Xt
(ii) Derive the autocorrelation function for this process (i.e. express the
autocorrelations, 1, 2, ... as functions of the parameters 1 and 2).
(iii) If 1 = -0.5 and 2 = 0.25, sketch the acf of Xt.

11

Solution

(i) If E(ut)=0, then E(ut-i)=0  i.


So

E(Xt) = E(ut + 1ut-1+ 2ut-2)= E(ut)+ 1E(ut-1)+ 2E(ut-2)=0

Var(Xt) = E[Xt-E(Xt)][Xt-E(Xt)]
= E[(Xt)(Xt)] (E(Xt) = 0)
= E[(ut + 1ut-1+ 2ut-2)(ut + 1ut-1+ 2ut-2)]
= E[ +cross-products]

E[cross-products]=0 since Cov(ut, ut-s) = 0 for s0.

12
Solution (cont’d)

So Var(Xt) = 0= E [ ]
=
=

(ii) The acf of Xt.


1 = E[Xt-E(Xt)][Xt-1-E(Xt-1)]
= E[Xt][Xt-1]
= E[(ut +1ut-1+ 2ut-2)(ut-1 + 1ut-2+ 2ut-3)]
= E[( )]
=
=

13

Solution (cont’d)

2 = E[Xt-E(Xt)][Xt-2-E(Xt-2)]
= E[Xt][Xt-2]
= E[(ut +1ut-1+2ut-2)(ut-2 +1ut-3+2ut-4)]
= E[( )]
=

3 = E[Xt-E(Xt)][Xt-3-E(Xt-3)]
= E[Xt][Xt-3]
= E[(ut +1ut-1+2ut-2)(ut-3 +1ut-4+2ut-5)]
=0

So s = 0 for s > 2.

14
Solution (cont’d)

We have the autocovariances, now calculate the autocorrelations:

(iii) For 1 = -0.5 and 2 = 0.25, substituting these into the formulae above
gives 1 = -0.476, 2 = 0.190.

15

ACF Plot

Thus the ACF plot will appear as follows:

16
Autoregressive Processes

• An autoregressive model of order p, an AR(p) can be expressed as

• Or using the lag operator notation:


Lyt = yt-1 Liyt = yt-i

• or

or where .

17

The Stationary Condition for an AR Model

• The condition for stationarity of a general AR(p) model is that the


roots of all lie outside the unit circle (refer to
Box 6.1 on page 260).

• Example 1: Is yt = yt-1 + ut stationary?


The characteristic root is 1, so it is a unit root process (so non-
stationary)

• Example 2: Is yt = 3yt-1 - 2.75yt-2 + 0.75yt-3 +ut stationary?


The characteristic roots are 1, 2/3, and 2. Since only one of these lies
outside the unit circle, the process is non-stationary.

(Refer to Page 261)

18
Wold’s Decomposition Theorem

• States that any stationary series can be decomposed into the sum of two
unrelated processes, a purely deterministic part and a purely stochastic
part, which will be an MA().

• For the AR(p) model, , ignoring the intercept, the Wold


decomposition is

where,

• Refer to pages 262 ~ 263

19

The Moments of an Autoregressive Process

• The moments of an autoregressive process are as follows. The mean is


given by 
E ( yt ) =
1 - f1 - f 2 - ... - f p

• The autocovariances and autocorrelation functions can be obtained by


solving what are known as the Yule-Walker equations:

• If the AR model is stationary, the autocorrelation function will decay


exponentially to zero.

20
Sample AR Problem

• Consider the following simple AR(1) model

(i) Calculate the (unconditional) mean of yt.

For the remainder of the question, set =0 for simplicity.

(ii) Calculate the (unconditional) variance of yt.

(iii) Derive the autocorrelation function for yt.

21

Solution

(i) Unconditional mean:


E(yt) = E(+1yt-1)
= +1E(yt-1)
But also

So E(yt)=  +1 ( +1E(yt-2))


=  +1  +12 E(yt-2))

E(yt) =  +1  +12 E(yt-2))


=  +1  +12 ( +1E(yt-3))
=  +1  +12  +13 E(yt-3)

22
Solution (cont’d)

An infinite number of such substitutions would give


E(yt) =  (1+1+12 +...) + 1y0
So long as the model is stationary, i.e. , then 1 = 0.

So E(yt) =  (1+1+12 +...) =

(ii) Calculating the variance of yt:

From Wold’s decomposition theorem:

23

Solution (cont’d)

So long as , this will converge.

Var(yt) = E[yt-E(yt)][yt-E(yt)]
but E(yt) = 0, since we are setting  = 0.
Var(yt) = E[(yt)(yt)]
= E[ ]
= E[
= E[
=
=
=

24
Solution (cont’d)

(iii) Turning now to calculating the acf, first calculate the autocovariances:
1 = Cov(yt, yt-1) = E[yt-E(yt)][yt-1-E(yt-1)]
Since a0 has been set to zero, E(yt) = 0 and E(yt-1) = 0, so
1 = E[ytyt-1]
1 = E[ ]
= E[
=

25

Solution (cont’d)

For the second autocorrelation coefficient,


2 = Cov(yt, yt-2) = E[yt-E(yt)][yt-2-E(yt-2)]
Using the same rules as applied above for the lag 1 covariance
2 = E[ytyt-2]
= E[ ]
= E[
=
=

26
Solution (cont’d)

• If these steps were repeated for 3, the following expression would be
obtained

3 =

and for any lag s, the autocovariance would be given by

s =

The acf can now be obtained by dividing the covariances by the


variance:

27

Solution (cont’d)

0 =

1 = 2 =

3 =

s =

• Note that use of the Yule-Walker Equations would have given the same answer.

28
The Partial Autocorrelation Function (denoted kk)

• Measures the correlation between an observation k periods ago and the


current observation, after controlling for observations at intermediate lags
(i.e. all lags < k).

• So kk measures the correlation between yt and yt-k after removing the effects
of yt-k+1 , yt-k+2 , …, yt-1 .

• At lag 1, the acf = pacf always

• At lag 2, 22 = (2-12) / (1-12)

• For lags 3+, the formulae are more complex.

29

The Partial Autocorrelation Function (denoted kk)


(cont’d)

• The pacf is useful for telling the difference between an AR process and an
ARMA process.

• In the case of an AR(p), there are direct connections between yt and yt-s only
for s p.

• So for an AR(p), the theoretical pacf will be zero after lag p.

• In the case of an MA(q), this can be written as an AR(), if MA(q) process is


invertible (refer to page 267 “The invertibility condition”). So there are direct
connections between yt and all its previous values.

• For an MA(q), the theoretical pacf will be geometrically declining.

30
ARMA Processes

• By combining the AR(p) and MA(q) models, we can obtain an ARMA(p,q)


model:

where

and

or

with

31

Summary of the Behaviour of the acf and pacf


for AR, MA and ARMA Processes
• The mean of an ARMA(p, q) series is given by

• An autoregressive process AR(p) has


– a geometrically decaying acf
– number of spikes of pacf = p =AR order

• A moving average process MA(q) has


– Number of spikes of acf = q = MA order
– a geometrically decaying pacf

• A combination autoregressive moving average process ARMA(p, q) has


– a geometrically decaying acf
– a geometrically decaying pacf

32
Some sample acf and pacf plots
for standard processes
The acf and pacf are not produced analytically from the relevant formulae for a model of that
type, but rather are estimated using 100,000 simulated observations with disturbances drawn
from a normal distribution.
ACF and PACF for an MA(1) Model: yt = – 0.5ut-1 + ut

33

ACF and PACF for an MA(2) Model:


yt = 0.5ut-1 - 0.25ut-2 + ut

34
ACF and PACF for a slowly decaying AR(1) Model:
yt = 0.9yt-1 + ut

35

ACF and PACF for a more rapidly decaying AR(1)


Model: yt = 0.5yt-1 + ut

36
ACF and PACF for a more rapidly decaying AR(1)
Model with Negative Coefficient: yt = -0.5yt-1 + ut

37

ACF and PACF for a Non-stationary Model


(i.e. a unit coefficient): yt = yt-1 + ut

38
ACF and PACF for an ARMA(1,1):
yt = 0.5yt-1 + 0.5ut-1 + ut

39

Building ARMA Models


- The Box Jenkins Approach

• Box and Jenkins (1970) were the first to approach the task of estimating an
ARMA model in a systematic manner. Their approach was a practical and
pragmatic one, involving 3 steps:
Step 1. Identification

Step 2. Estimation

Step 3. Model diagnostic checking

Step 1:
- Involves determining the order of the model.
- Use of graphical procedures (e.g . plotting the acf and pacf)

40
Building ARMA Models
- The Box Jenkins Approach (cont’d)

Step 2:
- Estimation of the parameters of the model specified in step 1.
- Can be done using least squares or maximum likelihood depending
on the model.

Step 3:
- Model checking – i.e. determining whether the model specified and
estimated is adequate.

Box and Jenkins suggest 2 methods:


- deliberate overfitting
- residual diagnostics
(Refer to page 274)

41

Some More Recent Developments in


ARMA Modelling

• Identification would typically not be done using graphical plots of the acf
and pacf.

• We want to form a parsimonious model.

• Reasons:
- variance of estimators is inversely proportional to the number of degrees of
freedom.
- models which are profligate might be inclined to fit to data specific features

• This gives motivation for using information criteria, which embody 2 factors
- a term which is a function of the RSS
- some penalty for adding extra parameters

• The object is to choose the number of parameters which minimises the


information criterion.

42
Information Criteria for Model Selection

• The information criteria vary according to how stiff the penalty term is.
• The three most popular criteria are Akaike’s (1974) information criterion
(AIC), Schwarz’s (1978) Bayesian information criterion (SBIC), and the
Hannan-Quinn criterion (HQIC).

where k = p + q + 1, T = sample size. So we min. IC s.t.


SBIC embodies a stiffer penalty term than AIC.
• Which IC should be preferred if they suggest different model orders?
– SBIC is strongly consistent but (inefficient).
– AIC is not consistent, and will typically pick “bigger” models.

43

Constructing ARIMA Models in EViews

• Refer to pages 276~281.

• Using the monthly UK house price series.

• Objective of this exercise is to build an ARMA model for the house


price changes. We use Box-Jenkins approach, i.e., the follow the three
steps:
– Step 1: Plot acf and pacf - possible ARMA (p, q)
– Step 2: Estimate parameters for all models – ARMA(0, 0) ~ ARMA(5, 5)
– Step 3: Model checking/selecting - AIC or SBIC (Summary table on page 280)
ARMA(4, 5) by AIC and ARMA(2, 0) = AR(2) by SBIC.

44
Forecasting/ Prediction

• Why forecast?
Forecasts are made essentially because they are useful! Financial decisions
often involve a long-term commitment of resources, the returns to which will
depend upon what happens in the future. In this context, the decisions made
today will reflect forecasts of the future state of the world, and the more
accurate those forecasts are, the more utility (or money!) is likely to be
gained from acting on them.

• An important test of the adequacy of a model. e.g.


- Forecasting tomorrow’s return on a particular share
- Forecasting the price of a house given its characteristics
- Forecasting the riskiness of a portfolio over the next year
- Forecasting the volatility of bond returns

• We can distinguish two approaches:


- Econometric (structural) forecasting
- Time series forecasting

45

In-Sample Versus Out-of-Sample

• Expect the “forecast” of the model to be good in-sample.

• Say we have some data - e.g. monthly FTSE returns for 120 months:
1990M1 – 1999M12. We could use all of it to build the model, or keep
some observations back:

• A good test of the model since we have not used the information from
1999M1 onwards when we estimated the model parameters.

46
How to produce forecasts

• Multi-step ahead versus single-step ahead forecasts

• Recursive versus rolling windows

• To understand how to construct forecasts, we need the idea of conditional


expectations: E(yt+s  t )

• We cannot forecast a white noise process: E(ut+s  t ) = 0  s > 0.

• The two simplest naïve forecasting “methods” (Box 6.3, on page 288)
1. Assume no change : ft.s = yt
2. Forecasts are the long term average ft.s =

47

Models for Forecasting

• Structural models
e.g. y = X + u

To forecast y, we require the conditional expectation of its future


value:

=
But what are etc.? We could use , so

= !!

48
Models for Forecasting (cont’d)

• Time Series Models


The current value of a series, yt, is modelled as a function only of its previous
values and the current value of an error term (and possibly previous values of
the error term).

• Models include:
• simple unweighted averages
• ARMA models
• Non-linear models – e.g. threshold models, GARCH, bilinear models, etc.

49

Forecasting with ARMA Models

The forecasting model typically used is of the form:

where ft,s = yt+s , s 0; ut+s = 0, s > 0


= ut+s , s  0

50
Forecasting with MA Models

• An MA(q) only has memory of q.

e.g. say we have estimated an MA(3) model:

yt =  + 1ut-1 +  2ut-2 +  3ut-3 + ut


yt+1 =  +  1ut +  2ut-1 +  3ut-2 + ut+1
yt+2 =  +  1ut+1 +  2ut +  3ut-1 + ut+2
yt+3 =  +  1ut+2 +  2ut+1 +  3ut + ut+3

• We are at time t and we want to forecast 1,2,..., s steps ahead.

• We know yt , yt-1, ..., and ut , ut-1

51

Forecasting with MA Models (cont’d)

ft, 1 = E(yt+1  t ) = E( +  1ut +  2ut-1 +  3ut-2 + ut+1)


=  +  1ut +  2ut-1 +  3ut-2

ft, 2 = E(yt+2  t ) = E( +  1ut+1 +  2ut +  3ut-1 + ut+2)


=  +  2ut +  3ut-1

ft, 3 = E(yt+3  t ) = E( +  1ut+2 +  2ut+1 +  3ut + ut+3)


=  +  3u t

ft, 4 = E(yt+4  t ) = 

ft, s = E(yt+s  t ) =  s4

52
Forecasting with AR Models

• Say we have estimated an AR(2)


yt =  + 1yt-1 +  2yt-2 + ut
yt+1 =  +  1yt +  2yt-1 + ut+1
yt+2 =  +  1yt+1 +  2yt + ut+2
yt+3 =  +  1yt+2 +  2yt+1 + ut+3

ft, 1 = E(yt+1  t ) = E( +  1yt +  2yt-1 + ut+1)


=  +  1E(yt) +  2E(yt-1)
=  +  1yt +  2yt-1

ft, 2 = E(yt+2  t ) = E( +  1yt+1 +  2yt + ut+2)


=  +  1E(yt+1) +  2E(yt)
=  +  1 ft, 1 +  2yt

53

Forecasting with AR Models (cont’d)

ft, 3 = E(yt+3  t ) = E( +  1yt+2 +  2yt+1 + ut+3)


=  +  1E(yt+2) +  2E(yt+1)
=  +  1 ft, 2 +  2 ft, 1

• We can see immediately that

ft, 4 =  +  1 ft, 3 +  2 ft, 2 etc., so

ft, s =  +  1 ft, s-1 +  2 ft, s-2

• Can easily generate ARMA(p,q) forecasts in the same way.

54
Forecasting in EViews

Forecasting using ARMA models in EViews

• Suppose that the AR(2) model selected for the house price percentage
changes series were estimated using observations Feb 1991 ~ Dec 2010,
leaving 29 remaining observations to construct forecasts for and to test
forecast accuracy (i.e. for the period Jan 2011 ~ May 2013)

(Details refer to pages 296 ~ 299)

55

You might also like