Time Series Analysis CEN-531 Notes SKJ
Time Series Analysis CEN-531 Notes SKJ
Time Series Analysis CEN-531 Notes SKJ
CEN 531
Time Series Analysis
TIME SERIES
A time series is a set of observations generated sequentially in time, e.g., river flow, precipitation,
temperature, ...
Time series analysis is useful for many applications, such as forecasting, detecting trends in
records, filling-in missing data, and generation of synthetic data.
A time series whose values have been observed at regular intervals, such as each day or each hour,
is termed as regularly spaced time series.
Observations in a discrete series, made at equidistant time intervals (h) 0 + h, 0 + 2h, … 0 + th,
…0 + Nh may be denoted by z(1), z(2), …z(t), …z(N).
Two time series xt and yt showing linear dependence at time t-k for k = 0, 1, …, → cross-
correlated.
Two series xt and yt may not have auto-correlation but may have cross-correlation.
Two series xt and yt may have auto-correlation but may not have cross-correlation.
TS Analysis - SKJ
2
Let ut, t= ..., -1, 0, 1, ..., be the random variables describing successive terms of a
time series.
Further, let distribution of any set of n consecutive u's, say ut+1, ut+2, ..., ut+n, be
F(ut+1, ut+2, . . . , ut+n).
If F is independent of t for all integers n > 0, the time series is stationary.
The joint distribution of any set of n consecutive variables is the same, regardless
of their location.
Trends
Trends are introduced in a time series due to gradual or sudden changes in the major processes that
generate the time series.
Jumps are introduced in a time series due to sudden changes in major processes that generate the
time series or due to external influence.
A water quality time series may show trends if a new factory upstream begins to discharge its
untreated effluent in the river.
Closure of a diversion dam will lead to a sudden change in the series because flow will be reduced
due to diversion.
Trend Analysis
Identification of trends in hydrologic data is helpful in modeling and predicting future behaviour
of the data.
Trends can be spatial and temporal. Will discuss temporal trends only here.
A temporal trend is general increase or decrease in observed values of a variable over time.
It describes long smooth movement of variable, ignoring short term changes.
Trend analysis is performed to determine significance of a trend (if present) and to estimate its
magnitude.
Magnitude of Trend
Magnitude of trend in a time series is determined either by using regression analysis (parametric
test) or by using Sen’s estimator method (non-parametric method).
Both these methods assume a linear trend in time series.
Regression analysis discussed earlier - linear trend or slope of regression line is the rate of rise/fall
in the variable.
Sen’s estimator
Slopes (Ti) of all data pairs are calculated by
x j − xk
Ti =
j−k for i = 1,2, …, N
MK test checks the null hypothesis of no trend versus the alternative hypothesis of the existence of
increasing or decreasing trend.
TS Analysis - SKJ
4
1 if 0
sgn( ) = 0 if =0
− 1 if 0
This statistic represents number of positive differences minus number of negative differences for
all the differences considered.
For large samples (N>10), the test is conducted using a normal distribution with mean and variance:
E[S] = 0
𝑁(𝑁−1)(2𝑁+5)−∑𝑛 𝑡 (𝑡 −1)(2𝑡 +5)
𝑘=1 𝑘 𝑘 𝑘
𝑉𝑎𝑟(𝑆) = 18
where n is the number of tied (zero difference between compared values) groups, and tk is the
number of data points in the kth tied group.
If computed value of │Z│> zα/2, null hypothesis (Ho) is rejected at the significance level α in a
two-sided test.
Seasonality / cyclicity
TS of WR variables measured or accumulated at sub-annual time intervals (monthly, 10-daily, etc.)
normally have seasonal (periodic) patterns.
Revolution of earth around sun produces annual cycles in most hydrologic variables.
Seasonal patterns can be seen in TS, e.g., monthly rainfall, daily runoff, daily urban water demands,
and these series are said to have seasonal or periodic patterns.
Many series that are used in hydrologic studies such as urban water use, hydropower demand, may
also show weekly patterns.
A stochastic process is called stationary if its properties are unaffected by a change of time origin.
Covariance between zt and zt+k is called the autocovariance at lag k and is calculated by
Lag k autocorrelation is
TS Analysis - SKJ
5
rk = ck/c0
If correlogram rapidly falls after a few lags, it indicates weak persistence or short memory.
Matlab function
autocorr(x) - to draw correlogram
Excel: =CORREL(array1, array2)
TS Analysis - SKJ
6
Difference operator
zt = zt - zt-1 = zt - B zt = (1-B) zt
Current value of the process is given by the weighted sum of pre-assigned number of
past values and a random term.
These are linear models – current value is additively equal to past values – not square, log, …
River flow arises from time-dependent components – surface flow, GW, ET, …
Precip – depends on atm circulation, ocean temp – long term persistence.
Let the values of a process at equally spaced times t, t-1, t-2, … be Yt, Yt-1, Yt-2, …
Let zt, zt-1, zt-2 … be the deviations from the mean µ; for example, zt = Yt - µ.
In a pth order linear AR model, current value of the process is expressed as a finite, linear aggregate
of previous p values of the process and a shock at.
𝑧𝑡 = 𝜑1 𝑧𝑡−1 + 𝜑2 𝑧𝑡−2 + ⋯ + 𝜑𝑝 𝑧𝑡−𝑝 + 𝑎𝑡
TS Analysis - SKJ
7
a’s are a series of independent variables, assumed to follow ND with mean 0 and variance 𝜎𝑎2
a’s are also called white noise.
Stationarity Requirement:
A time series is called stationary if its statistical properties remain constant over time.
Order of stationarity represents highest central moment which remain constant over time.
First-order stationarity indicates time invariant mean.
Second-order stationary: if both mean and variance remain constant over time.
AR process will be stable only when model parameters lie within a certain range.
Otherwise, past effects (influence of previous data points) would accumulate and the successive
values of the variable xt would move towards infinity, and therefore, the TS would not be stationary.
If there is more than one AR model parameter, similar restrictions on parameter values can be
defined.
𝒛𝒕 = 𝝋𝟏 𝒛𝒕−𝟏 + 𝒂𝒕 (AR1)
𝑧𝑡 = 𝜑1 𝑧𝑡−1 + 𝑎𝑡
𝜌1 = 𝜑1 −1 < 𝜌1 < 1
Example: Mean and SD of annual flows of a river are 4.7 and 0.958; 𝑟1 = 0.324.
Generate three data for t, t+1, and t+2 using AR(1) model. Let at’s = 0.87, -0.65, and 1.15.
Solution:
If at is independent standard normal variate, multiply at by 𝜎𝑎 = (1 − 𝑟12 )0.5
The model is:
𝑧𝑡 = 𝜑1 𝑧𝑡−1 + 𝜎𝑎 𝑎𝑡
0.5
Here, 𝜎𝑎 = (1 − 0.324) = 0.946
𝑧𝑡 = 0.324 𝑧𝑡−1 + 0.946 × 𝑎𝑡
For stationary conditions 𝜌12 < (𝜌2 + 1)/2 and −1 < 𝜌1 , 𝜌2 < 1
= 1 − 𝑅2 R2 = coefficient of determination
Example: Mean and SD of annual flows of a river are 1.0 and 0.182. AR(2) model is found to fit
well to the flow data, for which 𝑟1 = 0.458 and 𝑟2 = -0.004.
Generate data for t, and t+1 using AR(2) model. Let at’s = 1.352, and -0.532.
Solution:
TS Analysis - SKJ
9
Nov. 06
For the first-order moving-average model MA(1)
𝑧𝑡 = 𝑎𝑡 − 𝜃1 𝑎𝑡−1
TS Analysis - SKJ
10
Greater flexibility in fitting TS models is achieved by including both AR and MA terms in model.
This leads to mixed autoregressive-moving average ARMA (p, q) model:
which employs p+q+2 unknown parameters 𝜇, 𝜙1 , … , 𝜙𝑝 , 𝜃1 , … , 𝜃𝑞 , 𝜎𝑎2 ; estimated from the data.
Example: flow in a stream results due to a number of causes such as precipitation and catchment
storage.
In ARMA model, zt and at may represent time dependent discharge (output) and rainfall (input).
This mixed behavior can be modelled by ARMA models.
In practice, an adequate representation of actually occurring stationary TS can be frequently
obtained with AR, MA, or ARMA model, in which p and q are not greater than 2 and often less
than 2.
ARMA(1,1) model
Simplest member of ARMA(p, q) family is the ARMA(1, 1) model which can be written as
Consider a time series that is homogeneous except in level, i.e., the various segments of the series
look identical except, the difference in level about which it changes.
Thus, ARIMA (p, d, q) is an ARMA model that is fitted to the data after taking the dth difference
TS Analysis - SKJ
11
of the series:
where d indicates that the series is differenced d times. The notation n = 1 - Bn indicates
differencing with lag of n. The first order differencing [eq. (15)] is helpful in removing the trend
of a series or non-stationarity in the mean. Two consecutive differencing operations are necessary
to remo.ve non-stationarity in the mean and slope. However, it may not always be possible to
remove non-stationarity by differencing alone, other transformations may also be needed.
Term Harmonic originally came from acoustics wherein musical instruments are identified by
harmonics which have frequencies that are multiples of basic frequency produced.
French mathematician Fourier showed that a continuous function { X(t), t T} (where T is an
index set) can, in general, be equated to sum of an infinite number of harmonics with frequencies
1/T, 2/T, 3/T, ... .
Hence, this type of representation is a called a Fourier series.
If a Fourier series model is fit
𝑞
TS Analysis - SKJ
12
Harmonic decomposition of a periodic signal over a time span. The L harmonics (first
three are shown) have frequencies 1/T, 2/T, 3/T, 4/T, . .. , L/T and wavelengths T, T/2,
T/3, T/4, . .. , T/ L, where T/ L 2t; t is the sampling interval.
Their ordinates of harmonics are summed algebraically to give the periodic component
(𝑝)
𝑥𝑡 of sequence xt.
• Model Identification which involves the use of the data and any information on how the series
was generated to identify a subclass of parsimonious models worthy to be considered.
• Parameter Estimation which involves an efficient use of the data to make inferences about
TS Analysis - SKJ
13
Parsimony principle
If there are two equally good models, choose one with least number of parameters.
We know
So, if equation (5) is multiplied in turn by zt-2, zt-3, and expectations are computed, we get a set of
equations called the Yule-Walker equations:
TS Analysis - SKJ
14
1 𝜌1 𝜌2 ⋯ 𝜌𝑘−1 𝜑𝑘,1 𝜌1
𝜌1 1 𝜌1 … 𝜌𝑘−2 𝜑𝑘,2 𝜌2
[ ][ ⋮ ] = [ ⋮ ]
⋮ ⋮ ⋮ ⋮ ⋮
𝜌𝑘−1 𝜌𝑘−2 𝜌𝑘−3 ⋯ 1 𝜑𝑘,𝑘 𝜌𝑘
or
Pk k = k
Solving these equations for k = 1, 2, 3, …, successively, the values of 11, 22 ... are obtained as a
function of .
Quantity kk, regarded as a function of lag k, is called partial autocorrelation function (PACF).
For most monthly hydrological series, it is often helpful to first standardize series by subtracting
mean and dividing by standard deviation of corresponding month.
A first-order differencing of resultant series (if required) is often adequate to yield a stationary
series that can be modelled by ARMA class of models.
TS Analysis - SKJ
15
Common techniques to estimate of the parameters of a time series model are method of moments,
method of least squares, and method of maximum likelihood.
Model Testing
After an ARMA model has been fitted, it is necessary to apply statistical tests to check its adequacy
and suitability.
Tests for this purpose include the Porte Manteau Lack of Fit Test, the Akaike Information
Criterion (AIC), and the test of correlogram.
The Porte Manteau Lack of fit test checks whether residuals of a model are independent or not.
where rk denotes autocorrelation of residuals at lag k and L is the number of lags considered. Q
approximately follows a chi-square distribution with (L-p-q) degrees of freedom.
Technique of overfitting, in which a more elaborate model is fitted to the data and then the results
are compared, has also been recommended.
Applications of TS Models
ARMA models are frequently used in rainfall runoff modelling.
A number of well-known hydrologic models are special cases of the ARMA model.
For example, the Muskingum model of flood routing is obtained by setting certain parameters of
this equation to zero.
TS Analysis - SKJ