Class Notes
Class Notes
13.03.2020
There are two kinds of Models in Time Series: 1. Causal Model & 2. Non- Causal Model
Causal Model: In a causal model there is a set of independent variables impacting the dependent
variable. (e.g) Linear Regression, Logistic Regression, etc.
Non-Causal Model: In a Non-causal model (also known as Time series model), we don’t believe in
dependency. (e.g) Stock Forecasting, etc.
1. The first major requirement in Time series the data should be evenly spaced (The Interval should be
Consistent).
2. The data should be collected at regular time interval, which can be Daily / Weekly / Monthly /
Quarterly / Half-early / Yearly.
- The frequency of collecting data should depend on what is the required forecast frequency.
1. ARIMA.
2. SES (Simple Exponential Smoothening).
3. HTM (Holts Trend Model).
4. HWSM (Holts Winters Seasonal Model).
Trying to figure out which model will give you the best forecast.
In Time series we don’t break the data set into Training and Testing. The model is built on the entire
data set.
There are four components in a Time Series plot: Trend, Cyclical, Seasonal and Irregular.
Trends: This is long term components. Persistent, overall upward or downward pattern.
Cyclical: This is short term component, the repetition is more than a year.
Seasonal: This is short term component, the repetition is less than a year (some things occurs
only once in a year).
Irregular: There won’t be any pattern of repetition.
To model a time series, it is necessary to find out which components do my series exhibit.
If a data set need to be fit for ARIMA model, then it should be Stationary. To check the stationary of the
data set
Stationary in Time series is same as Normality in Statistics (i.e. Residual should be normally distributed, if
errors are not normally distributed please don’t use linear regression). If the variance & mean of the
data is constant, then the data is Stationary (e.g. ECG graph).
This data set can’t be forecasted, until we make it stationary.
To make the series stationary, some transformation functions needs to be used. The transformation
functions that can be used are log, Exponential, Sine, Cos, Tan, Sine-Hyperbolic, Cos-H, Tan-H. The type
of transformation function to be used depends on the kind the series. The whole objective is before
using ARIMA, make the series stationary.
Use Transformations to make Variance contact and use Difference to make mean constant.
To statistical test check the stationary is KPSS test. For a stationary series the p-value should be greater
than 0.05.
This model has three components the AR-component, I-component & MA-component to take
care of the time series.
AR (Auto Regressive):
o In the AR-Component we are regressing on the previous values of ‘Y’. So the new
equation will be Y=β0+β1(YT-1) +β2(YT-2) +…+ βn(YT-n). Since we are regressing on
dependent variable it is called as auto regression.
o To solve for AR component, we need to know, how many previous periods to be
considered to forecast for the next period. This is the p-coordinate in ARIMA. To get the
p-coordinate PACF (Partial Auto Correlation Function) plot is drawn.
I (Integration):
o Integration means the No. of times you did the differencing to make the stationary. This
is the d-coordinate in ARIMA.
MA (Moving Average):
o The MA-Component handles the residual errors. This will be return in Ɛ=β0+β1(ƐT-1)
+β2(ƐT-2) +…+ βn(ƐT-n). The MA-component considers how many previous errors need
to be taken into account. This is the q-quadrate in ARIMA and this can be opted in the
ACF (Auto Correlation Function) plot.
o ACF Plot:
ACF is the auto correlation between the original series and the lagged series.
ACF(1) is the correlation between original series and lag(1) series. ACF (2) is the
correlation between lag(1) series and lag(2) series.
Using this ACF is calculated by using acf() in R and then it is put in the ACF plot
to get the value of q.
o PACF Plot:
PACF is the partial correlation of original series, which means correlation
between and the lagged series removing the effect of intervening lags. So,
PACF(2) is the partial correlation between original and lag(2) series removing
the effect of lag(1). Similarly, PACF(3) is equal to correlation between original
and lag(3) series removing the effect of lag(1) and lag(2).
Using this PACF is calculated by using pacf() in R and then it is put in the PACF
plot to get the value of p.
Get the value of p & q from PACF plot & the ACF plot.
15.03.2020
- For working on Time series we need to change the class of the data set into TS.
- The Moving Average component in ARIMA takes care of the correlation between the residuals
and this is what makes ARIMA a powerful ARIMA.
- The function for SES, HTM & HWSM will be same in R, whereas the differences only be in the
argument.
- If the data set is already stationary, then SES model will give good results than making it
stationary and do ARIMA.
- If the series basically exhibits an increasing / decreasing trend with minor ups and downs, then
Holts Trend Model can be used.
- If the plot exhibits up and downs with no major trend, then Holt Winters Seasonal Model can be
used.
- This model has three parameters: α, β, ϒ and they are called as smoothening coefficients.
- The combination of α, β, ϒ tells the R function, is it SES or HTM or HWSM.
- If it is SES, then α = True, β = False, ϒ = False.
- If it is HTM, then α = True, β = True, ϒ = False.
- If it is HWSM, then α = True, β = True, ϒ = True.
- The syntax of the function is HoltWinters(Data_Set_with_TS_class, α = True / False, β = True /
False, ϒ = True / False). In the argument of HoltWinter() function, it is enough to specify only
which smoothening coefficients is False.
- To Check the Correlation between the Errors / Residuals, there are two test available in R:
o Durbin Watson Test: This is used for Causal Model.
o Ljung Box Test: This will be used for Non-causal model. If the p-value is greater than 0.05
then the residuals are not correlated and the model is stable.
- If the p-value of the Ljung Box test is less than 0.05 then there is enough evidence to say the
model is not stable and it is better to change an alternative model irrespective of MAPE.
- If the data is not stationary and trend is predominant, we can use Holts Trend model.
- If the data set has predominant trend in it then go for HT model.
- If the data has ups and downs with no major trend, then Holt Winters Seasonal Model can be
used.
Decomposition of Time series: The TS can be broken into three components Trend, Seasonal / Cyclical,
Irregular / Random. To do this in R, there is a function syntax: plot(decompose(Data_Set))