Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
42 views

Class Notes

The document discusses time series forecasting models. There are two types of models: causal models which relate a dependent variable to independent variables, and non-causal or time series models which do not assume dependencies. Common time series models discussed include ARIMA, SES, HTM, and HWSM. The document provides details on identifying components in a time series, checking for and achieving stationarity, and selecting appropriate models and evaluating their accuracy.

Uploaded by

Nirmal Samuel
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Class Notes

The document discusses time series forecasting models. There are two types of models: causal models which relate a dependent variable to independent variables, and non-causal or time series models which do not assume dependencies. Common time series models discussed include ARIMA, SES, HTM, and HWSM. The document provides details on identifying components in a time series, checking for and achieving stationarity, and selecting appropriate models and evaluating their accuracy.

Uploaded by

Nirmal Samuel
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Time Series Forecasting

13.03.2020
There are two kinds of Models in Time Series: 1. Causal Model & 2. Non- Causal Model

Causal Model: In a causal model there is a set of independent variables impacting the dependent
variable. (e.g) Linear Regression, Logistic Regression, etc.

Non-Causal Model: In a Non-causal model (also known as Time series model), we don’t believe in
dependency. (e.g) Stock Forecasting, etc.

 Time Series comes under new domain Econometrics.


 To get the best fit line the name of the function in R is abline(Model).
 For a model two parameters need to be reported
o The value of R-Square (How much of DV is explained by Independent Variables).
o Forecast accuracy (How much is the predicted (Causal model) / Forecast (Non-causal
model) value is deviating from the actual value).
 Predicted values of a linear regression is stored in the argument Model$Fitted and the
difference between the actual and the predicted values will be stored in Model$Residual
 The model accuracy should be at least 85%.

Parameters to Measure Forecast Accuracy:

1. Mean Forecast Error (MFE): Et = At (Actual Value)-Ft (Predicted / Forecasted Value)


o MFE will not be able to differentiate, which is the better forecasted model. Since, both
the models gives the same MFE value.
2. Mean Absolute Deviation (MAD): Et = |At - Ft|
o In Comparing two models MAD will tell which is the better model. But, how much better
cannot be answered.
3. Mean Squared Error (MSE): Et = (At - Ft)2
o MSE will also tell which model is better but, will not be able to tell how much better.
4. Mean Absolute Percentage Error (MAPE): Et = |At - Ft|/ At
o MAPE tells me how better is one model from the other.
14.03.2020
- While we taking a data set, 1st thing we need to check is whether the variables of the data set is
“Normally Distributed” or not.

Requirements of Time Series Forecasting:

1. The first major requirement in Time series the data should be evenly spaced (The Interval should be
Consistent).

2. The data should be collected at regular time interval, which can be Daily / Weekly / Monthly /
Quarterly / Half-early / Yearly.

- The frequency of collecting data should depend on what is the required forecast frequency.

Types of Time Series Model:

1. ARIMA.
2. SES (Simple Exponential Smoothening).
3. HTM (Holts Trend Model).
4. HWSM (Holts Winters Seasonal Model).

Trying to figure out which model will give you the best forecast.

R-Libraries used for TSF: “tseries”, “forecast”

 Library in R is a “Collection of related functions and compiled data sets”.


 CRAN (Comprehensive R Archive Network).

In Time series we don’t break the data set into Training and Testing. The model is built on the entire
data set.

There are four components in a Time Series plot: Trend, Cyclical, Seasonal and Irregular.

 Trends: This is long term components. Persistent, overall upward or downward pattern.
 Cyclical: This is short term component, the repetition is more than a year.
 Seasonal: This is short term component, the repetition is less than a year (some things occurs
only once in a year).
 Irregular: There won’t be any pattern of repetition.

To model a time series, it is necessary to find out which components do my series exhibit.

If a data set need to be fit for ARIMA model, then it should be Stationary. To check the stationary of the
data set

Stationary in Time series is same as Normality in Statistics (i.e. Residual should be normally distributed, if
errors are not normally distributed please don’t use linear regression). If the variance & mean of the
data is constant, then the data is Stationary (e.g. ECG graph).
This data set can’t be forecasted, until we make it stationary.

To make the series stationary, some transformation functions needs to be used. The transformation
functions that can be used are log, Exponential, Sine, Cos, Tan, Sine-Hyperbolic, Cos-H, Tan-H. The type
of transformation function to be used depends on the kind the series. The whole objective is before
using ARIMA, make the series stationary.

Do scaling before do Clustering and Neural Network to avoid biased results.

Use Transformations to make Variance contact and use Difference to make mean constant.

To statistical test check the stationary is KPSS test. For a stationary series the p-value should be greater
than 0.05.

ARIMA (Auto Regressive Integrated Moving Average) model:

 This model has three components the AR-component, I-component & MA-component to take
care of the time series.
 AR (Auto Regressive):
o In the AR-Component we are regressing on the previous values of ‘Y’. So the new
equation will be Y=β0+β1(YT-1) +β2(YT-2) +…+ βn(YT-n). Since we are regressing on
dependent variable it is called as auto regression.
o To solve for AR component, we need to know, how many previous periods to be
considered to forecast for the next period. This is the p-coordinate in ARIMA. To get the
p-coordinate PACF (Partial Auto Correlation Function) plot is drawn.
 I (Integration):
o Integration means the No. of times you did the differencing to make the stationary. This
is the d-coordinate in ARIMA.
 MA (Moving Average):
o The MA-Component handles the residual errors. This will be return in Ɛ=β0+β1(ƐT-1)
+β2(ƐT-2) +…+ βn(ƐT-n). The MA-component considers how many previous errors need
to be taken into account. This is the q-quadrate in ARIMA and this can be opted in the
ACF (Auto Correlation Function) plot.
o ACF Plot:
 ACF is the auto correlation between the original series and the lagged series.
ACF(1) is the correlation between original series and lag(1) series. ACF (2) is the
correlation between lag(1) series and lag(2) series.
 Using this ACF is calculated by using acf() in R and then it is put in the ACF plot
to get the value of q.
o PACF Plot:
 PACF is the partial correlation of original series, which means correlation
between and the lagged series removing the effect of intervening lags. So,
PACF(2) is the partial correlation between original and lag(2) series removing
the effect of lag(1). Similarly, PACF(3) is equal to correlation between original
and lag(3) series removing the effect of lag(1) and lag(2).
 Using this PACF is calculated by using pacf() in R and then it is put in the PACF
plot to get the value of p.
 Get the value of p & q from PACF plot & the ACF plot.

Steps to approach a Time series Data Set:

1. Import the data set


2. Get the Structure if the data set
3. Observe the data set before EDA and try if you can able to get any insights.
4. Use Plots to express your observation form the previous observation.
5. Draw the best fit line.
6. Find the Components presents in the data set.
7. Check the Stationary of the Data from the plot.
8. If the data is not Stationary, then we need to use Transformation function to make it stationary
(Variance & Mean should be constant). Use Transformations to make Variance contact and use
Difference to make mean constant.
9. Check the Stationary of the Data statistically. Use kpss test.
10. Calculate q-coordinate by using ACF graph to solve MA-component.
11. Calculate p-coordinate by using PACF graph to solve AR-component.
12. Run ARIMA model using coordinates.
13. Predict the value, then do the anti-transformation.

15.03.2020
- For working on Time series we need to change the class of the data set into TS.
- The Moving Average component in ARIMA takes care of the correlation between the residuals
and this is what makes ARIMA a powerful ARIMA.

Other Models (SES, HTM & HWSM):

- The function for SES, HTM & HWSM will be same in R, whereas the differences only be in the
argument.
- If the data set is already stationary, then SES model will give good results than making it
stationary and do ARIMA.
- If the series basically exhibits an increasing / decreasing trend with minor ups and downs, then
Holts Trend Model can be used.
- If the plot exhibits up and downs with no major trend, then Holt Winters Seasonal Model can be
used.
- This model has three parameters: α, β, ϒ and they are called as smoothening coefficients.
- The combination of α, β, ϒ tells the R function, is it SES or HTM or HWSM.
- If it is SES, then α = True, β = False, ϒ = False.
- If it is HTM, then α = True, β = True, ϒ = False.
- If it is HWSM, then α = True, β = True, ϒ = True.
- The syntax of the function is HoltWinters(Data_Set_with_TS_class, α = True / False, β = True /
False, ϒ = True / False). In the argument of HoltWinter() function, it is enough to specify only
which smoothening coefficients is False.
- To Check the Correlation between the Errors / Residuals, there are two test available in R:
o Durbin Watson Test: This is used for Causal Model.
o Ljung Box Test: This will be used for Non-causal model. If the p-value is greater than 0.05
then the residuals are not correlated and the model is stable.
- If the p-value of the Ljung Box test is less than 0.05 then there is enough evidence to say the
model is not stable and it is better to change an alternative model irrespective of MAPE.

Simple Exponential Smoothening (SES) Model:

- When the data is stationary then use SES.


- The residuals are stored in the argument forecast$residuals.

Steps to approach SES models data set:

1. Import the data and convert the data type to TS.


2. Plot the data set.
3. If the plot shows, there is no much trend and ups and downs are more are less constant then it
seems the data is stationary then go for SES model
4. Run the HoltWinters(Data_Set_with_TS_class, β = False, ϒ = False) function.
5. Plot the model to check the pattern between original data and forecasted data.
6. Calculate the forecast(Model_Name, Time_Period).
7. Run the Ljung Box Test Box.test(Forecast_Name$residuals, type = “Ljung-Box”) to check the
correlation between residuals (i.e) stability of the model.

Holts Trend (HT) Model:

- If the data is not stationary and trend is predominant, we can use Holts Trend model.
- If the data set has predominant trend in it then go for HT model.

Steps to approach HT models data set:

1. Import the data and convert the data type to TD.


2. Plot the data set.
3. If the plot shows, the data is not stationary and there is predominant trend and less seasonality,
then use HT model
4. Run the HoltWinters(Data_Set_with_TS_class, ϒ = False) function.
5. Plot the model to check the pattern between original data and forecasted data.
6. Calculate the forecast(Model_Name, Time_Period).
7. Run the Ljung Box Test Box.test(Forecast_Name$residuals, type = “Ljung-Box”) to check the
correlation between residuals (i.e) stability of the model.

Holts Winter Seasonal (HWS) Model:

- If the data has ups and downs with no major trend, then Holt Winters Seasonal Model can be
used.

Steps to approach HWS models data set:

1. Import the data and convert the data type to TD.


2. Plot the data set.
3. If the plot shows, the data is not stationary and there is predominant seasonality with less trend,
then use HWS model
4. Run the HoltWinters(Data_Set_with_TS_class) function.
5. Plot the model to check the pattern between original data and forecasted data.
6. Calculate the forecast(Model_Name, Time_Period).
7. Run the Ljung Box Test Box.test(Forecast_Name$residuals, type = “Ljung-Box”) to check the
correlation between residuals (i.e) stability of the model.

Decomposition of Time series: The TS can be broken into three components Trend, Seasonal / Cyclical,
Irregular / Random. To do this in R, there is a function syntax: plot(decompose(Data_Set))

You might also like