Time Series Forecast - A Basic Introduction Using Python
Time Series Forecast - A Basic Introduction Using Python
Time series data is an important source for information and strategy used in various
businesses. From a conventional finance industry to education industry, they play a
major role in understanding a lot of details on specific factors with respect to time. I
recently learnt the importance of Time series data in the telecommunication industry
and wanted to brush up on my time series analysis and forecasting information. So I
decided to work through a simple example using python and I have explained all the
details in this blog.
Time series forecasting is basically the machine learning modeling for Time Series data
(years, days, hours…etc.)for predicting future values using Time Series modeling .This
helps if your data in serially correlated.
I will be using python in jupyter notebook. Pandas in python has libraries that are
specific to handling time series object .You can check out this documentation for more
details .
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
%matplotlib inline
To load the data- I have provided the link to my GitHub where the dataset and the code
is available. I will be using the AirPassenger dataset from. Are you ready?
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 1/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
data = pd.read_csv('AirPassengers.csv')
print data.head()
print '\n Data Types:'
print data.dtypes
The data contains a particular month and number of passengers travelling in that
month .The data type here is object (month) Let’s convert it into a Time series object
and use the Month column as our index.
You can see that now the data type is ‘datetime64[ns]’.Now let’s just make it into a
series rather than a data frame ( this would make it easier for the blog explanation )
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 2/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 3/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
STATIONARITY
This is a very important concept in Time Series Analysis. In order to apply a time series
model, it is important for the Time series to be stationary; in other words all its
statistical properties (mean,variance) remain constant over time. This is done basically
because if you take a certain behavior over time, it is important that this behavior is
same in the future in order for us to forecast the series. There are a lot of statistical
theories to explore stationary series than non-stationary series. (Thus we can bring the
fight to our home ground!)
• constant mean
• constant variance
The best way to understand you stationarity in a Time Series is by eye-balling the plot:
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 4/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
It’s clear from the plot that there is an overall increase in the trend,with some
seasonality in it.
I have written a function for it as I will be using it quite often in this Time series
explanation. But before we get to that,let me explain all the concepts in the function.
Plotting Rolling Statistics :The function will plot the moving mean or moving Standard
Deviation. This is still visual method
NOTE: moving mean and moving standard deviation — At any instant ‘t’, we take the
mean/std of the last year which in this case is 12 months)
Dickey-fuller Test :This is one of the statistical tests for checking stationarity. First we
consider the null hypothesis: the time series is non- stationary. The result from the rest
will contain the test statistic and critical value for different confidence levels. The idea
is to have Test statistics less than critical value, in this case we can reject the null
hypothesis and say that this Time series is indeed stationary (the force is strong with
this one !!)
Function details:
mean
Plot mean
Plot std
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 5/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
Now let’s parse our time series data into this function:
• Note: the signed values are compared and the absolute values.
There are two major factors that make a time series non-stationary. They are:
The basic idea is to model the trend and seasonality in this series, so we can remove it
and make the series stationary. Then we can go ahead and apply statistical forecasting
to the stationary series. And finally we can convert the forecasted values into original
by applying the trend and seasonality constrains back to those that we previously
separated.
Trend
The first step is to reduce the trend using transformation, as we can see here that there
is a strong positive trend. These transformation can be log, sq-rt, cube root etc .
Basically it penalizes larger values more than the smaller. In this case we will use the
logarithmic transformation.
There is some noise in realizing the forward trend here. There are some methods to
model these trends and then remove them from the series. Some of the common ones
are:
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 7/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
Smoothing:
In smoothing we usually take the past few instances (rolling estimates) We will discuss
two methods under smoothing- Moving average and Exponentially weighted moving
average.
Moving average -
First take x consecutive values and this depends on the frequency if it is 1 year we take
12 values. Lucky for us that Pandas has a function for rolling estimate (“alright alright
alright” -Matthew McConaughey!)
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 8/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
The reason there are null values is because we take the average of first 12 so 11 values
are null. We can also see that in the visual representation. Thus it is dropped for further
analysis. Now let’s parse it to the function to check for stationarity.
• The rolling values are varying slightly but there is no specific trend.
• The test statistics is smaller than the 5 % critical values. That tells us that we are 95%
confident that this series is stationary.
In this example we can easily take a time period (12 months for a year), but there are
situations where the time period range is more complex like stock price etc. So we use
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 9/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
the exponentially weighted moving average (there are other weighted moving averages
but for starters, lets use this). The previous values are assigned with a decay factor.
Pandas again comes to the rescue with some awesome functions for it, like:
the parameter (halflife) is assumed 12, but it really depends on the domain. Let’s check
stationarity now:
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 10/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
It is stationary because:
• Rolling values have less variations in mean and standard deviation in magnitude.
• the test statistic is smaller than 1% of the critical value. So we can say we are almost
99% confident that this is stationary.
Previously we saw just trend part of the time series, now we will see both trend and
seasonality. Most Time series have trends along with seasonality. There are two
common methods to remove trend and seasonality, they are:
Differencing:
Here we first take the difference of the value at a particular time with that of the
previous time. Now let’s do it in Pandas.
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 11/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
It is stationary because:
• the mean and std variations have small variations with time.
• test statistic is less than 10% of the critical values, so we can be 90 % confident that
this is stationary.
Decomposing:
Here we model both the trend and the seasonality, then the remaining part of the time
series is returned. Guess what? Yup, we have some awesome function for it. Let’s check
it out:
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 12/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
Remove the trend and seasonality from the Time series and now we can use the
residual values. Let’s check stationarity.
• the mean and std variations have small variations with time.
Now that we have made the Time series stationary, let’s make models on the time series
using differencing because it is easy to add the error , trend and seasonality back into
predicted values .
We will use statistical modelling method called ARIMA to forecast the data where there
are dependencies in the values.
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 13/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
Now let’s check out on how we can figure out what value of p and q to use. We use two
popular plotting techniques; they are:
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 14/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
The two dotted lines on either sides of 0 are the confidence intervals. These can be
used to determine the ‘p’ and ‘q’ values as:
• p: The first time where the PACF crosses the upper confidence interval, here its close
to 2. hence p = 2.
• q: The first time where the ACF crosses the upper confidence interval, here its close
to 2. hence p = 2.
Now, using this make 3 different ARIMA models considering individual as well as
combined effects. I will also print the RSS for each. Please note that here RSS is for the
values of residuals and not actual series.
AR Model
MA Model
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 15/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
ARIMA MODEL
RSS values:
• AR=1.5023
• MA=1.472
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 16/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
• ARIMA =1.0292
Steps involved:
• First get the predicted values and store it as series. You will notice the first month is
missing because we took a lag of 1(shift).
• Now convert differencing to log scale: find the cumulative sum and add it to a new
series with a base value (here the first-month value of the log series).
• Next -take the exponent of the series from above (anti-log) which will be the
predicted value — the time series forecast model.
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 17/18
2/21/2020 Time Series Forecast : A basic introduction using Python.
The result can be further refined to get a better model. The scope of the blog was to
quickly introduce Time Series Forecasting. Hope you guys enjoyed the blog, there a lot
more details with respect Time series analysis and forecasting. This is a good place to
understanding the theory behind the practical techniques .
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 18/18