Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
33 views

Time Series Forecast - A Basic Introduction Using Python

This document provides a summary of time series forecasting using Python. It introduces time series data and forecasting, and demonstrates how to load time series data into Pandas, check for stationarity, and make adjustments to make the data stationary. Specifically, it shows how to convert date strings to datetime indexes, check for trends and seasonality visually and using statistical tests, take the logarithm or moving average to remove trends, and subtract the smoothed data from the original to remove trends and make the time series stationary for forecasting. The goal is to prepare time series data for statistical forecasting models by removing non-stationarities like trends and seasonality.

Uploaded by

khongbich
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Time Series Forecast - A Basic Introduction Using Python

This document provides a summary of time series forecasting using Python. It introduces time series data and forecasting, and demonstrates how to load time series data into Pandas, check for stationarity, and make adjustments to make the data stationary. Specifically, it shows how to convert date strings to datetime indexes, check for trends and seasonality visually and using statistical tests, take the logarithm or moving average to remove trends, and subtract the smoothed data from the original to remove trends and make the time series stationary for forecasting. The goal is to prepare time series data for statistical forecasting models by removing non-stationarities like trends and seasonality.

Uploaded by

khongbich
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

2/21/2020 Time Series Forecast : A basic introduction using Python.

Time Series Forecast : A basic introduction using


Python.
Jacob_s
Nov 9, 2017 · 10 min read

Time series data is an important source for information and strategy used in various
businesses. From a conventional finance industry to education industry, they play a
major role in understanding a lot of details on specific factors with respect to time. I
recently learnt the importance of Time series data in the telecommunication industry
and wanted to brush up on my time series analysis and forecasting information. So I
decided to work through a simple example using python and I have explained all the
details in this blog.

Time series forecasting is basically the machine learning modeling for Time Series data
(years, days, hours…etc.)for predicting future values using Time Series modeling .This
helps if your data in serially correlated.

Loading and Handling Time Series in Pandas.

I will be using python in jupyter notebook. Pandas in python has libraries that are
specific to handling time series object .You can check out this documentation for more
details .

Let’s start with the Preliminaries

import pandas as pd
import numpy as np
import matplotlib.pylab as plt
%matplotlib inline

To load the data- I have provided the link to my GitHub where the dataset and the code
is available. I will be using the AirPassenger dataset from. Are you ready?

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 1/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

data = pd.read_csv('AirPassengers.csv')
print data.head()
print '\n Data Types:'
print data.dtypes

Looking at the output .

The data contains a particular month and number of passengers travelling in that
month .The data type here is object (month) Let’s convert it into a Time series object
and use the Month column as our index.

from datetime import datetime


con=data['Month']
data['Month']=pd.to_datetime(data['Month'])
data.set_index('Month', inplace=True)
#check datatype of index
data.index

You can see that now the data type is ‘datetime64[ns]’.Now let’s just make it into a
series rather than a data frame ( this would make it easier for the blog explanation )

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 2/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

#convert to time series:


ts = data['#Passengers']
ts.head(10)

Let’s explore the various properties date-time based index:

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 3/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

STATIONARITY

This is a very important concept in Time Series Analysis. In order to apply a time series
model, it is important for the Time series to be stationary; in other words all its
statistical properties (mean,variance) remain constant over time. This is done basically
because if you take a certain behavior over time, it is important that this behavior is
same in the future in order for us to forecast the series. There are a lot of statistical
theories to explore stationary series than non-stationary series. (Thus we can bring the
fight to our home ground!)

In practice we can assume the series to be stationary if it has constant statistical


properties over time and these properties can be:

• constant mean

• constant variance

• an auto co-variance that does not depend on time.

These details can be easily retrieved using stat commands in python.

The best way to understand you stationarity in a Time Series is by eye-balling the plot:

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 4/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

It’s clear from the plot that there is an overall increase in the trend,with some
seasonality in it.

I have written a function for it as I will be using it quite often in this Time series
explanation. But before we get to that,let me explain all the concepts in the function.

Plotting Rolling Statistics :The function will plot the moving mean or moving Standard
Deviation. This is still visual method

NOTE: moving mean and moving standard deviation — At any instant ‘t’, we take the
mean/std of the last year which in this case is 12 months)

Dickey-fuller Test :This is one of the statistical tests for checking stationarity. First we
consider the null hypothesis: the time series is non- stationary. The result from the rest
will contain the test statistic and critical value for different confidence levels. The idea
is to have Test statistics less than critical value, in this case we can reject the null
hypothesis and say that this Time series is indeed stationary (the force is strong with
this one !!)

More details for Dickey fuller Test.

Function details:

mean

Standard deviation (instead of variance)

Plot original series

Plot mean

Plot std

Plot Dickey-Fuller test

from statsmodels.tsa.stattools import adfuller


def test_stationarity(timeseries):

#Determing rolling statistics


rolmean = pd.rolling_mean(timeseries, window=12)
rolstd = pd.rolling_std(timeseries, window=12)

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 5/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

#Plot rolling statistics:


plt.plot(timeseries, color='blue',label='Original')
plt.plot(rolmean, color='red', label='Rolling Mean')
plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show()
#Perform Dickey-Fuller test:
print 'Results of Dickey-Fuller Test:'
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-
value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print dfoutput

Now let’s parse our time series data into this function:

This is not stationary because :

• mean is increasing even though the std is small.

• Test stat is > critical value.

• Note: the signed values are compared and the absolute values.

MAKING THE TIME SERIES STATIONARY


https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 6/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

There are two major factors that make a time series non-stationary. They are:

• Trend: non-constant mean

• Seasonality: Variation at specific time-frames

The basic idea is to model the trend and seasonality in this series, so we can remove it
and make the series stationary. Then we can go ahead and apply statistical forecasting
to the stationary series. And finally we can convert the forecasted values into original
by applying the trend and seasonality constrains back to those that we previously
separated.

Let’s start by working on the trend piece.

Trend

The first step is to reduce the trend using transformation, as we can see here that there
is a strong positive trend. These transformation can be log, sq-rt, cube root etc .
Basically it penalizes larger values more than the smaller. In this case we will use the
logarithmic transformation.

There is some noise in realizing the forward trend here. There are some methods to
model these trends and then remove them from the series. Some of the common ones
are:

• Smoothing: using rolling/moving average

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 7/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

• Aggression: by taking the mean for a certain time period (year/month)

I will be using Smoothing here.

Smoothing:

In smoothing we usually take the past few instances (rolling estimates) We will discuss
two methods under smoothing- Moving average and Exponentially weighted moving
average.

Moving average -

First take x consecutive values and this depends on the frequency if it is 1 year we take
12 values. Lucky for us that Pandas has a function for rolling estimate (“alright alright
alright” -Matthew McConaughey!)

Now subtract the rolling mean from the original series.

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 8/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

The reason there are null values is because we take the average of first 12 so 11 values
are null. We can also see that in the visual representation. Thus it is dropped for further
analysis. Now let’s parse it to the function to check for stationarity.

We notice two things:

• The rolling values are varying slightly but there is no specific trend.

• The test statistics is smaller than the 5 % critical values. That tells us that we are 95%
confident that this series is stationary.

In this example we can easily take a time period (12 months for a year), but there are
situations where the time period range is more complex like stock price etc. So we use

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 9/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

the exponentially weighted moving average (there are other weighted moving averages
but for starters, lets use this). The previous values are assigned with a decay factor.
Pandas again comes to the rescue with some awesome functions for it, like:

the parameter (halflife) is assumed 12, but it really depends on the domain. Let’s check
stationarity now:

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 10/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

It is stationary because:

• Rolling values have less variations in mean and standard deviation in magnitude.

• the test statistic is smaller than 1% of the critical value. So we can say we are almost
99% confident that this is stationary.

Seasonality (along with Trend)

Previously we saw just trend part of the time series, now we will see both trend and
seasonality. Most Time series have trends along with seasonality. There are two
common methods to remove trend and seasonality, they are:

• Differencing: by taking difference using time lag

• Decomposition: model both trend and seasonality, then remove them

Differencing:

Here we first take the difference of the value at a particular time with that of the
previous time. Now let’s do it in Pandas.

Looks ok to me but let’s parse it using our stationary testing function

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 11/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

It is stationary because:

• the mean and std variations have small variations with time.

• test statistic is less than 10% of the critical values, so we can be 90 % confident that
this is stationary.

Decomposing:

Here we model both the trend and the seasonality, then the remaining part of the time
series is returned. Guess what? Yup, we have some awesome function for it. Let’s check
it out:

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 12/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

Remove the trend and seasonality from the Time series and now we can use the
residual values. Let’s check stationarity.

This is stationary because:

• test statistic is lower than 1% critical values.

• the mean and std variations have small variations with time.

Forecasting a Time Series

Now that we have made the Time series stationary, let’s make models on the time series
using differencing because it is easy to add the error , trend and seasonality back into
predicted values .

We will use statistical modelling method called ARIMA to forecast the data where there
are dependencies in the values.

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 13/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

Auto Regressive Integrated Moving Average(ARIMA) — It is like a liner regression


equation where the predictors depend on parameters (p,d,q) of the ARIMA model .

Let me explain these dependent parameters:

• p : This is the number of AR (Auto-Regressive) terms . Example — if p is 3 the


predictor for y(t) will be y(t-1),y(t-2),y(t-3).

• q : This is the number of MA (Moving-Average) terms . Example — if p is 3 the


predictor for y(t) will be y(t-1),y(t-2),y(t-3).

• d :This is the number of differences or the number of non-seasonal differences .

Now let’s check out on how we can figure out what value of p and q to use. We use two
popular plotting techniques; they are:

• Autocorrelation Function (ACF): It just measures the correlation between two


consecutive (lagged version). example at lag 4, ACF will compare series at time
instance t1…t2 with series at instance t1–4…t2–4

• Partial Autocorrelation Function (PACF): is used to measure the degree of association


between y(t) and y(t-p).

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 14/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

The two dotted lines on either sides of 0 are the confidence intervals. These can be
used to determine the ‘p’ and ‘q’ values as:

• p: The first time where the PACF crosses the upper confidence interval, here its close
to 2. hence p = 2.

• q: The first time where the ACF crosses the upper confidence interval, here its close
to 2. hence p = 2.

Now, using this make 3 different ARIMA models considering individual as well as
combined effects. I will also print the RSS for each. Please note that here RSS is for the
values of residuals and not actual series.

AR Model

MA Model
https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 15/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

ARIMA MODEL

RSS values:

• AR=1.5023

• MA=1.472

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 16/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

• ARIMA =1.0292

ARIMA has the best RSS values.

FINAL STEP: BRINGING THIS BACK TO THE ORIGINAL SCALE

Steps involved:

• First get the predicted values and store it as series. You will notice the first month is
missing because we took a lag of 1(shift).

• Now convert differencing to log scale: find the cumulative sum and add it to a new
series with a base value (here the first-month value of the log series).

• Next -take the exponent of the series from above (anti-log) which will be the
predicted value — the time series forecast model.

Now plot the predicted values with the original.

• Find the RMSE

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 17/18
2/21/2020 Time Series Forecast : A basic introduction using Python.

The result can be further refined to get a better model. The scope of the blog was to
quickly introduce Time Series Forecasting. Hope you guys enjoyed the blog, there a lot
more details with respect Time series analysis and forecasting. This is a good place to
understanding the theory behind the practical techniques .

Data Science Timeseries Python Programming

About Help Legal

https://medium.com/@stallonejacob/time-series-forecast-a-basic-introduction-using-python-414fcb963000 18/18

You might also like