An End-To-End Project On Time Series Analysis and Forecasting With Python
An End-To-End Project On Time Series Analysis and Forecasting With Python
Time series analysis comprises methods for analyzing time series data
in order to extract meaningful statistics and other characteristics of the
data. Time series forecasting is the use of a model to predict future
values based on previously observed values.
Time series are widely used for non-stationary data, like economic,
weather, stock price, and retail sales in this post. We will demonstrate
different approaches for forecasting retail sales time series. Let’s get
started!
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 1/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
The Data
We are using Superstore sales data that can be downloaded from here.
import warnings
import itertools
import numpy as np
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')
import pandas as pd
import statsmodels.api as sm
import matplotlib
matplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'
There are several categories in the Superstore sales data, we start from
time series analysis and forecasting for furniture sales.
df = pd.read_excel("Superstore.xls")
furniture = df.loc[df['Category'] == 'Furniture']
Data Preprocessing
This step includes removing columns we do not need, check missing
values, aggregate sales by date and so on.
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 2/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
'Discount', 'Profit']
furniture.drop(cols, axis=1, inplace=True)
furniture = furniture.sort_values('Order Date')
furniture.isnull().sum()
Figure 1
Figure 2
Our current datetime data can be tricky to work with, therefore, we will
use the averages daily sales value for that month instead, and we are
using the start of each month as the timestamp.
y = furniture['Sales'].resample('MS').mean()
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 3/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
y['2017':]
Figure 3
y.plot(figsize=(15, 6))
plt.show()
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 4/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
Figure 4
Some distinguishable patterns appear when we plot the data. The time-
series has seasonality pattern, such as sales are always low at the
beginning of the year and high at the end of the year. There is always
an upward trend within any single year with a couple of low months in
the mid of the year.
decomposition = sm.tsa.seasonal_decompose(y,
model='additive')
fig = decomposition.plot()
plt.show()
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 5/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
Figure 5
The plot above clearly shows that the sales of furniture is unstable,
along with its obvious seasonality.
p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in
list(itertools.product(p, d, q))]
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 6/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
Figure 6
This step is parameter Selection for our furniture’s sales ARIMA Time
Series Model. Our goal here is to use a “grid search” to find the optimal
set of parameters that yields the best performance for our model.
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 7/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
Figure 7
mod = sm.tsa.statespace.SARIMAX(y,
order=(1, 1, 1),
seasonal_order=(1, 1, 0,
12),
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
print(results.summary().tables[1])
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 8/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
Figure 8
results.plot_diagnostics(figsize=(16, 8))
plt.show()
Figure 9
Validating forecasts
To help us understand the accuracy of our forecasts, we compare
predicted sales to real sales of the time series, and we set forecasts to
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 9/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
pred = results.get_prediction(start=pd.to_datetime('2017-01-
01'), dynamic=False)
pred_ci = pred.conf_int()
ax = y['2014':].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead
Forecast', alpha=.7, figsize=(14, 7))
ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Furniture Sales')
plt.legend()
plt.show()
Figure 10
The line plot is showing the observed values compared to the rolling
forecast predictions. Overall, our forecasts align with the true values
very well, showing an upward trend starts from the beginning of the
year and captured the seasonality toward the end of the year.
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 10/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
y_forecasted = pred.predicted_mean
y_truth = y['2017-01-01':]
Root Mean Square Error (RMSE) tells us that our model was able to
forecast the average daily furniture sales in the test set within 151.64 of
the real sales. Our furniture daily sales range from around 400 to over
1200. In my opinion, this is a pretty good model so far.
pred_uc = results.get_forecast(steps=100)
pred_ci = pred_uc.conf_int()
plt.legend()
plt.show()
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 11/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
Figure 11
The above time series analysis for furniture makes me curious about
other categories, and how do they compare with each other over time.
Therefore, we are going to compare time series of furniture and office
supplier.
Data Exploration
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 12/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
We are going to compare two categories’ sales in the same time period.
This means combine two data frames into one and plot these two
categories’ time series into one plot.
y_furniture = furniture['Sales'].resample('MS').mean()
y_office = office['Sales'].resample('MS').mean()
Figure 12
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 13/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
plt.figure(figsize=(20, 8))
plt.plot(store['Order Date'], store['furniture_sales'], 'b-
', label = 'furniture')
plt.plot(store['Order Date'], store['office_sales'], 'r-',
label = 'office supplies')
plt.xlabel('Date'); plt.ylabel('Sales'); plt.title('Sales of
Furniture and Office Supplies')
plt.legend();
Figure 13
first_date =
store.ix[np.min(list(np.where(store['office_sales'] >
store['furniture_sales'])[0])), 'Order Date']
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 14/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
furniture_forecast =
furniture_model.make_future_dataframe(periods=36, freq='MS')
furniture_forecast =
furniture_model.predict(furniture_forecast)
office_forecast =
office_model.make_future_dataframe(periods=36, freq='MS')
office_forecast = office_model.predict(office_forecast)
plt.figure(figsize=(18, 6))
furniture_model.plot(furniture_forecast, xlabel = 'Date',
ylabel = 'Sales')
plt.title('Furniture Sales');
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 15/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
Figure 14
plt.figure(figsize=(18, 6))
office_model.plot(office_forecast, xlabel = 'Date', ylabel =
'Sales')
plt.title('Office Supplies Sales');
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 16/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
Figure 15
Compare Forecasts
We already have the forecasts for three years for these two categories
into the future. We will now join them together to compare their future
forecasts.
merge_furniture_forecast = furniture_forecast.copy()
merge_office_forecast = office_forecast.copy()
merge_furniture_forecast.columns = furniture_names
merge_office_forecast.columns = office_names
forecast = pd.merge(merge_furniture_forecast,
merge_office_forecast, how = 'inner', left_on =
'furniture_ds', right_on = 'office_ds')
forecast = forecast.rename(columns={'furniture_ds':
'Date'}).drop('office_ds', axis=1)
forecast.head()
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 17/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
Figure 16
plt.figure(figsize=(10, 7))
plt.plot(forecast['Date'], forecast['furniture_trend'], 'b-
')
plt.plot(forecast['Date'], forecast['office_trend'], 'r-')
plt.legend(); plt.xlabel('Date'); plt.ylabel('Sales')
plt.title('Furniture vs. Office Supplies Sales Trend');
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 18/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
Figure 17
plt.figure(figsize=(10, 7))
plt.plot(forecast['Date'], forecast['furniture_yhat'], 'b-')
plt.plot(forecast['Date'], forecast['office_yhat'], 'r-')
plt.legend(); plt.xlabel('Date'); plt.ylabel('Sales')
plt.title('Furniture vs. Office Supplies Estimate');
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 19/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
Figure 18
furniture_model.plot_components(furniture_forecast);
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 20/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
Figure 19
office_model.plot_components(office_forecast);
Figure 20
Good to see that the sales for both furniture and office supplies have
been linearly increasing over time and will be keep growing, although
office supplies’ growth seems slightly stronger.
The worst month for furniture is April, the worst month for office
supplies is February. The best month for furniture is December, and the
best month for office supplies is October.
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 21/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
There are many time-series analysis we can explore from now on, such
as forecast with uncertainty bounds, change point and anomaly
detection, forecast time-series with external data source. We have only
just started.
References:
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 22/23
27/04/2019 An End-to-End Project on Time Series Analysis and Forecasting with Python
https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b 23/23