Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
47 views

Time Series Using Python

Time series

Uploaded by

graduation
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Time Series Using Python

Time series

Uploaded by

graduation
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Time Series using Python

What is a time series analysis and what are


the benefits?
• A time series analysis focuses on a series of data points ordered in
time. This is one of the most widely used data science analyses and is
applied in a variety of industries.
Time Series using Python
• Dataset contains two years of historical daily sales data for a global
retail widget company. The dataset contains data for the date range
from 2017 to 2019.
• Using the ‘pandas’ package, I took some preparation steps with our
dummy dataset so that it’s slightly cleaner than most real-life datasets.
• Aggregate the daily data into weeks before starting my analysis.
• Index your data with time so that your rows will be indicated by a date
rather than just a standard integer. Since our data is weekly, the values
in the first column will be in YYYY-MM-DD date format and show
the Monday of each week.
Data Import and Aggregation

import pandas as pd
# Import the data
df = pd.read_csv("Blog_Orders.csv")
df['Date'] = pd.to_datetime(df['Date’])
# Set the date as index
df = df.set_index('Date’)
# Select the proper time period for weekly aggregation
df = df['2017-01-02':'2019-12-29'].resample('W').sum()
df.head()
Examine and Prepare Your Dataset for Modeling
Check the Data for Common Time Series Patterns :
• It’s important to check any time series data for patterns that can affect the results, and can
inform which forecasting model to use. Some common time series data patterns are:

Level The average value in the series


Trend Increases, decreases, or stays the same over time
Seasonal or
Periodic Pattern repeats periodically over time
Pattern that increases and decreases but usually
Cyclical related to non-seasonal activity, like business cycles
Random or Increases and decreases that don’t have any
Irregular Variations apparent pattern

• Most time-series data will contain one or more, but probably not all of these patterns. It’s
still a good idea to check for them since they can affect the performance of the model and
may even require different modeling approaches.
• Two great methods for finding these data patterns are visualization and decomposition.
Visualize the Data
• The first step is simply to plot the dataset. In the example, matplotlib package is used. Since it’s
easier to see a general trend using the mean, I use both the original data (blue line) as well as the
monthly average resample data (orange line).
• By changing the 'M’ (or ‘Month’) within y.resample('M'), you can plot the mean for different
aggregate dates. For example, if you have a very long history of data, you might plot the yearly
average by changing ‘M’ to ‘Y’.

import warnings
import matplotlib.pyplot as plt
y = df['Orders’]
fig, ax = plt.subplots(figsize=(20, 6))
ax.plot(y,marker='.', linestyle='-', linewidth=0.5, label='Weekly’)
ax.plot(y.resample('M').mean(),marker='o', markersize=8, linestyle='-', label='Monthly Mean Resample’)
ax.set_ylabel('Orders’)
ax.legend();
Visualize the Data
Decompose the Data
• By looking at the graph of sales data above, we can see a general
increasing trend with no clear pattern of seasonal or cyclical changes.
• The next step is to decompose the data to view more of the complexity
behind the linear visualization.
• A useful Python function called seasonal_decompose within the '
statsmodels' package can help us to decompose the data into four
different components:
• Observed
• Trended
• Seasonal
• Residual
Decompose the Data

import statsmodels.api as sm
# graphs to show seasonal_decompose
def seasonal_decompose (y):
decomposition = sm.tsa.seasonal_decompose(y,
model='additive',extrapolate_trend='freq’)
fig = decomposition.plot(
fig.set_size_inches(14,7)
plt.show()
Decompose the Data
seasonal_decompose(y)

After looking at the four pieces of decomposed graphs, we can tell that our
sales dataset has an overall increasing trend as well as a yearly seasonality.
Depending on the components of your dataset like trend, seasonality, or
cycles, your choice of model will be different.
Check for Stationarity
• Next, we need to check whether the dataset is stationary or not. A dataset
is stationary if its statistical properties like mean, variance, and
autocorrelation do not change over time.
• Most time series datasets related to business activity are not stationary
since there are usually all sorts of non-stationary elements like trends and
economic cycles.
• But, since most time series forecasting models use stationarity—and
mathematical transformations related to it—to make predictions, we need
to ‘stationarize’ the time series as part of the process of fitting a model.
• Two common methods to check for stationarity are Visualization and the
Augmented Dickey-Fuller (ADF) Test. Python makes both approaches
easy:
Visualization - Check for Stationarity
This method graphs the rolling statistics (mean and variance) to show at a glance whether the standard
deviation changes substantially over time:

### plot for Rolling Statistic for testing Stationarity


def test_stationarity(timeseries, title):

#Determing rolling statistics


rolmean = pd.Series(timeseries).rolling(window=12).mean()
rolstd = pd.Series(timeseries).rolling(window=12).std()

fig, ax = plt.subplots(figsize=(16, 4))


ax.plot(timeseries, label= title)
ax.plot(rolmean, label='rolling mean');
ax.plot(rolstd, label='rolling std (x10)');
ax.legend()
pd.options.display.float_format = '{:.8f}'.format
test_stationarity(y,'raw data')
Visualization - Check for Stationarity

Both the mean and standard deviation for stationary data does not change much
over time. But in this case, since the y-axis has such a large scale, we can not
confidently conclude that our data is stationary by simply viewing the above
graph. Therefore, we should do another test of stationarity.
Augmented Dickey-Fuller Test
• The ADF approach is essentially a statistical significance test that compares the
p-value with the critical values and does hypothesis testing.
• Using this test, we can determine whether the processed data is stationary or not
with different levels of confidence.
# Augmented Dickey-Fuller Test
from statsmodels.tsa.stattools import adfuller

def ADF_test(timeseries, dataDesc):


print(' > Is the {} stationary ?'.format(dataDesc))
dftest = adfuller(timeseries.dropna(), autolag='AIC')
print('Test statistic = {:.3f}'.format(dftest[0]))
print('P-value = {:.3f}'.format(dftest[1]))
print('Critical values :')
for k, v in dftest[4].items():
print('\t{}: {} - The data is {} stationary with {}% confidence'.format(k, v,
'not' if v<dftest[0] else '', 100-int(k[:-1])))
Augmented Dickey-Fuller Test
ADF_test(y,'raw data')

Looking at both the visualization and ADF test, we can tell that our
sample sales data is non-stationary.
Make the Data Stationary - Detrending
• To proceed with our time series analysis, we need to stationarize the dataset. There
are many approaches to stationarize data, but we’ll use de-trending, differencing,
and then a combination of the two.

•This detrending method removes the underlying trend in the time series:
# Detrending
y_detrend = (y -
y.rolling(window=12).mean())/y.rolling(window=12).std()

test_stationarity(y_detrend,'de-trended data')
ADF_test(y_detrend,'de-trended data')
Make the Data Stationary - Detrending

The results show that the data is now stationary, indicated by the relative smoothness of
the rolling mean and rolling standard deviation after running the ADF test again
Differencing
This method removes the underlying seasonal or cyclical patterns in the time
series. Since the sample dataset has a 12-month seasonality, a 12-lag
difference is used:
# Differencing
y_12lag = y - y.shift(12)
test_stationarity(y_12lag,'12 lag differenced data')
ADF_test(y_12lag,'12 lag differenced data')

You might also like