0% found this document useful (0 votes)

63 views

Time Series Analysis With Python

This document discusses analyzing time series data from Germany's electricity consumption, wind power, and solar power production from 2006-2017 using pandas. It provides an overview of pandas time series data structures like DatetimeIndex and how to create a time series DataFrame from the data. It also demonstrates various time-based indexing and visualization techniques for exploring patterns in the data like seasonality and trends over time.

Uploaded by

Chit Surela

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

Time Series Analysis With Python

Uploaded by

Chit Surela

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Time Series

Daily time series of Open Power System Data (OPSD) for Germany, which has been rapidly expanding its
renewable energy production in recent years. The data set includes country-wide totals of electricity
consumption, wind power production, and solar power production for 2006-2017. You can download the
data here.

Electricity production and consumption are reported as daily totals in gigawatt-hours (GWh). The columns of
the data file are:

Date — The date (yyyy-mm-dd format)

Consumption — Electricity consumption in GWh

Wind — Wind power production in GWh

Solar — Solar power production in GWh

Wind+Solar — Sum of wind and solar power production in GWh

We will explore how electricity consumption and production in Germany have varied over time, using pandas
time series tools to answer questions such as:

When is electricity consumption typically highest and lowest?

How do wind and solar power production vary with seasons of the year?

What are the long-term trends in electricity consumption, solar power, and wind power?

How do wind and solar power production compare with electricity consumption, and how has this ratio
changed over time?
Time series data structures

In pandas, a single point in time is represented as a Timestamp. We can use the to_datetime() function to
create Timestamps from strings in a wide variety of date/time formats.

import pandas as pd

pd.to_datetime('2018-01-15 3:45pm') #Timestamp('2018-01-15 15:45:00')

pd.to_datetime('7/8/1952') #Timestamp('1952-07-08 00:00:00')

to_datetime() automatically infers a date/time format based on the input. In the example above, the
ambiguous date '7/8/1952' is assumed to be month/day/year and is interpreted as July 8, 1952
We can use the dayfirst parameter to tell pandas to interpret the date as August 7, 1952

pd.to_datetime('7/8/1952, dayfirst=True) #Timestamp('1952-08-07 00:00:00')

If we supply a list or array of strings as input to to_datetime(), it returns a sequence of date/time values in a
DatetimeIndex object, which is the core data structure that powers much of pandas time series functionality.

pd.to_datetime(['2018-01-05', '7/8/1952', 'Oct 10, 1995'])

OUTPUT : DatetimeIndex(['2018-01-05', '1952-07-08', '1995-10-10'], dtype='datetime64[ns]', freq=None)

In the DatetimeIndex above, the data type datetime64[ns] indicates that the underlying data is stored as 64-
bit integers, in units of nanoseconds (ns). This data structure allows pandas to compactly store large
sequences of date/time values and efficiently perform vectorized operations using NumPy datetime64
arrays.If we're dealing with a sequence of strings all in the same date/time format, we can explicitly specify it
with the format parameter.

pd.to_datetime(['2/25/10', '8/6/17', '12/15/12'], format='%m/%d/%y')

OUTPUT : DatetimeIndex(['2010-02-25', '2017-08-06', '2012-12-15'], dtype='datetime64[ns]', freq=None)

Creating a time series DataFrame

To work with time series data in pandas, we use a DatetimeIndex as the index for our DataFrame (or Series).
Let's see how to do this with our OPSD data set. First, we use the read_csv() function to read the data into a
DataFrame, and then display its shape.

opsd_daily = pd.read_csv('opsd_germany_daily.csv')

opsd_daily.shape

Let's check data using head and tail to see how it looks and check types

opsd_daily.dtypes
Now that the Date column is the correct data type, let's set it as the DataFrame's index.

opsd_daily = opsd_daily.set_index('Date')

opsd_daily.index

Alternatively, we can consolidate the above steps into a single line, using the index_col and parse_dates
parameters of the read_csv() function. This is often a useful shortcut.

opsd_daily = pd.read_csv('opsd_germany_daily.csv', index_col=0, parse_dates=True)

Now that our DataFrame's index is a DatetimeIndex, we can use all of pandas' powerful time-based indexing
to wrangle and analyze our data, as we shall see in the following sections.

Another useful aspect of the DatetimeIndex is that the individual date/time components are all available as
attributes such as year, month, day, and so on. Let's add a few more columns to opsd_daily, containing the
year, month, and weekday name.

# Add columns with year, month, and weekday name

opsd_daily['Year'] = opsd_daily.index.year

opsd_daily['Month'] = opsd_daily.index.month

opsd_daily['Weekday Name'] = opsd_daily.index.weekday_name

# Display a random sampling of 5 rows

opsd_daily.sample(5, random_state=0)
Time-based indexing

One of the most powerful and convenient features of pandas time series is time-based indexing — using
dates and times to intuitively organize and access our data. With time-based indexing, we can use date/time
formatted strings to select data in our DataFrame with the loc accessor. The indexing works similar to
standard label-based indexing with loc, but with a few additional features.

For example, we can select data for a single day using a string such as '2017-08-10'.

opsd_daily.loc['2017-08-10']
We can also select a slice of days, such as '2014-01-20':'2014-01-22'. As with regular label-based indexing
with loc, the slice is inclusive of both endpoints.

opsd_daily.loc['2014-01-20':'2014-01-22']

Another very handy feature of pandas time series is partial-string indexing, where we can select all
date/times which partially match a given string. For example, we can select the entire year 2006 with
opsd_daily.loc['2006'], or the entire month of February 2012 with opsd_daily.loc['2012-02'].

opsd_daily.loc['2006']

opsd_daily.loc['2012-02']
Visualizing time series data

With pandas and matplotlib, we can easily visualize our time series data. In this section, we'll cover a few examples
and some useful customizations for our time series plots. First, let's import matplotlib.

import matplotlib.pyplot as plt

import seaborn as sns

# Use seaborn style defaults and set the default figure size 11 inch width and 4 inch height

sns.set(rc={'figure.figsize':(11, 4)})

Let's create a line plot of the full time series of Germany's daily electricity consumption, using the DataFrame's plot()
method.

opsd_daily['Consumption'].plot(linewidth=0.5);
We can see that the plot() method has chosen pretty good tick locations (every two years) and labels (the
years) for the x-axis, which is helpful. However, with so many data points, the line plot is crowded and hard to
read. Let's plot the data as dots instead, and also look at the Solar and Wind time series.

cols_plot = ['Consumption', 'Solar', 'Wind']

axes = opsd_daily[cols_plot].plot(marker='.', alpha=0.5, linestyle='None', figsize=(11, 9), subplots=True)

for ax in axes:

ax.set_ylabel('Daily Totals (GWh)')

Seasonality can also occur on other time scales. The plot above suggests there may be some weekly
seasonality in Germany's electricity consumption, corresponding with weekdays and weekends. Let's plot the
time series in a single year to investigate further.

ax = opsd_daily.loc['2017', 'Consumption'].plot()

ax.set_ylabel('Daily Consumption (GWh)');

Now we can clearly see the weekly oscillations. Another interesting feature that becomes apparent at this
level of granularity is the drastic decrease in electricity consumption in early January and late December,
during the holidays.

Let's zoom in further and look at just January and February.

ax = opsd_daily.loc['2017-01':'2017-02', 'Consumption'].plot(marker='o', linestyle='-')

ax.set_ylabel('Daily Consumption (GWh)');

Customizing time series plots

To better visualize the weekly seasonality in electricity consumption in the plot above, it would be nice to
have vertical gridlines on a weekly time scale (instead of on the first day of each month). We can customize
our plot with matplotlib.dates, so let's import that module.

import matplotlib.dates as mdates

Because date/time ticks are handled a bit differently in matplotlib.dates compared with the DataFrame's
plot() method, let's create the plot directly in matplotlib. Then we use mdates.WeekdayLocator() and
mdates.MONDAY to set the x-axis ticks to the first Monday of each week. We also use
mdates.DateFormatter() to improve the formatting of the tick labels, using the format codes we saw earlier.
fig, ax = plt.subplots()

ax.plot(opsd_daily.loc['2017-01':'2017-02', 'Consumption'], marker='o', linestyle='-')

ax.set_ylabel('Daily Consumption (GWh)')

ax.set_title('Jan-Feb 2017 Electricity Consumption')

# Set x-axis major ticks to weekly interval, on Mondays

ax.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=mdates.MONDAY))

# Format x-tick labels as 3-letter month name and day number

ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'));

Now we have vertical gridlines and nicely formatted tick labels on each Monday, so we can easily tell which
days are weekdays and weekends.
Date Formatter :

We can use with the DateFormatter class to customize the formatting of the tick labels on the x-axis. Here are
a few examples:

%b displays the abbreviated month name (e.g., Jan, Feb)

%Y displays the full year (e.g., 2021)

%m displays the zero-padded month number (e.g., 01, 02, ..., 12)

%d displays the zero-padded day number (e.g., 01, 02, ..., 31)

%H displays the hour as a zero-padded decimal number (e.g., 00, 01, ..., 23)

%M displays the minute as a zero-padded decimal number (e.g., 00, 01, ..., 59)

%S displays the second as a zero-padded decimal number (e.g., 00, 01, ..., 59)
Seasonality

Next, let's further explore the seasonality of our data with box plots, using seaborn's boxplot() function to
group the data by different time periods and display the distributions for each group. We'll first group the
data by month, to visualize yearly seasonality.

fig, axes = plt.subplots(3, 1, figsize=(11, 10))

for name, ax in zip(['Consumption', 'Solar', 'Wind'], axes):
sns.boxplot(data=opsd_daily, x='Month', y=name, ax=ax)
ax.set_ylabel('GWh')
ax.set_title(name)
# Remove the automatic x-axis label from all but the bottom subplot
if ax != axes[-1]:
ax.set_xlabel('')
Next, let's group the electricity consumption time series by day of the week, to explore weekly seasonality.

sns.boxplot(data=opsd_daily, x='Weekday Name', y='Consumption')

As expected, electricity consumption is significantly higher on weekdays than on weekends. The low outliers
on weekdays are presumably during holidays.

This section has provided a brief introduction to time series seasonality. As we will see later, applying a rolling
window to the data can also help to visualize seasonality on different time scales.
Frequencies

When the data points of a time series are uniformly spaced in time (e.g., hourly, daily, monthly, etc.), the time
series can be associated with a frequency in pandas. For example, let's use the date_range() function to create
a sequence of uniformly spaced dates from 1998-03-10 through 1998-03-15 at daily frequency.

pd.date_range('1998-03-10', '1998-03-15', freq='D')

OUTPUT :

DatetimeIndex(['1998-03-10', '1998-03-11', '1998-03-12', '1998-03-13',

'1998-03-14', '1998-03-15'],

dtype='datetime64[ns]', freq='D')
The resulting DatetimeIndex has an attribute freq with a value of 'D', indicating daily frequency. Available
frequencies in pandas include hourly ('H'), calendar daily ('D'), business daily ('B'), weekly ('W'), monthly
('M'), quarterly ('Q'), annual ('A'), and many others. Frequencies can also be specified as multiples of any of
the base frequencies, for example '5D' for every five days.

pd.date_range('2004-09-20', periods=8, freq='H')

opsd_daily.index

We can see that it has no frequency (freq=None). This makes sense, since the index was created from a
sequence of dates in our CSV file, without explicitly specifying any frequency for the time series.

If we know that our data should be at a specific frequency, we can use the DataFrame's asfreq() method to
assign a frequency. If any date/times are missing in the data, new rows will be added for those date/times,
which are either empty (NaN), or filled according to a specified data filling method such as forward filling or
interpolation.

To see how this works, let's create a new DataFrame which contains only the Consumption data for Feb 3, 6,
and 8, 2013.
times_sample = pd.to_datetime(['2013-02-03', '2013-02-06', '2013-02-08'])

# Select the specified dates and just the Consumption column

consum_sample = opsd_daily.loc[times_sample, ['Consumption']].copy()

consum_sample

Now we use the asfreq() method to convert the DataFrame to daily frequency, with a column for unfilled
data, and a column for forward filled data.

# Convert the data to daily frequency, without filling any missings

consum_freq = consum_sample.asfreq('D')

# Create a column with missings forward filled

consum_freq['Consumption - Forward Fill'] = consum_sample.asfreq('D', method='ffill')

consum_freq

Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
From Everand
Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
Bob Mather
3/5 (1)
The Aircraft Environmental Flight Envelope
No ratings yet
The Aircraft Environmental Flight Envelope
11 pages
Acolite Manua
No ratings yet
Acolite Manua
35 pages
Pandas 6 1716219621
No ratings yet
Pandas 6 1716219621
17 pages
Date Functions in Tableau 16
No ratings yet
Date Functions in Tableau 16
8 pages
DAX Basic
No ratings yet
DAX Basic
34 pages
Date Functions
No ratings yet
Date Functions
22 pages
Using SAS® Dates and Times - A Tutorial
No ratings yet
Using SAS® Dates and Times - A Tutorial
10 pages
Tutorial - Time Series Analysis With Pandas - Dataquest
No ratings yet
Tutorial - Time Series Analysis With Pandas - Dataquest
32 pages
Time Series Project
No ratings yet
Time Series Project
19 pages
ECOM 209 R For Finance
No ratings yet
ECOM 209 R For Finance
13 pages
Regression Linaire Python Tome I
No ratings yet
Regression Linaire Python Tome I
9 pages
Forecasting Time Series With R - Dataiku
No ratings yet
Forecasting Time Series With R - Dataiku
16 pages
GROUP BY and ORDER BY
No ratings yet
GROUP BY and ORDER BY
24 pages
Daxquest
No ratings yet
Daxquest
8 pages
Theil-Sen No R
No ratings yet
Theil-Sen No R
5 pages
R - Excel Magic
No ratings yet
R - Excel Magic
34 pages
21 - Practice Note On Time Series USING R
No ratings yet
21 - Practice Note On Time Series USING R
17 pages
Important Topics Summary
No ratings yet
Important Topics Summary
39 pages
Lesson 8 - More Complex Queries
No ratings yet
Lesson 8 - More Complex Queries
9 pages
4397-2020
No ratings yet
4397-2020
25 pages
Date Sas
No ratings yet
Date Sas
22 pages
unit5i
No ratings yet
unit5i
34 pages
Date String Manipulations With Python
No ratings yet
Date String Manipulations With Python
6 pages
Lab 6: Views and SQL Functions
No ratings yet
Lab 6: Views and SQL Functions
24 pages
Date Conversion in R Can Be A Real Pain
No ratings yet
Date Conversion in R Can Be A Real Pain
7 pages
Date and Time Functions
No ratings yet
Date and Time Functions
16 pages
Server-Analysis-Services/: Ssas Glossary
No ratings yet
Server-Analysis-Services/: Ssas Glossary
13 pages
Predicting Gold Prices: Working With The Time Series Data
No ratings yet
Predicting Gold Prices: Working With The Time Series Data
15 pages
Idbslab 09
No ratings yet
Idbslab 09
12 pages
Gas Price Analyzer
No ratings yet
Gas Price Analyzer
3 pages
Date Format 91f2eba
No ratings yet
Date Format 91f2eba
4 pages
Microsoft SQL Server and Sybase Adaptive Server Oracle Description
No ratings yet
Microsoft SQL Server and Sybase Adaptive Server Oracle Description
8 pages
Date and Strdate Functions
No ratings yet
Date and Strdate Functions
8 pages
Date Handling by Csaba Kantor - 21022021
No ratings yet
Date Handling by Csaba Kantor - 21022021
21 pages
LSTM Stock Prediction
100% (1)
LSTM Stock Prediction
38 pages
Dates: Intermediate Java Programming
No ratings yet
Dates: Intermediate Java Programming
6 pages
Sas Functions
No ratings yet
Sas Functions
9 pages
Part 1: The Star Schema Data Model: Healthcare Data Models UC Davis Continuing and Professional Education
No ratings yet
Part 1: The Star Schema Data Model: Healthcare Data Models UC Davis Continuing and Professional Education
5 pages
D Date Time Values From Other Software
No ratings yet
D Date Time Values From Other Software
6 pages
Dav Ex 4 - 099
No ratings yet
Dav Ex 4 - 099
4 pages
How to Read, Write, and Manipulate SAS® Dates
No ratings yet
How to Read, Write, and Manipulate SAS® Dates
10 pages
C3 WK03 DY03 PracticeExercise+ +solutions
No ratings yet
C3 WK03 DY03 PracticeExercise+ +solutions
5 pages
Assignment 4
No ratings yet
Assignment 4
7 pages
Python For Data Analytics
No ratings yet
Python For Data Analytics
3 pages
Time Series Using Python
No ratings yet
Time Series Using Python
18 pages
Pharmasug-China-2019-CC57
No ratings yet
Pharmasug-China-2019-CC57
11 pages
SAS Date - Time Functions
No ratings yet
SAS Date - Time Functions
7 pages
POWER BI Date
No ratings yet
POWER BI Date
8 pages
Time_series_analysis__1718649022
No ratings yet
Time_series_analysis__1718649022
5 pages
A Tour of The Oil Industry - Kaggle
No ratings yet
A Tour of The Oil Industry - Kaggle
19 pages
Advt Da
No ratings yet
Advt Da
13 pages
Chapter 6 PLSQL
No ratings yet
Chapter 6 PLSQL
8 pages
Working With Dates and Times
No ratings yet
Working With Dates and Times
9 pages
Fun With SQL
No ratings yet
Fun With SQL
6 pages
Excel Tricks Tor
No ratings yet
Excel Tricks Tor
17 pages
EDA Module 2
No ratings yet
EDA Module 2
34 pages
Time Series Forecasting Predicting Monthly Beer Production
No ratings yet
Time Series Forecasting Predicting Monthly Beer Production
19 pages
Generating Missing Data
No ratings yet
Generating Missing Data
18 pages
AM02 Questions
No ratings yet
AM02 Questions
3 pages
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
Introduction to PHP, Part 3, Second Edition
From Everand
Introduction to PHP, Part 3, Second Edition
Adam Majczak
No ratings yet
TR 242
No ratings yet
TR 242
64 pages
Se USB2200a Manual
No ratings yet
Se USB2200a Manual
11 pages
Itu-R Preface
No ratings yet
Itu-R Preface
320 pages
Comparative Study 0.440Kv and 11Kv Power Supply
No ratings yet
Comparative Study 0.440Kv and 11Kv Power Supply
2 pages
Cse - Ai
No ratings yet
Cse - Ai
13 pages
Assignment 5 - Plagiarism
No ratings yet
Assignment 5 - Plagiarism
2 pages
Client Server Project
No ratings yet
Client Server Project
24 pages
Universal Willem EPROM Programmer Quick Start Guide V1.2: 1. Installation
No ratings yet
Universal Willem EPROM Programmer Quick Start Guide V1.2: 1. Installation
8 pages
Deployment Notes
No ratings yet
Deployment Notes
4 pages
DPD Customer List
No ratings yet
DPD Customer List
11 pages
Ridge Rider manual
No ratings yet
Ridge Rider manual
30 pages
How To Create PDF Reports With Python - The Essential Guide - Python-Bloggers
No ratings yet
How To Create PDF Reports With Python - The Essential Guide - Python-Bloggers
8 pages
ELE4804 Project Part II Name University Date
No ratings yet
ELE4804 Project Part II Name University Date
6 pages
Teltonika Networks Catalogue
No ratings yet
Teltonika Networks Catalogue
40 pages
Eaglemaster CL 5000A - Operation Instruction
No ratings yet
Eaglemaster CL 5000A - Operation Instruction
4 pages
ch6 WirelessMobileNetworks 4G5G
100% (1)
ch6 WirelessMobileNetworks 4G5G
46 pages
Lab Assignment 1
No ratings yet
Lab Assignment 1
2 pages
Manual Wien2k
No ratings yet
Manual Wien2k
287 pages
Seal Types
No ratings yet
Seal Types
3 pages
Project Report Smart Glasses
No ratings yet
Project Report Smart Glasses
31 pages
Geeetech A10M 3D Printer Description
No ratings yet
Geeetech A10M 3D Printer Description
3 pages
UCA - Ultrasonic Cement Analyzer Model - 4265
No ratings yet
UCA - Ultrasonic Cement Analyzer Model - 4265
2 pages
Sophos Stopping Active Adversaries WP
No ratings yet
Sophos Stopping Active Adversaries WP
13 pages
Risc V Exos
No ratings yet
Risc V Exos
3 pages
Introduction To CAD: 2D To 3D Modeling
No ratings yet
Introduction To CAD: 2D To 3D Modeling
32 pages
C++ Classes PDF
No ratings yet
C++ Classes PDF
99 pages
DPS 6101D
No ratings yet
DPS 6101D
29 pages