Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (3 votes)
94 views

Finance Fundamentals in Python

Learn the finance and Python fundamentals you need to make data-driven financial decisions. There’s no prior coding experience needed. In this track, you’ll learn about data types, lists, arrays, and the time value of money, before discovering how to work with time series data to evaluate index performance. Throughout the track, you’ll work with popular Python packages, including pandas, NumPy, statsmodels, and pyfolio, as you learn to import and manage financial data from different sources, including Excel files and from the web. Hands-on exercises will reinforce your new skills, as you work with real-world data, including NASDAQ stock data, AMEX, investment portfolios, and data from the S&P 100. By the end of the track, you'll be ready to navigate the world of finance using Python—having learned how to work with investment portfolios, calculate measures of risk, and calculate an optimal portfolio based on risk and return. https://ebooks-tech.sellfy.store/p/finance-fundamentals-in-python/

Uploaded by

jcmayac
Copyright
© © All Rights Reserved
100% found this document useful (3 votes)
94 views

Finance Fundamentals in Python

Learn the finance and Python fundamentals you need to make data-driven financial decisions. There’s no prior coding experience needed. In this track, you’ll learn about data types, lists, arrays, and the time value of money, before discovering how to work with time series data to evaluate index performance. Throughout the track, you’ll work with popular Python packages, including pandas, NumPy, statsmodels, and pyfolio, as you learn to import and manage financial data from different sources, including Excel files and from the web. Hands-on exercises will reinforce your new skills, as you work with real-world data, including NASDAQ stock data, AMEX, investment portfolios, and data from the S&P 100. By the end of the track, you'll be ready to navigate the world of finance using Python—having learned how to work with investment portfolios, calculate measures of risk, and calculate an optimal portfolio based on risk and return. https://ebooks-tech.sellfy.store/p/finance-fundamentals-in-python/

Uploaded by

jcmayac
Copyright
© © All Rights Reserved
You are on page 1/ 877

Introduction to

Python for Finance


INTRODUCTION TO PYTHON FOR FINANCE

Adina Howe
Instructor
Why Python for Finance?
Easy to Learn and Flexible
General purpose

Dynamic

High-level language

Integrates with other languages

Open source
Accessible to anyone

INTRODUCTION TO PYTHON FOR FINANCE


Python Shell
In [1]:

Calculations in IPython

In [1]: 1 + 1

INTRODUCTION TO PYTHON FOR FINANCE


INTRODUCTION TO PYTHON FOR FINANCE
Common mathematical operators
Operator Meaning
+ Add
- Subtract
* Multiply
/ Divide
% Modulus (remainder of division)
** Exponent

INTRODUCTION TO PYTHON FOR FINANCE


Common mathematical operators
In [1]: 8 + 4

Out [1]: 12

In [2]: 8 / 4

Out [2]: 2

INTRODUCTION TO PYTHON FOR FINANCE


Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Comments and
variables
INTRODUCTION TO PYTHON FOR FINANCE

Name Surname
Instructor
Any comments?
# Example, do not modify!
print(8 / 2 )
print(2**2)

# Put code below here


print(1.0 + 0.10)

INTRODUCTION TO PYTHON FOR FINANCE


Outputs in IPython vs. script.py
IPython Shell script.py

In [1]: 1 + 1 1 + 1

Out[1]: 2 # No output

In [1]: print(1 + 1) print(1 + 1)

2 <script.py> output:
2

INTRODUCTION TO PYTHON FOR FINANCE


Variables
Variable names

Names can be upper or lower case le ers, digits, and underscores

Variables cannot start with a digit

Some variable names are reserved in Python (e.g., class or type) and should be avoided

INTRODUCTION TO PYTHON FOR FINANCE


Variable example
# Correct
day_2 = 5

# Incorrect, variable name starts with a digit


2_day = 5

INTRODUCTION TO PYTHON FOR FINANCE


Using variables to evaluate stock trends
Market price
Price to earning ratio =
Earnings per share

price = 200
earnings = 5
pe_ratio = price / earnings
print(pe_ratio)

40

INTRODUCTION TO PYTHON FOR FINANCE


Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Variable Data Types
INTRODUCTION TO PYTHON FOR FINANCE

Adina Howe
Instructor
Python Data Types
Variable Types Example
Strings 'hello world'
Integers 40
Floats 3.1417
Booleans True or False

INTRODUCTION TO PYTHON FOR FINANCE


Variable Types
Variable Types Example Abbreviations

Strings 'Tuesday' str

Integers 40 int

Floats 3.1417 float

Booleans True or False bool

INTRODUCTION TO PYTHON FOR FINANCE


What data type is a variable: type()
To identify the type, we can use the function type() :

type(variable_name)

pe_ratio = 40
print(type(pe_ratio))

<class 'int'>

INTRODUCTION TO PYTHON FOR FINANCE


Booleans
operators descriptions

== equal

!= does not equal

> greater than

< less than

INTRODUCTION TO PYTHON FOR FINANCE


Boolean Example
print(1 == 1)

True

print(type(1 == 1))

<class 'bool'>

INTRODUCTION TO PYTHON FOR FINANCE


Variable manipulations
x = 5 y = 'stock'
print(x * 3) print(y * 3)

15 'stockstockstock'

print(x + 3) print(y + 3)

8 TypeError: must be str, not int

INTRODUCTION TO PYTHON FOR FINANCE


Changing variable types
pi = 3.14159
print(type(pi))

<class 'float'>

pi_string = str(pi)
print(type(pi_string))

<class 'str'>

print('I love to eat ' + pi_string + '!')

I love to eat 3.14159!

INTRODUCTION TO PYTHON FOR FINANCE


Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Lists in Python
INTRODUCTION TO PYTHON FOR FINANCE

Adina Howe
Instructor
Lists - square brackets [ ]
months = ['January', 'February', 'March', 'April', 'May', 'June']

INTRODUCTION TO PYTHON FOR FINANCE


Python is zero-indexed

INTRODUCTION TO PYTHON FOR FINANCE


Subset lists
months = ['January', 'February', 'March', 'April', 'May', 'June']

months[0]

'January'

months[2]

'March'

INTRODUCTION TO PYTHON FOR FINANCE


Negative indexing of lists
months = ['January', 'February', 'March', 'April', 'May', 'June']

months[-1]

'June'

months[-2]

'May'

INTRODUCTION TO PYTHON FOR FINANCE


Subsetting multiple list elements with slicing
Slicing syntax

# Includes the start and up to (but not including) the end


mylist[startAt:endBefore]

Example

months = ['January', 'February', 'March', 'April', 'May', 'June']

months[2:5]

['March', 'April', 'May']

months[-4:-1]

['March', 'April', 'May']

INTRODUCTION TO PYTHON FOR FINANCE


Extended slicing with lists
months = ['January', 'February', 'March', 'April', 'May', 'June']

months[3:]

['April', 'May', 'June']

months[:3]

['January', 'February', 'March']

INTRODUCTION TO PYTHON FOR FINANCE


Slicing with Steps
# Includes the start and up to (but not including) the end
mylist[startAt:endBefore:step]

months = ['January', 'February', 'March', 'April', 'May', 'June']

months[0:6:2]

['January', 'March', 'May']

months[0:6:3]

['January', 'April']

INTRODUCTION TO PYTHON FOR FINANCE


Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Lists in Lists
INTRODUCTION TO PYTHON FOR FINANCE

Adina Howe
Instructor
Lists in Lists
Lists can contain various data types, including lists themselves.

Example: a nested list describing the month and its associated consumer price index

cpi = [['Jan', 'Feb', 'Mar'], [238.11, 237.81, 238.91]]

INTRODUCTION TO PYTHON FOR FINANCE


Subsetting Nested Lists
months = ['Jan', 'Feb', 'Mar']
print(months[1])

'Feb'

cpi = [['Jan', 'Feb', 'Mar'], [238.11, 237.81, 238.91]]


print(cpi[1])

[238.11, 237.81, 238.91]

INTRODUCTION TO PYTHON FOR FINANCE


More on Subsetting Nested Lists
How would one subset out a speci c price index?

cpi = [['Jan', 'Feb', 'Mar'], [238.11, 237.81, 238.91]]


print(cpi[1])

[238.11, 237.81, 238.91]

print(cpi[1][0])

238.11

INTRODUCTION TO PYTHON FOR FINANCE


Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Methods and
functions
INTRODUCTION TO PYTHON FOR FINANCE

Adina Howe
Instructor
Methods vs. Functions
Methods Functions
All methods are functions Not all functions are methods

List methods are a subset of built-in


functions in Python

Used on an object Requires an input of an object


prices.sort() type(prices)

INTRODUCTION TO PYTHON FOR FINANCE


List Methods - sort
Lists have several built-in methods that can help retrieve and manipulate data

Methods can be accessed as list.method()

list.sort() sorts list elements in ascending order

prices = [238.11, 237.81, 238.91]


prices.sort()
print(prices)

[237.81, 238.11, 238.91]

INTRODUCTION TO PYTHON FOR FINANCE


Adding to a list with append and extend
list.append() adds a single element to a list

months = ['January', 'February', 'March']


months.append('April')
print(months)

['January', 'February', 'March', 'April']

list.extend() adds each element to a list

months.extend(['May', 'June', 'July'])


print(months)

['January', 'February', 'March', 'April', 'May', 'June', 'July']

INTRODUCTION TO PYTHON FOR FINANCE


Useful list methods - index
list.index(x) returns the lowest index where the element x appears

months = ['January', 'February', 'March']


prices = [238.11, 237.81, 238.91]

months.index('February')

print(prices[1])

237.81

INTRODUCTION TO PYTHON FOR FINANCE


More functions ...
min(list) : returns the smallest element

max(list) : returns the largest element

INTRODUCTION TO PYTHON FOR FINANCE


Find the month with smallest CPI
months = ['January', 'February', 'March']
prices = [238.11, 237.81, 238.91]

# Identify min price


min_price = min(prices)
# Identify min price index
min_index = prices.index(min_price)
# Identify the month with min price
min_month = months[min_index]
print(min_month)

February

INTRODUCTION TO PYTHON FOR FINANCE


Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Arrays
INTRODUCTION TO PYTHON FOR FINANCE

Adina Howe
Instructor
Installing packages
pip3 install package_name_here

pip3 install numpy

INTRODUCTION TO PYTHON FOR FINANCE


Importing packages
import numpy

INTRODUCTION TO PYTHON FOR FINANCE


NumPy and Arrays
import numpy
my_array = numpy.array([0, 1, 2, 3, 4])
print(my_array)

[0, 1, 2, 3, 4]

print(type(my_array))

<class 'numpy.ndarray'>

INTRODUCTION TO PYTHON FOR FINANCE


Using an alias
import package_name
package_name.function_name(...)

import numpy as np
my_array = np.array([0, 1, 2, 3, 4])
print(my_array)

[0, 1, 2, 3, 4]

INTRODUCTION TO PYTHON FOR FINANCE


Why use an array for financial analysis?
Arrays can handle very large datasets e ciently
Computationally-memory e cient

Faster calculations and analysis than lists

Diverse functionality (many functions in Python packages)

INTRODUCTION TO PYTHON FOR FINANCE


What's the difference?
NumPy arrays Lists

my_array = np.array([3, 'is', True]) my_list = [3, 'is', True]


print(my_array) print(my_list)

['3' 'is' 'True'] [3, 'is', True]

INTRODUCTION TO PYTHON FOR FINANCE


Array operations
Arrays Lists

import numpy as np list_A = [1, 2, 3]


list_B = [4, 5, 6]
array_A = np.array([1, 2, 3])
array_B = np.array([4, 5, 6]) print(list_A + list_B)

print(array_A + array_B) [1, 2, 3, 4, 5, 6]

[5 7 9]

INTRODUCTION TO PYTHON FOR FINANCE


Array indexing
import numpy as np

months_array = np.array(['Jan', 'Feb', 'March', 'Apr', 'May'])


print(months_array[3])

Apr

print(months_array[2:5])

['March' 'Apr' 'May']

INTRODUCTION TO PYTHON FOR FINANCE


Array slicing with steps
import numpy as np

months_array = np.array(['Jan', 'Feb', 'March', 'Apr', 'May'])

print(months_array[0:5:2])

['Jan' 'March' 'May']

INTRODUCTION TO PYTHON FOR FINANCE


Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Two Dimensional
Arrays
INTRODUCTION TO PYTHON FOR FINANCE

Adina Howe
Instructor
Two-dimensional arrays
import numpy as np

months = [1, 2, 3]
prices = [238.11, 237.81, 238.91]

cpi_array = np.array([months, prices])

print(cpi_array)

[[ 1. 2. 3. ]
[ 238.11 237.81 238.91]]

INTRODUCTION TO PYTHON FOR FINANCE


Array Methods
print(cpi_array)

[[ 1. 2. 3. ]
[ 238.11 237.81 238.91]]

.shape gives you dimensions of the array

print(cpi_array.shape)

(2, 3)

.size gives you total number of elements in the array

print(cpi_array.size)

INTRODUCTION TO PYTHON FOR FINANCE


Array Functions
import numpy as np

prices = [238.11, 237.81, 238.91]


prices_array = np.array(prices)

np.mean() calculates the mean of an input

print(np.mean(prices_array))

238.27666666666667

np.std() calculates the standard deviation of an input

print(np.std(prices_array))

0.46427960923946671

INTRODUCTION TO PYTHON FOR FINANCE


The `arange()` function
numpy.arange() creates an array with start, end, step

import numpy as np

months = np.arange(1, 13)


print(months)

[ 1 2 3 4 5 6 7 8 9 10 11 12]

months_odd = np.arange(1, 13, 2)


print(months_odd)

[ 1 3 5 7 9 11]

INTRODUCTION TO PYTHON FOR FINANCE


The `transpose()` function
numpy.transpose() switches rows and columns of a numpy array

print(cpi_array)

[[ 1. 2. 3. ]
[ 238.11 237.81 238.91]]

cpi_transposed = np.transpose(cpi_array)

print(cpi_transposed)

[[ 1. 238.11]
[ 2. 237.81]
[ 3. 238.91]]

INTRODUCTION TO PYTHON FOR FINANCE


Array Indexing for 2D arrays
print(cpi_array)

[[ 1. 2. 3. ]
[ 238.11 237.81 238.91]]

# row index 1, column index 2


cpi_array[1, 2]

238.91

# all row slice, third column


print(cpi_array[:, 2])

[ 3. 238.91]

INTRODUCTION TO PYTHON FOR FINANCE


Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Using Arrays for
Analyses
INTRODUCTION TO PYTHON FOR FINANCE

Adina Howe
Instructor
Indexing Arrays
import numpy as np

months_array = np.array(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'])


indexing_array = np.array([1, 3, 5])

months_subset = months_array[indexing_array]
print(months_subset)

['Feb' 'Apr' 'Jun']

INTRODUCTION TO PYTHON FOR FINANCE


More on indexing arrays
import numpy as np

months_array = np.array(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'])

negative_index = np.array([-1, -2])

print(months_array[negative_index])

['Jun' 'May']

INTRODUCTION TO PYTHON FOR FINANCE


Boolean arrays
import numpy as np

months_array = np.array(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'])

boolean_array = np.array([True, True, True, False, False, False])


print(months_array[boolean_array])

['Jan' 'Feb' 'Mar']

INTRODUCTION TO PYTHON FOR FINANCE


More on Boolean arrays
prices_array = np.array([238.11, 237.81, 238.91])
# Create a Boolean array
boolean_array = (prices_array > 238)

print(boolean_array)

[ True False True]

print(prices_array[boolean_array])

[ 238.11 238.91]

INTRODUCTION TO PYTHON FOR FINANCE


Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Visualization in
Python
INTRODUCTION TO PYTHON FOR FINANCE

Adina Howe
Instructor
Matplotlib: A visualization package
See more of the Matplotlib gallery by clicking this link.

INTRODUCTION TO PYTHON FOR FINANCE


matplotlib.pyplot - diverse plotting functions
import matplotlib.pyplot as plt

INTRODUCTION TO PYTHON FOR FINANCE


matplotlib.pyplot - diverse plotting functions
plt.plot()
takes arguments that describe the data to be plo ed

plt.show()
displays plot to screen

INTRODUCTION TO PYTHON FOR FINANCE


Plotting with pyplot
import matplotlib.pyplot as plt
plt.plot(months, prices)
plt.show()

INTRODUCTION TO PYTHON FOR FINANCE


Plot result

INTRODUCTION TO PYTHON FOR FINANCE


Red solid line
import matplotlib.pyplot as plt
plt.plot(months, prices, color = 'red')
plt.show()

INTRODUCTION TO PYTHON FOR FINANCE


Plot result

INTRODUCTION TO PYTHON FOR FINANCE


Dashed line
import matplotlib.pyplot as plt
plt.plot(months, prices, color = 'red', linestyle = '--')
plt.show()

INTRODUCTION TO PYTHON FOR FINANCE


Plot result

INTRODUCTION TO PYTHON FOR FINANCE


Colors and linestyles
color linestyle
'green' green '-' solid line
'red' red '--' dashed line
'cyan' cyan '-.' dashed dot line
'blue' blue ':' do ed

More documentation on colors and lines can


be found here.

INTRODUCTION TO PYTHON FOR FINANCE


Adding Labels and Titles
import matplotlib.pyplot as plt
plt.plot(months, prices, color = 'red', linestyle = '--')

# Add labels
plt.xlabel('Months')
plt.ylabel('Consumer Price Indexes, $')
plt.title('Average Monthly Consumer Price Indexes')

# Show plot
plt.show()

INTRODUCTION TO PYTHON FOR FINANCE


Plot result

INTRODUCTION TO PYTHON FOR FINANCE


Adding additional lines
import matplotlib.pyplot as plt
plt.plot(months, prices, color = 'red', linestyle = '--')

# adding an additional line


plt.plot(months, prices_new, color = 'green', linestyle = '--')

plt.xlabel('Months')
plt.ylabel('Consumer Price Indexes, $')
plt.title('Average Monthly Consumer Price Indexes')
plt.show()

INTRODUCTION TO PYTHON FOR FINANCE


Plot result

INTRODUCTION TO PYTHON FOR FINANCE


Scatterplots
import matplotlib.pyplot as plt
plt.scatter(x = months, y = prices, color = 'red')
plt.show()

INTRODUCTION TO PYTHON FOR FINANCE


Scatterplot result

INTRODUCTION TO PYTHON FOR FINANCE


Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Histograms
INTRODUCTION TO PYTHON FOR FINANCE

Adina Howe
Instructor
Why histograms for financial analysis?

INTRODUCTION TO PYTHON FOR FINANCE


Histograms and Data
Is your data skewed?

Is your data centered around the average?

Do you have any abnormal data points (outliers) in your data?

INTRODUCTION TO PYTHON FOR FINANCE


Histograms and matplotlib.pyplot
import matplotlib.pyplot as plt
plt.hist(x=prices, bins=3)
plt.show()

INTRODUCTION TO PYTHON FOR FINANCE


Changing the number of bins
import matplotlib.pyplot as plt
plt.hist(prices, bins=6)
plt.show()

INTRODUCTION TO PYTHON FOR FINANCE


Normalizing histogram data
import matplotlib.pyplot as plt
plt.hist(prices, bins=6, normed=1)
plt.show()

INTRODUCTION TO PYTHON FOR FINANCE


Layering histograms on a plot
import matplotlib.pyplot as plt
plt.hist(x=prices, bins=6, normed=1)
plt.hist(x=prices_new, bins=6, normed=1)
plt.show()

INTRODUCTION TO PYTHON FOR FINANCE


Histogram result

INTRODUCTION TO PYTHON FOR FINANCE


Alpha: Changing transparency of histograms
import matplotlib.pyplot as plt
plt.hist(x=prices, bins=6, normed=1, alpha=0.5)
plt.hist(x=prices_new, bins=6, normed=1, alpha=0.5)
plt.show()

INTRODUCTION TO PYTHON FOR FINANCE


Histogram result

INTRODUCTION TO PYTHON FOR FINANCE


Adding a legend
import matplotlib.pyplot as plt
plt.hist(x=prices, bins=6, normed=1, alpha=0.5, label="Prices 1")
plt.hist(x=prices_new, bins=6, normed=1, alpha=0.5, label="Prices New")
plt.legend()
plt.show()

INTRODUCTION TO PYTHON FOR FINANCE


Histogram result

INTRODUCTION TO PYTHON FOR FINANCE


Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Introducing the
dataset
INTRODUCTION TO PYTHON FOR FINANCE

Adina Howe
Instructor
Overall Review
Python shell and scripts

Variables and data types

Lists

Arrays

Methods and functions

Indexing and subse ing

Matplotlib

INTRODUCTION TO PYTHON FOR FINANCE


S&P 100 Companies
Standard and Poor's S&P 100:

made up of major companies that span multiple industry groups

used to measure stock performance of large companies

INTRODUCTION TO PYTHON FOR FINANCE


S&P 100 Case Study
Sectors of Companies within the S&P 100 in 2017

INTRODUCTION TO PYTHON FOR FINANCE


The data

INTRODUCTION TO PYTHON FOR FINANCE


Price to Earnings Ratio
Market price
Price to earning ratio =
Earnings per share
The ratio for valuing a company that measures its current share price relative to its per-
share earnings

In general, higher P/E ratio indicates higher growth expectations

INTRODUCTION TO PYTHON FOR FINANCE


Your mission
GIVEN
Lists of data describing the S&P 100: names, prices, earnings, sectors

OBJECTIVE PART I
Explore and analyze the S&P 100 data, speci cally the P/E ratios of S&P 100 companies

INTRODUCTION TO PYTHON FOR FINANCE


Step 1: examine the lists
In [1]: my_list = [1, 2, 3, 4, 5]

# first element
In [2]: print(my_list[0])

# last element
In [3]: print(my_list[-1])

# range of elements
In [4]: print(my_list[0:3])

[1, 2, 3]

INTRODUCTION TO PYTHON FOR FINANCE


Step 2: Convert lists to arrays
# Convert lists to arrays
import numpy as np
my_array = np.array(my_list)

INTRODUCTION TO PYTHON FOR FINANCE


Step 3: Elementwise array operations
# Elementwise array operations
array_ratio = array1 / array2

INTRODUCTION TO PYTHON FOR FINANCE


Let's analyze!
INTRODUCTION TO PYTHON FOR FINANCE
A closer look at the
sectors
INTRODUCTION TO PYTHON FOR FINANCE

Adina Howe
Instructor
Your mission
GIVEN
NumPy arrays of data describing the S&P 100: names, prices, earnings, sectors

OBJECTIVE PART II
Explore and analyze sector-speci c P/E ratios within companies of the S&P 100

INTRODUCTION TO PYTHON FOR FINANCE


Step 1: Create a boolean filtering array
stock_prices = np.array([100, 200, 300])
filter_array = (stock_prices >= 150)
print(filter_array)

[ False True True]

INTRODUCTION TO PYTHON FOR FINANCE


Step 2: Apply filtering array to subset another array
stock_prices = np.array([100, 200, 300])
filter_array = (stock_prices >= 150)
print(stock_prices[filter_array])

[200 300]

INTRODUCTION TO PYTHON FOR FINANCE


Step 3: Summarize P/E ratios
Calculate the average and standard deviation of these sector-speci c P/E ratios

import numpy as np
average_value = np.mean(my_array)
std_value = np.std(my_array)

INTRODUCTION TO PYTHON FOR FINANCE


Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Visualizing trends
INTRODUCTION TO PYTHON FOR FINANCE

Adina Howe
Instructor
Your mission - outlier?

INTRODUCTION TO PYTHON FOR FINANCE


Step 1: Make a histogram
import matplotlib.pyplot as plt
plt.hist(hist_data, bins = 8)
plt.show()

INTRODUCTION TO PYTHON FOR FINANCE


Step 2: Identify the Outlier
Identify the outlier P/E ratio

Create a boolean array lter to subset this company

Filter out this company information from the provided datasets

INTRODUCTION TO PYTHON FOR FINANCE


Let's practice!
INTRODUCTION TO PYTHON FOR FINANCE
Representing time
with datetimes
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Datetimes

INTERMEDIATE PYTHON FOR FINANCE


Datetimes

INTERMEDIATE PYTHON FOR FINANCE


Datetimes
from datetime import datetime

black_monday = datetime(1987, 10, 19)


print(black_monday)

datetime.datetime(1987, 10, 19, 0, 0)

INTERMEDIATE PYTHON FOR FINANCE


Datetime now
datetime.now()

datetime.datetime(2019, 11, 6, 3, 48, 30, 886713)

INTERMEDIATE PYTHON FOR FINANCE


Datetime from string
black_monday_str = "Monday, October 19, 1987. 9:30 am"
format_str = "%A, %B %d, %Y. %I:%M %p"
datetime.datetime.strptime(black_monday_str, format_str)

datetime.datetime(1987, 10, 19, 9, 30)

INTERMEDIATE PYTHON FOR FINANCE


Datetime from string
Year

%y Without century (01, 02, ..., 98, 99)

%Y With century (0001, 0002, ..., 1998, 1999, ..., 9999)

Month

%b Abbreviated names (Jan, Feb, ..., Nov, Dec)

%B Full names (January, February, ... November, December)

%m As numbers (01, 02, ..., 11, 12)

Day of Month

%d (01, 02, ..., 30, 31)

INTERMEDIATE PYTHON FOR FINANCE


Datetime from string
Weekday

%a Abbreviated name (Sun, ... Sat)

%A Full name (Sunday, ... Saturday)

%w Number (0, ..., 6)

Hour

%H 24 hour (00, 01, ... 23)

%I 12 hour (01, 02, ... 12)

%M (01, 02, ..., 59)

INTERMEDIATE PYTHON FOR FINANCE


Datetime from string
Seconds

%S (00, 01, ... 59)

Micro-seconds

%f (000000, 000001, ... 999999)

AM/PM

%p (AM, PM)

INTERMEDIATE PYTHON FOR FINANCE


Datetime from string
%m Months

%M Minutes

INTERMEDIATE PYTHON FOR FINANCE


Datetime from string
"1837-05-10"

%Y

%m

%d

"%Y-%m-%d"

INTERMEDIATE PYTHON FOR FINANCE


Datetime from string
"Friday, 17 May 01"

%A

%d

%B

%y

"%A, %d %B %y"

INTERMEDIATE PYTHON FOR FINANCE


String from datetime
dt.strftime(format_string)

INTERMEDIATE PYTHON FOR FINANCE


String from datetime
great_depression_crash = datetime.datetime(1929, 10, 29)
great_depression_crash

datetime.datetime(1929, 10, 29, 0, 0)

great_depression_crash.strftime("%a, %b %d, %Y")

'Tue, Oct 29, 1929'

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Working with
datetimes
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Datetime attributes
now.year now.hour
now.month now.minute
now.day now.second

2019 22
11 34
13 56

INTERMEDIATE PYTHON FOR FINANCE


Comparing datetimes
equals ==

less than <

more than >

INTERMEDIATE PYTHON FOR FINANCE


Comparing datetimes
from datetime import datetime
asian_crisis = datetime(1997, 7, 2)
world_mini_crash = datetime(1997, 10, 27)

asian_crisis > world_mini_crash

False

asian_crisis < world_mini_crash

True

INTERMEDIATE PYTHON FOR FINANCE


Comparing datetimes
asian_crisis = datetime(1997, 7, 2)
world_mini_crash = datetime(1997, 10, 27)

text = "10/27/1997"
format_str = "%m/%d/%Y"
sell_date = datetime.strptime(text, format_str)

sell_date == world_mini_crash

True

INTERMEDIATE PYTHON FOR FINANCE


Difference between datetimes
Compare with < , > , or == .

Subtraction returns a timedelta object.

timedelta a ributes: weeks, days, minutes, seconds, microseconds

INTERMEDIATE PYTHON FOR FINANCE


Difference between datetimes
delta = world_mini_crash - asian_crisis

type(delta)

datetime.timedelta

delta.days

117

INTERMEDIATE PYTHON FOR FINANCE


Creating relative datetimes
dt

datetime.datetime(2019, 1, 14, 0, 0)

datetime(dt.year, dt.month, dt.day - 7)

datetime.datetime(2019, 1, 7, 0, 0)

datetime(dt.year, dt.month, dt.day - 15)

ValueError Traceback (most recent call last)


<ipython-input-28-804001f45cdb> in <module>()
-> 1 datetime(dt.year, dt.month, dt.day - 15)
ValueError: day is out of range for month

INTERMEDIATE PYTHON FOR FINANCE


Creating relative datetimes
delta = world_mini_crash - asian_crisis
type(delta)

datetime.timedelta

INTERMEDIATE PYTHON FOR FINANCE


Creating relative datetimes
from datetime import timedelta

offset = timedelta(weeks = 1)
offset

datetime.timedelta(7)

dt - offset

datetime.datetime(2019, 1, 7, 0, 0)

INTERMEDIATE PYTHON FOR FINANCE


Creating relative datetimes
offset = timedelta(days=16)
dt - offset

datetime.datetime(2018, 12, 29, 0, 0)

cur_week = last_week + timedelta(weeks=1)


# Do some work with date
# set last week variable to cur week and repeat
last_week = cur_week

source_dt = event_dt - timedelta(weeks=4)


# Use source datetime to look up market factors

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Dictionaries
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Lookup by index
my_list = ['a','b','c','d']

0 1 2 3
['a','b','c','d']

my_list[0]

'a'

my_list.index('c')

INTERMEDIATE PYTHON FOR FINANCE


Lookup by key
Dictionaries

INTERMEDIATE PYTHON FOR FINANCE


Representation
{ 'key-1':'value-1', 'key-2':'value-2', 'key-3':'value-3'}

INTERMEDIATE PYTHON FOR FINANCE


Creating dictionaries
my_dict = {}
my_dict

{}

my_dict = dict()
my_dict

{}

INTERMEDIATE PYTHON FOR FINANCE


Creating dictionaries
ticker_symbols = {'AAPL':'Apple', 'F':'Ford', 'LUV':'Southwest'}
print(ticker_symbols)

{'AAPL':'Apple', 'F':'Ford', 'LUV':'Southwest'}

ticker_symbols = dict([['APPL','Apple'],['F','Ford'],['LUV','Southwest']])
print(ticker_symbols)

{'AAPL':'Apple', 'F':'Ford', 'LUV':'Southwest'}

INTERMEDIATE PYTHON FOR FINANCE


Adding to dictionaries
ticker_symbols['XON'] = 'Exxon'
ticker_symbols

{'APPL': 'Apple', 'F': 'Ford', 'LUV': 'Southwest', 'XON': 'Exxon'}

ticker_symbols['XON'] = 'Exxon OLD'


ticker_symbols

{'APPL': 'Apple','F': 'Ford','LUV': 'Southwest','XON': 'Exxon OLD'}

INTERMEDIATE PYTHON FOR FINANCE


Accessing values
ticker_symbols['F']

'Ford'

INTERMEDIATE PYTHON FOR FINANCE


Accessing values
ticker_symbols['XOM']

KeyError Traceback (most recent call last)


<ipython-input-6-782fbf617bf7> in <module>()
-> 1 ticker_symbols['XOM']

KeyError: 'XOM'

INTERMEDIATE PYTHON FOR FINANCE


Accessing values
company = ticker_symbols.get('LUV')
print(company)

'Southwest'

company = ticker_symbols.get('XOM')
print(company)

None

company = ticker_symbols.get('XOM', 'MISSING')


print(company)

'MISSING'

INTERMEDIATE PYTHON FOR FINANCE


Deleting from dictionaries
ticker_symbols

{'APPL': 'Apple', 'F': 'Ford', 'LUV': 'Southwest', 'XON': 'Exxon OLD'}

del(ticker_symbols['XON'])

ticker_symbols

{'APPL': 'Apple', 'F': 'Ford', 'LUV': 'Southwest'}

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Comparison
operators
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Python comparison operators
Equality: == , !=

Order: < , > , <= , >=

INTERMEDIATE PYTHON FOR FINANCE


Equality operator vs assignment
Test equality: ==

Assign value: =

INTERMEDIATE PYTHON FOR FINANCE


Equality operator vs assignment
13 == 13

True

count = 13
print(count)

13

INTERMEDIATE PYTHON FOR FINANCE


Equality comparisons
datetimes

numbers ( oats, ints)

dictionaries

strings

almost anything else

INTERMEDIATE PYTHON FOR FINANCE


Comparing datetimes
date_close_high = datetime(2019, 11, 27)
date_intra_high = datetime(2019, 11, 27)
print(date_close_high == date_intra_high)

True

INTERMEDIATE PYTHON FOR FINANCE


Comparing dictionaries
d1 = {'high':56.88, 'low':33.22, 'closing':56.88}
d2 = {'high':56.88, 'low':33.22, 'closing':56.88}
print(d1 == d2)

True

d1 = {'high':56.88, 'low':33.22, 'closing':56.88}


d2 = {'high':56.88, 'low':33.22, 'closing':12.89}
print(d1 == d2)

False

INTERMEDIATE PYTHON FOR FINANCE


Comparing different types
print(3 == 3.0)

True

print(3 == '3')

False

INTERMEDIATE PYTHON FOR FINANCE


Not equal operator
print(3 != 4)

True

print(3 != 3)

False

INTERMEDIATE PYTHON FOR FINANCE


Order operators
Less than <

Less than or equal <=

Greater than >

Greater than or equal >=

INTERMEDIATE PYTHON FOR FINANCE


Less than operator
print(3 < 4)

True

print(3 < 3.6)

True

print('a' < 'b')

True

INTERMEDIATE PYTHON FOR FINANCE


Less than operator
date_close_high = datetime(2019, 11, 27)
date_intra_high = datetime(2019, 11, 27)
print(date_close_high < date_intra_high)

False

INTERMEDIATE PYTHON FOR FINANCE


Less than or equal operator
print(1 <= 4)

True

print(1.0 <= 1)

True

print('e' <= 'a')

False

INTERMEDIATE PYTHON FOR FINANCE


Greater than operator
print(6 > 5)
print(4 > 4)

True

False

INTERMEDIATE PYTHON FOR FINANCE


Greater than or equal operator
print(6 >= 5)
print(4 >= 4)

True

True

INTERMEDIATE PYTHON FOR FINANCE


Order comparison across types
print(3.45454 < 90)

True

print('a' < 23)

<hr />----------------------------------------------
TypeError Traceback (most recent call last)
...
TypeError: '<' not supported between instances of 'str' and 'int'

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Boolean operators
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Boolean logic

INTERMEDIATE PYTHON FOR FINANCE


What are Boolean operations?
1. and

2. or

3. not

INTERMEDIATE PYTHON FOR FINANCE


Object evaluation
Evaluates as False Evaluates as True
Constants: Almost everything else
False

None

Numeric zero:
0

0.0

Length of zero
""

[]

{}

INTERMEDIATE PYTHON FOR FINANCE


The AND operator
True and True

True

True and False

False

INTERMEDIATE PYTHON FOR FINANCE


The OR operator
False or True

True

True or True

True

False or False

False

INTERMEDIATE PYTHON FOR FINANCE


Short circuit.
is_current() and is_investment()

False

is_current() or is_investment()

True

INTERMEDIATE PYTHON FOR FINANCE


The NOT operator
not True

False

not False

True

INTERMEDIATE PYTHON FOR FINANCE


Order of operations with NOT
True == False

False

not True == False

True

INTERMEDIATE PYTHON FOR FINANCE


Object evaluation
"CUSIP" and True

True

INTERMEDIATE PYTHON FOR FINANCE


Object evaluation
[] or False

False

INTERMEDIATE PYTHON FOR FINANCE


Object evaluation
not {}

True

INTERMEDIATE PYTHON FOR FINANCE


Returning objects
"Federal" and "State"

"State"

[] and "State"

[]

INTERMEDIATE PYTHON FOR FINANCE


Returning objects.
13 or "account number"

13

0.0 or {"balance": 2200}

{"balance": 2200}

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
If statements
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Printing sales only
trns = { 'symbol': 'TSLA', 'type':'BUY', 'amount': 300}

print(trns['amount'])

300

INTERMEDIATE PYTHON FOR FINANCE


Compound statements
control statement
statement 1
statement 2
statement 3

INTERMEDIATE PYTHON FOR FINANCE


Control Statement
if <expression> :

if x < y:

if x in y:

if x and y:

if x:

INTERMEDIATE PYTHON FOR FINANCE


Code blocks
if <expression>:
statement
statement
statement

if <expression>: statement;statement;statement

INTERMEDIATE PYTHON FOR FINANCE


Printing sales only
trns = { 'symbol': 'TSLA', 'type':'BUY', 'amount': 300}

if trns['type'] == 'SELL':
print(trns['amount'])

trns['type'] == 'SELL'

False

INTERMEDIATE PYTHON FOR FINANCE


Printing sales only.
trns = { 'symbol': 'APPL', 'type':'SELL', 'amount': 200}

if trns['type'] == 'SELL':
print(trns['amount'])

200

INTERMEDIATE PYTHON FOR FINANCE


Else
if x in y:
print("I found x in y")
else:
print("No x in y")

INTERMEDIATE PYTHON FOR FINANCE


Elif
if x == y:
print("equals")
elif x < y:
print("less")

INTERMEDIATE PYTHON FOR FINANCE


Elif
if x == y:
print("equals")
elif x < y:
print("less")
elif x > y:
print("more")
elif x == 0
print("zero")

INTERMEDIATE PYTHON FOR FINANCE


Else with elif
if x == y:
print("equals")
elif x < y:
print("less")
elif x > y:
print("more")
elif x == 0
print("zero")
else:
print("None of the above")

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
For and while loops
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Repeating a code block
CUSIP SYMBOL

037833100 AAPL

17275R102 CSCO

68389X105 ORCL

INTERMEDIATE PYTHON FOR FINANCE


Loops.
For loop While loop

INTERMEDIATE PYTHON FOR FINANCE


Statement components
<Control Statement>
<Code Block>

execution 1

execution 2

execution 3

INTERMEDIATE PYTHON FOR FINANCE


For loops
for <variable> in <sequence>:

for x in [0, 1, 2]:

d = {'key': 'value1'}
for x in d:

for x in "ORACLE":

INTERMEDIATE PYTHON FOR FINANCE


List example
for x in [0, 1, 2]:
print(x)

0
1
2

INTERMEDIATE PYTHON FOR FINANCE


Dictionary example
symbols = {'037833100': 'AAPL',
'17275R102': 'CSCO'
'68389X105': 'ORCL'}
for k in symbols:
print(symbols[k])

AAPL
CSCO
ORCL

INTERMEDIATE PYTHON FOR FINANCE


String example
for x in "ORACLE":
print(x)

O
R
A
C
L
E

INTERMEDIATE PYTHON FOR FINANCE


While control statements
while <expression>:

INTERMEDIATE PYTHON FOR FINANCE


While example
x = 0

while x < 5:
print(x)
x = (x + 1)

0
1
2
3
4

INTERMEDIATE PYTHON FOR FINANCE


Infinite loops
x = 0

while x <= 5:
print(x)

INTERMEDIATE PYTHON FOR FINANCE


Skipping with continue
for x in [0, 1, 2, 3]:
if x == 2:
continue
print(x)

0
1
3

INTERMEDIATE PYTHON FOR FINANCE


Stopping with break.
while True:
transaction = get_transaction()
if transaction['symbol'] == 'ORCL':
print('The current symbol is ORCL, break now')
break
print('Not ORCL')

Not ORCL
Not ORCL
Not ORCL
The current symbol is ORCL, break now

INTERMEDIATE PYTHON FOR FINANCE


Let's practice 'for'
and 'while' loops!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Creating a
DataFrame
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Pandas
import pandas as pd

print(pd)

<module 'pandas' from '.../pandas/__init__.py'>

INTERMEDIATE PYTHON FOR FINANCE


Pandas DataFrame
pd.DataFrame()

INTERMEDIATE PYTHON FOR FINANCE


Pandas DataFrame
Col 1 Col 2 Col 3
0 v1 a 00
1 v2 b 01
2 v3 c 13.02

INTERMEDIATE PYTHON FOR FINANCE


From dict
data = {'Bank Code': ['BA', 'AAD', 'BA'],
'Account#': ['ajfdk2', '1234nmk', 'mm3d90'],
'Balance':[1222.00, 390789.11, 13.02]}

df = pd.DataFrame(data=data)

INTERMEDIATE PYTHON FOR FINANCE


From dict
data = {'Bank Code': ['BA', 'AAD', 'BA'],
'Account#': ['ajfdk2', '1234nmk', 'mm3d90'],
'Balance':[1222.00, 390789.11, 13.02]}

df = pd.DataFrame(data=data)

Bank Code Account# Balance


0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
1 BA mm3d90 13.02

INTERMEDIATE PYTHON FOR FINANCE


From list of dicts
data = [{'Bank Code': 'BA', 'Account#': 'ajfdk2', 'Balance': 1222.00},
{'Bank Code': 'AAD', 'Account#': '1234nmk', 'Balance': 390789.11},
{'Bank Code': 'BA', 'Account#': 'mm3d90', 'Balance': 13.02}]
df = pd.DataFrame(data=data)

INTERMEDIATE PYTHON FOR FINANCE


From list of dicts
data = [{'Bank Code': 'BA', 'Account#': 'ajfdk2', 'Balance': 1222.00},
{'Bank Code': 'AAD', 'Account#': '1234nmk', 'Balance': 390789.11},
{'Bank Code': 'BA', 'Account#': 'mm3d90', 'Balance': 13.02}]
df = pd.DataFrame(data=data)

Bank Code Account# Balance


0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
1 BA mm3d90 13.02

INTERMEDIATE PYTHON FOR FINANCE


From list of lists
data = [['BA', 'ajfdk2', 1222.00],
['AAD', '1234nmk', 390789.11],
['BA', 'mm3d90', 13.02]]
df = pd.DataFrame(data=data)

INTERMEDIATE PYTHON FOR FINANCE


From list of lists
data = [['BA', 'ajfdk2', 1222.00],
['AAD', '1234nmk', 390789.11],
['BA', 'mm3d90', 13.02]]
df = pd.DataFrame(data=data)

0 1 2
0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
1 BA mm3d90 13.02

INTERMEDIATE PYTHON FOR FINANCE


From list of lists with column names
data = [['BA', 'ajfdk2', 1222.00],
['AAD', '1234nmk', 390789.11],
['BA', 'mm3d90', 13.02]]
columns = ['Bank Code', 'Account#', 'Balance']
df = pd.DataFrame(data=data, columns=columns)

Bank Code Account# Balance


0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
1 BA mm3d90 13.02

INTERMEDIATE PYTHON FOR FINANCE


From list of lists with column names
data = [['BA', 'ajfdk2', 1222.00],
['AAD', '1234nmk', 390789.11],
['BA', 'mm3d90', 13.02]]
columns = ['Bank Code', 'Account#', 'Balance']
df = pd.DataFrame(data=data, columns=columns)

Bank Code Account# Balance


0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
2 BA mm3d90 13.02

INTERMEDIATE PYTHON FOR FINANCE


Reading data
Excel pd.read_excel

JSON pd.read_json

HTML pd.read_html

Pickle pd.read_pickle

Sql pd.read_sql

Csv pd.read_csv

INTERMEDIATE PYTHON FOR FINANCE


CSV
Comma separated values

client id,trans type, amount


14343,buy,23.0
0574,sell,2000
7093,dividend,2234

INTERMEDIATE PYTHON FOR FINANCE


Reading a csv file
df = pd.read_csv('/data/daily/transactions.csv')

INTERMEDIATE PYTHON FOR FINANCE


Reading a csv file
df = pd.read_csv('/data/daily/transactions.csv')

client id trans type amount


14343 buy 23.0
0574 sell 2000
7093 dividend 2234

INTERMEDIATE PYTHON FOR FINANCE


Non-comma csv
client id|trans type| amount
14343|buy|23.0
0574|sell|2000
7093|dividend|2234

INTERMEDIATE PYTHON FOR FINANCE


Non-comma csv
df = pd.read_csv('/data/daily/transactions.csv', sep='|')

INTERMEDIATE PYTHON FOR FINANCE


Non-comma csv
df = pd.read_csv('/data/daily/transactions.csv', sep='|')

client id trans type amount


14343 buy 23.0
0574 sell 2000
7093 dividend 2234

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Accessing Data
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Account Balance

INTERMEDIATE PYTHON FOR FINANCE


Introducing lesson data
Bank Code Account# Balance
a BA ajfdk2 1222.00
b AAD 1234nmk 390789.11
c BA mm3d90 13.02

accounts

INTERMEDIATE PYTHON FOR FINANCE


Access column using brackets
accounts['Balance']

INTERMEDIATE PYTHON FOR FINANCE


Access column using brackets
accounts['Balance']

a 1222.00
b 390789.11
c 13.02

Name: Balance, dtype: oat6

INTERMEDIATE PYTHON FOR FINANCE


Access column using dot-syntax
accounts.Balance

Balance
a 1222.00
b 390789.11
c 13.02

INTERMEDIATE PYTHON FOR FINANCE


Access multiple columns
accounts[['Bank Code', 'Account#']]

INTERMEDIATE PYTHON FOR FINANCE


Access multiple columns
accounts[['Bank Code', 'Account#']]

Bank Code Account#


a BA ajfdk2
b AAD 1234nmk
c BA mm3d90

INTERMEDIATE PYTHON FOR FINANCE


Access rows using brackets
accounts[0:2]

INTERMEDIATE PYTHON FOR FINANCE


Access rows using brackets
accounts[0:2]

Bank Code Account# Balance


a BA ajfdk2 1222.00
b AAD 1234nmk 390789.11

INTERMEDIATE PYTHON FOR FINANCE


Access rows using brackets
accounts[[True, False, True]]

INTERMEDIATE PYTHON FOR FINANCE


Access rows using brackets
accounts[[True, False, True]]

Bank Code Account# Balance


a BA ajfdk2 1222.00
c BA mm3d90 13.02

INTERMEDIATE PYTHON FOR FINANCE


loc and iloc
loc access by name

iloc access by position

INTERMEDIATE PYTHON FOR FINANCE


loc
accounts.loc['b']

Bank Code AAD


Account# 1234nmk
Balance 390789

Name: b, dtype: object

INTERMEDIATE PYTHON FOR FINANCE


loc
accounts.loc[['a','c']] df.loc[[True, False, True]]

Bank Code Account# Balance Bank Code Account# Balance


a BA ajfdk2 1222.00 a BA ajfdk2 1222.00
c BA mm3d90 13.02 c BA mm3d90 13.02

INTERMEDIATE PYTHON FOR FINANCE


Columns with loc
accounts.loc['a':'c','Balance']

accounts.loc['a':'c', ['Balance','Account#']]

accounts.loc['a':'c',[True,False,True]]

accounts.loc['a':'c','Bank Code':'Balance']

INTERMEDIATE PYTHON FOR FINANCE


Columns with loc
accounts.loc['a':'c',['Bank Code', 'Balance']]

INTERMEDIATE PYTHON FOR FINANCE


Columns with loc
accounts.loc['a':'c',['Bank Code', 'Balance']]

Bank Code Balance


a BA 1222.00
b AAD 390789.11
c BA 13.02

INTERMEDIATE PYTHON FOR FINANCE


iloc
accounts.iloc[0:2, [0,2]]

INTERMEDIATE PYTHON FOR FINANCE


iloc
accounts.iloc[0:2, [0,2]]

INTERMEDIATE PYTHON FOR FINANCE


iloc
accounts.iloc[0:2, [0,2]]

Bank Code Balance


a BA 1222.00
b AAD 390789.11

INTERMEDIATE PYTHON FOR FINANCE


Setting a single value
Bank Code Account# Balance
a BA ajfdk2 1222.00
b AAD 1234nmk 390789.11
c BA mm3d90 13.02

accounts.loc['a', 'Balance'] = 0

INTERMEDIATE PYTHON FOR FINANCE


Setting a single value
Bank Code Account# Balance
a BA ajfdk2 0.00
b AAD 1234nmk 390789.11
c BA mm3d90 13.02

accounts.loc['a', 'Balance'] = 0

INTERMEDIATE PYTHON FOR FINANCE


Setting multiple values
Bank Code Account# Balance
a BA ajfdk2 1222.00
b AAD 1234nmk 390789.11
c BA mm3d90 13.02

accounts.iloc[:2, 1:] = 'NA'

INTERMEDIATE PYTHON FOR FINANCE


Setting multiple columns
Bank Code Account# Balance
a BA NA NA
b AAD NA NA
c BA mm3d90 13.02

accounts.iloc[:2, 1:] = 'NA'

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Aggregating and
summarizing
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
DataFrame methods
.count() .sum()

.min() .prod()

.max() .mean()

.first() .median()

.last() .std()

.var()

INTERMEDIATE PYTHON FOR FINANCE


Axis
Rows Columns
default axis=1

axis=0 axis='columns'

axis='rows'

INTERMEDIATE PYTHON FOR FINANCE


Count
AAD GDDL IMA df.count()
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 AAD 4

2020-10-05 300.00 80.00 45.33 GDDL 4


IMA 4
2020-10-07 302.90 82.92 49.00
dtype: int64

INTERMEDIATE PYTHON FOR FINANCE


Sum
AAD GDDL IMA df.sum(axis=1)
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 2020-10-03 415.44

2020-10-05 300.00 80.00 45.33 2020-10-04 426.47


2020-10-05 425.33
2020-10-07 302.90 82.92 49.00
2020-10-07 434.82
dtype: float64

INTERMEDIATE PYTHON FOR FINANCE


Product
AAD GDDL IMA df.prod(axis='columns')
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 2020-10-03 9.022416e+05

2020-10-05 300.00 80.00 45.33 2020-10-04 1.084987e+06


2020-10-05 1.087920e+06
2020-10-07 302.90 82.92 49.00
2020-10-07 1.230707e+06
dtype: float64

INTERMEDIATE PYTHON FOR FINANCE


Mean
AAD GDDL IMA df.mean()
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 AAD 301.1525

2020-10-05 300.00 80.00 45.33 GDDL 79.5575


IMA 44.8050
2020-10-07 302.90 82.92 49.00
dtype: float64

INTERMEDIATE PYTHON FOR FINANCE


Median
AAD GDDL IMA df.median()
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 AAD 300.855

2020-10-05 300.00 80.00 45.33 GDDL 79.995


IMA 45.160
2020-10-07 302.90 82.92 49.00
dtype: float64

INTERMEDIATE PYTHON FOR FINANCE


Standard deviation
AAD GDDL IMA df.std()
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 AAD 1.337345

2020-10-05 300.00 80.00 45.33 GDDL 3.143548


IMA 3.740183
2020-10-07 302.90 82.92 49.00
dtype: float64

INTERMEDIATE PYTHON FOR FINANCE


Variance
AAD GDDL IMA df.var()
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 AAD 1.788492

2020-10-05 300.00 80.00 45.33 GDDL 9.881892


IMA 13.988967
2020-10-07 302.90 82.92 49.00
dtype: float64

INTERMEDIATE PYTHON FOR FINANCE


Columns and rows
AAD GDDL IMA df.loc[:,'AAD'].max()
2020-10-03 300.22 75.32 39.90
2020-10-04 301.49 79.99 44.99 302.9

2020-10-05 300.00 80.00 45.33


df.iloc[0].min()
2020-10-07 302.90 82.92 49.00

39.9

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Extending and
manipulating data
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
PCE
Personal consumption expenditures (PCE)

PCE =

INTERMEDIATE PYTHON FOR FINANCE


PCE
Personal consumption expenditures (PCE)

PCE = PCDG

Durable goods

1 By cactus cowboy 2 Open Clipart, CC0, h ps://commons.wikimedia.org/w/index.php?curid=64953673

INTERMEDIATE PYTHON FOR FINANCE


PCE
Personal consumption expenditures (PCE)

PCE = PCDG + PCNDG

Non-durable goods

1By Smart Servier 2 h ps://smart.servier.com/, CC BY 3.0, h ps://commons.wikimedia.org/w/index.php?


curid=74765623

INTERMEDIATE PYTHON FOR FINANCE


PCE
Personal consumption expenditures (PCE)

PCE = PCDG + PCNDG + PCESV

Services

1By Clip Art by Vector Toons 2 Own work, CC BY-SA 4.0, h ps://commons.wikimedia.org/w/index.php?
curid=65937611

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing columns
DATE PCDGA
1929-01-01 9.829
1930-01-01 7.661
1931-01-01 5.911
1932-01-01 3.959

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing columns
pce['PCND'] = [[33.941,
30.503,
25.798000000000002,
20.169]

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing columns
pce

DATE PCDG PCND


1929-01-01 9.829 33.941
1930-01-01 7.661 30.503
1931-01-01 5.911 25.798
1932-01-01 3.959 20.169

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing columns
pce pcesv

DATE PCDG PCND PCESV


1929-01-01 9.829 33.941 0 33.613
1930-01-01 7.661 30.503 1 31.972
1931-01-01 5.911 25.798 2 28.963
1932-01-01 3.959 20.169 3 24.587

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing columns
pce['PCESV'] = pcesv pce

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing columns
pce['PCESV'] = pcesv pce

DATE PCDG PCND PCESV


1929-01-01 9.829 33.941 33.613
1930-01-01 7.661 30.503 31.972
1931-01-01 5.911 25.798 28.963
1932-01-01 3.959 20.169 24.587

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing columns
pce['PCE'] = pce['PCDG'] + pce['PCND'] + pce['PCESV']

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing columns
pce['PCE'] = pce['PCDG'] + pce['PCND'] + pce['PCESV']

DATE PCDG PCND PCESV PCE


1929-01-01 9.829 33.941 33.613 77.383
1930-01-01 7.661 30.503 31.972 70.136
1931-01-01 5.911 25.798 28.963 60.672
1932-01-01 3.959 20.169 24.587 48.715

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing columns
pce.drop(columns=['PCDG', 'PCND', 'PCESV'],
axis=1,
inplace=True)

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing columns
pce.drop(columns=['PCDG', 'PCND', 'PCESV'],
axis=1,
inplace=True)

DATE PCE
1929-01-01 77.383
1930-01-01 70.136
1931-01-01 60.672
1932-01-01 48.715

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing rows
new_row

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing rows
new_row pce.append(new_row)

DATE PCE
1933-01-01 45.945

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing rows
new_row pce.append(new_row)

DATE PCE DATE PCE


1933-01-01 45.945 1929-01-01 77.383
1930-01-01 70.136
1931-01-01 60.672
1932-01-01 48.715
1933-01-01 45.945

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing rows
Adding multiple rows

new_rows = [ row1, row2, row3


]
for row in new_rows:
pce = pce.append(row)

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing rows
Adding multiple rows DATE PCE
1929-01-01 77.383
for row in new_rows:
1930-01-01 70.136
pce = pce.append(row)
1931-01-01 60.672
1932-01-01 48.715
1933-01-01 45.945
1934-01-01 51.461
1935-01-01 55.933

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing rows
pce.drop(['1934-01-01',
'1935-01-01',
'1936-01-01',
'1937-01-01',
'1938-01-01',
'1939-01-01'],
inplace=True)

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing rows
pce.drop(['1934-01-01', DATE PCE
'1935-01-01', 1929-01-01 77.383
'1936-01-01', 1930-01-01 70.136
'1937-01-01',
1931-01-01 60.672
'1938-01-01',
1932-01-01 48.715
'1939-01-01'],
inplace=True) 1933-01-01 45.945

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing rows
all_rows = [row1, row2, row3, pce]

pd.concat(all_rows)

INTERMEDIATE PYTHON FOR FINANCE


PCE - adding and removing rows
all_rows = [row1, row2, row3, pce] DATE PCE
1929-01-01 77.383
pd.concat(all_rows) 1930-01-01 70.136
1931-01-01 60.672
1932-01-01 48.715
1933-01-01 45.945
1934-01-01 51.461
1935-01-01 55.933

INTERMEDIATE PYTHON FOR FINANCE


PCE - operations on DataFrames
ec = 0.88
pce * ec

INTERMEDIATE PYTHON FOR FINANCE


PCE - operations on DataFrames
ec = 0.88
pce * ec

DATE PCE
1934-01-01 45.28568
1935-01-01 49.22104
1936-01-01 54.72544
1937-01-01 58.81832

INTERMEDIATE PYTHON FOR FINANCE


PCE - map
def convert_to_euro(x):
return x * 0.88

pce['EURO'] = pce['PCE'].map(convert_to_euro)

INTERMEDIATE PYTHON FOR FINANCE


PCE - map
def convert_to_euro(x):
return x * 0.88

pce['EURO'] = pce['PCE'].map(convert_to_euro)

DATE PCE EURO


1934-01-01 51.461 45.28568
1935-01-01 55.933 49.22104
1936-01-01 62.188 54.72544

INTERMEDIATE PYTHON FOR FINANCE


Gross Domestic Product (GDP)
GDP = PCE + GE + GPDI + NE

PCE: Personal Consumption Expenditures

GE: Government Expenditures

GPDI: Gross Private Domestic Investment

NE: Net Exports

INTERMEDIATE PYTHON FOR FINANCE


GDP - apply
map - Elements in a column (series)

apply - Across rows or columns

INTERMEDIATE PYTHON FOR FINANCE


GDP - apply
GCE GPDI NE PCE
DATE
1929-01-01 9.622 17.170 0.383 77.383
1930-01-01 10.273 11.428 0.323 70.136
1931-01-01 10.169 6.549 0.001 60.672
1932-01-01 8.946 1.819 0.043 48.715

INTERMEDIATE PYTHON FOR FINANCE


GDP - apply
gdp.apply(np.sum, axis=1)

INTERMEDIATE PYTHON FOR FINANCE


GDP - apply
gdp['GDP'] = gdp.apply(np.sum, axis=1)

GCE GPDI NE PCE GDP


DATE
1929-01-01 9.622 17.170 0.383 77.383 104.558
1930-01-01 10.273 11.428 0.323 70.136 92.160
1931-01-01 10.169 6.549 0.001 60.672 77.391
1932-01-01 8.946 1.819 0.043 48.715 59.523

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Peeking at data with
head, tail, and
describe
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Understanding your data
Data is loaded correctly

Understand the data's shape

INTERMEDIATE PYTHON FOR FINANCE


First look at data
aapl

INTERMEDIATE PYTHON FOR FINANCE


First look at data
aapl

Date
03/27/2020
03/26/2020
03/25/2020
03/24/2020

INTERMEDIATE PYTHON FOR FINANCE


First look at data
aapl

Price
Date
03/27/2020 247.74
03/26/2020 258.44
03/25/2020 245.52
03/24/2020 246.88

INTERMEDIATE PYTHON FOR FINANCE


First look at data
aapl

Price Volume
Date
03/27/2020 247.74 51054150
03/26/2020 258.44 63140170
03/25/2020 245.52 75900510
03/24/2020 246.88 71882770

INTERMEDIATE PYTHON FOR FINANCE


First look at data
aapl

Price Volume Trend


Date
03/27/2020 247.74 51054150 Down
03/26/2020 258.44 63140170 Up
03/25/2020 245.52 75900510 Down
03/24/2020 246.88 71882770 Up

INTERMEDIATE PYTHON FOR FINANCE


Head
aapl.head()

Price Volumne Trend


Date
03/27/2020 247.74 51054150 Down
03/26/2020 258.44 63140170 Up
03/25/2020 245.52 75900510 Down
03/24/2020 246.88 71882770 Up
03/23/2020 224.37 84188210 Down

INTERMEDIATE PYTHON FOR FINANCE


Head
aapl.head()

INTERMEDIATE PYTHON FOR FINANCE


Head
aapl.head(3)

```out
Price Volumne Trend
Date
03/27/2020 247.74 51054150 Down
03/26/2020 258.44 63140170 Up
03/25/2020 245.52 75900510 Down

INTERMEDIATE PYTHON FOR FINANCE


Tail
aapl.tail()

Price Volumne Trend


Date
03/05/2020 292.92 46893220 Down
03/04/2020 302.74 54794570 Up
03/03/2020 289.32 79868850 Down
03/02/2020 298.81 85349340 Up
02/28/2020 273.36 106721200 Down

INTERMEDIATE PYTHON FOR FINANCE


Describe
aapl.describe()

Price Volume
count 21.000000 2.100000e+01
mean 263.715714 7.551468e+07
std 23.360598 1.669757e+07
min 224.370000 4.689322e+07
25% 246.670000 6.409497e+07
50% 258.440000 7.505841e+07
75% 285.340000 8.418821e+07
max 302.740000 1.067212e+08

INTERMEDIATE PYTHON FOR FINANCE


Include
aapl.describe(include='object')

Trend
count 21
unique 2
top Down
freq 14

INTERMEDIATE PYTHON FOR FINANCE


Include
aapl.describe(include='all')

Price Volumne Trend


count 21.000000 2.100000e+01 21
unique NaN NaN 2
top NaN NaN Down
freq NaN NaN 14
mean 263.715714 7.551468e+07 NaN
std 23.360598 1.669757e+07 NaN
min 224.370000 4.689322e+07 NaN
25% 246.670000 6.409497e+07 NaN

INTERMEDIATE PYTHON FOR FINANCE


aapl.describe(include=['float', 'object'])

Price Trend
count 21.000000 21
unique NaN 2
top NaN Down
freq NaN 14
mean 263.715714 NaN
std 23.360598 NaN
min 224.370000 NaN
25% 246.670000 NaN
50% 258.440000 NaN
75% 285.340000 NaN
max 302.740000 NaN

INTERMEDIATE PYTHON FOR FINANCE


Percentiles
aapl.describe(percentiles=[.1, .5, .9])

Price Volumne
count 21.000000 2.100000e+01
mean 263.715714 7.551468e+07
std 23.360598 1.669757e+07
min 224.370000 4.689322e+07
10% 242.210000 5.479457e+07
50% 258.440000 7.505841e+07
90% 292.920000 1.004233e+08
max 302.740000 1.067212e+08

INTERMEDIATE PYTHON FOR FINANCE


Exclude
aapl.describe(exclude='float')

Volumne Trend
count 2.100000e+01 21
unique NaN 2
top NaN Down
freq NaN 14
mean 7.551468e+07 NaN
std 1.669757e+07 NaN
min 4.689322e+07 NaN
25% 6.409497e+07 NaN

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Filtering data
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Introducing the data
prices.head()

INTERMEDIATE PYTHON FOR FINANCE


Introducing the data
prices.head()

Date Symbol High


0 2020-04-03 AAPL 245.70
1 2020-04-02 AAPL 245.15
2 2020-04-01 AAPL 248.72
3 2020-03-31 AAPL 262.49
4 2020-03-30 AAPL 255.52

INTERMEDIATE PYTHON FOR FINANCE


Introducing the data
prices.describe()

INTERMEDIATE PYTHON FOR FINANCE


Introducing the data
prices.describe()

High
count 378.000000
mean 881.593138
std 720.771922
min 227.490000
max 2185.950000

INTERMEDIATE PYTHON FOR FINANCE


Introducing the data
prices.describe(include='object')

Symbol
count 378
unique 3
top AMZN
freq 126

INTERMEDIATE PYTHON FOR FINANCE


Comparison operators
< <= > >= == !=

INTERMEDIATE PYTHON FOR FINANCE


Column comparison
prices.High > 2160

INTERMEDIATE PYTHON FOR FINANCE


Column comparison
prices.High > 2160

0 False
1 False
2 False
3 False
4 False
...
374 False
375 False
376 False
377 False

INTERMEDIATE PYTHON FOR FINANCE


Column comparison
prices.Symbol == 'AAPL'

INTERMEDIATE PYTHON FOR FINANCE


Column comparison
prices.Symbol == 'AAPL'

0 True
1 True
2 True
3 True
4 True
...
374 False
375 False
376 False
377 False

INTERMEDIATE PYTHON FOR FINANCE


Masking by symbol
mask_symbol = prices.Symbol == 'AAPL'
aapl = prices.loc[mask_symbol]

INTERMEDIATE PYTHON FOR FINANCE


Masking by symbol
mask_symbol = prices.Symbol == 'AAPL'
aapl = prices.loc[mask_symbol]
aapl.describe(include='object')

Symbol
count 126
unique 1
top AAPL
freq 126

INTERMEDIATE PYTHON FOR FINANCE


Masking by price
mask_high = prices.High > 2160
big_price = prices.loc[mask_high]

INTERMEDIATE PYTHON FOR FINANCE


Masking by price
big_price.describe()

High
count 6.000000
mean 2177.406567
std 7.999334
min 2166.070000
max 2185.95000

INTERMEDIATE PYTHON FOR FINANCE


Pandas Boolean operators
And &

Or |

Not ~

INTERMEDIATE PYTHON FOR FINANCE


Combining conditions
mask_prices = prices['Symbol'] != 'AMZN'

mask_date = historical_highs['Date'] > datetime(2020, 4, 1)

mask_amzn = mask_prices & mask_date

prices.loc[mask_amzn]

INTERMEDIATE PYTHON FOR FINANCE


Combining conditions
Date Symbol High
0 2020-04-03 AAPL 245.7000
1 2020-04-02 AAPL 245.1500
252 2020-04-03 TSLA 515.4900
253 2020-04-02 TSLA 494.2599

INTERMEDIATE PYTHON FOR FINANCE


Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Plotting data
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Look at your data

INTERMEDIATE PYTHON FOR FINANCE


exxon.head()

INTERMEDIATE PYTHON FOR FINANCE


Introducing the data
exxon.head()

Date High Volume Month


0 2015-05-01 90.089996 198924100 May
1 2015-06-01 85.970001 238808600 Jun
2 2015-07-01 83.529999 274029000 Jul
3 2015-08-01 79.290001 387523600 Aug
4 2015-09-01 75.470001 316644500 Sep

INTERMEDIATE PYTHON FOR FINANCE


Matplotlib
my_dataframe.plot()

INTERMEDIATE PYTHON FOR FINANCE


Line plot
exxon.plot(x='Date',
y='High' )

INTERMEDIATE PYTHON FOR FINANCE


INTERMEDIATE PYTHON FOR FINANCE
Rotate
exxon.plot(x='Date',
y='High',
rot=90 )

INTERMEDIATE PYTHON FOR FINANCE


INTERMEDIATE PYTHON FOR FINANCE
Title
exxon.plot(x='Date',
y='High',
rot=90,
title='Exxon Stock Price')

INTERMEDIATE PYTHON FOR FINANCE


INTERMEDIATE PYTHON FOR FINANCE
Index
exxon.set_index('Date', inplace=True)
exxon.plot(y='High',
rot=90,
title='Exxon Stock Price')

INTERMEDIATE PYTHON FOR FINANCE


INTERMEDIATE PYTHON FOR FINANCE
Plot types
line density

bar area

barh pie

hist scatter

box hexbin

kde

INTERMEDIATE PYTHON FOR FINANCE


Bar
exxon2018.plot(x='Month',
y='Volume',
kind='bar',
title='Exxon 2018')

INTERMEDIATE PYTHON FOR FINANCE


INTERMEDIATE PYTHON FOR FINANCE
Hist
exxon.plot(y='High',kind='hist')

INTERMEDIATE PYTHON FOR FINANCE


INTERMEDIATE PYTHON FOR FINANCE
Let's practice!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Wrapping up
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E

Kennedy Behrman
Data Engineer, Author, Founder
Chapter 1
Representing time Mapping data

datetime dict()

INTERMEDIATE PYTHON FOR FINANCE


Chapter 2
Comparison operators If statements

< <= > >=


if a < b:
print(a)
Equality operators

== != Loops

Boolean operators while a < b:


and or not a = a + 1

for a in c:
print(a)

INTERMEDIATE PYTHON FOR FINANCE


Chapter 3
Creating a DataFrame Aggregating, summarizing

DataFrame(data=data) stocks.mean()
pd.read_csv('/data.csv') stocks.median()

Accessing data Extending, manipulating

stocks.loc['a', 'Values'] pce['PCESV'] = pcesv


stocks.iloc[2:22, 12] gdp.apply(np.sum, axis=1)

INTERMEDIATE PYTHON FOR FINANCE


Chapter 4
Peeking Plo ing

aapl.head() exxon.plot(x='Date',
aapl.tail() y='High' )
aapl.describe()

Filtering

mask = prices.High > 216


prices.loc[mask]

INTERMEDIATE PYTHON FOR FINANCE


Congratulations!
I N T E R M E D I AT E P Y T H O N F O R F I N A N C E
Fundamental
financial concepts
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON

Dakota Wixom
Quantitative Finance Analyst
Course objectives
The Time Value of Money

Compound Interest

Discounting and Projecting Cash Flows

Making Rational Economic Decisions

Mortgage Structures

Interest and Equity

The Cost of Capital

Wealth Accumulation

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Calculating Return on Investment (% Gain)
vt2 − vt1
Return (% Gain) = =r
vt1
vt1 : The initial value of the investment at time
vt2 : The nal value of the investment at time

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Example
You invest $10,000 at time = year 1

At time = 2, your investment is worth $11,000


$11, 000 − $10, 000
∗ 100 = 10% annual return (gain) on y
$10, 000

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Calculating Return on Investment (Dollar Value)
vt2 = vt1 ∗ (1 + r)

vt1 : The initial value of the investment at time


vt2 : The nal value of the investment at time

r: The rate of return of the investment per period t

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Example
Annual rate of return = 10% = 10/100

You invest $10,000 at time = year 1


10
$10,000 ∗ (1 + ) = $11,000
100

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Cumulative growth (or depreciation)
r: The investment's expected rate of return (growth rate)

t: The lifespan of the investment (time)

vt0 : The initial value of the investment at time 0


Investment Value = vt0 ∗ (1 + r)t

If the growth rate r is negative, the investment's value will


depreciate (shrink) over time.

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Discount factors
1
df =
(1 + r)t
v = f v ∗ df

df : Discount factor
r: The rate of depreciation per period t
t: Time periods
v : Initial value of the investment
f v : Future value of the investment

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Compound interest
r t∗c
Investment Value = vt0 ∗ (1 + )
c
r: The investment's annual expected rate of return (growth
rate)

t: The lifespan of the investment

vt0 : The initial value of the investment at time 0


c: The number of compounding periods per year

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


The power of compounding returns
Consider a $1,000 investment with a 10% annual return,
compounded quarterly (every 3 months, 4 times per year):

0.10 1∗4
$1, 000 ∗ (1 + ) = $1, 103.81
4
Compare this with no compounding:

0.10 1∗1
$1, 000 ∗ (1 + ) = $1, 100.00
1
Notice the extra $3.81 due to the quarterly compounding?

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Exponential growth
Compounded Quarterly Over 30 Years:

0.10 30∗4
$1, 000 ∗ (1 + ) = $19, 358.15
4
Compounded Annually Over 30 Years:

0.10 30∗1
$1, 000 ∗ (1 + ) = $17, 449.40
1
Compounding quarterly generates an extra $1,908.75 over 30
years

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Present and future
value
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON

Dakota Wixom
Quantitative Finance Analyst
The non-static value of money
Situation 1

Option A: $100 in your pocket today

Option B: $100 in your pocket tomorrow

Situation 2

Option A: $10,000 dollars in your pocket today

Option B: $10,500 dollars in your pocket one year from now

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Time is money
Your Options

A: Take the $10,000, stash it in the bank at 1% interest per


year, risk free

B: Invest the $10,000 in the stock market and earn an


average 8% per year

C: Wait 1 year, take the $10,500 instead

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Comparing future values
A: 10,000 * (1 + 0.01) = 10,100 future dollars

B: 10,000 * (1 + 0.08) = 10,800 future dollars

C: 10,500 future dollars

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Present value in Python
Calculate the present value of $100 received 3 years from now
at a 1.0% in ation rate.

import numpy as np
np.pv(rate=0.01, nper=3, pmt=0, fv=100)

-97.05

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Future value in Python
Calculate the future value of $100 invested for 3 years at a
5.0% average annual rate of return.

import numpy as np
np.fv(rate=0.05, nper=3, pmt=0, pv=-100)

115.76

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Net present value
and cash flows
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON

Dakota Wixom
Quantitative Finance Analyst
Cash flows
Cash ows are a series of gains or losses from an investment
over time.

Year Project 1 Cash Flows Project 2 Cash Flows


0 -$100 $100
1 $100 $100
2 $125 -$100
3 $150 $200
4 $175 $300

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Assume a 3% discount rate

Cash Present
Year Formula
Flows Value
pv(rate=0.03, nper=0, pmt=0,
0 -$100 -100
fv=-100)
pv(rate=0.03, nper=1, pmt=0,
1 $100 97.09
fv=100)
pv(rate=0.03, nper=2, pmt=0,
2 $125 117.82
fv=125)
pv(rate=0.03, nper=3, pmt=0,
3 $150 137.27
fv=150)
pv(rate=0.03, nper=4, pmt=0,
4 $175 155.49
fv=175)

Sum of all present values = 407.67

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Arrays in NumPy
Example:

import numpy as np
array_1 = np.array([100,200,300])
print(array_1*2)

[200 400 600]

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Net Present Value
Project 1

import numpy as np
np.npv(rate=0.03, values=np.array([-100, 100, 125, 150, 175]))

407.67

Project 2

import numpy as np
np.npv(rate=0.03, values=np.array([100, 100, -100, 200, 300]))

552.40

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
A tale of two project
proposals
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON

Dakota Wixom
Quantitative Finance Analyst
Common profitability analysis methods
Net Present Value (NPV)

Internal Rate of Return (IRR)

Equivalent Annual Annuity (EAA)

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Net Present Value (NPV)
NPV is equal to the sum of all discounted cash ows:

Ct
N P V = ∑Tt=1 (1+r)t
− C0

Ct : Cash ow C at time t

r: Discount rate

NPV is a simple cash ow valuation measure that does not allow


for the comparison of di erent sized projects or lengths.

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Internal Rate of Return (IRR)
The internal rate of return must be computed by solving for IRR
in the NPV equation when set equal to 0.

Ct
N P V = ∑Tt=1 (1+IRR)t
− C0 = 0

Ct : Cash ow C at time t

IRR: Internal Rate of Return

IRR can be used to compare projects of di erent sizes and


lengths but requires an algorithmic solution and does not
measure total value.

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


IRR in NumPy
You can use the NumPy function .irr(values) to compute the
internal rate of return of an array of values.

Example:

import numpy as np
project_1 = np.array([-100,150,200])
np.irr(project_1)

1.35

Project 1 has an IRR of 135%

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
The Weighted
Average Cost of
Capital (WACC)
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON

Dakota Wixom
Quantitative Finance Analyst
What is WACC?
W ACC = FEquity ∗ CEquity + FDebt ∗ CDebt ∗ (1 − T R)

FEquity : The proportion (%) of a company's nancing via


equity

FDebt : The proportion (%) of a company's nancing via debt

CEquity : The cost of a company's equity


CDebt : The cost of a company's debt
T R : The corporate tax rate

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Proportion of financing
The proportion (%) of nancing can be calculated as follows:

MEquity
FEquity = MT otal
MDebt
FDebt = MT otal

MT otal = MDebt + MEquity

MDebt : Market value of a company's debt


MEquity : Market value of a company's equity
MT otal : Total value of a company's nancing

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Example:

Calculate the WACC of a company with a 12% cost of debt, 14%


cost of equity, 20% debt nancing and 80% equity nancing.
Assume a 35% e ective corporate tax rate.

percent_equity = 0.80
percent_debt = 0.20
cost_equity = 0.14
cost_debt = 0.12
tax_rate = 0.35
wacc = (percent_equity*cost_equity) + (percent_debt*cost_debt) *
(1 - tax_rate)
print(wacc)

0.1276

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Discounting using WACC
Example:

Calculate the NPV of a project that produces $100 in cash ow


every year for 5 years. Assume a WACC of 13%.

cf_project1 = np.repeat(100, 5)
npv_project1 = np.npv(0.13, cf_project1)
print(npv_project1)

397.45

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Comparing two
projects of different
life spans
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON

Dakota Wixom
Quantitative Finance Analyst
Different NPVs and IRRs
Year Project 1 Project 2 Project comparison

1 -$100 -$125
NPV IRR Length
2 $200 $100
#1 362.58 200% 3
3 $300 $100
#2 453.64 78.62% 8
4 N/A $100
Notice how you could
5 N/A $100
undertake multiple Project 1's
6 N/A $100 over 8 years? Are the NPVs fair
7 N/A $100 to compare?

8 N/A $100

Assume a 5% discount rate for


both projects

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Equivalent Annual Annuity (EAA) can be used to compare two
projects of di erent lifespans in present value terms.

Apply the EAA method to the previous two projects using the
computed NPVs * -1:

import numpy as np
npv_project1 = 362.58
npv_project2 = 453.64
np.pmt(rate=0.05, nper=3, pv=-1*npv_project1, fv=0)

133.14

np.pmt(rate=0.05, nper=8, pv=-1*npv_project2, fv=0)

70.18

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Mortgage basics
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON

Dakota Wixom
Quantitative Finance Analyst
Taking out a mortgage
A mortage is a loan that covers the remaining cost of a home
a er paying a percentage of the home value as a down
payment.

A typical down payment in the US is at least 20% of the


home value

A typical US mortgage loan is paid o over 30 years

Example:

$500,000 house

20% down ($100,000)

$400,000 remaining as a 30 year mortgage loan

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Converting from an annual rate
To convert from an annual rate Example:
to a periodic rate:
Convert a 12% annual interest
1
RP eriodic = (1 + RAnnual ) −
N rate to the equivalent monthly
rate.
R: Rate of Return (or Interest
1
Rate) (1 + 0.12) − 1 = 0.949% m
12

N: Number of Payment
Periods Per Year

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Mortgage loan payments
You can use the NumPy function .pmt(rate, nper, pv) to
compute the periodic mortgage loan payment.

Example:

Calculate the monthly mortgage payment of a $400,000 30


year loan at 3.8% interest:

import numpy as np
monthly_rate = ((1+0.038)**(1/12) - 1)
np.pmt(rate=monthly_rate, nper=12*30, pv=400000)

-1849.15

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Amortization,
interest and
principal
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON

Dakota Wixom
Quantitative Finance Analyst
Amortization
Principal (Equity): The amount PP: Principal Payment
of your mortgage paid that
MP: Mortgage Payment
counts towards the value of
IP: Interest Payment
the house itself
R: Mortgage Interest Rate
Interest Payment (IP P eriodic )
(Periodic)

= RM B ∗ RP eriodic RMB: Remaining Mortgage


Balance
Principal Payment (
P P P eriodic )

= M P P eriodic − IP P eriodic

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Accumulating values via for loops in Python
Example:

accumulator = 0
for i in range(3):
if i == 0:
accumulator = accumulator + 3
else:
accumulator = accumulator + 1
print(str(i)+": Loop value: "+str(accumulator))

0: Loop value: 3
1: Loop value: 4
2: Loop value: 5

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Home ownership,
equity and
forecasting
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON

Dakota Wixom
Quantitative Finance Analyst
Ownership
To calculate the percentage of the home you actually own
(home equity):

ECumulative,t
Percent Equity Ownedt = PDown + VHome

ECumulative,t = ∑Tt=1 PP rincipal,t

ECumulative,t : Cumulative home equity at time t


PP rincipal,t : Principal payment at time t
VHome : Total home value
PDown : Initial down payment

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Underwater mortgage
An underwater mortgage is when the remaining amount you
owe on your mortgage is actually higher than the value of the
house itself.

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Cumulative operations in NumPy
Cumulative Sum

import numpy as np
np.cumsum(np.array([1, 2, 3]))

array([1, 3, 6])

Cumulative Product

import numpy as np
np.cumprod(np.array([1, 2, 3]))

array([1, 2, 6])

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Forecasting cumulative growth
Example:

What is the cumulative value at each point in time of a $100


investment that grows by 3% in period 1, then 3% again in
period 2, and then by 5% in period 3?

import numpy as np
np.cumprod(1 + np.array([0.03, 0.03, 0.05]))

array([ 1.03, 1.0609, 1.113945])

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Budgeting project
proposal
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON

Dakota Wixom
Quantitative Finance Analyst
Project proposal
Your budget will have to take into account the following:

Rent

Food expenses

Entertainment expenses

Emergency fund

You will have to adjust for the following:

Taxes

Salary growth

In ation (for all expenses)

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Constant cumulative growth forecast
What is the cumulative growth of an investment that grows by
3% per year for 3 years?

import numpy as np
np.cumprod(1 + np.repeat(0.03, 3)) - 1

array([ 0.03, 0.0609, 0.0927])

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Forecasting values from growth rates
Compute the value at each point in time of an initial $100
investment that grows by 3% per year for 3 years?

import numpy as np
100*np.cumprod(1 + np.repeat(0.03, 3))

array([ 103, 106.09, 109.27])

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Let's build it!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Net worth and
valuation in your
personal financial
life
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON

Dakota Wixom
Quantitative Finance Analyst
Net Worth
Net Worth = Assets - Liabilities = Equity

This is the basis of modern accounting

A point in time measurement

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Valuation
NPV(discount rate, cash ows)

Take into account future cash ows, salary and expenses

Adjust for in ation

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Reaching financial goals
Saving will only earn you a low rate of return

In ation will destroy most of your savings over time if you let
it

The best way to combat in ation is to invest

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


The basics of investing
Investing is a risk-reward tradeo

Diversify

Plan for the worst

Invest as early as possible

Invest continuously over time

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Let's simulate it!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
The power of time
and compound
interest
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON

Dakota Wixom
Quantitative Finance Analyst
The power of time
Goal: Save $1.0 million over 40 years. Assume an average 7%
rate of return per year.

import numpy as np
np.pmt(rate=((1+0.07)**1/12 - 1), nper=12*40, pv=0, fv=1000000)

-404.61

What if your investments only returned 5% on average?

import numpy as np
np.pmt(rate=((1+0.05)**1/12 - 1), nper=12*40, pv=0, fv=1000000)

-674.53

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


The power of time
Goal: Save $1.0 million over 25 years. Assume an average 7%
rate of return per year.

import numpy as np
np.pmt(rate=((1+0.07)**1/12 - 1), nper=12*25, pv=0, fv=1000000)

-1277.07

What if your investments only returned 5% on average?

import numpy as np
np.pmt(rate=((1+0.05)**1/12 - 1), nper=12*40, pv=0, fv=1000000)

-1707.26

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Inflation adjusting
Assume an average rate of in ation of 3% per year

import numpy as np
np.fv(rate=-0.03, nper=25, pv=-1000000, pmt=0)

466974.70

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Let's practice!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
Financial concepts
in your daily life
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON

Dakota Wixom
Quantitative Finance Analyst
Congratulations
The Time Value of Money

Compound Interest

Discounting and Projecting Cash Flows

Making Rational Economic Decisions

Mortgage Structures

Interest and Equity

The Cost of Capital

Wealth Accumulation

INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON


Congratulations!
INTRODUCTION TO FINANCIAL CONCEPTS IN PYTHON
How to use dates &
times with pandas
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Date & time series functionality
At the root: data types for date & time information
Objects for points in time and periods

A ributes & methods re ect time-related details

Sequences of dates & periods:


Series or DataFrame columns

Index: convert object into Time Series

Many Series/DataFrame methods rely on time information in


the index to provide time-series functionality

MANIPULATING TIME SERIES DATA IN PYTHON


Basic building block: pd.Timestamp
import pandas as pd # assumed imported going forward
from datetime import datetime # To manually create dates
time_stamp = pd.Timestamp(datetime(2017, 1, 1))
pd.Timestamp('2017-01-01') == time_stamp

True # Understands dates as strings

time_stamp # type: pandas.tslib.Timestamp

Timestamp('2017-01-01 00:00:00')

MANIPULATING TIME SERIES DATA IN PYTHON


Basic building block: pd.Timestamp
Timestamp object has many a ributes to store time-speci c
information

time_stamp.year

2017

time_stamp.day_name()

'Sunday'

MANIPULATING TIME SERIES DATA IN PYTHON


More building blocks: pd.Period & freq
period = pd.Period('2017-01')
period # default: month-end

Period object has freq


Period('2017-01', 'M') a ribute to store frequency
info
period.asfreq('D') # convert to daily

Period('2017-01-31', 'D')
Convert pd.Period() to
period.to_timestamp().to_period('M') pd.Timestamp() and back

Period('2017-01', 'M')

MANIPULATING TIME SERIES DATA IN PYTHON


More building blocks: pd.Period & freq
period + 2 Frequency info enables
basic date arithmetic
Period('2017-03', 'M')

pd.Timestamp('2017-01-31', 'M') + 1

Timestamp('2017-02-28 00:00:00', freq='M')

MANIPULATING TIME SERIES DATA IN PYTHON


Sequences of dates & times
pd.date_range : start , end , periods , freq

index = pd.date_range(start='2017-1-1', periods=12, freq='M')

index

DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', ...,


'2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31'],
dtype='datetime64[ns]', freq='M')

pd.DateTimeIndex : sequence of Timestamp objects with


frequency info

MANIPULATING TIME SERIES DATA IN PYTHON


Sequences of dates & times
index[0]

Timestamp('2017-01-31 00:00:00', freq='M')

index.to_period()

PeriodIndex(['2017-01', '2017-02', '2017-03', '2017-04', ...,


'2017-11', '2017-12'], dtype='period[M]', freq='M')

MANIPULATING TIME SERIES DATA IN PYTHON


Create a time series: pd.DateTimeIndex
pd.DataFrame({'data': index}).info()

RangeIndex: 12 entries, 0 to 11
Data columns (total 1 columns):
data 12 non-null datetime64[ns]
dtypes: datetime64[ns](1)

MANIPULATING TIME SERIES DATA IN PYTHON


Create a time series: pd.DateTimeIndex
np.random.random :
Random numbers: [0,1]

12 rows, 2 columns

data = np.random.random((size=12,2))
pd.DataFrame(data=data, index=index).info()

DatetimeIndex: 12 entries, 2017-01-31 to 2017-12-31


Freq: M
Data columns (total 2 columns):
0 12 non-null float64
1 12 non-null float64
dtypes: float64(2)

MANIPULATING TIME SERIES DATA IN PYTHON


Frequency aliases & time info

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Indexing &
resampling time
series
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Time series transformation
Basic time series transformations include:

Parsing string dates and convert to datetime64

Selecting & slicing for speci c subperiods

Se ing & changing DateTimeIndex frequency


Upsampling vs Downsampling

MANIPULATING TIME SERIES DATA IN PYTHON


Getting GOOG stock prices
google = pd.read_csv('google.csv') # import pandas as pd
google.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 504 entries, 0 to 503
Data columns (total 2 columns):
date 504 non-null object
price 504 non-null float64
dtypes: float64(1), object(1)

google.head()

date price
0 2015-01-02 524.81
1 2015-01-05 513.87
2 2015-01-06 501.96
3 2015-01-07 501.10
4 2015-01-08 502.68

MANIPULATING TIME SERIES DATA IN PYTHON


Converting string dates to datetime64
pd.to_datetime() :
Parse date string

Convert to datetime64

google.date = pd.to_datetime(google.date)
google.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 504 entries, 0 to 503
Data columns (total 2 columns):
date 504 non-null datetime64[ns]
price 504 non-null float64
dtypes: datetime64[ns](1), float64(1)

MANIPULATING TIME SERIES DATA IN PYTHON


Converting string dates to datetime64
.set_index() :
Date into index

inplace :
don't create copy

google.set_index('date', inplace=True)
google.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 504 entries, 2015-01-02 to 2016-12-30
Data columns (total 1 columns):
price 504 non-null float64
dtypes: float64(1)

MANIPULATING TIME SERIES DATA IN PYTHON


Plotting the Google stock time series
google.price.plot(title='Google Stock Price')
plt.tight_layout(); plt.show()

MANIPULATING TIME SERIES DATA IN PYTHON


Partial string indexing
Selecting/indexing using strings that parse to dates

google['2015'].info() # Pass string for part of date

DatetimeIndex: 252 entries, 2015-01-02 to 2015-12-31


Data columns (total 1 columns):
price 252 non-null float64
dtypes: float64(1)

google['2015-3': '2016-2'].info() # Slice includes last month

DatetimeIndex: 252 entries, 2015-03-02 to 2016-02-29


Data columns (total 1 columns):
price 252 non-null float64
dtypes: float64(1)
memory usage: 3.9 KB

MANIPULATING TIME SERIES DATA IN PYTHON


Partial string indexing
google.loc['2016-6-1', 'price'] # Use full date with .loc[]

734.15

MANIPULATING TIME SERIES DATA IN PYTHON


.asfreq(): set frequency
.asfreq('D') :
Convert DateTimeIndex to calendar day frequency

google.asfreq('D').info() # set calendar day frequency

DatetimeIndex: 729 entries, 2015-01-02 to 2016-12-30


Freq: D
Data columns (total 1 columns):
price 504 non-null float64
dtypes: float64(1)

MANIPULATING TIME SERIES DATA IN PYTHON


.asfreq(): set frequency
Upsampling:
Higher frequency implies new dates => missing data

google.asfreq('D').head()

price
date
2015-01-02 524.81
2015-01-03 NaN
2015-01-04 NaN
2015-01-05 513.87
2015-01-06 501.96

MANIPULATING TIME SERIES DATA IN PYTHON


.asfreq(): reset frequency
.asfreq('B') :
Convert DateTimeIndex to business day frequency

google = google.asfreq('B') # Change to calendar day frequency


google.info()

DatetimeIndex: 521 entries, 2015-01-02 to 2016-12-30


Freq: B
Data columns (total 1 columns):
price 504 non-null float64
dtypes: float64(1)

MANIPULATING TIME SERIES DATA IN PYTHON


.asfreq(): reset frequency
google[google.price.isnull()] # Select missing 'price' values

price
date
2015-01-19 NaN
2015-02-16 NaN
...
2016-11-24 NaN
2016-12-26 NaN

Business days that were not trading days

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Lags, changes, and
returns for stock
price series
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Basic time series calculations
Typical Time Series manipulations include:
Shi or lag values back or forward back in time

Get the di erence in value for a given time period

Compute the percent change over any number of periods

pandas built-in methods rely on pd.DateTimeIndex

MANIPULATING TIME SERIES DATA IN PYTHON


Getting GOOG stock prices
Let pd.read_csv() do the parsing for you!

google = pd.read_csv('google.csv', parse_dates=['date'], index_col='date')

google.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 504 entries, 2015-01-02 to 2016-12-30
Data columns (total 1 columns):
price 504 non-null float64
dtypes: float64(1)

MANIPULATING TIME SERIES DATA IN PYTHON


Getting GOOG stock prices
google.head()

price
date
2015-01-02 524.81
2015-01-05 513.87
2015-01-06 501.96
2015-01-07 501.10
2015-01-08 502.68

MANIPULATING TIME SERIES DATA IN PYTHON


.shift(): Moving data between past & future
.shift() :
defaults to periods=1

1 period into future

google['shifted'] = google.price.shift() # default: periods=1


google.head(3)

price shifted
date
2015-01-02 542.81 NaN
2015-01-05 513.87 542.81
2015-01-06 501.96 513.87

MANIPULATING TIME SERIES DATA IN PYTHON


.shift(): Moving data between past & future
.shift(periods=-1) :
lagged data

1 period back in time

google['lagged'] = google.price.shift(periods=-1)
google[['price', 'lagged', 'shifted']].tail(3)

price lagged shifted


date
2016-12-28 785.05 782.79 791.55
2016-12-29 782.79 771.82 785.05
2016-12-30 771.82 NaN 782.79

MANIPULATING TIME SERIES DATA IN PYTHON


Calculate one-period percent change
xt / xt−1
google['change'] = google.price.div(google.shifted)
google[['price', 'shifted', 'change']].head(3)

price shifted change


Date
2017-01-03 786.14 NaN NaN
2017-01-04 786.90 786.14 1.000967
2017-01-05 794.02 786.90 1.009048

MANIPULATING TIME SERIES DATA IN PYTHON


Calculate one-period percent change
google['return'] = google.change.sub(1).mul(100)
google[['price', 'shifted', 'change', 'return']].head(3)

price shifted change return


date
2015-01-02 524.81 NaN NaN NaN
2015-01-05 513.87 524.81 0.98 -2.08
2015-01-06 501.96 513.87 0.98 -2.32

MANIPULATING TIME SERIES DATA IN PYTHON


.diff(): built-in time-series change
Di erence in value for two adjacent periods

xt − xt−1
google['diff'] = google.price.diff()
google[['price', 'diff']].head(3)

price diff
date
2015-01-02 524.81 NaN
2015-01-05 513.87 -10.94
2015-01-06 501.96 -11.91

MANIPULATING TIME SERIES DATA IN PYTHON


.pct_change(): built-in time-series % change
Percent change for two adjacent periods
xt
xt−1

google['pct_change'] = google.price.pct_change().mul(100)
google[['price', 'return', 'pct_change']].head(3)

price return pct_change


date
2015-01-02 524.81 NaN NaN
2015-01-05 513.87 -2.08 -2.08
2015-01-06 501.96 -2.32 -2.32

MANIPULATING TIME SERIES DATA IN PYTHON


Looking ahead: Get multi-period returns
google['return_3d'] = google.price.pct_change(periods=3).mul(100)
google[['price', 'return_3d']].head()

price return_3d
date
2015-01-02 524.81 NaN
2015-01-05 513.87 NaN
2015-01-06 501.96 NaN
2015-01-07 501.10 -4.517825
2015-01-08 502.68 -2.177594

Percent change for two periods, 3 trading days apart

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Compare time series
growth rates
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Comparing stock performance
Stock price series: hard to compare at di erent levels

Simple solution: normalize price series to start at 100

Divide all prices by rst in series, multiply by 100


Same starting point

All prices relative to starting point

Di erence to starting point in percentage points

MANIPULATING TIME SERIES DATA IN PYTHON


Normalizing a single series (1)
google = pd.read_csv('google.csv', parse_dates=['date'], index_col='date')
google.head(3)

price
date
2010-01-04 313.06
2010-01-05 311.68
2010-01-06 303.83

first_price = google.price.iloc[0] # int-based selection


first_price

313.06

first_price == google.loc['2010-01-04', 'price']

True

MANIPULATING TIME SERIES DATA IN PYTHON


Normalizing a single series (2)
normalized = google.price.div(first_price).mul(100)
normalized.plot(title='Google Normalized Series')

MANIPULATING TIME SERIES DATA IN PYTHON


Normalizing multiple series (1)
prices = pd.read_csv('stock_prices.csv',
parse_dates=['date'],
index_col='date')
prices.info()

DatetimeIndex: 1761 entries, 2010-01-04 to 2016-12-30


Data columns (total 3 columns):
AAPL 1761 non-null float64
GOOG 1761 non-null float64
YHOO 1761 non-null float64
dtypes: float64(3)

prices.head(2)

AAPL GOOG YHOO


Date
2010-01-04 30.57 313.06 17.10
2010-01-05 30.63 311.68 17.23

MANIPULATING TIME SERIES DATA IN PYTHON


Normalizing multiple series (2)
prices.iloc[0]

AAPL 30.57
GOOG 313.06
YHOO 17.10
Name: 2010-01-04 00:00:00, dtype: float64

normalized = prices.div(prices.iloc[0])
normalized.head(3)

AAPL GOOG YHOO


Date
2010-01-04 1.000000 1.000000 1.000000
2010-01-05 1.001963 0.995592 1.007602
2010-01-06 0.985934 0.970517 1.004094

.div() : automatic alignment of Series index & DataFrame


columns

MANIPULATING TIME SERIES DATA IN PYTHON


Comparing with a benchmark (1)
index = pd.read_csv('benchmark.csv', parse_dates=['date'], index_col='date')
index.info()

DatetimeIndex: 1826 entries, 2010-01-01 to 2016-12-30


Data columns (total 1 columns):
SP500 1762 non-null float64
dtypes: float64(1)

prices = pd.concat([prices, index], axis=1).dropna()


prices.info()

DatetimeIndex: 1761 entries, 2010-01-04 to 2016-12-30


Data columns (total 4 columns):
AAPL 1761 non-null float64
GOOG 1761 non-null float64
YHOO 1761 non-null float64
SP500 1761 non-null float64
dtypes: float64(4)

MANIPULATING TIME SERIES DATA IN PYTHON


Comparing with a benchmark (2)
prices.head(1)

AAPL GOOG YHOO SP500


2010-01-04 30.57 313.06 17.10 1132.99

normalized = prices.div(prices.iloc[0]).mul(100)
normalized.plot()

MANIPULATING TIME SERIES DATA IN PYTHON


Plotting performance difference
diff = normalized[tickers].sub(normalized['SP500'], axis=0)

GOOG YHOO AAPL


2010-01-04 0.000000 0.000000 0.000000
2010-01-05 -0.752375 0.448669 -0.115294
2010-01-06 -3.314604 0.043069 -1.772895

.sub(..., axis=0) : Subtract a Series from each DataFrame


column by aligning indexes

MANIPULATING TIME SERIES DATA IN PYTHON


Plotting performance difference
diff.plot()

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Changing the time
series frequency:
resampling
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Changing the frequency: resampling
DateTimeIndex : set & change freq using .asfreq()

But frequency conversion a ects the data


Upsampling: ll or interpolate missing data

Downsampling: aggregate existing data

pandas API:
.asfreq() , .reindex()

.resample() + transformation method

MANIPULATING TIME SERIES DATA IN PYTHON


Getting started: quarterly data
dates = pd.date_range(start='2016', periods=4, freq='Q')
data = range(1, 5)
quarterly = pd.Series(data=data, index=dates)
quarterly

2016-03-31 1
2016-06-30 2
2016-09-30 3
2016-12-31 4
Freq: Q-DEC, dtype: int64 # Default: year-end quarters

MANIPULATING TIME SERIES DATA IN PYTHON


Upsampling: quarter => month
monthly = quarterly.asfreq('M') # to month-end frequency

2016-03-31 1.0
2016-04-30 NaN
2016-05-31 NaN
2016-06-30 2.0
2016-07-31 NaN
2016-08-31 NaN
2016-09-30 3.0
2016-10-31 NaN
2016-11-30 NaN
2016-12-31 4.0
Freq: M, dtype: float64

Upsampling creates missing values

monthly = monthly.to_frame('baseline') # to DataFrame

MANIPULATING TIME SERIES DATA IN PYTHON


Upsampling: fill methods
monthly['ffill'] = quarterly.asfreq('M', method='ffill')
monthly['bfill'] = quarterly.asfreq('M', method='bfill')
monthly['value'] = quarterly.asfreq('M', fill_value=0)

MANIPULATING TIME SERIES DATA IN PYTHON


Upsampling: fill methods
bfill : back ll

ffill : forward ll

baseline ffill bfill value


2016-03-31 1.0 1 1 1
2016-04-30 NaN 1 2 0
2016-05-31 NaN 1 2 0
2016-06-30 2.0 2 2 2
2016-07-31 NaN 2 3 0
2016-08-31 NaN 2 3 0
2016-09-30 3.0 3 3 3
2016-10-31 NaN 3 4 0
2016-11-30 NaN 3 4 0
2016-12-31 4.0 4 4 4

MANIPULATING TIME SERIES DATA IN PYTHON


Add missing months: .reindex()
dates = pd.date_range(start='2016', quarterly.reindex(dates)
periods=12,
freq='M')
2016-01-31 NaN
2016-02-29 NaN
DatetimeIndex(['2016-01-31', 2016-03-31 1.0
'2016-02-29', 2016-04-30 NaN
..., 2016-05-31 NaN
'2016-11-30', 2016-06-30 2.0
'2016-12-31'], 2016-07-31 NaN
dtype='datetime64[ns]', freq='M') 2016-08-31 NaN
2016-09-30 3.0
2016-10-31 NaN
.reindex() : 2016-11-30 NaN

conform DataFrame to 2016-12-31 4.0

new index

same lling logic as


.asfreq()

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Upsampling &
interpolation with
.resample()
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Frequency conversion & transformation methods
.resample() : similar to .groupby()

Groups data within resampling period and applies one or


several methods to each group

New date determined by o set - start, end, etc

Upsampling: ll from existing or interpolate values

Downsampling: apply aggregation to existing data

MANIPULATING TIME SERIES DATA IN PYTHON


Getting started: monthly unemployment rate
unrate = pd.read_csv('unrate.csv', parse_dates['Date'], index_col='Date')
unrate.info()

DatetimeIndex: 208 entries, 2000-01-01 to 2017-04-01


Data columns (total 1 columns):
UNRATE 208 non-null float64 # no frequency information
dtypes: float64(1)

unrate.head()

UNRATE
DATE
2000-01-01 4.0
2000-02-01 4.1
2000-03-01 4.0
2000-04-01 3.8
2000-05-01 4.0

Reporting date: 1st day of month

MANIPULATING TIME SERIES DATA IN PYTHON


Resampling Period & Frequency Offsets
Resample creates new date for frequency o set

Several alternatives to calendar month end

Frequency Alias Sample Date


Calendar Month End M 2017-04-30
Calendar Month Start MS 2017-04-01
Business Month End BM 2017-04-28
Business Month Start BMS 2017-04-03

MANIPULATING TIME SERIES DATA IN PYTHON


Resampling logic

MANIPULATING TIME SERIES DATA IN PYTHON


Resampling logic

MANIPULATING TIME SERIES DATA IN PYTHON


Assign frequency with .resample()
unrate.asfreq('MS').info()

DatetimeIndex: 208 entries, 2000-01-01 to 2017-04-01


Freq: MS
Data columns (total 1 columns):
UNRATE 208 non-null float64
dtypes: float64(1)

unrate.resample('MS') # creates Resampler object

DatetimeIndexResampler [freq=<MonthBegin>, axis=0, closed=left,


label=left, convention=start, base=0]

MANIPULATING TIME SERIES DATA IN PYTHON


Assign frequency with .resample()
unrate.asfreq('MS').equals(unrate.resample('MS').asfreq())

True

.resample() : returns data only when calling another method

MANIPULATING TIME SERIES DATA IN PYTHON


Quarterly real GDP growth
gdp = pd.read_csv('gdp.csv')
gdp.info()

DatetimeIndex: 69 entries, 2000-01-01 to 2017-01-01


Data columns (total 1 columns):
gpd 69 non-null float64 # no frequency info
dtypes: float64(1)

gdp.head(2)

gpd
DATE
2000-01-01 1.2
2000-04-01 7.8

MANIPULATING TIME SERIES DATA IN PYTHON


Interpolate monthly real GDP growth
gdp_1 = gdp.resample('MS').ffill().add_suffix('_ffill')

gpd_ffill
DATE
2000-01-01 1.2
2000-02-01 1.2
2000-03-01 1.2
2000-04-01 7.8

MANIPULATING TIME SERIES DATA IN PYTHON


Interpolate monthly real GDP growth
gdp_2 = gdp.resample('MS').interpolate().add_suffix('_inter')

gpd_inter
DATE
2000-01-01 1.200000
2000-02-01 3.400000
2000-03-01 5.600000
2000-04-01 7.800000

.interpolate() : nds points on straight line between


existing data

MANIPULATING TIME SERIES DATA IN PYTHON


Concatenating two DataFrames
df1 = pd.DataFrame([1, 2, 3], columns=['df1'])
df2 = pd.DataFrame([4, 5, 6], columns=['df2'])
pd.concat([df1, df2])

df1 df2
0 1.0 NaN
1 2.0 NaN
2 3.0 NaN
0 NaN 4.0
1 NaN 5.0
2 NaN 6.0

MANIPULATING TIME SERIES DATA IN PYTHON


Concatenating two DataFrames
pd.concat([df1, df2], axis=1)

df1 df2
0 1 4
1 2 5
2 3 6

axis=1 : concatenate horizontally

MANIPULATING TIME SERIES DATA IN PYTHON


Plot interpolated real GDP growth
pd.concat([gdp_1, gdp_2], axis=1).loc['2015':].plot()

MANIPULATING TIME SERIES DATA IN PYTHON


Combine GDP growth & unemployment
pd.concat([unrate, gdp_inter], axis=1).plot();

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Downsampling &
aggregation
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Downsampling & aggregation methods
So far: upsampling, ll logic & interpolation

Now: downsampling
hour to day

day to month, etc

How to represent the existing values at the new date?


Mean, median, last value?

MANIPULATING TIME SERIES DATA IN PYTHON


Air quality: daily ozone levels
ozone = pd.read_csv('ozone.csv',
parse_dates=['date'],
index_col='date')
ozone.info()

DatetimeIndex: 6291 entries, 2000-01-01 to 2017-03-31


Data columns (total 1 columns):
Ozone 6167 non-null float64
dtypes: float64(1)

ozone = ozone.resample('D').asfreq()
ozone.info()

DatetimeIndex: 6300 entries, 1998-01-05 to 2017-03-31


Freq: D
Data columns (total 1 columns):
Ozone 6167 non-null float64
dtypes: float64(1)

MANIPULATING TIME SERIES DATA IN PYTHON


Creating monthly ozone data
ozone.resample('M').mean().head() ozone.resample('M').median().head()

Ozone Ozone
date date
2000-01-31 0.010443 2000-01-31 0.009486
2000-02-29 0.011817 2000-02-29 0.010726
2000-03-31 0.016810 2000-03-31 0.017004
2000-04-30 0.019413 2000-04-30 0.019866
2000-05-31 0.026535 2000-05-31 0.026018

.resample().mean() : Monthly
average, assigned to end of
calendar month

MANIPULATING TIME SERIES DATA IN PYTHON


Creating monthly ozone data
ozone.resample('M').agg(['mean', 'std']).head()

Ozone
mean std
date
2000-01-31 0.010443 0.004755
2000-02-29 0.011817 0.004072
2000-03-31 0.016810 0.004977
2000-04-30 0.019413 0.006574
2000-05-31 0.026535 0.008409

.resample().agg() : List of aggregation functions like


groupby

MANIPULATING TIME SERIES DATA IN PYTHON


Plotting resampled ozone data
ozone = ozone.loc['2016':]
ax = ozone.plot()
monthly = ozone.resample('M').mean()
monthly.add_suffix('_monthly').plot(ax=ax)

MANIPULATING TIME SERIES DATA IN PYTHON


Resampling multiple time series
data = pd.read_csv('ozone_pm25.csv',
parse_dates=['date'],
index_col='date')
data = data.resample('D').asfreq()
data.info()

DatetimeIndex: 6300 entries, 2000-01-01 to 2017-03-31


Freq: D
Data columns (total 2 columns):
Ozone 6167 non-null float64
PM25 6167 non-null float64
dtypes: float64(2)

MANIPULATING TIME SERIES DATA IN PYTHON


Resampling multiple time series
data = data.resample('BM').mean()
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 207 entries, 2000-01-31 to 2017-03-31
Freq: BM
Data columns (total 2 columns):
ozone 207 non-null float64
pm25 207 non-null float64
dtypes: float64(2)

MANIPULATING TIME SERIES DATA IN PYTHON


Resampling multiple time series
df.resample('M').first().head(4)

Ozone PM25
date
2000-01-31 0.005545 20.800000
2000-02-29 0.016139 6.500000
2000-03-31 0.017004 8.493333
2000-04-30 0.031354 6.889474

df.resample('MS').first().head()

Ozone PM25
date
2000-01-01 0.004032 37.320000
2000-02-01 0.010583 24.800000
2000-03-01 0.007418 11.106667
2000-04-01 0.017631 11.700000
2000-05-01 0.022628 9.700000

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Rolling window
functions with
pandas
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Window functions in pandas
Windows identify sub periods of your time series

Calculate metrics for sub periods inside the window

Create a new time series of metrics

Two types of windows:


Rolling: same size, sliding (this video)

Expanding: contain all prior values (next video)

MANIPULATING TIME SERIES DATA IN PYTHON


Calculating a rolling average
data = pd.read_csv('google.csv', parse_dates=['date'], index_col='date')

DatetimeIndex: 1761 entries, 2010-01-04 to 2016-12-30


Data columns (total 1 columns):
price 1761 non-null float64
dtypes: float64(1)

MANIPULATING TIME SERIES DATA IN PYTHON


Calculating a rolling average
# Integer-based window size
data.rolling(window=30).mean() # fixed # observations

DatetimeIndex: 1761 entries, 2010-01-04 to 2017-05-24


Data columns (total 1 columns):
price 1732 non-null float64
dtypes: float64(1)

window=30 : # business days

min_periods : choose value < 30 to get results for rst days

MANIPULATING TIME SERIES DATA IN PYTHON


Calculating a rolling average
# Offset-based window size
data.rolling(window='30D').mean() # fixed period length

DatetimeIndex: 1761 entries, 2010-01-04 to 2017-05-24


Data columns (total 1 columns):
price 1761 non-null float64
dtypes: float64(1)

30D : # calendar days

MANIPULATING TIME SERIES DATA IN PYTHON


90 day rolling mean
r90 = data.rolling(window='90D').mean()
google.join(r90.add_suffix('_mean_90')).plot()

MANIPULATING TIME SERIES DATA IN PYTHON


90 & 360 day rolling means
data['mean90'] = r90
r360 = data['price'].rolling(window='360D'.mean()
data['mean360'] = r360; data.plot()

MANIPULATING TIME SERIES DATA IN PYTHON


Multiple rolling metrics (1)
r = data.price.rolling('90D').agg(['mean', 'std'])
r.plot(subplots = True)

MANIPULATING TIME SERIES DATA IN PYTHON


Multiple rolling metrics (2)
rolling = data.google.rolling('360D')
q10 = rolling.quantile(0.1).to_frame('q10')
median = rolling.median().to_frame('median')
q90 = rolling.quantile(0.9).to_frame('q90')
pd.concat([q10, median, q90], axis=1).plot()

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Expanding window
functions with
pandas
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Expanding windows in pandas
From rolling to expanding windows

Calculate metrics for periods up to current date

New time series re ects all historical values

Useful for running rate of return, running min/max

Two options with pandas:


.expanding() - just like .rolling()

.cumsum() , .cumprod() , cummin() / max()

MANIPULATING TIME SERIES DATA IN PYTHON


The basic idea
df = pd.DataFrame({'data': range(5)})
df['expanding sum'] = df.data.expanding().sum()
df['cumulative sum'] = df.data.cumsum()
df

data expanding sum cumulative sum


0 0 0.0 0
1 1 1.0 1
2 2 3.0 3
3 3 6.0 6
4 4 10.0 10

MANIPULATING TIME SERIES DATA IN PYTHON


Get data for the S&P 500
data = pd.read_csv('sp500.csv', parse_dates=['date'], index_col='date')

DatetimeIndex: 2519 entries, 2007-05-24 to 2017-05-24


Data columns (total 1 columns):
SP500 2519 non-null float64

MANIPULATING TIME SERIES DATA IN PYTHON


How to calculate a running return
Single period return rt : current price over last price minus 1:
Pt
rt = −1
Pt−1
Multi-period return: product of (1 + rt ) for all periods,
minus 1:

RT = (1 + r1 )(1 + r2 )...(1 + rT ) − 1

For the period return: .pct_change()

For basic math .add() , .sub() , .mul() , .div()

For cumulative product: .cumprod()

MANIPULATING TIME SERIES DATA IN PYTHON


Running rate of return in practice
pr = data.SP500.pct_change() # period return
pr_plus_one = pr.add(1)
cumulative_return = pr_plus_one.cumprod().sub(1)
cumulative_return.mul(100).plot()

MANIPULATING TIME SERIES DATA IN PYTHON


Getting the running min & max
data['running_min'] = data.SP500.expanding().min()
data['running_max'] = data.SP500.expanding().max()
data.plot()

MANIPULATING TIME SERIES DATA IN PYTHON


Rolling annual rate of return
def multi_period_return(period_returns):
return np.prod(period_returns + 1) - 1
pr = data.SP500.pct_change() # period return
r = pr.rolling('360D').apply(multi_period_return)
data['Rolling 1yr Return'] = r.mul(100)
data.plot(subplots=True)

MANIPULATING TIME SERIES DATA IN PYTHON


Rolling annual rate of return
data['Rolling 1yr Return'] = r.mul(100)
data.plot(subplots=True)

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Case study: S&P500
price simulation
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Random walks & simulations
Daily stock returns are hard to predict

Models o en assume they are random in nature

Numpy allows you to generate random numbers

From random returns to prices: use .cumprod()

Two examples:
Generate random returns

Randomly selected actual SP500 returns

MANIPULATING TIME SERIES DATA IN PYTHON


Generate random numbers
from numpy.random import normal, seed
from scipy.stats import norm
seed(42)
random_returns = normal(loc=0, scale=0.01, size=1000)
sns.distplot(random_returns, fit=norm, kde=False)

MANIPULATING TIME SERIES DATA IN PYTHON


Create a random price path
return_series = pd.Series(random_returns)
random_prices = return_series.add(1).cumprod().sub(1)
random_prices.mul(100).plot()

MANIPULATING TIME SERIES DATA IN PYTHON


S&P 500 prices & returns
data = pd.read_csv('sp500.csv', parse_dates=['date'], index_col='date')
data['returns'] = data.SP500.pct_change()
data.plot(subplots=True)

MANIPULATING TIME SERIES DATA IN PYTHON


S&P return distribution
sns.distplot(data.returns.dropna().mul(100), fit=norm)

MANIPULATING TIME SERIES DATA IN PYTHON


Generate random S&P 500 returns
from numpy.random import choice
sample = data.returns.dropna()
n_obs = data.returns.count()
random_walk = choice(sample, size=n_obs)
random_walk = pd.Series(random_walk, index=sample.index)
random_walk.head()

DATE
2007-05-29 -0.008357
2007-05-30 0.003702
2007-05-31 -0.013990
2007-06-01 0.008096
2007-06-04 0.013120

MANIPULATING TIME SERIES DATA IN PYTHON


Random S&P 500 prices (1)
start = data.SP500.first('D')

DATE
2007-05-25 1515.73
Name: SP500, dtype: float64

sp500_random = start.append(random_walk.add(1))
sp500_random.head())

DATE
2007-05-25 1515.730000
2007-05-29 0.998290
2007-05-30 0.995190
2007-05-31 0.997787
2007-06-01 0.983853
dtype: float64

MANIPULATING TIME SERIES DATA IN PYTHON


Random S&P 500 prices (2)
data['SP500_random'] = sp500_random.cumprod()
data[['SP500', 'SP500_random']].plot()

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Relationships
between time series:
correlation
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Correlation & relations between series
So far, focus on characteristics of individual variables

Now: characteristic of relations between variables

Correlation: measures linear relationships

Financial markets: important for prediction and risk


management

pandas & seaborn have tools to compute & visualize

MANIPULATING TIME SERIES DATA IN PYTHON


Correlation & linear relationships
Correlation coe cient: how similar is the pairwise movement
of two variables around their averages?
∑N (x −x̄)(yi − ȳ )
Varies between -1 and +1 r= i=1 i
sx sy

MANIPULATING TIME SERIES DATA IN PYTHON


Importing five price time series
data = pd.read_csv('assets.csv', parse_dates=['date'],
index_col='date')
data = data.dropna().info()

DatetimeIndex: 2469 entries, 2007-05-25 to 2017-05-22


Data columns (total 5 columns):
sp500 2469 non-null float64
nasdaq 2469 non-null float64
bonds 2469 non-null float64
gold 2469 non-null float64
oil 2469 non-null float64

MANIPULATING TIME SERIES DATA IN PYTHON


Visualize pairwise linear relationships
daily_returns = data.pct_change()
sns.jointplot(x='sp500', y='nasdaq', data=data_returns);

MANIPULATING TIME SERIES DATA IN PYTHON


Calculate all correlations
correlations = returns.corr()
correlations

bonds oil gold sp500 nasdaq


bonds 1.000000 -0.183755 0.003167 -0.300877 -0.306437
oil -0.183755 1.000000 0.105930 0.335578 0.289590
gold 0.003167 0.105930 1.000000 -0.007786 -0.002544
sp500 -0.300877 0.335578 -0.007786 1.000000 0.959990
nasdaq -0.306437 0.289590 -0.002544 0.959990 1.000000

MANIPULATING TIME SERIES DATA IN PYTHON


Visualize all correlations
sns.heatmap(correlations, annot=True)

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Select index
components &
import data
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Market value-weighted index
Composite performance of various stocks

Components weighted by market capitalization


Share Price x Number of Shares => Market Value

Larger components get higher percentage weightings

Key market indexes are value-weighted:


S&P 500 , NASDAQ , Wilshire 5000 , Hang Seng

MANIPULATING TIME SERIES DATA IN PYTHON


Build a cap-weighted Index
Apply new skills to construct value-weighted index
Select components from exchange listing data

Get component number of shares and stock prices

Calculate component weights

Calculate index

Evaluate performance of components and index

MANIPULATING TIME SERIES DATA IN PYTHON


Load stock listing data
nyse = pd.read_excel('listings.xlsx', sheet_name='nyse',
na_values='n/a')
nyse.info()

RangeIndex: 3147 entries, 0 to 3146


Data columns (total 7 columns):
Stock Symbol 3147 non-null object # Stock Ticker
Company Name 3147 non-null object
Last Sale 3079 non-null float64 # Latest Stock Price
Market Capitalization 3147 non-null float64
IPO Year 1361 non-null float64 # Year of listing
Sector 2177 non-null object
Industry 2177 non-null object
dtypes: float64(3), object(4)

MANIPULATING TIME SERIES DATA IN PYTHON


Load & prepare listing data
nyse.set_index('Stock Symbol', inplace=True)
nyse.dropna(subset=['Sector'], inplace=True)
nyse['Market Capitalization'] /= 1e6 # in Million USD

Index: 2177 entries, DDD to ZTO


Data columns (total 6 columns):
Company Name 2177 non-null object
Last Sale 2175 non-null float64
Market Capitalization 2177 non-null float64
IPO Year 967 non-null float64
Sector 2177 non-null object
Industry 2177 non-null object
dtypes: float64(3), object(3)

MANIPULATING TIME SERIES DATA IN PYTHON


Select index components
components = nyse.groupby(['Sector'])['Market Capitalization'].nlargest(1)
components.sort_values(ascending=False)

Sector Stock Symbol


Health Care JNJ 338834.390080
Energy XOM 338728.713874
Finance JPM 300283.250479
Miscellaneous BABA 275525.000000
Public Utilities T 247339.517272
Basic Industries PG 230159.644117
Consumer Services WMT 221864.614129
Consumer Non-Durables KO 183655.305119
Technology ORCL 181046.096000
Capital Goods TM 155660.252483
Transportation UPS 90180.886756
Consumer Durables ABB 48398.935676
Name: Market Capitalization, dtype: float64

MANIPULATING TIME SERIES DATA IN PYTHON


Import & prepare listing data
tickers = components.index.get_level_values('Stock Symbol')
tickers

Index(['PG', 'TM', 'ABB', 'KO', 'WMT', 'XOM', 'JPM', 'JNJ', 'BABA', 'T',
'ORCL', ‘UPS'], dtype='object', name='Stock Symbol’)

tickers.tolist()

['PG',
'TM',
'ABB',
'KO',
'WMT',
...
'T',
'ORCL',
'UPS']

MANIPULATING TIME SERIES DATA IN PYTHON


Stock index components
columns = ['Company Name', 'Market Capitalization', 'Last Sale']
component_info = nyse.loc[tickers, columns]
pd.options.display.float_format = '{:,.2f}'.format

Company Name Market Capitalization Last Sale


Stock Symbol
PG Procter & Gamble Company (The) 230,159.64 90.03
TM Toyota Motor Corp Ltd Ord 155,660.25 104.18
ABB ABB Ltd 48,398.94 22.63
KO Coca-Cola Company (The) 183,655.31 42.79
WMT Wal-Mart Stores, Inc. 221,864.61 73.15
XOM Exxon Mobil Corporation 338,728.71 81.69
JPM J P Morgan Chase & Co 300,283.25 84.40
JNJ Johnson & Johnson 338,834.39 124.99
BABA Alibaba Group Holding Limited 275,525.00 110.21
T AT&T Inc. 247,339.52 40.28
ORCL Oracle Corporation 181,046.10 44.00
UPS United Parcel Service, Inc. 90,180.89 103.74

MANIPULATING TIME SERIES DATA IN PYTHON


Import & prepare listing data
data = pd.read_csv('stocks.csv', parse_dates=['Date'],
index_col='Date').loc[:, tickers.tolist()]
data.info()

DatetimeIndex: 252 entries, 2016-01-04 to 2016-12-30


Data columns (total 12 columns):
ABB 252 non-null float64
BABA 252 non-null float64
JNJ 252 non-null float64
JPM 252 non-null float64
KO 252 non-null float64
ORCL 252 non-null float64
PG 252 non-null float64
T 252 non-null float64
TM 252 non-null float64
UPS 252 non-null float64
WMT 252 non-null float64
XOM 252 non-null float64
dtypes: float64(12)

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Build a market-cap
weighted index
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Build your value-weighted index
Key inputs:
number of shares

stock price series

MANIPULATING TIME SERIES DATA IN PYTHON


Build your value-weighted index
Key inputs:
number of shares

stock price series

Normalize index to start


at 100

MANIPULATING TIME SERIES DATA IN PYTHON


Stock index components
components

Company Name Market Capitalization Last Sale


Stock Symbol
PG Procter & Gamble Company (The) 230,159.64 90.03
TM Toyota Motor Corp Ltd Ord 155,660.25 104.18
ABB ABB Ltd 48,398.94 22.63
KO Coca-Cola Company (The) 183,655.31 42.79
WMT Wal-Mart Stores, Inc. 221,864.61 73.15
XOM Exxon Mobil Corporation 338,728.71 81.69
JPM J P Morgan Chase & Co 300,283.25 84.40
JNJ Johnson & Johnson 338,834.39 124.99
BABA Alibaba Group Holding Limited 275,525.00 110.21
T AT&T Inc. 247,339.52 40.28
ORCL Oracle Corporation 181,046.10 44.00
UPS United Parcel Service, Inc. 90,180.89 103.74

MANIPULATING TIME SERIES DATA IN PYTHON


Number of shares outstanding
shares = components['Market Capitalization'].div(components['Last Sale'])

Stock Symbol
PG 2,556.48 # Outstanding shares in million
TM 1,494.15
ABB 2,138.71
KO 4,292.01
WMT 3,033.01
XOM 4,146.51
JPM 3,557.86
JNJ 2,710.89
BABA 2,500.00
T 6,140.50
ORCL 4,114.68
UPS 869.30
dtype: float64

Market Capitalization = Number of Shares x Share Price

MANIPULATING TIME SERIES DATA IN PYTHON


Historical stock prices
data = pd.read_csv('stocks.csv', parse_dates=['Date'],
index_col='Date').loc[:, tickers.tolist()]
market_cap_series = data.mul(no_shares)
market_series.info()

DatetimeIndex: 252 entries, 2016-01-04 to 2016-12-30


Data columns (total 12 columns):
ABB 252 non-null float64
BABA 252 non-null float64
JNJ 252 non-null float64
JPM 252 non-null float64
...
TM 252 non-null float64
UPS 252 non-null float64
WMT 252 non-null float64
XOM 252 non-null float64
dtypes: float64(12)

MANIPULATING TIME SERIES DATA IN PYTHON


From stock prices to market value
market_cap_series.first('D').append(market_cap_series.last('D'))

ABB BABA JNJ JPM KO ORCL \\


Date
2016-01-04 37,470.14 191,725.00 272,390.43 226,350.95 181,981.42 147,099.95
2016-12-30 45,062.55 219,525.00 312,321.87 307,007.60 177,946.93 158,209.60
PG T TM UPS WMT XOM
Date
2016-01-04 200,351.12 210,926.33 181,479.12 82,444.14 186,408.74 321,188.96
2016-12-30 214,948.60 261,155.65 175,114.05 99,656.23 209,641.59 374,264.34

MANIPULATING TIME SERIES DATA IN PYTHON


Aggregate market value per period
agg_mcap = market_cap_series.sum(axis=1) # Total market cap
agg_mcap(title='Aggregate Market Cap')

MANIPULATING TIME SERIES DATA IN PYTHON


Value-based index
index = agg_mcap.div(agg_mcap.iloc[0]).mul(100) # Divide by 1st value
index.plot(title='Market-Cap Weighted Index')

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Evaluate index
performance
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Evaluate your value-weighted index
Index return:
Total index return

Contribution by component

Performance vs Benchmark
Total period return

Rolling returns for sub periods

MANIPULATING TIME SERIES DATA IN PYTHON


Value-based index - recap
agg_market_cap = market_cap_series.sum(axis=1)
index = agg_market_cap.div(agg_market_cap.iloc[0]).mul(100)
index.plot(title='Market-Cap Weighted Index')

MANIPULATING TIME SERIES DATA IN PYTHON


Value contribution by stock
agg_market_cap.iloc[-1] - agg_market_cap.iloc[0]

315,037.71

MANIPULATING TIME SERIES DATA IN PYTHON


Value contribution by stock
change = market_cap_series.first('D').append(market_cap_series.last('D'))
change.diff().iloc[-1].sort_values() # or: .loc['2016-12-30']

TM -6,365.07
KO -4,034.49
ABB 7,592.41
ORCL 11,109.65
PG 14,597.48
UPS 17,212.08
WMT 23,232.85
BABA 27,800.00
JNJ 39,931.44
T 50,229.33
XOM 53,075.38
JPM 80,656.65
Name: 2016-12-30 00:00:00, dtype: float64

MANIPULATING TIME SERIES DATA IN PYTHON


Market-cap based weights
market_cap = components['Market Capitalization']
weights = market_cap.div(market_cap.sum())
weights.sort_values().mul(100)

Stock Symbol
ABB 1.85
UPS 3.45
TM 5.96
ORCL 6.93
KO 7.03
WMT 8.50
PG 8.81
T 9.47
BABA 10.55
JPM 11.50
XOM 12.97
JNJ 12.97
Name: Market Capitalization, dtype: float64

MANIPULATING TIME SERIES DATA IN PYTHON


Value-weighted component returns
index_return = (index.iloc[-1] / index.iloc[0] - 1) * 100

14.06

weighted_returns = weights.mul(index_return)
weighted_returns.sort_values().plot(kind='barh')

MANIPULATING TIME SERIES DATA IN PYTHON


Performance vs benchmark
data = index.to_frame('Index') # Convert pd.Series to pd.DataFrame
data['SP500'] = pd.read_csv('sp500.csv', parse_dates=['Date'],
index_col='Date')
data.SP500 = data.SP500.div(data.SP500.iloc[0], axis=0).mul(100)

MANIPULATING TIME SERIES DATA IN PYTHON


Performance vs benchmark: 30D rolling return
def multi_period_return(r):
return (np.prod(r + 1) - 1) * 100
data.pct_change().rolling('30D').apply(multi_period_return).plot()

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Index correlation &
exporting to Excel
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Some additional analysis of your index
Daily return correlations:

Calculate among all components

Visualize the result as heatmap

Write results to excel using .xls and .xlsx formats:

Single worksheet

Multiple worksheets

MANIPULATING TIME SERIES DATA IN PYTHON


Index components - price data
data = DataReader(tickers, 'google', start='2016', end='2017')['Close']
data.info()

DatetimeIndex: 252 entries, 2016-01-04 to 2016-12-30


Data columns (total 12 columns):
ABB 252 non-null float64
BABA 252 non-null float64
JNJ 252 non-null float64
JPM 252 non-null float64
KO 252 non-null float64
ORCL 252 non-null float64
PG 252 non-null float64
T 252 non-null float64
TM 252 non-null float64
UPS 252 non-null float64
WMT 252 non-null float64
XOM 252 non-null float64

MANIPULATING TIME SERIES DATA IN PYTHON


Index components: return correlations
daily_returns = data.pct_change()
correlations = daily_returns.corr()

ABB BABA JNJ JPM KO ORCL PG T TM UPS WMT XOM


ABB 1.00 0.40 0.33 0.56 0.31 0.53 0.34 0.29 0.48 0.50 0.15 0.48
BABA 0.40 1.00 0.27 0.27 0.25 0.38 0.21 0.17 0.34 0.35 0.13 0.21
JNJ 0.33 0.27 1.00 0.34 0.30 0.37 0.42 0.35 0.29 0.45 0.24 0.41
JPM 0.56 0.27 0.34 1.00 0.22 0.57 0.27 0.13 0.49 0.56 0.14 0.48
KO 0.31 0.25 0.30 0.22 1.00 0.31 0.62 0.47 0.33 0.50 0.25 0.29
ORCL 0.53 0.38 0.37 0.57 0.31 1.00 0.41 0.32 0.48 0.54 0.21 0.42
PG 0.34 0.21 0.42 0.27 0.62 0.41 1.00 0.43 0.32 0.47 0.33 0.34
T 0.29 0.17 0.35 0.13 0.47 0.32 0.43 1.00 0.28 0.41 0.31 0.33
TM 0.48 0.34 0.29 0.49 0.33 0.48 0.32 0.28 1.00 0.52 0.20 0.30
UPS 0.50 0.35 0.45 0.56 0.50 0.54 0.47 0.41 0.52 1.00 0.33 0.45
WMT 0.15 0.13 0.24 0.14 0.25 0.21 0.33 0.31 0.20 0.33 1.00 0.21
XOM 0.48 0.21 0.41 0.48 0.29 0.42 0.34 0.33 0.30 0.45 0.21 1.00

MANIPULATING TIME SERIES DATA IN PYTHON


Index components: return correlations
sns.heatmap(correlations, annot=True)
plt.xticks(rotation=45)
plt.title('Daily Return Correlations')

MANIPULATING TIME SERIES DATA IN PYTHON


Saving to a single Excel worksheet
correlations.to_excel(excel_writer= 'correlations.xls',
sheet_name='correlations',
startrow=1,
startcol=1)

MANIPULATING TIME SERIES DATA IN PYTHON


Saving to multiple Excel worksheets
data.index = data.index.date # Keep only date component
with pd.ExcelWriter('stock_data.xlsx') as writer:
corr.to_excel(excel_writer=writer, sheet_name='correlations')
data.to_excel(excel_writer=writer, sheet_name='prices')
data.pct_change().to_excel(writer, sheet_name='returns')

MANIPULATING TIME SERIES DATA IN PYTHON


Let's practice!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Congratulations!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N

Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Congratulations!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Reading, inspecting,
and cleaning data
from CSV
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
Import and clean data
Ensure that pd.DataFrame() is same as CSV source file
Stock exchange listings: amex-listings.csv

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


How pandas stores data
Each column has its own data format ( dtype )

dtype affects your calculation and visualization

pandas dtype Column characteristics

object Text, or a mix of text and numeric data

int64 Numeric: whole numbers - 64 bits (≤ 264 )

float64 Numeric: Decimals, or whole numbers with missing values

datetime64 Date and time information

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Import & inspect
import pandas as pd
amex = pd.read_csv('amex-listings.csv')
amex.info() # To inspect table structure & data types

RangeIndex: 360 entries, 0 to 359


Data columns (total 8 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 360 non-null object
1 Company Name 360 non-null object
2 Last Sale 346 non-null float64
3 Market Capitalization 360 non-null float64
4 IPO Year 105 non-null float64
5 Sector 238 non-null object
6 Industry 238 non-null object
7 Last Update 360 non-null object
dtypes: float64(3), object(5)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Dealing with missing values
# Replace 'n/a' with np.nan
amex = pd.read_csv('amex-listings.csv', na_values='n/a')
amex.info()

RangeIndex: 360 entries, 0 to 359


Data columns (total 8 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 360 non-null object
1 Company Name 360 non-null object
2 Last Sale 346 non-null float64
3 Market Capitalization 360 non-null float64
4 IPO Year 105 non-null float64
5 Sector 238 non-null object
6 Industry 238 non-null object
7 Last Update 360 non-null object
dtypes: float64(3), object(5)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Properly parsing dates
amex = pd.read_csv('amex-listings.csv',
na_values='n/a',
parse_dates=['Last Update'])
amex.info()

RangeIndex: 360 entries, 0 to 359


Data columns (total 8 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 360 non-null object
1 Company Name 360 non-null object
2 Last Sale 346 non-null float64
3 Market Capitalization 360 non-null float64
4 IPO Year 105 non-null float64
5 Sector 238 non-null object
6 Industry 238 non-null object
7 Last Update 360 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(3), object(4)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Showing off the result
amex.head(2) # Show first n rows (default: 5)

Stock Symbol Company Name


0 XXII 22nd Century Group, Inc
1 FAX Aberdeen Asia-Pacific Income Fund Inc

Last Sale Market Capitalization IPO Year


0 1.3300 1.206285e+08 NaN
1 5.0000 1.266333e+09 1986.0

Sector Industry Last Update


0 Non-Durables Farming/Seeds/Milling 2017-04-26
1 NaN NaN 2017-04-25

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Read data from
Excel worksheets
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
Import data from Excel

pd.read_excel(file, sheet_name=0)
Select first sheet by default with sheet_name=0

Select by name with sheet_name='amex'


Import several sheets with list such as sheet_name=['amex', 'nasdaq']

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Import data from one sheet
amex = pd.read_excel('listings.xlsx',
sheet_name='amex',
na_values='n/a')
amex.info()

RangeIndex: 360 entries, 0 to 359


Data columns (total 7 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 360 non-null object
1 Company Name 360 non-null object
2 Last Sale 346 non-null float64
3 Market Capitalization 360 non-null float64
4 IPO Year 105 non-null float64

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Import data from two sheets
listings = pd.read_excel('listings.xlsx',
sheet_name=['amex', 'nasdaq'], # keys = sheet name
na_values='n/a') # values = DataFrame
listings['nasdaq'].info()

# Column Non-Null Count Dtype


-- ------ -------------- -----
0 Stock Symbol 3167 non-null object
1 Company Name 3167 non-null object
2 Last Sale 3165 non-null float64
3 Market Capitalization 3167 non-null float64
4 IPO Year 1386 non-null float64
...

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Get sheet names
xls = pd.ExcelFile('listings.xlsx') # pd.ExcelFile object
exchanges = xls.sheet_names
exchanges

['amex', 'nasdaq', 'nyse']

nyse = pd.read_excel(xls,
sheet_name=exchanges[2],
na_values='n/a')

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Get sheet names
nyse.info()

RangeIndex: 3147 entries, 0 to 3146


Data columns (total 7 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 3147 non-null object
1 Company Name 3147 non-null object
... ...
6 Industry 2177 non-null object
dtypes: float64(3), object(4)
memory usage: 172.2+ KB

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Combine data from
multiple worksheets
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
Combine DataFrames
Concatenate or "stack" a list of pd.DataFrame s
Syntax: pd.concat([amex, nasdaq, nyse])

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Combine DataFrames
Concatenate or "stack" a list of pd.DataFrame s
Syntax: pd.concat([amex, nasdaq, nyse])

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Combine DataFrames
Concatenate or "stack" a list of pd.DataFrame s
Syntax: pd.concat([amex, nasdaq, nyse])

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Concatenate two DataFrames
amex = pd.read_excel('listings.xlsx',
sheet_name='amex',
na_values=['n/a'])
nyse = pd.read_excel('listings.xlsx',
sheet_name='nyse',
na_values=['n/a'])
pd.concat([amex, nyse]).info()

Int64Index: 3507 entries, 0 to 3146


Data columns (total 7 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 3507 non-null object
...

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Add a reference column
amex['Exchange'] = 'AMEX' # Add column to reference source
nyse['Exchange'] = 'NYSE'
listings = pd.concat([amex, nyse])
listings.head(2)

Stock Symbol ... Exchange


0 XXII ... AMEX
1 FAX ... AMEX

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Combine three DataFrames
xls = pd.ExcelFile('listings.xlsx')
exchanges = xls.sheet_names
# Create empty list to collect DataFrames
listings = []
for exchange in exchanges:
listing = pd.read_excel(xls, sheet_name=exchange)
# Add reference col
listing['Exchange'] = exchange
# Add DataFrame to list
listings.append(listing)
# List of DataFrames
combined_listings = pd.concat(listings)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Combine three DataFrames
combined_listings.info()

Int64Index: 6674 entries, 0 to 3146


Data columns (total 8 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 6674 non-null object
1 Company Name 6674 non-null object
2 Last Sale 6590 non-null float64
3 Market Capitalization 6674 non-null float64
4 IPO Year 2852 non-null float64
5 Sector 5182 non-null object
6 Industry 5182 non-null object
7 Exchange 6674 non-null object
dtypes: float64(3), object(5)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
The DataReader:
Access financial
data online
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
pandas_datareader
Easy access to various financial internet data sources
Little code needed to import into a pandas DataFrame

Available sources include:


IEX and Yahoo! Finance (including derivatives)

Federal Reserve

World Bank, OECD, Eurostat

OANDA

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Stock prices: Yahoo! Finance
from pandas_datareader.data import DataReader
from datetime import date # Date & time functionality

start = date(2015, 1, 1) # Default: Jan 1, 2010


end = date(2016, 12, 31) # Default: today
ticker = 'GOOG'
data_source = 'yahoo'
stock_data = DataReader(ticker, data_source, start, end)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Stock prices: Yahoo! Finance
stock_data.info()

DatetimeIndex: 504 entries, 2015-01-02 to 2016-12-30


Data columns (total 6 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 High 504 non-null float64 # First price
1 Low 504 non-null float64 # Highest price
2 Open 504 non-null float64 # Lowest price
3 Close 504 non-null float64 # Last price
4 Volume 504 non-null float64 # No shares traded
5 Adj Close 504 non-null float64 # Adj. price
dtypes: float64(6)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Stock prices: Yahoo! Finance
pd.concat([stock_data.head(3), stock_data.tail(3)])

High Low Open Close Volume Adj Close


Date
2015-01-02 26.49 26.13 26.38 26.17 28951268 26.17
2015-01-05 26.14 25.58 26.09 25.62 41196796 25.62
2015-01-06 25.74 24.98 25.68 25.03 57998800 25.03
2016-12-28 39.71 39.16 39.69 39.25 23076000 39.25
2016-12-29 39.30 38.95 39.17 39.14 14886000 39.14
2016-12-30 39.14 38.52 39.14 38.59 35400000 38.59

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Stock prices: Visualization
import matplotlib.pyplot as plt
stock_data['Close'].plot(title=ticker)
plt.show()

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Economic data from
the Federal Reserve
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
Economic data from FRED

Federal Reserve Economic Data

500,000 series covering a range of categories:


Economic growth & employment

Monetary & fiscal policy

Demographics, industries, commodity prices


Daily, monthly, annual frequencies

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Get data from FRED

1 https://fred.stlouisfed.org/

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Get data from FRED

1 https://fred.stlouisfed.org/

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Get data from FRED

1 https://fred.stlouisfed.org/

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Interest rates
from pandas_datareader.data import DataReader
from datetime import date
series_code = 'DGS10' # 10-year Treasury Rate
data_source = 'fred' # FED Economic Data Service
start = date(1962, 1, 1)
data = DataReader(series_code, data_source, start)
data.info()

DatetimeIndex: 15754 entries, 1962-01-02 to 2022-05-20


Data columns (total 1 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 DGS10 15083 non-null float64
dtypes: float64(1)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Stock prices: Visualization
.rename(columns={old_name: new_name})

series_name = '10-year Treasury'


data = data.rename(columns={series_code: series_name})
data.plot(title=series_name); plt.show()

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Combine stock and economic data
start = date(2000, 1, 1)
series = 'DCOILWTICO' # West Texas Intermediate Oil Price
oil = DataReader(series, 'fred', start)
ticker = 'XOM' # Exxon Mobile Corporation
stock = DataReader(ticker, 'yanoo', start)
data = pd.concat([stock[['Close']], oil], axis=1)
data.info()

DatetimeIndex: 5841 entries, 2000-01-03 to 2022-05-23


Data columns (total 2 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Close 5634 non-null float64
1 DCOILWTICO 5615 non-null float64

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Combine stock and economic data
data.columns = ['Exxon', 'Oil Price']
data.plot()
plt.show()

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Select stocks and
get data from
Yahoo! Finance
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
Select stocks based on criteria
Use the listing information to select specific stocks
As criteria:
Stock Exchange

Sector or Industry

IPO Year

Market Capitalization

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Get ticker for largest company
nyse = pd.read_excel('listings.xlsx',sheet_name='nyse', na_values='n/a')
nyse = nyse.sort_values('Market Capitalization', ascending=False)
nyse[['Stock Symbol', 'Company Name']].head(3)

Stock Symbol Company Name


1586 JNJ Johnson & Johnson
1125 XOM Exxon Mobil Corporation
1548 JPM J P Morgan Chase & Co

largest_by_market_cap = nyse.iloc[0] # 1st row


largest_by_market_cap['Stock Symbol'] # Select row label

'JNJ'

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Get ticker for largest company
nyse = nyse.set_index('Stock Symbol') # Stock ticker as index
nyse.info()

Index: 3147 entries, JNJ to EAE


Data columns (total 6 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Company Name 3147 non-null object
1 Last Sale 3079 non-null float64
2 Market Capitalization 3147 non-null float64
...

nyse['Market Capitalization'].idxmax() # Index of max value

'JNJ'

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Get ticker for largest tech company
nyse['Sector'].unique() # Unique values as numpy array

array(['Technology', 'Health Care', ...], dtype=object)

tech = nyse.loc[nyse.Sector == 'Technology']


tech['Company Name'].head(2)

Stock Symbol Company Name


ORCL Oracle Corporation
TSM Taiwan Semiconductor Manufacturing

nyse.loc[nyse.Sector=='Technology', 'Market Capitalization'].idxmax()

'ORCL'

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Get data for largest tech company with 2017 IPO
ticker = nyse.loc[(nyse.Sector=='Technology') & (nyse['IPO Year']==2017),
'Market Capitalization'].idxmax()
data = DataReader(ticker, 'yahoo') # Start: 2010/1/1
data = data.loc[:, ['Close', 'Volume']]

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Visualize price and volume on two axes
import matplotlib.pyplot as plt
data.plot(title=ticker, secondary_y='Volume')
plt.tight_layout(); plt.show()

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Get several stocks &
manage a
MultiIndex
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
Get data for several stocks
Use the listing information to select multiple stocks
E.g. largest 3 stocks per sector

Use Yahoo! Finance to retrieve data for several stocks

Learn how to manage a pandas MultiIndex , a powerful tool to deal with more complex
data sets

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Load prices for top 5 companies
nasdaq = pd.read_excel('listings.xlsx', sheet_name='nasdaq', na_values='n/a')
nasdaq.set_index('Stock Symbol', inplace=True)
top_5 = nasdaq['Market Capitalization'].nlargest(n=5) # Top 5
top_5.div(1000000) # Market Cap in million USD

AAPL 740024.467000
GOOG 569426.124504
... ...
Name: Market Capitalization, dtype: float64

tickers = top_5.index.tolist() # Convert index to list

['AAPL', 'GOOG', 'MSFT', 'AMZN', 'FB']

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Load prices for top 5 companies
df = DataReader(tickers, 'yahoo', start=date(2020, 1, 1))

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 712 entries, 2020-01-02 to 2022-10-27
Data columns (total 30 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 (Adj Close, AAPL) 712 non-null float64
1 (Adj Close, GOOG) 712 non-null float64
2 (Adj Close, MSFT) 712 non-null float64
...
28 (Volume, AMZN) 712 non-null float64
29 (Volume, FB) 253 non-null float64
dtypes: float64(30)
memory usage: 172.4 KB

df = df.stack()

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Load prices for top 5 companies
df.info()

MultiIndex: 3101 entries, (Timestamp('2020-01-02 00:00:00'), 'AAPL') to (Timestamp('


Data columns (total 6 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Adj Close 3101 non-null float64
...

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Reshape your data: .unstack()
unstacked = df['Close'].unstack()
unstacked.info()

DatetimeIndex: 712 entries, 2020-01-02 to 2022-10-27


Data columns (total 5 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 AAPL 712 non-null float64
1 GOOG 712 non-null float64
2 MSFT 712 non-null float64
3 AMZN 712 non-null float64
4 FB 253 non-null float64
dtypes: float64(5)
memory usage: 33.4 KB

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


From long to wide format
unstacked = df['Close'].unstack() # Results in DataFrame

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Stock prices: Visualization
unstacked.plot(subplots=True)
plt.tight_layout(); plt.show()

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Summarize your
data with
descriptive stats
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
Be on top of your data
Goal: Capture key quantitative characteristics
Important angles to look at:
Central tendency: Which values are "typical"?

Dispersion: Are there outliers?

Overall distribution of individual variables

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Central tendency
n
1
Mean (average): x̄ = ∑ xi
n
i=1
Median: 50% of values smaller/larger

Mode: most frequent value

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Central tendency
n
1
Mean (average): x̄ = ∑ xi
n
i=1
Median: 50% of values smaller/larger

Mode: most frequent value

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Central tendency
n
1
Mean (average): x̄ = ∑ xi
n
i=1
Median: 50% of values smaller/larger

Mode: most frequent value

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Calculate summary statistics
nasdaq = pd.read_excel('listings.xlsx', sheet_name='nasdaq', na_values='n/a')
market_cap = nasdaq['Market Capitalization'].div(10**6)

market_cap.mean()

3180.7126214953805

market_cap.median()

225.9684285

market_cap.mode()

0.0

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Calculate summary statistics

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Dispersion
Variance: Sum all of the squared differences from mean and divide by n − 1
n
1
var = ∑(xi − x̄)2
n−1
i=1
Standard deviation: Square root of variance
sd = √var

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Calculate variance and standard deviation
variance = market_cap.var()
print(variance)

648773812.8182

np.sqrt(variance)

25471.0387

market_cap.std()

25471.0387

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Describe the
distribution of your
data with quantiles
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
Describe data distributions
First glance: Central tendency and standard deviation
How to get a more granular view of the distribution?

Calculate and plot quantiles

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


More on dispersion: quantiles
Quantiles: Groups with equal share of observations
Quartiles: 4 groups, 25% of data each

Deciles: 10 groups, 10% of data each

Interquartile range: 3rd quartile - 1st quartile

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Quantiles with pandas
market_cap = nasdaq['Market Capitalization'].div(10**6)
median = market_cap.quantile(.5)
median == market_cap.median()

True

quantiles = market_cap.quantile([.25, .75])

0.25 43.375930
0.75 969.905207

quantiles[.75] - quantiles[.25] # Interquartile Range

926.5292771575

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Quantiles with pandas & numpy
deciles = np.arange(start=.1, stop=.91, step=.1)
deciles

array([ 0.1, 0.2, 0.3, 0.4, ..., 0.7, 0.8, 0.9])

market_cap.quantile(deciles)

0.1 4.884565
0.2 26.993382
0.3 65.714547
0.4 124.320644
0.5 225.968428
0.6 402.469678
...

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Visualize quantiles with bar chart
title = 'NASDAQ Market Capitalization (million USD)'
market_cap.quantile(deciles).plot(kind='bar', title=title)
plt.tight_layout(); plt.show()

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


All statistics in one go
market_cap.describe()

count 3167.000000
mean 3180.712621
std 25471.038707
min 0.000000
25% 43.375930 # 1st quantile
50% 225.968428 # Median
75% 969.905207 # 3rd quantile
max 740024.467000
Name: Market Capitalization

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


All statistics in one go
market_cap.describe(percentiles=np.arange(.1, .91, .1))

count 3167.000000
mean 3180.712621
std 25471.038707
min 0.000000
10% 4.884565
20% 26.993382
30% 65.714547
40% 124.320644
50% 225.968428
60% 402.469678
70% 723.163197
80% 1441.071134
...

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Visualize the
distribution of your
data
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
Always look at your data!
Identical metrics can represent very different data

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Introducing seaborn plots
Many attractive and insightful statistical plots
Based on matplotlib

Swiss Army knife: seaborn.distplot()


Histogram

Kernel Density Estimation (KDE)

Rugplot

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


10 year treasury: trend and distribution
ty10 = web.DataReader('DGS10', 'fred', date(1962, 1, 1))
ty10.info()

DatetimeIndex: 15754 entries, 1962-01-02 to 2022-05-20


Data columns (total 1 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 DGS10 15083 non-null float64

ty10.describe()

DGS10
mean 6.291073
std 2.851161
min 1.370000
25% 4.190000
50% 6.040000
...

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


10 year treasury: time series trend
ty10.dropna(inplace=True) # Avoid creation of copy
ty10.plot(title='10-year Treasury'); plt.tight_layout()

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


10 year treasury: historical distribution
import seaborn as sns
sns.distplot(ty10)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


10 year treasury: trend and distribution
ax = sns.distplot(ty10)
ax.axvline(ty10['DGS10'].median(), color='black', ls='--')

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Summarize
categorical
variables
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
From categorical to quantitative variables
So far, we have analyzed quantitative variables
Categorical variables require a different approach

Concepts like average don't make much sense

Instead, we'll rely on their frequency distribution

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Categorical listing information
amex = pd.read_excel('listings.xlsx', sheet_name='amex',
na_values=['n/a'])
amex.info()

RangeIndex: 360 entries, 0 to 359


Data columns (total 7 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 Stock Symbol 360 non-null object
1 Company Name 360 non-null object
2 Last Sale 346 non-null float64
3 Market Capitalization 360 non-null float64
4 IPO Year 105 non-null float64
5 Sector 238 non-null object
6 Industry 238 non-null object
dtypes: float64(3), object(4)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Categorical listing information
amex = amex['Sector'].nunique()

12

apply() : call function on each column

lambda : "anonymous function", receives each column as argument x

amex.Sector.apply(lambda x: x.nunique())

Stock Symbol 360


Company Name 326
Last Sale 323
Market Capitalization 317
...

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


How many observations per sector?
amex['Sector'].value_counts()

Health Care 49 # Mode


Basic Industries 44
Energy 28
Consumer Services 27
Capital Goods 24
Technology 20
Consumer Non-Durables 13
Finance 12
Public Utilities 11
Miscellaneous 5
...

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


How many IPOs per year?
amex['IPO Year'].value_counts()

2002.0 19 # Mode
2015.0 11
1999.0 9
1993.0 7
2014.0 6
2013.0 5
2017.0 5
...
2009.0 1
1990.0 1
1991.0 1
Name: IPO Year, dtype: int64

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Convert IPO Year to int
ipo_by_yr = amex['IPO Year'].dropna().astype(int).value_counts()
ipo_by_yr

2002 19
2015 11
1999 9
1993 7
2014 6
2004 5
2003 5
2017 5
...
1987 1
Name: IPO Year, dtype: int64

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Convert IPO Year to int
ipo_by_yr.plot(kind='bar', title='IPOs per Year')
plt.xticks(rotation=45)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Aggregate your
data by category
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
Summarize numeric data by category
So far: Summarize individual variables
Compute descriptive statistic like mean, quantiles

Split data into groups, then summarize groups

Examples:
Largest company by exchange

Median market capitalization per IPO year

Average market capitalization per sector

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Group your data by sector
nasdaq.info()

RangeIndex: 3167 entries, 0 to 3166


Data columns (total 7 columns):
# Column Non-Null Count Dtype
-- --- -------------- -----
0 Stock Symbol 3167 non-null object
1 Company Name 3167 non-null object
2 Last Sale 3165 non-null float64
3 Market Capitalization 3167 non-null float64
4 IPO Year 1386 non-null float64
5 Sector 2767 non-null object
6 Industry 2767 non-null object
dtypes: float64(3), object(4)
memory usage: 173 3+ KB

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Group your data by sector
nasdaq['market_cap_m'] = nasdaq['Market Capitalization'].div(1e6)
nasdaq = nasdaq.drop('Market Capitalization', axis=1) # Drop column
nasdaq_by_sector = nasdaq.groupby('Sector') # Create groupby object
for sector, data in nasdaq_by_sector:
print(sector, data.market_cap_m.mean())

Basic Industries 724.899933858


Capital Goods 1511.23737278
Consumer Durables 839.802606627
Consumer Non-Durables 3104.05120552
...
Public Utilities 2357.86531507
Technology 10883.4342135
Transportation 2869.66000673

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Keep it simple and skip the loop
mcap_by_sector = nasdaq_by_sector.market_cap_m.mean()
mcap_by_sector

Sector
Basic Industries 724.899934
Capital Goods 1511.237373
Consumer Durables 839.802607
Consumer Non-Durables 3104.051206
Consumer Services 5582.344175
Energy 826.607608
Finance 1044.090205
Health Care 1758.709197
...

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Visualize category summaries
title = 'NASDAQ = Avg. Market Cap by Sector'
mcap_by_sector.plot(kind='barh', title=title)
plt.xlabel('USD mn')

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Aggregate summary for all numeric columns
nasdaq_by_sector.mean()

Last Sale IPO Year market_cap_m


Sector
Basic Industries 21.597679 2000.766667 724.899934
Capital Goods 26.188681 2001.324675 1511.237373
Consumer Durables 24.363391 2003.222222 839.802607
Consumer Non-Durables 25.749565 2000.609756 3104.051206
Consumer Services 34.917318 2004.104575 5582.344175
Energy 15.496834 2008.034483 826.607608
Finance 29.644242 2010.321101 1044.090205
Health Care 19.462531 2009.240409 1758.709197
Miscellaneous 46.094369 2004.333333 3445.655935
Public Utilities 18.643705 2006.040000 2357.865315
...

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
More ways to
aggregate your
data
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
Many ways to aggregate
Last segment: Group by one variable and aggregate

More detailed ways to summarize your data:


Group by two or more variables

Apply multiple aggregations

Examples
Median market cap by sector and IPO year

Mean & standard deviation of stock price by year

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Several aggregations by category
nasdaq['market_cap_m'] = nasdaq['Market Capitalization'].div(1e6)
by_sector = nasdaq.groupby('Sector')
by_sector.market_cap_m.agg(['size', 'mean']).sort_values('size')

Sector size mean


Transportation 52 2869.660007
Energy 66 826.607608
Public Utilities 66 2357.865315
Basic Industries 78 724.899934
...
Consumer Services 348 5582.344175
Technology 433 10883.434214
Finance 627 1044.090205
Health Care 645 1758.709197

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Several aggregations plus new labels
by_sector.market_cap_m.agg(['size', 'mean'])
.rename(columns={'size': '#Obs', 'mean': 'Average'})

Sector #Obs Average


Basic Industries 78 724.899934
Capital Goods 172 1511.237373
Consumer Durables 88 839.802607
Consumer Non-Durables 103 3104.051206
Consumer Services 348 5582.344175
...
Health Care 645 1758.709197
Miscellaneous 89 3445.655935
Public Utilities 66 2357.865315
Technology 433 10883.434214
Transportation 52 2869.660007

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Different statistics by column
by_sector.agg({'market_cap_m': 'size', 'IPO Year':'median'})

Sector market_cap_m IPO Year


Basic Industries 78 1972.0
Capital Goods 172 1972.0
Consumer Durables 88 1983.0
Consumer Non-Durables 103 1972.0
Consumer Services 348 1981.0
...
Health Care 645 1981.0
Miscellaneous 89 1987.0
Public Utilities 66 1981.0
Technology 433 1972.0
Transportation 52 1986.0

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Aggregate by two categories
by_sector_year = nasdaq.groupby(['Sector', 'IPO Year'])
by_sector_year.market_cap_m.mean()

Sector IPO Year


Basic Industries 1972.0 877.240005
1973.0 1445.697371
1986.0 1396.817381
...
Transportation 1986.0 1176.179710
1991.0 6646.778622
1992.0 56.074572
...
2009.0 552.445919
2011.0 3711.638317
2013.0 125.740421

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Select from MultiIndex
mcap_sector_year = by_sector_year.market_cap_m.mean()
mcap_sect_year.loc['Basic Industries']

IPO Year
1972.0 877.240005
1973.0 1445.697371
1986.0 1396.817381
1988.0 24.847526
...
2012.0 381.796074
2013.0 22.661533
2015.0 260.075564
2016.0 81.288336
Name: market_cap_m, dtype: float64

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Select from MultiIndex
mcap_sect_year.loc[['Basic Industries', 'Transportation']]

Sector IPO Year


Basic Industries 1972.0 877.240005
1973.0 1445.697371
1986.0 1396.817381
...
Transportation 1986.0 1176.179710
1991.0 6646.778622
1992.0 56.074572
...

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Summary statistics
by category with
seaborn
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
Categorical plots with seaborn
Specialized ways to plot combinations of categorical and numerical variables
Visualize estimates of summary statistics per category

Understand how categories impact numerical variables

Compare using key metrics of distributional characteristics

Example: Mean Market Cap per Sector or IPO Year with indication of dispersion

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


The basics: countplot
sns.countplot(x='Sector', data=nasdaq)
plt.xticks(rotation=45)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


countplot, sorted
sector_size = nasdaq.groupby('Sector').size()
order = sector_size.sort_values(ascending=False)
order.head()

Sector
Health Care 645
Finance 627
Technology 433
...

order = order.index.tolist()

['Health Care', 'Finance', ..., 'Energy', 'Transportation']

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


countplot, sorted
sns.countplot(x='Sector', data=nasdaq, order=order)
plt.xticks(rotation=45)
plt.title('# Observations per Sector’)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


countplot, multiple categories
recent_ipos = nasdaq[nasdaq['IPO Year'] > 2014]
recent_ipos['IPO Year'] = recent_ipos['IPO Year'].astype(int)
sns.countplot(x='Sector', hue='IPO Year', data=recent_ipos)

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Compare stats with PointPlot
nasdaq['IPO'] = nasdaq['IPO Year'].apply(lambda x: 'After 2000' if x > 2000 else 'Before 2000')
sns.pointplot(x='Sector', y='market_cap_m', hue='IPO', data=nasdaq)
plt.xticks(rotation=45); plt.title('Mean Market Cap')

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Distributions by
category with
seaborn
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
Distributions by category
Last segment: Summary statistics
Number of observations, mean per category

Now: Visualize distribution of a variable by levels of a categorical variable to facilitate


comparison

Example: Distribution of Market Cap by Sector or IPO Year

More detail than summary stats

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Clean data: removing outliers
nasdaq = pd.read_excel('listings.xlsx', sheet_name='nasdaq',
na_values='n/a')
nasdaq['market_cap_m'] = nasdaq['Market Capitalization'].div(1e6)
nasdaq = nasdaq[nasdaq.market_cap_m > 0] # Active companies only
outliers = nasdaq.market_cap_m.quantile(.9) # Outlier threshold
nasdaq = nasdaq[nasdaq.market_cap_m < outliers] # Remove outliers

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Boxplot: quartiles and outliers
import seaborn as sns
sns.boxplot(x='Sector', y='market_cap_m', data=nasdaq)
plt.xticks(rotation=75);

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


A variation: SwarmPlot
sns.swarmplot(x='Sector', y='market_cap_m', data=nasdaq)
plt.xticks(rotation=75)
plt.show()

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Let's practice!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Congratulations!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N

Stefan Jansen
Instructor
What you learned
Import data from Excel and online sources

Combine datasets

Summarize and aggregate data

IMPORTING AND MANAGING FINANCIAL DATA IN PYTHON


Keep learning!
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Welcome to Portfolio
Analysis!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
Hi! My name is Charlotte

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


What is a portfolio

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Why do we need portfolio analysis

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Portfolio versus fund versus index
Portfolio: a collection of investments (stocks, bonds, commodities, other funds) o en owned
by an individual

Fund: a pool of investments that is managed by a professional fund manager. Individual


investors buy "units" of the fund and the manager invests the money

Index: A smaller sample of the market that is representative of the whole, e.g. S&P500,
Nasdaq, Russell 2000, MSCI World Index

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Active versus passive investing
Passive investing: following a benchmark as
closely as possible

Active investing: taking active "bets" that


are di erent from a benchmark

Long only strategies: small deviations from


a benchmark

Hedgefunds: no benchmark but 'total return


strategies'

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Diversification
1. Single stock investments expose you to: a
sudden change in management,
disappointing nancial performance, weak
economy, an industry slump, etc

2. Good diversi cation means combining


stocks that are di erent: risk, cyclical,
counter-cyclical, industry, country

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Typical portfolio strategies
Equal weighted portfolios

Market-cap weighted portfolios

Risk-return optimized portfolios

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Portfolio returns
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
What are portfolio weights?
Weight is the percentage composition of a particular asset in a portfolio

All weights together have to sum up to 100%

Weights and diversi cation (few large investments versus many small investments)

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Calculating portfolio weights

Calculate by dividing the value of a security by total value of the portfolio

Equal weighted portfolio, or market cap weighted portfolio

Weights determine your investment strategy, and can be set to optimize risk and expected
return

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Portfolio returns
Changes in value over time
Vt −Vt−1
Returnt = Vt−1

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Portfolio returns

Vt −Vt−1
Returnt = Vt−1
Historic average returns o en used to calculate expected return

Warning for confusion: average return, cumulative return, active return, and annualized
return

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Calculating returns from pricing data
df.head(2)
AAPL AMZN TSLA
date
2018-03-25 13.88 114.74 92.48
2018-03-26 13.35 109.95 89.79

# Calculate returns over each day


returns = df.pct_change()

returns.head(2)
AAPL AMZN TSLA
date
2018-03-25 NaN NaN NaN
2018-03-26 -0.013772 0.030838 0.075705

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Calculating returns from pricing data
weights = np.array([0, 0.50, 0.25])

# Calculate average return for each stock


meanDailyReturns = returns.mean()

# Calculate portfolio return


portReturn = np.sum(meanDailyReturns*weights)
print (portReturn)

0.05752375881537723

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Calculating cumulative returns
# Calculate daily portfolio returns
returns['Portfolio']= returns.dot(weights)

# Let's see what it looks like


returns.head(3)

AAPL AMZN TSLA Portfolio


date
2018-03-23 -0.020974 -0.026739 -0.029068 -0.025880
2018-03-26 -0.013772 0.030838 0.075705 0.030902

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Calculating cumulative returns
# Compound the percentage returns over time
daily_cum_ret=(1+returns).cumprod()

# Plot your cumulative return


daily_cum_ret.Portfolio.plot()

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Cumulative return plot

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Measuring risk of a
portfolio
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
Risk of a portfolio
Investing is risky: individual assets will go up or down

Expected return is a random variable

Returns spread around the mean is measured by the variance σ 2 and is a common measure
of volatility
N
2
∑ (X−μ)
σ2 = i=1
N

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Variance

Variance of an individual asset varies: some


have more or less spread around the mean

Variance of the portfolio is not simply the


weighted variances of the underlying assets

Because returns of assets are correlated, it


becomes complex

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


How do variance and correlation relate to portfolio
risk?

The correlation between asset 1 and 2 is denoted by ρ1,2 , and tells us to which extend assets
move together

The portfolio variance takes into account the individual assets' variances (σ12 , σ22 , etc), the
weights of the assets in the portfolio (w1 , w2 ), as well as their correlation to each other

The standard deviation (σ ) is equal to the square root of variance (σ 2 ), both are a measure
of volatility

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Calculating portfolio variance

ρ1,2 σ1 σ2 is called the covariance between asset 1 and 2


The covariance can also be wri en as σ1,2
This let's us write:

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Re-writing the portfolio variance shorter

This can be re-wri en in matrix notation, which you can use more easily in code:

In words, what we need to calculate in python is: Portfolio variance = Weights transposed x
(Covariance matrix x Weights)

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Portfolio variance in python
price_data.head(2)

ticker AAPL FB GE GM WMT


date
2018-03-21 171.270 169.39 13.88 37.58 88.18
2018-03-22 168.845 164.89 13.35 36.35 87.14

# Calculate daily returns from prices


daily_returns = df.pct_change()

# Construct a covariance matrix for the daily returns data


cov_matrix_d = daily_returns.cov()

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Portfolio variance in python
# Construct a covariance matrix from the daily_returns
cov_matrix_d = (daily_returns.cov())*250
print (cov_matrix_d)

AAPL FB GE GM WMT
AAPL 0.053569 0.026822 0.013466 0.018119 0.010798
FB 0.026822 0.062351 0.015298 0.017250 0.008765
GE 0.013466 0.015298 0.045987 0.021315 0.009513
GM 0.018119 0.017250 0.021315 0.058651 0.011894
WMT 0.010798 0.008765 0.009513 0.011894 0.041520

weights = np.array([0.2, 0.2, 0.2, 0.2, 0.2])

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Portfolio variance in python
# Calculate the variance with the formula
port_variance = np.dot(weights.T, np.dot(cov_matrix_a, weights))
print (port_variance)

0.022742232726360567

# Just converting the variance float into a percentage


print(str(np.round(port_variance, 3) * 100) + '%')

2.3%

port_stddev = np.sqrt(np.dot(weights.T, np.dot(cov_matrix_a, weights)))


print(str(np.round(port_stddev, 3) * 100) + '%')
15.1%

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Annualized returns
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
Comparing returns
1. Annual Return: Total return earned over a period of one calendar year

2. Annualized return: Yearly rate of return inferred from any time period

3. Average Return: Total return realized over a longer period, spread out evenly over the
(shorter) periods.

4. Cumulative (compounding) return: A return that includes the compounded results of re-
investing interest, dividends, and capital gains.

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Why annualize returns?

Average return = (100 - 50) / 2 = 25%

Actual return = 0% so average return is not


a good measure for performance!

How to compare portfolios with di erent


time lengths?

How to account for compounding e ects


over time?

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Calculating annualized returns

N in years: rate = (1 + Return)1/N − 1


N in months: rate = (1 + Return)12/N − 1
Convert any time length to an annual rate:

Return is the total return you want to annualize.

N is number of periods so far.

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Annualized returns in python
# Check the start and end of timeseries
apple_price.head(1)

date
2015-01-06 105.05
Name: AAPL, dtype: float64

apple_price.tail(1)

date
2018-03-29 99.75
Name: AAPL, dtype: float64

# Assign the number of months


months = 38

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Annualized returns in python
# Calculate the total return
total_return = (apple_price[-1] - apple_price[0]) /
apple_price[0]

print (total_return)

0.5397420653068692

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Annualized returns in python
# Calculate the annualized returns over months
annualized_return=((1 + total_return)**(12/months))-1
print (annualized_return)

0.14602501482708763

# Select three year period


apple_price = apple_price.loc['2015-01-01':'2017-12-31']
apple_price.tail(3)

date
2017-12-27 170.60
2017-12-28 171.08
2017-12-29 169.23
Name: AAPL, dtype: float64

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Annualized return in Python
# Calculate annualized return over 3 years
annualized_return = ((1 + total_return)**(1/3))-1

print (annualized_return)

0.1567672968419047

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Risk adjusted returns
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
Choose a portfolio

Portfolio 1 Portfolio 2

Annual return of 14% Annual return of 6%

Volatility (standard deviation) is 8% Volatility is 3%

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Risk adjusted return

It de nes an investment's return by measuring how much risk is involved in producing that
return

It's usually a ratio

Allows you to objectively compare across di erent investment options

Tells you whether the return justi es the underlying risk

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Sharpe ratio
Sharpe ratio is the most commonly used risk adjusted return ratio

It's calculated as follows:


Rp −Rf
Sharpe ratio = σp

Where: Rp is the portfolio return, Rf is the risk free rate and σp is the portfolio standard
deviation

Remember the formula for the portfolio σp ?


σp = √(W eights transposed(Covariance matrix ∗ W eights) )

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Annualizing volatility

Annualized standard deviation is calculated as follows: σa = σm ∗ √T


σm is the measured standard deviation
σa is the annualized standard deviation
T is the number of data points per year

Alternatively, when using variance instead of standard deviation; σa2 = σm


2
∗T

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Calculating the Sharpe Ratio
# Calculate the annualized standard deviation
annualized_vol = apple_returns.std()*np.sqrt(250)
print (annualized_vol)

0.2286248397870068

# Define the risk free rate


risk_free = 0.01

# Calcuate the sharpe ratio


sharpe_ratio = (annualized_return - risk_free) / annualized_vol
print (sharpe_ratio)

0.6419569149994251

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Which portfolio did you choose?

Portfolio 1 Portfolio 2

Annual return of 14% Annual return of 6%

Volatility (standard deviation) is 8% Volatility is 3%

Sharpe ratio of 1.75 Sharpe ratio of 2

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Non-normal
distribution of
returns
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
In a perfect world returns are distributed normally

1 Source: Distribution of monthly returns from the S&P500 from evestment.com

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


But using mean and standard deviations can be
deceiving

1 Source: “An Introduction to Omega, Con Keating and William Shadwick, The Finance Development Center, 2002

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Skewness: leaning towards the negative

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Pearson’s Coefficient of Skewness
3(mean−median)
Skewness = σ

Rule of thumb:

Skewness < −1 or Skewness > 1 ⇒ Highly skewed distribution


−1 < Skewness < −0.5 or 0.5 < Skewness < 1 ⇒ Moderately skewed distribution
−0.5 < Skewness < 0.5 ⇒ Approximately symmetric distribution

1 Source: h ps://brownmath.com/stat/shape.htm

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Kurtosis: Fat tailed distribution

1 Source: Pimco

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Interpreting kurtosis
“Higher kurtosis means more of the variance is the result of infrequent extreme deviations, as
opposed to frequent modestly sized deviations.”

A normal distribution has kurtosis of exactly 3 and is called (mesokurtic)

A distribution with kurtosis <3 is called platykurtic. Tails are shorter and thinner, and central
peak is lower and broader.

A distribution with kurtosis >3 is called leptokurtic: Tails are longer and fa er, and central
peak is higher and sharper (fat tailed)

1 Source: h ps://brownmath.com/stat/shape.htm

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Calculating skewness and kurtosis
apple_returns=apple_price.pct_change()
apple_returns.head(3)

date
2015-01-02 NaN
2015-01-05 -0.028172
2015-01-06 0.000094
Name: AAPL, dtype: float64

apple_returns.hist()

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON
Calculating skewness and kurtosis
print("mean : ", apple_returns.mean())
print("vol : ", apple_returns.std())
print("skew : ", apple_returns.skew())
print("kurt : ", apple_returns.kurtosis())

mean : 0.0006855391415724799
vol : 0.014459504468360529
skew : -0.012440851735057878
kurt : 3.197244607586669

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Alternative
measures of risk
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
Looking at downside risk

A good risk measure should focus on potential losses i.e. downside risk

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Sortino ratio

Similar to the Sharpe ratio, just with a


di erent standard deviation
Rp −Rf
Sortino Ratio = σd
σd is the standard deviation of the
downside.

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Sortino ratio in python
# Define risk free rate and target return of 0
rfr = 0
target_return = 0

# Calcualte the daily returns from price data


apple_returns=pd.DataFrame(apple_price.pct_change())

# Select the negative returns only


negative_returns = apple_returns.loc[apple_returns['AAPL'] < target]

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


# Calculate expected return and std dev of downside returns
expected_return = apple_returns['AAPL'].mean()
down_stdev = negative_returns.std()

# Calculate the sortino ratio


sortino_ratio = (expected_return - rfr)/down_stdev
print(sortino_ratio)

0.07887683763760528

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Maximum draw-down
The largest percentage loss from a market peak to trough

Dependent on the chosen time window

The recovery time: time it takes to get back to break-even

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Maximum daily draw-down in Python
# Calculate the maximum value of returns using rolling().max()
roll_max = apple_price.rolling(min_periods=1,window=250).max()
# Calculate daily draw-down from rolling max
daily_drawdown = apple_price/roll_max - 1.0
# Calculate maximum daily draw-down
max_daily_drawdown = daily_drawdown.rolling(min_periods=1,window=250).min()
# Plot the results
daily_drawdown.plot()
max_daily_drawdown.plot()
plt.show()

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Maximum draw-down of Apple

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Comparing against
a benchmark
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
Active investing against a benchmark

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Active return for an actively managed portfolio

Active return is the performance of an (active) investment, relative to the investment's


benchmark.

Calculated as the di erence between the benchmark and the actual return.

Active return is achieved by "active" investing, i.e. taking overweight and underweight
positions from the benchmark.

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Tracking error for an index tracker

Passive investment funds, or index trackers, don't use active return as a measure for
performance.

Tracking error is the name used for the di erence in portfolio and benchmark for a passive
investment fund.

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Active weights

1 Source: Schwab Center for Financial Research.

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Active return in Python
# Inspect the data
portfolio_data.head()

mean_ret var pf_w bm_w GICS Sector


Ticker
A 0.146 0.035 0.002 0.005 Health Care
AAL 0.444 0.094 0.214 0.189 Industrials
AAP 0.242 0.029 0.000 0.000 Consumer Discretionary
AAPL 0.225 0.027 0.324 0.459 Information Technology
ABBV 0.182 0.029 0.026 0.010 Health Care

1 Global Industry Classi cation System (GICS)

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Active return in Python
# Calculate mean portfolio return
total_return_pf = (pf_w*mean_ret).sum()

# Calculate mean benchmark return


total_return_bm = (bm_w*mean_ret).sum()

# Calculate active return


active_return = total_return_pf - total_return_bm
print ("Simple active return: ", active_return)

Simple active return: 6.5764

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Active weights in Python
# Group dataframe by GICS sectors
grouped_df=portfolio_data.groupby('GICS Sector').sum()

# Calculate active weights of portfolio


grouped_df['active_weight']=grouped_df['pf_weights']-
grouped_df['bm_weights']

print (grouped_df['active_weight'])

GICS Sector
Consumer Discretionary 20.257
Financials -2.116
...etc

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Risk factors
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
What is a factor?
Factors in portfolios are like nutrients in food

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Factors in portfolios
Di erent types of factors:

Macro factors: interest rates, currency, country, industry

Style factors: momentum, volatility, value and quality

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Using factor models to determine risk exposure

1 Source: h ps://invesco.eu/investment-campus/educational-papers/factor-investing

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Factor exposures
df.head()

date portfolio volatility quality


2015-01-05 -1.827811 1.02 -1.76
2015-01-06 -0.889347 0.41 -0.82
2015-01-07 1.162984 1.07 1.39
2015-01-08 1.788828 0.31 1.93
2015-01-09 -0.840381 0.28 -0.77

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Factor exposures
df.corr()

portfolio volatility quality


portfolio 1.000000 0.056596 0.983416
volatility 0.056596 1.000000 0.092852
quality 0.983416 0.092852 1.000000

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Correlations change over time
# Rolling correlation
df['corr']=df['portfolio'].rolling(30).corr(df['quality'])

# Plot results
df['corr'].plot()

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Rolling correlation with quality

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Factor models
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
Using factors to explain performance

Factors are used for risk management.

Factors are used to help explain performance.

Factor models help you relate factors to portfolio returns

Empirical factor models exist that have been tested on historic data.

Fama French 3 factor model is a well-known factor model.

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Fama French Multi Factor model

Rpf = α + βm M KT + βs SM B + βh HM L
MKT is the excess return of the market, i.e. Rm − Rf
SMB (Small Minus Big) a size factor

HML (High Minus Low) a value factor

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Regression model refresher

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Difference between beta and correlation

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Regression model in Python
import statsmodels.api as sm

# Define the model


model = sm.OLS(factor_data['sp500'],
factor_data[['momentum','value']]).fit()

# Get the model predictions


predictions = model.predict(factor_data[['momentum','value']])

b1, b2 = model.params

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


The regression summary output
# Print out the summary statistics
model.summary()

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Obtaining betas quickly
# Get just beta coefficients from linear regression model
b1, b2 = regression.linear_model.OLS(df['returns'],
df[['F1', 'F2']]).fit().params

# Print the coefficients


print 'Sensitivities of active returns to factors:
\nF1: %f\nF2: %f' % (b1, b2)

Sensitivities of active returns to factors:


F1: -0.0381
F2: 0.9858

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Portfolio analysis
tools
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
Professional portfolio analysis tools

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Back-testing your strategy
Back-testing: run your strategy on historic data and see how it would have performed

Strategy works on historic data: not guaranteed to work well on future data -> changes in
markets

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Quantopian's pyfolio tool

1 Github: h ps://github.com/quantopian/pyfolio

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Performance and risk analysis in Pyfolio
# Install the package
!pip install pyfolio
# Import the package
import pyfolio as pf

# Read the data as a pandas series


returns=pd.Series(pd.read_csv('pf_returns.csv')
returns.index=pd.to_datetime(returns.index)

# Create a tear sheet on returns


pf.create_returns_tear_sheet(returns)

# If you have backtest and live data


pf.create_returns_tear_sheet(returns, live_start_date='2018-03-01')

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Pyfolio's tear sheet

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Holdings and exposures in Pyfolio
# define our sector mappings
sect_map = {'COST': 'Consumer Goods',
'INTC': 'Technology',
'CERN': 'Healthcare',
'GPS': 'Technology',
'MMM': 'Construction',
'DELL': 'Technology',
'AMD': 'Technology'}

pf.create_position_tear_sheet(returns, positions,
sector_mappings=sect_map)

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Exposure tear sheet results

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Modern portfolio
theory
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
Creating optimal portfolios

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


What is Portfolio Optimization?
Meet Harry Markowitz

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


The optimization problem: finding optimal weights
In words:

Minimize the portfolio variance, subject to:

The expected mean return is at least some


target return

The weights sum up to 100%

At least some weights are positive

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Varying target returns leads to the Efficient Frontier

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


PyPortfolioOpt for portfolio optimization
from pypfopt.efficient_frontier import EfficientFrontier
from pypfopt import risk_models
from pypfopt import expected_returns

df=pd.read_csv('portfolio.csv')
df.head(2)
XOM RRC BBY MA PFE
date
2010-01-04 54.068794 51.300568 32.524055 22.062426 13.940202
2010-01-05 54.279907 51.993038 33.349487 21.997149 13.741367

# Calculate expected annualized returns and sample covariance


mu = expected_returns.mean_historical_return(df)
Sigma = risk_models.sample_cov(df)

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Get the Efficient Frontier and portfolio weights
# Calculate expected annualized returns and risk
mu = expected_returns.mean_historical_return(df)
Sigma = risk_models.sample_cov(df)

# Obtain the EfficientFrontier


ef = EfficientFrontier(mu, Sigma)

# Select a chosen optimal portfolio


ef.max_sharpe()

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Different optimizations
# Select the maximum Sharpe portfolio
ef.max_sharpe()

# Select an optimal return for a target risk


ef.efficient_risk(2.3)

# Select a minimal risk for a target return


ef.efficient_return(1.5)

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Calculate portfolio risk and performance
# Obtain the performance numbers
ef.portfolio_performance(verbose=True, risk_free_rate = 0.01)

Expected annual return: 21.3%


Annual volatility: 19.5%
Sharpe Ratio: 0.98

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's optimize a
portfolio!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Maximum Sharpe
vs. minimum
volatility
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
Remember the Efficient Frontier?

E cient frontier: all portfolios with an


optimal risk and return trade-o

Maximum Sharpe portfolio: the highest


Sharpe ratio on the EF

Minimum volatility portfolio: the lowest


level of risk on the EF

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Adjusting PyPortfolioOpt optimization

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Maximum Sharpe portfolio
Maximum Sharpe portfolio: the highest Sharpe ratio on the EF

from pypfopt.efficient_frontier import EfficientFrontier

# Calculate the Efficient Frontier with mu and S


ef = EfficientFrontier(mu, Sigma)
raw_weights = ef.max_sharpe()

# Get interpretable weights


cleaned_weights = ef.clean_weights()

{'GOOG': 0.01269,'AAPL': 0.09202,'FB': 0.19856,


'BABA': 0.09642,'AMZN': 0.07158,'GE': 0.02456,...}

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Maximum Sharpe portfolio
# Get performance numbers
ef.portfolio_performance(verbose=True)

Expected annual return: 33.0%


Annual volatility: 21.7%
Sharpe Ratio: 1.43

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Minimum Volatility Portfolio
Minimum volatility portfolio: the lowest level of risk on the EF

# Calculate the Efficient Frontier with mu and S


ef = EfficientFrontier(mu, Sigma)

raw_weights = ef.min_volatility()

# Get interpretable weights and performance numbers


cleaned_weights = ef.clean_weights()

{'GOOG': 0.05664, 'AAPL': 0.087, 'FB': 0.1591,


'BABA': 0.09784, 'AMZN': 0.06986, 'GE': 0.0123,...}

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Minimum Volatility Portfolio
ef.portfolio_performance(verbose=True)

Expected annual return: 17.4%


Annual volatility: 13.2%
Sharpe Ratio: 1.28

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's have another look at the Efficient Frontier

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Maximum Sharpe versus Minimum Volatility

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Alternative portfolio
optimization
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
Expected risk and return based on historic data

Mean historic returns, or the historic


portfolio variance are not perfect estimates
of mu and Sigma

Weights from portfolio optimization


therefore not guaranteed to work well on
future data

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Historic data

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Exponentially weighted returns

Need be er measures for risk and return

Exponentially weighted risk and return


assigns more importance to the most
recent data

Exponential moving average in the graph:


most weight on t-1 observation

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Exponentially weighted covariance

The exponential covariance matrix: gives


more weight to recent data

In the graph: exponential weighted


volatility in black, follows real volatility
be er than standard volatility in blue

1 Source: h ps://systematicinvestor.github.io/Exponentially-Weighted-Volatility-RCPP

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Exponentially weighted returns
from pypfopt import expected_returns

# Exponentially weighted moving average


mu_ema = expected_returns.ema_historical_return(df,
span=252, frequency=252)
print(mu_ema)

symbol
XOM 0.103030
BBY 0.394629
PFE 0.186058

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Exponentially weighted covariance
from pypfopt import risk_models

# Exponentially weighted covariance


Sigma_ew = risk_models.exp_cov(df, span=180, frequency=252)

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Using downside risk in the optimization

Remember the Sortino ratio: it uses the variance of negative returns only

PyPortfolioOpt allows you to use semicovariance in the optimization, this is a measure for
downside risk:

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Semicovariance in PyPortfolioOpt
Sigma_semi = risk_models.semicovariance(df,
benchmark=0, frequency=252)

print(Sigma_semi)

XOM BBY MA PFE


XOM 0.018939 0.008505 0.006568 0.004058
BBY 0.008505 0.016797 0.009133 0.004404
MA 0.006568 0.009133 0.018711 0.005373
PFE 0.004058 0.004404 0.005373 0.008349

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Let's practice!
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N
Recap
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

Charlo e Werger
Data Scientist
Chapter 1: Calculating risk and return

A portfolio as a collection of weight and assets

Diversi cation

Mean returns versus cumulative returns

Variance, standard deviation, correlations and the covariance matrix

Calculating portfolio variance

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Chapter 2: Diving deep into risk measures

Annualizing returns and risk to compare over di erent periods

Sharpe ratio as a measured of risk adjusted returns

Skewness and Kurtosis: looking beyond mean and variance of a distribution

Maximum draw-down, downside risk and the Sortino ratio

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Chapter 3: Breaking down performance

Compare to benchmark with active weights and active returns

Investment factors: explain returns and sources of risk

Fama French 3 factor model to breakdown performance into explainable factors and alpha

Pyfolio as a portfolio analysis tool

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Chapter 4: Finding the optimal portfolio

Markowitz' portfolio optimization: e cient frontier, maximum Sharpe and minimum volatility
portfolios

Exponentially weighted risk and return, semicovariance

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


Continued learning

Datacamp course on Portfolio Risk Management in Python

Quantopian's lecture series: h ps://www.quantopian.com/lectures

Learning by doing: Pyfolio and PyPortfolioOpt

INTRODUCTION TO PORTFOLIO ANALYSIS IN PYTHON


End of this course
I N T R O D U C T I O N T O P O R T F O L I O A N A LY S I S I N P Y T H O N

You might also like