Finance Fundamentals in Python
Finance Fundamentals in Python
Adina Howe
Instructor
Why Python for Finance?
Easy to Learn and Flexible
General purpose
Dynamic
High-level language
Open source
Accessible to anyone
Calculations in IPython
In [1]: 1 + 1
Out [1]: 12
In [2]: 8 / 4
Out [2]: 2
Name Surname
Instructor
Any comments?
# Example, do not modify!
print(8 / 2 )
print(2**2)
In [1]: 1 + 1 1 + 1
Out[1]: 2 # No output
2 <script.py> output:
2
Some variable names are reserved in Python (e.g., class or type) and should be avoided
price = 200
earnings = 5
pe_ratio = price / earnings
print(pe_ratio)
40
Adina Howe
Instructor
Python Data Types
Variable Types Example
Strings 'hello world'
Integers 40
Floats 3.1417
Booleans True or False
Integers 40 int
type(variable_name)
pe_ratio = 40
print(type(pe_ratio))
<class 'int'>
== equal
True
print(type(1 == 1))
<class 'bool'>
15 'stockstockstock'
print(x + 3) print(y + 3)
<class 'float'>
pi_string = str(pi)
print(type(pi_string))
<class 'str'>
Adina Howe
Instructor
Lists - square brackets [ ]
months = ['January', 'February', 'March', 'April', 'May', 'June']
months[0]
'January'
months[2]
'March'
months[-1]
'June'
months[-2]
'May'
Example
months[2:5]
months[-4:-1]
months[3:]
months[:3]
months[0:6:2]
months[0:6:3]
['January', 'April']
Adina Howe
Instructor
Lists in Lists
Lists can contain various data types, including lists themselves.
Example: a nested list describing the month and its associated consumer price index
'Feb'
print(cpi[1][0])
238.11
Adina Howe
Instructor
Methods vs. Functions
Methods Functions
All methods are functions Not all functions are methods
months.index('February')
print(prices[1])
237.81
February
Adina Howe
Instructor
Installing packages
pip3 install package_name_here
[0, 1, 2, 3, 4]
print(type(my_array))
<class 'numpy.ndarray'>
import numpy as np
my_array = np.array([0, 1, 2, 3, 4])
print(my_array)
[0, 1, 2, 3, 4]
[5 7 9]
Apr
print(months_array[2:5])
print(months_array[0:5:2])
Adina Howe
Instructor
Two-dimensional arrays
import numpy as np
months = [1, 2, 3]
prices = [238.11, 237.81, 238.91]
print(cpi_array)
[[ 1. 2. 3. ]
[ 238.11 237.81 238.91]]
[[ 1. 2. 3. ]
[ 238.11 237.81 238.91]]
print(cpi_array.shape)
(2, 3)
print(cpi_array.size)
print(np.mean(prices_array))
238.27666666666667
print(np.std(prices_array))
0.46427960923946671
import numpy as np
[ 1 2 3 4 5 6 7 8 9 10 11 12]
[ 1 3 5 7 9 11]
print(cpi_array)
[[ 1. 2. 3. ]
[ 238.11 237.81 238.91]]
cpi_transposed = np.transpose(cpi_array)
print(cpi_transposed)
[[ 1. 238.11]
[ 2. 237.81]
[ 3. 238.91]]
[[ 1. 2. 3. ]
[ 238.11 237.81 238.91]]
238.91
[ 3. 238.91]
Adina Howe
Instructor
Indexing Arrays
import numpy as np
months_subset = months_array[indexing_array]
print(months_subset)
print(months_array[negative_index])
['Jun' 'May']
print(boolean_array)
print(prices_array[boolean_array])
[ 238.11 238.91]
Adina Howe
Instructor
Matplotlib: A visualization package
See more of the Matplotlib gallery by clicking this link.
plt.show()
displays plot to screen
# Add labels
plt.xlabel('Months')
plt.ylabel('Consumer Price Indexes, $')
plt.title('Average Monthly Consumer Price Indexes')
# Show plot
plt.show()
plt.xlabel('Months')
plt.ylabel('Consumer Price Indexes, $')
plt.title('Average Monthly Consumer Price Indexes')
plt.show()
Adina Howe
Instructor
Why histograms for financial analysis?
Adina Howe
Instructor
Overall Review
Python shell and scripts
Lists
Arrays
Matplotlib
OBJECTIVE PART I
Explore and analyze the S&P 100 data, speci cally the P/E ratios of S&P 100 companies
# first element
In [2]: print(my_list[0])
# last element
In [3]: print(my_list[-1])
# range of elements
In [4]: print(my_list[0:3])
[1, 2, 3]
Adina Howe
Instructor
Your mission
GIVEN
NumPy arrays of data describing the S&P 100: names, prices, earnings, sectors
OBJECTIVE PART II
Explore and analyze sector-speci c P/E ratios within companies of the S&P 100
[200 300]
import numpy as np
average_value = np.mean(my_array)
std_value = np.std(my_array)
Adina Howe
Instructor
Your mission - outlier?
Kennedy Behrman
Data Engineer, Author, Founder
Datetimes
Month
Day of Month
Hour
Micro-seconds
AM/PM
%p (AM, PM)
%M Minutes
%Y
%m
%d
"%Y-%m-%d"
%A
%d
%B
%y
"%A, %d %B %y"
Kennedy Behrman
Data Engineer, Author, Founder
Datetime attributes
now.year now.hour
now.month now.minute
now.day now.second
2019 22
11 34
13 56
False
True
text = "10/27/1997"
format_str = "%m/%d/%Y"
sell_date = datetime.strptime(text, format_str)
sell_date == world_mini_crash
True
type(delta)
datetime.timedelta
delta.days
117
datetime.datetime(2019, 1, 14, 0, 0)
datetime.datetime(2019, 1, 7, 0, 0)
datetime.timedelta
offset = timedelta(weeks = 1)
offset
datetime.timedelta(7)
dt - offset
datetime.datetime(2019, 1, 7, 0, 0)
Kennedy Behrman
Data Engineer, Author, Founder
Lookup by index
my_list = ['a','b','c','d']
0 1 2 3
['a','b','c','d']
my_list[0]
'a'
my_list.index('c')
{}
my_dict = dict()
my_dict
{}
ticker_symbols = dict([['APPL','Apple'],['F','Ford'],['LUV','Southwest']])
print(ticker_symbols)
'Ford'
KeyError: 'XOM'
'Southwest'
company = ticker_symbols.get('XOM')
print(company)
None
'MISSING'
del(ticker_symbols['XON'])
ticker_symbols
Kennedy Behrman
Data Engineer, Author, Founder
Python comparison operators
Equality: == , !=
Assign value: =
True
count = 13
print(count)
13
dictionaries
strings
True
True
False
True
print(3 == '3')
False
True
print(3 != 3)
False
True
True
True
False
True
print(1.0 <= 1)
True
False
True
False
True
True
True
<hr />----------------------------------------------
TypeError Traceback (most recent call last)
...
TypeError: '<' not supported between instances of 'str' and 'int'
Kennedy Behrman
Data Engineer, Author, Founder
Boolean logic
2. or
3. not
None
Numeric zero:
0
0.0
Length of zero
""
[]
{}
True
False
True
True or True
True
False or False
False
False
is_current() or is_investment()
True
False
not False
True
False
True
True
False
True
"State"
[] and "State"
[]
13
{"balance": 2200}
Kennedy Behrman
Data Engineer, Author, Founder
Printing sales only
trns = { 'symbol': 'TSLA', 'type':'BUY', 'amount': 300}
print(trns['amount'])
300
if x < y:
if x in y:
if x and y:
if x:
if <expression>: statement;statement;statement
if trns['type'] == 'SELL':
print(trns['amount'])
trns['type'] == 'SELL'
False
if trns['type'] == 'SELL':
print(trns['amount'])
200
Kennedy Behrman
Data Engineer, Author, Founder
Repeating a code block
CUSIP SYMBOL
037833100 AAPL
17275R102 CSCO
68389X105 ORCL
execution 1
execution 2
execution 3
d = {'key': 'value1'}
for x in d:
for x in "ORACLE":
0
1
2
AAPL
CSCO
ORCL
O
R
A
C
L
E
while x < 5:
print(x)
x = (x + 1)
0
1
2
3
4
while x <= 5:
print(x)
0
1
3
Not ORCL
Not ORCL
Not ORCL
The current symbol is ORCL, break now
Kennedy Behrman
Data Engineer, Author, Founder
Pandas
import pandas as pd
print(pd)
df = pd.DataFrame(data=data)
df = pd.DataFrame(data=data)
0 1 2
0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
1 BA mm3d90 13.02
JSON pd.read_json
HTML pd.read_html
Pickle pd.read_pickle
Sql pd.read_sql
Csv pd.read_csv
Kennedy Behrman
Data Engineer, Author, Founder
Account Balance
accounts
a 1222.00
b 390789.11
c 13.02
Balance
a 1222.00
b 390789.11
c 13.02
accounts.loc['a':'c', ['Balance','Account#']]
accounts.loc['a':'c',[True,False,True]]
accounts.loc['a':'c','Bank Code':'Balance']
accounts.loc['a', 'Balance'] = 0
accounts.loc['a', 'Balance'] = 0
Kennedy Behrman
Data Engineer, Author, Founder
DataFrame methods
.count() .sum()
.min() .prod()
.max() .mean()
.first() .median()
.last() .std()
.var()
axis=0 axis='columns'
axis='rows'
39.9
Kennedy Behrman
Data Engineer, Author, Founder
PCE
Personal consumption expenditures (PCE)
PCE =
PCE = PCDG
Durable goods
Non-durable goods
Services
1By Clip Art by Vector Toons 2 Own work, CC BY-SA 4.0, h ps://commons.wikimedia.org/w/index.php?
curid=65937611
DATE PCE
1929-01-01 77.383
1930-01-01 70.136
1931-01-01 60.672
1932-01-01 48.715
DATE PCE
1933-01-01 45.945
pd.concat(all_rows)
DATE PCE
1934-01-01 45.28568
1935-01-01 49.22104
1936-01-01 54.72544
1937-01-01 58.81832
pce['EURO'] = pce['PCE'].map(convert_to_euro)
pce['EURO'] = pce['PCE'].map(convert_to_euro)
Kennedy Behrman
Data Engineer, Author, Founder
Understanding your data
Data is loaded correctly
Date
03/27/2020
03/26/2020
03/25/2020
03/24/2020
Price
Date
03/27/2020 247.74
03/26/2020 258.44
03/25/2020 245.52
03/24/2020 246.88
Price Volume
Date
03/27/2020 247.74 51054150
03/26/2020 258.44 63140170
03/25/2020 245.52 75900510
03/24/2020 246.88 71882770
```out
Price Volumne Trend
Date
03/27/2020 247.74 51054150 Down
03/26/2020 258.44 63140170 Up
03/25/2020 245.52 75900510 Down
Price Volume
count 21.000000 2.100000e+01
mean 263.715714 7.551468e+07
std 23.360598 1.669757e+07
min 224.370000 4.689322e+07
25% 246.670000 6.409497e+07
50% 258.440000 7.505841e+07
75% 285.340000 8.418821e+07
max 302.740000 1.067212e+08
Trend
count 21
unique 2
top Down
freq 14
Price Trend
count 21.000000 21
unique NaN 2
top NaN Down
freq NaN 14
mean 263.715714 NaN
std 23.360598 NaN
min 224.370000 NaN
25% 246.670000 NaN
50% 258.440000 NaN
75% 285.340000 NaN
max 302.740000 NaN
Price Volumne
count 21.000000 2.100000e+01
mean 263.715714 7.551468e+07
std 23.360598 1.669757e+07
min 224.370000 4.689322e+07
10% 242.210000 5.479457e+07
50% 258.440000 7.505841e+07
90% 292.920000 1.004233e+08
max 302.740000 1.067212e+08
Volumne Trend
count 2.100000e+01 21
unique NaN 2
top NaN Down
freq NaN 14
mean 7.551468e+07 NaN
std 1.669757e+07 NaN
min 4.689322e+07 NaN
25% 6.409497e+07 NaN
Kennedy Behrman
Data Engineer, Author, Founder
Introducing the data
prices.head()
High
count 378.000000
mean 881.593138
std 720.771922
min 227.490000
max 2185.950000
Symbol
count 378
unique 3
top AMZN
freq 126
0 False
1 False
2 False
3 False
4 False
...
374 False
375 False
376 False
377 False
0 True
1 True
2 True
3 True
4 True
...
374 False
375 False
376 False
377 False
Symbol
count 126
unique 1
top AAPL
freq 126
High
count 6.000000
mean 2177.406567
std 7.999334
min 2166.070000
max 2185.95000
Or |
Not ~
prices.loc[mask_amzn]
Kennedy Behrman
Data Engineer, Author, Founder
Look at your data
bar area
barh pie
hist scatter
box hexbin
kde
Kennedy Behrman
Data Engineer, Author, Founder
Chapter 1
Representing time Mapping data
datetime dict()
== != Loops
for a in c:
print(a)
DataFrame(data=data) stocks.mean()
pd.read_csv('/data.csv') stocks.median()
aapl.head() exxon.plot(x='Date',
aapl.tail() y='High' )
aapl.describe()
Filtering
Dakota Wixom
Quantitative Finance Analyst
Course objectives
The Time Value of Money
Compound Interest
Mortgage Structures
Wealth Accumulation
df : Discount factor
r: The rate of depreciation per period t
t: Time periods
v : Initial value of the investment
f v : Future value of the investment
0.10 1∗4
$1, 000 ∗ (1 + ) = $1, 103.81
4
Compare this with no compounding:
0.10 1∗1
$1, 000 ∗ (1 + ) = $1, 100.00
1
Notice the extra $3.81 due to the quarterly compounding?
0.10 30∗4
$1, 000 ∗ (1 + ) = $19, 358.15
4
Compounded Annually Over 30 Years:
0.10 30∗1
$1, 000 ∗ (1 + ) = $17, 449.40
1
Compounding quarterly generates an extra $1,908.75 over 30
years
Dakota Wixom
Quantitative Finance Analyst
The non-static value of money
Situation 1
Situation 2
import numpy as np
np.pv(rate=0.01, nper=3, pmt=0, fv=100)
-97.05
import numpy as np
np.fv(rate=0.05, nper=3, pmt=0, pv=-100)
115.76
Dakota Wixom
Quantitative Finance Analyst
Cash flows
Cash ows are a series of gains or losses from an investment
over time.
Cash Present
Year Formula
Flows Value
pv(rate=0.03, nper=0, pmt=0,
0 -$100 -100
fv=-100)
pv(rate=0.03, nper=1, pmt=0,
1 $100 97.09
fv=100)
pv(rate=0.03, nper=2, pmt=0,
2 $125 117.82
fv=125)
pv(rate=0.03, nper=3, pmt=0,
3 $150 137.27
fv=150)
pv(rate=0.03, nper=4, pmt=0,
4 $175 155.49
fv=175)
import numpy as np
array_1 = np.array([100,200,300])
print(array_1*2)
import numpy as np
np.npv(rate=0.03, values=np.array([-100, 100, 125, 150, 175]))
407.67
Project 2
import numpy as np
np.npv(rate=0.03, values=np.array([100, 100, -100, 200, 300]))
552.40
Dakota Wixom
Quantitative Finance Analyst
Common profitability analysis methods
Net Present Value (NPV)
Ct
N P V = ∑Tt=1 (1+r)t
− C0
Ct : Cash ow C at time t
r: Discount rate
Ct
N P V = ∑Tt=1 (1+IRR)t
− C0 = 0
Ct : Cash ow C at time t
Example:
import numpy as np
project_1 = np.array([-100,150,200])
np.irr(project_1)
1.35
Dakota Wixom
Quantitative Finance Analyst
What is WACC?
W ACC = FEquity ∗ CEquity + FDebt ∗ CDebt ∗ (1 − T R)
MEquity
FEquity = MT otal
MDebt
FDebt = MT otal
percent_equity = 0.80
percent_debt = 0.20
cost_equity = 0.14
cost_debt = 0.12
tax_rate = 0.35
wacc = (percent_equity*cost_equity) + (percent_debt*cost_debt) *
(1 - tax_rate)
print(wacc)
0.1276
cf_project1 = np.repeat(100, 5)
npv_project1 = np.npv(0.13, cf_project1)
print(npv_project1)
397.45
Dakota Wixom
Quantitative Finance Analyst
Different NPVs and IRRs
Year Project 1 Project 2 Project comparison
1 -$100 -$125
NPV IRR Length
2 $200 $100
#1 362.58 200% 3
3 $300 $100
#2 453.64 78.62% 8
4 N/A $100
Notice how you could
5 N/A $100
undertake multiple Project 1's
6 N/A $100 over 8 years? Are the NPVs fair
7 N/A $100 to compare?
8 N/A $100
Apply the EAA method to the previous two projects using the
computed NPVs * -1:
import numpy as np
npv_project1 = 362.58
npv_project2 = 453.64
np.pmt(rate=0.05, nper=3, pv=-1*npv_project1, fv=0)
133.14
70.18
Dakota Wixom
Quantitative Finance Analyst
Taking out a mortgage
A mortage is a loan that covers the remaining cost of a home
a er paying a percentage of the home value as a down
payment.
Example:
$500,000 house
N: Number of Payment
Periods Per Year
Example:
import numpy as np
monthly_rate = ((1+0.038)**(1/12) - 1)
np.pmt(rate=monthly_rate, nper=12*30, pv=400000)
-1849.15
Dakota Wixom
Quantitative Finance Analyst
Amortization
Principal (Equity): The amount PP: Principal Payment
of your mortgage paid that
MP: Mortgage Payment
counts towards the value of
IP: Interest Payment
the house itself
R: Mortgage Interest Rate
Interest Payment (IP P eriodic )
(Periodic)
= M P P eriodic − IP P eriodic
accumulator = 0
for i in range(3):
if i == 0:
accumulator = accumulator + 3
else:
accumulator = accumulator + 1
print(str(i)+": Loop value: "+str(accumulator))
0: Loop value: 3
1: Loop value: 4
2: Loop value: 5
Dakota Wixom
Quantitative Finance Analyst
Ownership
To calculate the percentage of the home you actually own
(home equity):
ECumulative,t
Percent Equity Ownedt = PDown + VHome
import numpy as np
np.cumsum(np.array([1, 2, 3]))
array([1, 3, 6])
Cumulative Product
import numpy as np
np.cumprod(np.array([1, 2, 3]))
array([1, 2, 6])
import numpy as np
np.cumprod(1 + np.array([0.03, 0.03, 0.05]))
Dakota Wixom
Quantitative Finance Analyst
Project proposal
Your budget will have to take into account the following:
Rent
Food expenses
Entertainment expenses
Emergency fund
Taxes
Salary growth
import numpy as np
np.cumprod(1 + np.repeat(0.03, 3)) - 1
import numpy as np
100*np.cumprod(1 + np.repeat(0.03, 3))
Dakota Wixom
Quantitative Finance Analyst
Net Worth
Net Worth = Assets - Liabilities = Equity
In ation will destroy most of your savings over time if you let
it
Diversify
Dakota Wixom
Quantitative Finance Analyst
The power of time
Goal: Save $1.0 million over 40 years. Assume an average 7%
rate of return per year.
import numpy as np
np.pmt(rate=((1+0.07)**1/12 - 1), nper=12*40, pv=0, fv=1000000)
-404.61
import numpy as np
np.pmt(rate=((1+0.05)**1/12 - 1), nper=12*40, pv=0, fv=1000000)
-674.53
import numpy as np
np.pmt(rate=((1+0.07)**1/12 - 1), nper=12*25, pv=0, fv=1000000)
-1277.07
import numpy as np
np.pmt(rate=((1+0.05)**1/12 - 1), nper=12*40, pv=0, fv=1000000)
-1707.26
import numpy as np
np.fv(rate=-0.03, nper=25, pv=-1000000, pmt=0)
466974.70
Dakota Wixom
Quantitative Finance Analyst
Congratulations
The Time Value of Money
Compound Interest
Mortgage Structures
Wealth Accumulation
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Date & time series functionality
At the root: data types for date & time information
Objects for points in time and periods
Timestamp('2017-01-01 00:00:00')
time_stamp.year
2017
time_stamp.day_name()
'Sunday'
Period('2017-01-31', 'D')
Convert pd.Period() to
period.to_timestamp().to_period('M') pd.Timestamp() and back
Period('2017-01', 'M')
pd.Timestamp('2017-01-31', 'M') + 1
index
index.to_period()
RangeIndex: 12 entries, 0 to 11
Data columns (total 1 columns):
data 12 non-null datetime64[ns]
dtypes: datetime64[ns](1)
12 rows, 2 columns
data = np.random.random((size=12,2))
pd.DataFrame(data=data, index=index).info()
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Time series transformation
Basic time series transformations include:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 504 entries, 0 to 503
Data columns (total 2 columns):
date 504 non-null object
price 504 non-null float64
dtypes: float64(1), object(1)
google.head()
date price
0 2015-01-02 524.81
1 2015-01-05 513.87
2 2015-01-06 501.96
3 2015-01-07 501.10
4 2015-01-08 502.68
Convert to datetime64
google.date = pd.to_datetime(google.date)
google.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 504 entries, 0 to 503
Data columns (total 2 columns):
date 504 non-null datetime64[ns]
price 504 non-null float64
dtypes: datetime64[ns](1), float64(1)
inplace :
don't create copy
google.set_index('date', inplace=True)
google.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 504 entries, 2015-01-02 to 2016-12-30
Data columns (total 1 columns):
price 504 non-null float64
dtypes: float64(1)
734.15
google.asfreq('D').head()
price
date
2015-01-02 524.81
2015-01-03 NaN
2015-01-04 NaN
2015-01-05 513.87
2015-01-06 501.96
price
date
2015-01-19 NaN
2015-02-16 NaN
...
2016-11-24 NaN
2016-12-26 NaN
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Basic time series calculations
Typical Time Series manipulations include:
Shi or lag values back or forward back in time
google.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 504 entries, 2015-01-02 to 2016-12-30
Data columns (total 1 columns):
price 504 non-null float64
dtypes: float64(1)
price
date
2015-01-02 524.81
2015-01-05 513.87
2015-01-06 501.96
2015-01-07 501.10
2015-01-08 502.68
price shifted
date
2015-01-02 542.81 NaN
2015-01-05 513.87 542.81
2015-01-06 501.96 513.87
google['lagged'] = google.price.shift(periods=-1)
google[['price', 'lagged', 'shifted']].tail(3)
xt − xt−1
google['diff'] = google.price.diff()
google[['price', 'diff']].head(3)
price diff
date
2015-01-02 524.81 NaN
2015-01-05 513.87 -10.94
2015-01-06 501.96 -11.91
google['pct_change'] = google.price.pct_change().mul(100)
google[['price', 'return', 'pct_change']].head(3)
price return_3d
date
2015-01-02 524.81 NaN
2015-01-05 513.87 NaN
2015-01-06 501.96 NaN
2015-01-07 501.10 -4.517825
2015-01-08 502.68 -2.177594
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Comparing stock performance
Stock price series: hard to compare at di erent levels
price
date
2010-01-04 313.06
2010-01-05 311.68
2010-01-06 303.83
313.06
True
prices.head(2)
AAPL 30.57
GOOG 313.06
YHOO 17.10
Name: 2010-01-04 00:00:00, dtype: float64
normalized = prices.div(prices.iloc[0])
normalized.head(3)
normalized = prices.div(prices.iloc[0]).mul(100)
normalized.plot()
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Changing the frequency: resampling
DateTimeIndex : set & change freq using .asfreq()
pandas API:
.asfreq() , .reindex()
2016-03-31 1
2016-06-30 2
2016-09-30 3
2016-12-31 4
Freq: Q-DEC, dtype: int64 # Default: year-end quarters
2016-03-31 1.0
2016-04-30 NaN
2016-05-31 NaN
2016-06-30 2.0
2016-07-31 NaN
2016-08-31 NaN
2016-09-30 3.0
2016-10-31 NaN
2016-11-30 NaN
2016-12-31 4.0
Freq: M, dtype: float64
ffill : forward ll
new index
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Frequency conversion & transformation methods
.resample() : similar to .groupby()
unrate.head()
UNRATE
DATE
2000-01-01 4.0
2000-02-01 4.1
2000-03-01 4.0
2000-04-01 3.8
2000-05-01 4.0
True
gdp.head(2)
gpd
DATE
2000-01-01 1.2
2000-04-01 7.8
gpd_ffill
DATE
2000-01-01 1.2
2000-02-01 1.2
2000-03-01 1.2
2000-04-01 7.8
gpd_inter
DATE
2000-01-01 1.200000
2000-02-01 3.400000
2000-03-01 5.600000
2000-04-01 7.800000
df1 df2
0 1.0 NaN
1 2.0 NaN
2 3.0 NaN
0 NaN 4.0
1 NaN 5.0
2 NaN 6.0
df1 df2
0 1 4
1 2 5
2 3 6
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Downsampling & aggregation methods
So far: upsampling, ll logic & interpolation
Now: downsampling
hour to day
ozone = ozone.resample('D').asfreq()
ozone.info()
Ozone Ozone
date date
2000-01-31 0.010443 2000-01-31 0.009486
2000-02-29 0.011817 2000-02-29 0.010726
2000-03-31 0.016810 2000-03-31 0.017004
2000-04-30 0.019413 2000-04-30 0.019866
2000-05-31 0.026535 2000-05-31 0.026018
.resample().mean() : Monthly
average, assigned to end of
calendar month
Ozone
mean std
date
2000-01-31 0.010443 0.004755
2000-02-29 0.011817 0.004072
2000-03-31 0.016810 0.004977
2000-04-30 0.019413 0.006574
2000-05-31 0.026535 0.008409
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 207 entries, 2000-01-31 to 2017-03-31
Freq: BM
Data columns (total 2 columns):
ozone 207 non-null float64
pm25 207 non-null float64
dtypes: float64(2)
Ozone PM25
date
2000-01-31 0.005545 20.800000
2000-02-29 0.016139 6.500000
2000-03-31 0.017004 8.493333
2000-04-30 0.031354 6.889474
df.resample('MS').first().head()
Ozone PM25
date
2000-01-01 0.004032 37.320000
2000-02-01 0.010583 24.800000
2000-03-01 0.007418 11.106667
2000-04-01 0.017631 11.700000
2000-05-01 0.022628 9.700000
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Window functions in pandas
Windows identify sub periods of your time series
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Expanding windows in pandas
From rolling to expanding windows
RT = (1 + r1 )(1 + r2 )...(1 + rT ) − 1
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Random walks & simulations
Daily stock returns are hard to predict
Two examples:
Generate random returns
DATE
2007-05-29 -0.008357
2007-05-30 0.003702
2007-05-31 -0.013990
2007-06-01 0.008096
2007-06-04 0.013120
DATE
2007-05-25 1515.73
Name: SP500, dtype: float64
sp500_random = start.append(random_walk.add(1))
sp500_random.head())
DATE
2007-05-25 1515.730000
2007-05-29 0.998290
2007-05-30 0.995190
2007-05-31 0.997787
2007-06-01 0.983853
dtype: float64
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Correlation & relations between series
So far, focus on characteristics of individual variables
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Market value-weighted index
Composite performance of various stocks
Calculate index
Index(['PG', 'TM', 'ABB', 'KO', 'WMT', 'XOM', 'JPM', 'JNJ', 'BABA', 'T',
'ORCL', ‘UPS'], dtype='object', name='Stock Symbol’)
tickers.tolist()
['PG',
'TM',
'ABB',
'KO',
'WMT',
...
'T',
'ORCL',
'UPS']
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Build your value-weighted index
Key inputs:
number of shares
Stock Symbol
PG 2,556.48 # Outstanding shares in million
TM 1,494.15
ABB 2,138.71
KO 4,292.01
WMT 3,033.01
XOM 4,146.51
JPM 3,557.86
JNJ 2,710.89
BABA 2,500.00
T 6,140.50
ORCL 4,114.68
UPS 869.30
dtype: float64
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Evaluate your value-weighted index
Index return:
Total index return
Contribution by component
Performance vs Benchmark
Total period return
315,037.71
TM -6,365.07
KO -4,034.49
ABB 7,592.41
ORCL 11,109.65
PG 14,597.48
UPS 17,212.08
WMT 23,232.85
BABA 27,800.00
JNJ 39,931.44
T 50,229.33
XOM 53,075.38
JPM 80,656.65
Name: 2016-12-30 00:00:00, dtype: float64
Stock Symbol
ABB 1.85
UPS 3.45
TM 5.96
ORCL 6.93
KO 7.03
WMT 8.50
PG 8.81
T 9.47
BABA 10.55
JPM 11.50
XOM 12.97
JNJ 12.97
Name: Market Capitalization, dtype: float64
14.06
weighted_returns = weights.mul(index_return)
weighted_returns.sort_values().plot(kind='barh')
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Some additional analysis of your index
Daily return correlations:
Single worksheet
Multiple worksheets
Stefan Jansen
Founder & Lead Data Scientist at
Applied Arti cial Intelligence
Congratulations!
M A N I P U L AT I N G T I M E S E R I E S D ATA I N P Y T H O N
Reading, inspecting,
and cleaning data
from CSV
I M P O R T I N G A N D M A N A G I N G F I N A N C I A L D ATA I N P Y T H O N
Stefan Jansen
Instructor
Import and clean data
Ensure that pd.DataFrame() is same as CSV source file
Stock exchange listings: amex-listings.csv
Stefan Jansen
Instructor
Import data from Excel
pd.read_excel(file, sheet_name=0)
Select first sheet by default with sheet_name=0
nyse = pd.read_excel(xls,
sheet_name=exchanges[2],
na_values='n/a')
Stefan Jansen
Instructor
Combine DataFrames
Concatenate or "stack" a list of pd.DataFrame s
Syntax: pd.concat([amex, nasdaq, nyse])
Stefan Jansen
Instructor
pandas_datareader
Easy access to various financial internet data sources
Little code needed to import into a pandas DataFrame
Federal Reserve
OANDA
Stefan Jansen
Instructor
Economic data from FRED
1 https://fred.stlouisfed.org/
1 https://fred.stlouisfed.org/
1 https://fred.stlouisfed.org/
Stefan Jansen
Instructor
Select stocks based on criteria
Use the listing information to select specific stocks
As criteria:
Stock Exchange
Sector or Industry
IPO Year
Market Capitalization
'JNJ'
'JNJ'
'ORCL'
Stefan Jansen
Instructor
Get data for several stocks
Use the listing information to select multiple stocks
E.g. largest 3 stocks per sector
Learn how to manage a pandas MultiIndex , a powerful tool to deal with more complex
data sets
AAPL 740024.467000
GOOG 569426.124504
... ...
Name: Market Capitalization, dtype: float64
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 712 entries, 2020-01-02 to 2022-10-27
Data columns (total 30 columns):
# Column Non-Null Count Dtype
-- ------ -------------- -----
0 (Adj Close, AAPL) 712 non-null float64
1 (Adj Close, GOOG) 712 non-null float64
2 (Adj Close, MSFT) 712 non-null float64
...
28 (Volume, AMZN) 712 non-null float64
29 (Volume, FB) 253 non-null float64
dtypes: float64(30)
memory usage: 172.4 KB
df = df.stack()
Stefan Jansen
Instructor
Be on top of your data
Goal: Capture key quantitative characteristics
Important angles to look at:
Central tendency: Which values are "typical"?
market_cap.mean()
3180.7126214953805
market_cap.median()
225.9684285
market_cap.mode()
0.0
648773812.8182
np.sqrt(variance)
25471.0387
market_cap.std()
25471.0387
Stefan Jansen
Instructor
Describe data distributions
First glance: Central tendency and standard deviation
How to get a more granular view of the distribution?
True
0.25 43.375930
0.75 969.905207
926.5292771575
market_cap.quantile(deciles)
0.1 4.884565
0.2 26.993382
0.3 65.714547
0.4 124.320644
0.5 225.968428
0.6 402.469678
...
count 3167.000000
mean 3180.712621
std 25471.038707
min 0.000000
25% 43.375930 # 1st quantile
50% 225.968428 # Median
75% 969.905207 # 3rd quantile
max 740024.467000
Name: Market Capitalization
count 3167.000000
mean 3180.712621
std 25471.038707
min 0.000000
10% 4.884565
20% 26.993382
30% 65.714547
40% 124.320644
50% 225.968428
60% 402.469678
70% 723.163197
80% 1441.071134
...
Stefan Jansen
Instructor
Always look at your data!
Identical metrics can represent very different data
Rugplot
ty10.describe()
DGS10
mean 6.291073
std 2.851161
min 1.370000
25% 4.190000
50% 6.040000
...
Stefan Jansen
Instructor
From categorical to quantitative variables
So far, we have analyzed quantitative variables
Categorical variables require a different approach
12
amex.Sector.apply(lambda x: x.nunique())
2002.0 19 # Mode
2015.0 11
1999.0 9
1993.0 7
2014.0 6
2013.0 5
2017.0 5
...
2009.0 1
1990.0 1
1991.0 1
Name: IPO Year, dtype: int64
2002 19
2015 11
1999 9
1993 7
2014 6
2004 5
2003 5
2017 5
...
1987 1
Name: IPO Year, dtype: int64
Stefan Jansen
Instructor
Summarize numeric data by category
So far: Summarize individual variables
Compute descriptive statistic like mean, quantiles
Examples:
Largest company by exchange
Sector
Basic Industries 724.899934
Capital Goods 1511.237373
Consumer Durables 839.802607
Consumer Non-Durables 3104.051206
Consumer Services 5582.344175
Energy 826.607608
Finance 1044.090205
Health Care 1758.709197
...
Stefan Jansen
Instructor
Many ways to aggregate
Last segment: Group by one variable and aggregate
Examples
Median market cap by sector and IPO year
IPO Year
1972.0 877.240005
1973.0 1445.697371
1986.0 1396.817381
1988.0 24.847526
...
2012.0 381.796074
2013.0 22.661533
2015.0 260.075564
2016.0 81.288336
Name: market_cap_m, dtype: float64
Stefan Jansen
Instructor
Categorical plots with seaborn
Specialized ways to plot combinations of categorical and numerical variables
Visualize estimates of summary statistics per category
Example: Mean Market Cap per Sector or IPO Year with indication of dispersion
Sector
Health Care 645
Finance 627
Technology 433
...
order = order.index.tolist()
Stefan Jansen
Instructor
Distributions by category
Last segment: Summary statistics
Number of observations, mean per category
Stefan Jansen
Instructor
What you learned
Import data from Excel and online sources
Combine datasets
Charlo e Werger
Data Scientist
Hi! My name is Charlotte
Index: A smaller sample of the market that is representative of the whole, e.g. S&P500,
Nasdaq, Russell 2000, MSCI World Index
Charlo e Werger
Data Scientist
What are portfolio weights?
Weight is the percentage composition of a particular asset in a portfolio
Weights and diversi cation (few large investments versus many small investments)
Weights determine your investment strategy, and can be set to optimize risk and expected
return
Vt −Vt−1
Returnt = Vt−1
Historic average returns o en used to calculate expected return
Warning for confusion: average return, cumulative return, active return, and annualized
return
returns.head(2)
AAPL AMZN TSLA
date
2018-03-25 NaN NaN NaN
2018-03-26 -0.013772 0.030838 0.075705
0.05752375881537723
Charlo e Werger
Data Scientist
Risk of a portfolio
Investing is risky: individual assets will go up or down
Returns spread around the mean is measured by the variance σ 2 and is a common measure
of volatility
N
2
∑ (X−μ)
σ2 = i=1
N
The correlation between asset 1 and 2 is denoted by ρ1,2 , and tells us to which extend assets
move together
The portfolio variance takes into account the individual assets' variances (σ12 , σ22 , etc), the
weights of the assets in the portfolio (w1 , w2 ), as well as their correlation to each other
The standard deviation (σ ) is equal to the square root of variance (σ 2 ), both are a measure
of volatility
This can be re-wri en in matrix notation, which you can use more easily in code:
In words, what we need to calculate in python is: Portfolio variance = Weights transposed x
(Covariance matrix x Weights)
AAPL FB GE GM WMT
AAPL 0.053569 0.026822 0.013466 0.018119 0.010798
FB 0.026822 0.062351 0.015298 0.017250 0.008765
GE 0.013466 0.015298 0.045987 0.021315 0.009513
GM 0.018119 0.017250 0.021315 0.058651 0.011894
WMT 0.010798 0.008765 0.009513 0.011894 0.041520
0.022742232726360567
2.3%
Charlo e Werger
Data Scientist
Comparing returns
1. Annual Return: Total return earned over a period of one calendar year
2. Annualized return: Yearly rate of return inferred from any time period
3. Average Return: Total return realized over a longer period, spread out evenly over the
(shorter) periods.
4. Cumulative (compounding) return: A return that includes the compounded results of re-
investing interest, dividends, and capital gains.
date
2015-01-06 105.05
Name: AAPL, dtype: float64
apple_price.tail(1)
date
2018-03-29 99.75
Name: AAPL, dtype: float64
print (total_return)
0.5397420653068692
0.14602501482708763
date
2017-12-27 170.60
2017-12-28 171.08
2017-12-29 169.23
Name: AAPL, dtype: float64
print (annualized_return)
0.1567672968419047
Charlo e Werger
Data Scientist
Choose a portfolio
Portfolio 1 Portfolio 2
It de nes an investment's return by measuring how much risk is involved in producing that
return
Where: Rp is the portfolio return, Rf is the risk free rate and σp is the portfolio standard
deviation
0.2286248397870068
0.6419569149994251
Portfolio 1 Portfolio 2
Charlo e Werger
Data Scientist
In a perfect world returns are distributed normally
1 Source: “An Introduction to Omega, Con Keating and William Shadwick, The Finance Development Center, 2002
Rule of thumb:
1 Source: h ps://brownmath.com/stat/shape.htm
1 Source: Pimco
A distribution with kurtosis <3 is called platykurtic. Tails are shorter and thinner, and central
peak is lower and broader.
A distribution with kurtosis >3 is called leptokurtic: Tails are longer and fa er, and central
peak is higher and sharper (fat tailed)
1 Source: h ps://brownmath.com/stat/shape.htm
date
2015-01-02 NaN
2015-01-05 -0.028172
2015-01-06 0.000094
Name: AAPL, dtype: float64
apple_returns.hist()
mean : 0.0006855391415724799
vol : 0.014459504468360529
skew : -0.012440851735057878
kurt : 3.197244607586669
Charlo e Werger
Data Scientist
Looking at downside risk
A good risk measure should focus on potential losses i.e. downside risk
0.07887683763760528
Charlo e Werger
Data Scientist
Active investing against a benchmark
Calculated as the di erence between the benchmark and the actual return.
Active return is achieved by "active" investing, i.e. taking overweight and underweight
positions from the benchmark.
Passive investment funds, or index trackers, don't use active return as a measure for
performance.
Tracking error is the name used for the di erence in portfolio and benchmark for a passive
investment fund.
print (grouped_df['active_weight'])
GICS Sector
Consumer Discretionary 20.257
Financials -2.116
...etc
Charlo e Werger
Data Scientist
What is a factor?
Factors in portfolios are like nutrients in food
1 Source: h ps://invesco.eu/investment-campus/educational-papers/factor-investing
# Plot results
df['corr'].plot()
Charlo e Werger
Data Scientist
Using factors to explain performance
Empirical factor models exist that have been tested on historic data.
Rpf = α + βm M KT + βs SM B + βh HM L
MKT is the excess return of the market, i.e. Rm − Rf
SMB (Small Minus Big) a size factor
b1, b2 = model.params
Charlo e Werger
Data Scientist
Professional portfolio analysis tools
Strategy works on historic data: not guaranteed to work well on future data -> changes in
markets
1 Github: h ps://github.com/quantopian/pyfolio
pf.create_position_tear_sheet(returns, positions,
sector_mappings=sect_map)
Charlo e Werger
Data Scientist
Creating optimal portfolios
df=pd.read_csv('portfolio.csv')
df.head(2)
XOM RRC BBY MA PFE
date
2010-01-04 54.068794 51.300568 32.524055 22.062426 13.940202
2010-01-05 54.279907 51.993038 33.349487 21.997149 13.741367
Charlo e Werger
Data Scientist
Remember the Efficient Frontier?
raw_weights = ef.min_volatility()
Charlo e Werger
Data Scientist
Expected risk and return based on historic data
1 Source: h ps://systematicinvestor.github.io/Exponentially-Weighted-Volatility-RCPP
symbol
XOM 0.103030
BBY 0.394629
PFE 0.186058
Remember the Sortino ratio: it uses the variance of negative returns only
PyPortfolioOpt allows you to use semicovariance in the optimization, this is a measure for
downside risk:
print(Sigma_semi)
Charlo e Werger
Data Scientist
Chapter 1: Calculating risk and return
Diversi cation
Fama French 3 factor model to breakdown performance into explainable factors and alpha
Markowitz' portfolio optimization: e cient frontier, maximum Sharpe and minimum volatility
portfolios