Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Kings College Jan2020

Download as pdf or txt
Download as pdf or txt
You are on page 1of 75

Portfolio optimisation

when you don’t know


the future
(or the past)
Robert Carver
Kings College London
28th January 2020
Legal boilerplate bit:
Nothing in this presentation constitutes investment
advice, or an offer or solicitation to conduct
investment business. The material here is solely for
educational purposes.

I am not currently regulated or authorised by the


FCA, SEC, CFTC, or any other regulatory body to
give investment advice, or indeed to do anything
else.

Futures trading carries significant risks and is not


suitable for all investors. Back tested and actual
historic results are no guarantee of future
performance. Use of the material in this
presentation is entirely at your own risk.
Robert Carver

BA and Msc Economics

<1999 Working in Dubai, not in finance

2002-2004 Exotic Derivatives trader, Barclays Capital

2004-2006 Research Manager, CEPR

2006-2010 Head of fundamental strategies, AHL

2010-2013 Head of fixed income, AHL

2013 - Independent systematic trader
Author of 3 books:

“Systematic Trading”,

“Smart Portfolios”

“Leveraged Trading”

2018 - Visiting lecturer, QMUL
Todays talk: Portfolio optimisation

General themes:
– Understand the tools you are using
– Understand their limitations
– Understand the uncertainty of your data


Python code:
https://gist.github.com/robcarver17/5204ea7b1a
7d5723da0b01b8ba413e72

Data:
https://drive.google.com/file/d/1IBgMNuYivYR4
hb_rHuIopZKnyOR7W5yb/view?usp=sharing
Todays talk: Portfolio optimisation

The classic Markowitz optimisation

Two major problems

Building a better optimiser
The problem is to find portfolio weights to assets that
maximise [something] given some estimates of likely
future returns

Specifically in this presentation:

Max mp / sp over wi; where mp = Siwi.E(mi)


and sp = SiSj wiwjsij
Subject to Swi = 1, wi>0, sp<smax

w=weight, m = mean (asset i or portfolio p)


sp = standard deviation portfolio,
sij = covariance of i and j
So we need estimates of:

How well assets are likely to perform on average - mean
(mi)

How risky they are – standard deviation (si)

How similar their returns are- correlation (ri,j)

[From s and r we can derive the covariance that is used in


the optimisation. But this is a more useful representation]
[Later in the presentation I will use the Sharpe Ratio, mi / si,
rather than the mean]

Objective function is quadratic

I use SLSQP
http://www.pyopt.org/reference/optimizers.slsqp
.html
from scipy.optimize import minimize
import numpy as np
import scipy.stats as stats
def optimise_with_sigma(sigma, mean_list):
mus = np.array(mean_list, ndmin=2).transpose()
number_assets = sigma.shape[1]
start_weights = [1.0 / number_assets] * number_assets
# Constraints - positive weights, adding to 1.0
bounds = [(0.0, 1.0)] * number_assets
cdict = [{'type': 'eq', 'fun': addem}]
ans = minimize(
neg_sharpe_ratio,
start_weights, (sigma, mus),
method='SLSQP',
bounds=bounds,
constraints=cdict,
tol=0.00001)
weights = ans['x']
return weights
def neg_sharpe_ratio(weights, sigma, mus):
# Returns minus the portfolio sharpe ratio (as we're minimising)
weights = np.matrix(weights)
estreturn = (weights * mus)[0, 0]
std_dev = (variance(weights, sigma)**.5)
return -estreturn / std_dev

def portfolio_stdev(weights, sigma):


std_dev = (variance(weights, sigma)**.5)
return std_dev

def variance(weights, sigma):


# returns the variance (NOT standard deviation) given weights and sigma
return (weights * sigma * weights.transpose())[0, 0]

def addem(weights):
# Used for constraints
return 1.0 – sum(weights)
# Useful for working in standard deviation and correlation space
# rather than covariance space

def optimise_with_corr_and_std(mean_list, stdev_list, corrmatrix):


sigma = sigma_from_corr_and_std(stdev_list, corrmatrix)
weights = optimise_with_sigma(sigma, mean_list)
return weights

def sigma_from_corr_and_std(stdev_list, corrmatrix):


stdev = np.array(stdev_list, ndmin=2).transpose()
sigma = stdev * corrmatrix * stdev

return sigma

The classic Markowitz optimisation

Two major problems

Building a better optimiser
Instability
optimise_with_corr_and_std([.04,.04,0.04], [0.08,0.08,0.08],
np.array([[1,.9,.9],[.9,1,.9],[.9,.9,1]]))
optimise_with_corr_and_std([.04,.04,0.04], [0.08,0.08,0.08],
np.array([[1,0,0],[0,1,0],[0,0,1]]))
optimise_with_corr_and_std([.04,.04,0.04], [0.08,0.08,0.08],
np.array([[1,0.7,0],[0.7,1,0],[0,0,1]]))
optimise_with_corr_and_std([.04,.04,0.04], [0.08,0.08,0.12],
np.array([[1,0,0],[0,1,0],[0,0,1]]))
optimise_with_corr_and_std([.04,.04,0.04], [0.08,0.08,0.085],
np.array([[1,.9,.9],[.9,1,.9],[.9,.9,1]]))
optimise_with_corr_and_std([.04,.04,0.045], [0.08,0.08,0.08],
np.array([[1,.9,.9],[.9,1,.9],[.9,.9,1]]))
optimise_with_corr_and_std([.04,.04,0.06], [0.08,0.08,0.08],
np.array([[1,0,0],[0,1,0],[0,0,1]]))
Instability...
High correlation + small difference in m or s →
extreme portfolios

Things are marginally better if correlations are


very low
Uncertainty

Can we measure m, s or r accurately?
● NO

We can’t predict the future – must assume the
past will repeat itself

Need to fit a statistical model to historic returns
(e.g. estimate m, s and r)
Uncertainty
We need to fit a statistical model. But...

Do we have the right model?

Will the model change?

Do we have accurate historical parameter
estimates?
– Easy to quantify. Can capture changes in model. Worth
exploring… “The Uncertainty of the Past”
Parameter sampling distributions

All statistics estimated from past data are
subject to uncertainty

Less data → more uncertainty
Parameter sampling distributions

Let’s pretend that we have a huge bag of balls, coloured
red and green

We’ve drawn some balls out already, but we want to know
the expected probability that the next ball we draw is red

If the bag is large enough then we can assume that will be
the same as it was in the initial condition of the bag

We don’t know what the initial condition was, but we can
infer it from the balls we’ve already drawn out

The more balls we’ve already got, the more confident we
can be about the initial conditions, and thus about our
expectations
Parameter sampling distributions

Example: Drawn GGGRGGGGRRGGGGG

N=15, 3 reds, 12 greens

Probability of a red, P = 3/15 = 0.20
But how confident can we be? It’s plausible that
the real P = 0.1 or 0.3; it could even be 0.4 and
we just got unlucky and drew a lot of greens.
If we repeatedly simulate the act of drawing 15
balls from a bag with P = 0.2, we can see how
likely it is that the real P has other values
25

20

15

10

0
0.07 0.13 0.2 0.27 0.33 0.4
Parameter sampling distributions

All statistics estimated from past data are
subject to uncertainty

Less data → more uncertainty

We can quantify the uncertainty by
bootstrapping the data
Parameter weights are calculated
from statistical estimates based on
past data
…. and so subject to sampling uncertainty!
Estimating with real data
import pandas as pd

WEEKS_IN_YEAR = 365.25/7

def annual_returns_from_weekly_data(data):
return data.mean()*WEEKS_IN_YEAR

def annual_stdev_from_weekly_data(data):
return data.std()*(WEEKS_IN_YEAR**.5)

def sharpe_ratio(data):
return annual_returns_from_weekly_data(data)/annual_stdev_from_weekly_data(data)

def optimise_with_data(data):
mean_list = annual_returns_from_weekly_data(data).values
stdev_list = annual_stdev_from_weekly_data(data).values
corrmatrix = data.corr().values

weights = optimise_with_corr_and_std(mean_list, stdev_list, corrmatrix)

return weights

data = pd.read_csv("returns.csv")
SP500 US10 US5

Return 2.6% 3.6% 2.6%

Std. dev 16.6% 5.6% 3.7%

Sharpe ratio 0.16 0.64 0.71

SP500/US5 SP500/US10 US5/US10

Correlation -0.28 -0.25 0.96

Based on weekly excess returns (futures, including rolldown) from


01/1998 to 10/2019

SP500 US10 US5

Weight 14.5% 0% 85.5%


Bootstrap portfolio weights
import random

import matplotlib.pyplot as plt

def optimisation_with_random_bootstrap(data):
bootstrapped_data = get_bootstrap_series(data)
weights = optimise_with_data(bootstrapped_data)

return weights

def get_bootstrap_series(data):
length_of_series = len(data.index)
random_indices = [int(random.uniform(0,length_of_series)) for _unused in
range(length_of_series)]
bootstrap_data = data.iloc[random_indices]

return bootstrap_data

def bootstrap_optimisation_distributions(data, monte_count=1000):

dist_of_weights = []
for i in range(monte_count):
single_bootstrap_weights = optimisation_with_random_bootstrap(data)
dist_of_weights.append(single_bootstrap_weights)

dist_of_weights = pd.DataFrame(dist_of_weights)
dist_of_weights.columns = data.columns

return dist_of_weights

weight_distr = bootstrap_optimisation_distributions(data, monte_count=1000)


plt.hist(weight_distr.SP500, bins=100)

You can speed this up by using eg pool objects


Portfolio weights – S&P 500
Portfolio weights – US 5 year
Portfolio weights – US 10 year
The sampling distribution of different
inputs
How uncertain is each input into the optimisation (eg
Sharpe Ratio, correlation …)?
What effect does this have on the optimal portfolio
weights?

We can plot:

The sampling distribution of the relevant input

The effect on the portfolio weights from changing that
(and keeping all other inputs the same ‘ceritus paribus’)
Sampling distribution code

def plot_changeling(code, factor,data):


factor_distribution, weights_data = changeling_graph_data(code, factor,
data)
# two plots
fig, (ax1, ax2) = plt.subplots(2)
fig.suptitle("Distribution of %s for %s, average %.2f" % (factor, code,
np.mean(factor_distribution)))

ax1.hist(factor_distribution, bins=50)

weights_data.plot(ax=ax2)

analyse_changeling_results(code, factor, data)

def changeling_graph_data(code, factor, data):


assert factor in factor_list
factor_distribution = get_factor_distribution(code, factor, data)
weights_data = get_weights_data(code, factor, data, factor_distribution)

return factor_distribution, weights_data


func_dict = dict(sharpe=sharpe_ratio, stdev = annual_stdev_from_weekly_data,
corr = correlation_for_code)
factor_list = list(func_dict.keys())

def get_factor_distribution(code, factor, data, monte_length=10000):

factor_func = func_dict[factor]
factor_distr = []
for not_used in range(monte_length):
bootstrap_data =get_bootstrap_series(data)
factor_estimate = factor_func(bootstrap_data)
factor_estimate_for_code = factor_estimate[code]
factor_distr.append(factor_estimate_for_code)

# note we drop the outlying values which are probably extreme


factor_distr.sort()
drop_values = int(monte_length*.005)
factor_distr = factor_distr[drop_values:-drop_values]

return factor_distr

def split_corr_code(code):
split_code = code.split("/")
instr1 = split_code[0]
instr2 = split_code[1]
return instr1, instr2

def join_corr_code(instr1, instr2):


return '%s/%s' % (instr1, instr2)

def correlation_for_code(data):
corr = data.corr()
instruments = data.columns
results_dict = {}
for instr1 in instruments:
for instr2 in instruments:
code = join_corr_code(instr1, instr2)
results_dict[code] = corr.loc[instr1, instr2]

return results_dict
def get_weights_data(code, factor, data, factor_distribution,
points_to_use=100):
factor_range = [np.min(factor_distribution), np.max(factor_distribution)]
factor_step = (factor_range[1] - factor_range[0])/points_to_use
factor_values_to_test = np.arange(start = factor_range[0], stop =
factor_range[1], step = factor_step)
weight_results = []
for factor_value_to_use in factor_values_to_test:
weights = optimise_with_replaced_factor_value(data, code, factor,
factor_value_to_use)
weight_results.append(weights)
# nice format
weight_results = pd.DataFrame(weight_results)
weight_results.columns = data.columns

return weight_results
def optimise_with_replaced_factor_value(data, code, factor, factor_value_to_use):

mean_list, stdev_list, corrmatrix = replace_factor_value(data, code, factor,


factor_value_to_use)

weights = optimise_with_corr_and_std(mean_list, stdev_list, corrmatrix)

return weights

def replace_factor_value(data, code, factor, factor_value_to_use):


stdev_list = annual_stdev_from_weekly_data(data)
if factor=="stdev":
stdev_list[code] = factor_value_to_use
stdev_list = stdev_list.values

sharpe_list = sharpe_ratio(data)
if factor=="sharpe":
sharpe_list[code] = factor_value_to_use
sharpe_list = sharpe_list.values

mean_list = stdev_list * sharpe_list


corrmatrix = data.corr()
if factor=="corr":
corrmatrix = replace_corr_value(corrmatrix, code, factor_value_to_use)

corrmatrix = corrmatrix.values

return mean_list, stdev_list, corrmatrix

def replace_corr_value(corrmatrix, code, factor_value_to_use):


# code should be in form instr1/instr2
instr1, instr2 = split_corr_code(code)
corrmatrix.loc[instr1, instr2] = factor_value_to_use
corrmatrix.loc[instr2, instr1] = factor_value_to_use

return corrmatrix
def analyse_changeling_results(code, factor, factor_distribution, data):
factor5 = np.percentile(factor_distribution, 5)
factor95 = np.percentile(factor_distribution, 95)
print("There is a 90%% chance that %s for %s was between %.2f and %.2f" %
(factor, code, factor5, factor95))
weights5 = optimise_with_replaced_factor_value(data, code, factor, factor5)
weights95 = optimise_with_replaced_factor_value(data, code, factor,
factor95)

if factor=="corr":
code = split_corr_code(code)[0]

instruments = list(data.columns)
code_index = instruments.index(code)

weight5_code = weights5[code_index]
weight95_code = weights95[code_index]

print("Giving weights for %s between %.3f and %.3f" % (code, weight5_code,


weight95_code))
plot_changeling("SP500", "sharpe", data)

There is a 90% chance that sharpe for SP500 was between -0.18 and 0.50
Giving weights for SP500 between 0.098 and 0.170
plot_changeling("US10", "sharpe", data)

There is a 90% chance that sharpe for US10 was between 0.31 and 0.98
Giving weights for US10 between 0.000 and 0.846
plot_changeling("US5", "sharpe", data)

There is a 90% chance that sharpe for US5 was between 0.37 and 1.05
Giving weights for US5 between 0.000 and 0.860
plot_changeling("SP500", "stdev", data)

There is a 90% chance that stdev for SP500 was between 0.16 and 0.18
Giving weights for SP500 between 0.149 and 0.142
plot_changeling("US10", "stdev", data)

There is a 90% chance that stdev for US10 was between 0.05 and 0.06
Giving weights for US10 between 0.000 and 0.000
plot_changeling("US5", "stdev", data)

There is a 90% chance that stdev for US5 was between 0.04 and 0.04
Giving weights for US5 between 0.857 and 0.853
plot_changeling("SP500/US10", "corr", data)

There is a 90% chance that corr for SP500/US10 was between -0.32 and -0.19
Giving weights for SP500 between 0.145 and 0.145
plot_changeling("SP500/US5", "corr", data)

There is a 90% chance that corr for SP500/US5 was between -0.34 and -0.22
Giving weights for SP500 between 0.160 and 0.128
plot_changeling("US5/US10", "corr", data)

There is a 90% chance that corr for US5/US10 was between 0.95 and 0.96
Giving weights for US5 between 0.855 and 0.855
So what?

Always:
– Uncertainty in standard deviation estimates (which is tiny) has zero
effect on portfolio weights

For pairs of assets with high correlation:
– Uncertainty in correlations (which is tiny) has zero effect on portfolio
weights
– Uncertainty in Sharpe Ratios (which is large) has a massive effect on
weights: We see flipping from 0% to ~85% in a tiny range for bonds

For pairs of assets with low correlation:
– Uncertainty in correlations (which is medium sized) has no significant
effect on portfolio weights: For SP500 / US5 there is a 90% chance
the correct SP500 weight was between 12.8% and 16.0%
– Uncertainty in Sharpe Ratios (which is large) has a significant effect
on weights: For S&P 500 there is a 90% chance the correct weight
was between 9.8% and 17%
If you prefer closed forms...
How uncertain are SR estimates?

Under assumptions of independent Gaussian
returns the variance wSR of the SR estimate is:
(1+0.5SR2)/N
Where SR is the mean estimate of the Sharpe
Ratio

A two sided 95% confidence interval is:
(SR – 1.96* √wwSR, SR + 1.96* √wwSR)

53
How uncertain are estimates of s?
The two sided confidence interval is:
[s√{(n-1) / c21-a/2,n-1 } , s√{(n-1) / c2a/2,n-1 }]
Where a is the power (0.05 for a two sided 95%
confidence range), n is the sample size, s is the
standard deviation estimate and c2 is the chi-
squared distribution.

54
How uncertain are correlation
estimates?
We use the Fisher transformation:
zr = 0.5 ln [(1+r) / (1-r)]
z then has upper and lower confidence intervals:
zL = zr – z1-a/2 √{1/(n-3)}, ZU = zr + z1-a/2 √{1/(n-3)}]
Where Zk is the kth percentage point of the standard normal
distribution. We then transform these back into the correlation
space:
rL = [exp(2ZL) – 1] / [exp(2ZL)+1],
rU = [exp(2ZU) - 1] / [exp(2ZU)+1]

55
An aside

We have been talking about “the uncertainty of the past”

But what matters is the future. How well can we predict
future inputs into the optimisation process?

It turns out the ranking is the same as for past
uncertainty:
– Standard deviation is very predictable (R2 0.38, using last months value
to predict next months)
– Correlations are somewhat predictable (R2 0.2, between 1 and 6
months)
– Sharpe Ratios (and means) are very hard to predict (R2<0.05 at all
horizons)
Todays talk: Portfolio optimisation

The classic Markowitz optimisation

Two major problems

Building a better optimiser
Don’t:

Use constraints: in-sample and breaks
optimiser

Tweak inputs
Try:

Bootstrapping the weights
Bootstrap code
weight_distr = bootstrap_optimisation_distributions(data, monte_count=1000)
weight_distr.mean()
Try:

Bootstrapping the weights
– Can also bootstrap a distribution
– Can also bootstrap the efficient frontier
– Computationally slow
– Doesn’t use constraints efficiently
– Doesn’t distinguish between different forms of uncertainty

Set some/all inputs to identical values
Set some/all inputs to identical
values


Using equal portfolio weights, if correlations and
standard deviations are sufficiently similar (eg
S&P 500 stocks)

Inverse volatility weighting (takes account of
standard deviations but not correlations)

Setting all Sharpe Ratios to be equal to each
other in the inputs for the optimisation
def optimise_with_identical_values(data, identical_SR=False,
identical_stdev=False, identical_corr=False):
if identical_stdev:
stdev_list = get_identical_stdev(data)
else:
stdev_list = annual_stdev_from_weekly_data(data).values

if identical_SR:
mean_list = get_means_assuming_identical_SR(data, stdev_list)
else:
mean_list = annual_returns_from_weekly_data(data).values

if identical_corr:
corr_matrix = get_identical_corr(data)
else:
corr_matrix = data.corr().values

weights = optimise_with_corr_and_std(mean_list, stdev_list,


corr_matrix)

return weights

def get_identical_corr(data):
instrument_count = len(data.columns)
estimated_corr = data.corr().values
avg_corr = get_avg_corr(estimated_corr)
corrmatrix = boring_corr_matrix(instrument_count, offdiag=avg_corr)

return corrmatrix

def get_identical_stdev(data):
estimated_stdev = annual_stdev_from_weekly_data(data)
instrument_count = len(data.columns)
average_stdev = estimated_stdev.mean()
stdev_list = [average_stdev]*instrument_count

return stdev_list

def get_means_assuming_identical_SR(data, using_stdev):

average_SR = sharpe_ratio(data).mean()
mean_list = [stdev*average_SR for stdev in using_stdev]

return mean_list
from copy import copy

def boring_corr_matrix(size, offdiag=0.99, diag=1.0):


"""
Create a boring correlation matrix

:param size: dimensions


:param offdiag: value to put in off diagonal
:param diag: value to put in diagonal
:return: np.array 2 dimensions, size
"""
size_index = range(size)

def _od(i, j, offdiag, diag):


if i == j:
return diag
else:
return offdiag

m = [[_od(i, j, offdiag, diag) for i in size_index] for j in size_index]


m = np.array(m)
return m

def get_avg_corr(sigma):
new_sigma = copy(sigma)
np.fill_diagonal(new_sigma, np.nan)
if np.all(np.isnan(new_sigma)):
return np.nan

avg_corr = np.nanmean(new_sigma)

return avg_corr
Try:

Bootstrapping the weights

Set some/all inputs to identical values
– But what if the data clearly shows differences?

Bayesian
Bayesian shrinkage
Average between estimates and a prior

For example; you roll a die and it comes up 6 ten times


in a row. What is the probability of the next roll being 6?

Your prior is that the die is fair, 1/6 = 0.1667


Your estimate based on the data is 1.0000
The probability (‘posterior’) will be somewhere between
the two. The weight you place on each value will
depend on several factors, including how many times
the dice has been rolled, and how much you trust the
guy that gave you the die
Bayesian shrinkage
m = (1-w)mp + wmE

mE estimated mean
w shrinkage factor (w=0, use just prior; w=1, use
estimate)
mP prior

We can apply similar formula to Sharpe Ratios, r


and s
Bayesian code
def optimise_with_shrinkage_parameters(data, SR_shrinkage=0.0,
corr_shrinkage=0.0, stdev_shrinkage=0.0):
prior_stdev_list = np.array(get_identical_stdev(data))
estimated_stdev_list = annual_stdev_from_weekly_data(data).values
stdev_list = prior_stdev_list*stdev_shrinkage + estimated_stdev_list*(1-
stdev_shrinkage)

prior_mean_list = np.array(get_means_assuming_identical_SR(data, stdev_list))


estimated_mean_list = annual_returns_from_weekly_data(data).values
mean_list = prior_mean_list*SR_shrinkage + estimated_mean_list*(1-
SR_shrinkage)

prior_corr_matrix = get_identical_corr(data)
estimated_corr_matrix = data.corr().values
corr_matrix = prior_corr_matrix*corr_shrinkage + estimated_corr_matrix*(1-
corr_shrinkage)

weights = optimise_with_corr_and_std(mean_list, stdev_list, corr_matrix)

return weights
Full shrinkage on all Equal weights
inputs
Full shrinkage on Inverse volatility
Sharpe Ratios and portfolio
Correlations; none
on std. deviation
No shrinkage on Normal optimisation
anything
Bayesian
Advantages
– Intuitive results (with no shrinkage will recover original
optimisation results; with full shrinkage will recover prior;
shrinkage related to uncertainty)
– Can use different shrinkage for different parameter estimates
– Can be used with constraints
– Computationally fast (single optimisation)

Disadvantages
– What prior to use? (no cheating!) (Black-Litterman?)
– How much shrinkage? (depends on data, underlying stability of
problem)
Try:

Bootstrapping the weights

Set some/all inputs to identical values

Bayesian
– Black-Litterman
Try:

Bootstrapping the weights

Set some/all inputs to identical values

Bayesian
– Black-Litterman

Hierarchical methods
– HRP
– https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2708678
– https://mlfinlab.readthedocs.io/en/latest/implementations/portfolio_optimisati
on.html#hierarchical-risk-parity-hrp

Heuristic methods
– Handcrafting
– https://qoppac.blogspot.com/2018/12/portfolio-construction-through.html

Neural networks, machine learning, …
– Be careful!
Conclusions...

Understand your tools

Understand your data

Be careful out there!
My website: My code:
systematicmoney.org github.com/robcarver17/

My blog: Twittering:
qoppac.blogspot.com @investingidiocy

You might also like