Safe Portfolio Optimisation
Robert Macrae
Chris Watkins
Arcus Investment Limited
Royal Holloway, University of London
Abstract
We show how to optimise portfolios safely using limited amounts of historic price data,
by regularising an estimate of the historical covariance. We describe a simple form of
regularisation with a particularly direct financial interpretation, and explain how this
avoids various problems that are commonly associated with optimised portfolios
including under-estimated risk, poor capture of expected returns and over-trading.
Introduction
Many authors describe how unconstrained optimisation using the historic covariance
matrix produces portfolios containing extreme long and short positions. These portfolios
are alarmingly sensitive to small changes in the covariance matrix, which leads to
overtrading1. Some authors2 recommend introducing constraints on maximum allowed
position sizes to prevent this. This can improve performance, but is clearly not
addressing the root of the problem. We do not believe that it is widely known that a
simple modification of the historic covariance matrix improves performance
enormously. We explain why the raw covariance matrix needs to be changed, and show
how this improves optimisation. The paper is divided into two sections that introduce
the problem and our proposed solution and illustrate them with synthetic data.
The Optimisation Problem
For simplicity we discuss only the case of unconstrained optimisation with longs and
shorts. The same considerations apply to long-only portfolios but constraints tend to
obscure the problem. Formally, let the investor’s anticipated returns relative to the risk
free rate for securities 1 ... p be !=(!1 … !p). The anticipated return of a portfolio
consisting of an amount x1 … xp of each security is then !1x1+…+!pxp. Let the
covariance matrix of future returns be V, so that the variance of the return of the i-th
security is Vii and the covariance of securities i and j is Vij (which is equal to Vji). The
variance of the portfolio return is then !ij Vij xi x j so the Anticipated Sharpe Ratio
(ASR) is:
! ! i xi
ASR =
i
! xi x j Vij
i, j
! Tx
!
x T Vx
The ASR is maximised when x = k V-1!, where k is an arbitrary constant, as the ASR
of a portfolio does not depend on its size. The ASR of the optimal portfolio is
1
(! T V !1! ) 2 .
The covariance matrix V of future returns is unknown when the portfolio is optimised,
and optimisation is performed using some estimated or approximated covariance matrix
A. The portfolio chosen is proportional to A-1!. Many methods of portfolio construction
can be expressed in these terms -- for example the widely used and very robust method
of weighting assets proportional to ! corresponds to A = I. This is the asymptote of the
regularisation of A that we will investigate. Initially, however, we will start with the
simplest choice of making A equal to the historical covariance matrix C in order to
explain the problems mentioned in the introduction.
Optimisation Using the Historic Covariance Matrix
To understand the implications of inverting a matrix it is useful to conduct a principal
components analysis. This decomposes the matrix A into two orthonormal matrices U,
V and a diagonal weight matrix W. In practice it is easier to base calculations on the
decomposition of the matrix of returns R.
R = U.W.Vt
If the mean return is subtracted from each series, we have
C = Rt.R = V.W.Ut.U.W.Vt = V.W2.Vt
This decomposition has a particularly simple interpretation. Each column of V is a
portfolio for which the sum of squares weights is one, and which is uncorrelated to all
other portfolios. For convenience we refer to these as “eigenportfolios”. The
corresponding element from W2 is the historic variance of this portfolio. Each
eigenportfolio can be treated as a synthetic asset with alpha given by Vi.! and volatility
W2ii. Optimisation of these uncorrelated assets gives them weights proportional to Vi.!
/ W2ii so there will be large weights given to those eigenportfolios with small estimated
variances.
Any practical historic covariance matrix is likely to be near singular, so some W2ii will
be very small indeed resulting in the familiar list of problems outlined in the
introduction. Assuming for ease of description that the matrix is singular:
1) Gross underestimate of Volatility -- all of the portfolio weight is applied to
eigenporfolios in the null space of C so estimated risk is zero.
2) Wasted Alpha -- alpha that lies along directions not in the null space will be
ignored in the pursuit of the apparently riskless portfolios. The optimiser will
give up real alpha exposure in the pursuit of illusory zero variance.
3) Excess Volatility -- possibilities of diversification into low, but non-zero
volatility portfolios are not exploited.
4) Overtrading -- the nullspace will shift with every trivial modifications to C,
causing excessive trading.
The same problems apply to all near-singular matrices, with the subspace in which there
is little variation taking the role of the nullspace.
Synthetic Example
To see if the problem was serious with practical amounts of data, we generated artificial
random returns data for various numbers of uncorrelated securities of equal volatility.
We chose to examine the range of 5 to 50 assets, with a number of observations ranging
from 1 to 50 times the number of assets. To permit comparison of portfolios with
different numbers of assets, we normalised the sum-of-squares position sizes to give all
portfolios equal true volatility of 5%.
For each set of data we construct the covariance matrix and invert this to find the
eigenportfolios with maximum and minimum estimated volatilities. Figure 1 is
generated by replicating each case many times and plotting the average values for both
maximum and minimum risk portfolios.
Fig 1: Volatility Estimates as a Function of Data to Assets Ratio
10%
8%
Max 50
Max 20
6%
Max 10
Max 5
Theoretical
Min 5
Min 10
4%
Min 20
Min 50
2%
0%
1.0
10.0
100.0
In practice we are only concerned with low-risk portfolios, for which the error is almost
independent of the number of assets involved. This permits us to construct Figure 2,
which shows quartiles of the ratio of out-of-sample to in-sample volatility for the
minimum risk portfolio.
Fig 2: Volatility Underestimate for Optimised Portfolios due to Finite Data
10
Upper Quartile
Lower Quartile
9
Factor Underestimate
8
7
6
5
4
3
2
1
0
1
2
3
4
5
6
7
8
9
10
Ratio of Series Length to Number of Assets
There are two key conclusions from these graphs:
1) The required historic data is roughly proportional to number of assets.
So long as you have ten times as many data points as you have assets, selection bias
leads to an average volatility underestimate of about 40%, which is not unreasonable in
comparison the other uncertainties of the risk-control process. However, if you use only
twice as many, selection bias may lead to out-of-sample volatility being 2.5 to 3.5 times
as large as you estimate. This result has been found to hold for much more general
artificial cases including those with a shared "Market" factor, and APT structure, and we
believe it to be a good general guide. This guide can, however, be optimistic with longtailed and heteroskedastic distributions, since relatively few of the observations make
any significant contribution to the covariance and the effective number of data points
may be far lower than the total number.
2) Sufficient historic data is often unavailable in practice.
At first sight, a requirement of 10 data points per asset does not appear too demanding
since daily data is widely available, but there are pitfalls in using high-frequency data.
As a rule of thumb, if you are interested in forecasting financial series on a certain timescale there are limited benefits to using data that is spaced more closely than 1/10 of this
time-scale (since short term effects dominate and closer points cease to be independent),
or greater than 10x this time-scale (since non-stationarity becomes a problem and older
points cease to be relevant). If we were to take these guides as hard limits then we
would be limited to 100 independent and relevant data points per security, so we could
safely estimate risk for optimised portfolios containing at most 10 securities! This
conclusion is over-pessimistic, but it indicates that optimisation using historic
covariances may perform poorly for any practical optimisation involving hundreds of
securities, even if several years of daily data are available.
Regularisation
This analysis suggests that regularisation is required. We have investigated several
methods of regularisation but, unless you have a strong prior about the form of the
covariance matrix, there seems little benefit to using complex methods. We prefer the
single-parameter approach of decomposing C = V.W2.Vt as above, and then setting Wnew
2
2
2
2
ii = w min + W ii. This methods is equivalent to approximating A = C + I.w min and can
be regarded as regularising by moving from C towards I to an extent given by w2min.
How should the regularising parameter be chosen? Too little regularisation will lead to
an incomplete cure for these problems, but too much will ignore the historic covariance
structure. Choosing an appropriate amount of regularisation is a compromise, on which
there is a large research literature in statistics. We have experimented with three
general-purpose methods: leave-one-out cross validation, a bootstrap method, and using
an approximate Bayesian criterion. All of these methods tended to under-regularise, but
the approach is reassuringly robust to the level of regularisation chosen and any of these
methods, or indeed an educated guess, are sufficient to achieve sensible results in
practice. We call this process “plausibility editing”, because it is removes implausible
eigenportfolio variances that are implicit in the historic covariance matrix.
Benefits of Plausibility Editing
1) Guaranteed Floor on Estimate of Risk -- by construction no unit-norm
portfolio can have a variance estimate below w2min.
2) Efficient Use of Alpha -- alpha which lies along any direction i will be
exploited to an extent given by w2min / w2i.
3) Efficient Use of Diversification Opportunities -- Only portfolios with
w2i >> w2min are unavailable for diversification
4) No Overtrading -- the portfolio is insensitive to shifts in the nullspace.
In practical cases the most volatile few eigenportfolios will account for a large
proportion of the total volatility. Their volatilities will be almost unaffected by any
sensible w2min, so these benefits come at a very limited cost to the accuracy with which
the historic volatility structure is reproduced.
Plausibility editing has also been applied to the construction of a wide range more
complex synthetic examples and to practical portfolios, and has produced consistently
sensible results. We believe it to have very broad applicability.
Conclusion
Principal components analysis of the historic covariance matrix shows why it is
inappropriate to use this matrix in portfolio optimisation, and suggests plausibility
editing as a form of regularisation which solves the associated problems. This technique
can easily be shown to work on synthetic data.
1
Black F. and Litterman R., Global Portfolio Optimisation, Financial Analysts Journal,
September-October 1992 give a description of what happens during optimisation with
an unregularised covariance matrix. They describe an optimisation method in which the
historic covariance matrix is unregularised, but the investor’s views (!s) are modified to
achieve a similar effect.
2
Frost, P.A. and Savarino, J.E., For Better Performance, Constrain Portfolio Weights,
Journal of Portfolio Management, Fall 1988, pp29-34.