Computers & Operations Research 39 (2012) 805–819
Contents lists available at ScienceDirect
Computers & Operations Research
journal homepage: www.elsevier.com/locate/caor
Combined forecasts in portfolio optimization: A generalized approach
Ozden Ustun a, Refail Kasimbeyli b,,1
a
b
Dumlupinar University, Faculty of Engineering, Department of Industrial Engineering, The Central Campus, 43100 Kutahya, Turkey
Izmir University of Economics, Faculty of Engineering and Computer Sciences, Department of Industrial Systems Engineering, Balc- ova 35330, Izmir, Turkey
a r t i c l e in fo
abstract
Available online 29 September 2010
In this paper a general mathematical model for portfolio selection problem is proposed. By considering
a forecasting performance according to the distributional properties of residuals, we formulate an
extended mean–variance–skewness model with 11 objective functions. Returns and return errors for
each asset obtained using different forecasting techniques, are combined in optimal proportions so as to
minimize the mean absolute forecast error. These proportions are then used in constructing six criteria
related to the mean, variance and skewness of return forecasts of assets in the future and forecasting
errors of returns of assets in the past. The obtained multi-objective model is scalarized by using the
conic scalarization method which guarantees to find all non-dominated solutions by considering
investor preferences in non-convex multi-objective problems. The obtained scalar problem is solved by
utilizing F-MSG algorithm. The performance of the proposed approach is tested on a real case problem
generated on the data derived from Istanbul Stock Exchange. The comparison is conducted with respect
to different levels of investor preferences over return, variance, and skewness and obtained results are
summarized.
& 2010 Elsevier Ltd. All rights reserved.
Keywords:
Multiple objective programming
Risk management
Global optimization
Portfolio optimization
Conic scalarization
F-MSG algorithm
1. Introduction
In recent years, risk management has attracted a great deal of
attention from both researchers and practitioners. Complexity
and uncertainty in many practical problems require new methods
and tools. Risk management can be defined as the process of
identification, analysis, and either acceptance or mitigation of
uncertainty in investment decision-making. Risk management
can be used as a tool for greater rewards, not just control against
loss. Risks are studied from different silo disciplinary perspectives, with a discussion of how various methods and tools are
used to optimize risk management the field of financial risk
management has experienced a fast and advanced growth at an
incredible speed [1–3]. The traditional financial risk management
approach is based on mean–variance framework of portfolio
theory, that is, selection and diversification [4]. It is recognized in
finance that risk can be understood in two types: systematic (nondiversifiable) risk, which is positively correlated with the rate of
return, and unsystematic (diversifiable) risk, which can be
diversified by increasing the number of securities invested [5–7].
The mean–variance model originally introduced by Markowitz
[4] plays an important and critical role in modern portfolio theory.
Corresponding author. Tel.: +90 232 4888418; fax: + 90 232 4888475.
E-mail addresses: oustun@dpu.edu.tr (O. Ustun),
refail.kasimbeyli@ieu.edu.tr (R. Kasimbeyli).
1
The author published under the name Rafail N. Gasimov until 2007.
0305-0548/$ - see front matter & 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cor.2010.09.008
Markowitz’s portfolio model is a bi-criteria optimization problem
where a reasonable trade-off between return and risk is
considered by minimizing risk for a given level of expected
return, or by maximizing expected return for a given level of risk.
Since Markowitz’s pioneering work [4] was published, the mean–
variance model has revolutionized the way people think about a
portfolio of assets. This model has gained widespread acceptance
as a practical tool for portfolio optimization, and numerous later
studies have examined the issue of risk diversification. With the
continuous effort of various researchers, Markowitz’s seminal
work has been widely extended. The mean–variance analysis
provides a basis for the derivation of the equilibrium model
known variously as the Capital Asset Pricing Model (CAPM),
Sharpe–Lintner model, black model and two-factor model [8].
Extended models mainly include the mean semi-variance model
[9], mean absolute deviation model [10], mean target model [11]
and some frictional models, such as those reported in [9]. A
distinct characteristic of these studies is that only the first two
moments of return distributions are taken into account. But there
is a controversy over the issue of whether higher moments should
be accounted for in portfolio selection. Many academic researchers (see, e.g., [12]) argued that the higher moments cannot be
neglected unless there is a reason to believe that the asset returns
are normally distributed or the utility function is quadratic, or
that the higher moments are irrelevant to the investor’s decision.
As a result, in some recent studies, such as [12–15], the concept of
mean–variance trade-off has been extended to include the
skewness of return in portfolio selection. In other words, a
806
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
mean–variance–skewness trade-off model for portfolio selection
has been generated. One problem with the mean–variance–
skewness trade-off model for portfolio selection is that it is not
easy to find a trade-off between the three objectives because this
is a non-smooth multi-objective optimization problem. Until now,
many methods used to tackle this problem have been restricted to
goal programming or linear programming techniques. For example, Lai [13] gave a goal programming procedure that performs
portfolio selection based on competing and conflicting objectives
by maximizing both expected return and skewness while
minimizing the risk associated with the return (i.e., variance).
Liu et al. [15] also transformed the mean–variance–skewness
model with transaction costs into a linear programming problem
and verified its efficiency via a numerical example. However, the
main disadvantage of these algorithms is that they generally
converge slowly, if at all [16].
Furthermore, most existing studies only present some numerical examples with realized data. Because investment decisions
are oriented towards the future, realized returns are of no use
here. The benefits of portfolio optimization critically depend on
how accurately the involved measures are forecasted. In general,
opinion is that, it is very difficult to predict the expected return,
variance, and skewness, and therefore reducing forecast errors in
portfolio management becomes very important. The role of
covariance matrix of asset returns and its estimation procedure
has been addressed by using shrinkage estimators [17].
Most portfolio selection models in the literature only consider
the distribution properties of investment returns; other factors,
such as investors’ risk preferences and trading strategies, are not
taken into account. Unlike others, these important factors were
considered in the integrated approaches to portfolio selection
proposed by Leung et al. [14]. The authors proposed a multiobjective approach which explicitly encompasses the average
return, expected variance, and skewness in combining forecasts.
They dealt with the practical issue of how to incorporate different
analysts’ forecasts into an investment trading model and used the
polynomial goal programming approach of Lai [13] to scalarize
the multi-objective model. The scalarized problem, then is solved
by using a standard program package.
In [18], a new artificial intelligence technique is introduced
and a fast and efficient neural network-based methodology to
solve the trade-off of the mean–variance–skewness model is
proposed. The authors used the weighted sum scalarization
method to scalarize the multi-objective model. In the solution
process of the scalarized model which is a constrained optimization problem, the first order Karush–Kuhn–Tucker conditions are
used. Note that both the weighted sum scalarization method and
the Karush–Kuhn–Tucker conditions are well-known tools efficiently used in solving convex problems.
In this paper, by considering the forecasting performance
according to the distributional properties of the residuals, we
formulate an extended mean–variance–skewness model with 11
objective functions. Returns and return errors obtained using
different forecasting techniques for each asset are combined in
optimal proportions so as to minimize the mean absolute forecast
error. These data are then used in constructing the six criteria
related to the mean, variance and skewness of return forecasts of
assets in the future and forecasting errors of returns of assets in
the past. We construct the 11-objective portfolio optimization
model, which involves the objective functions representing
distributional properties of error forecasts, the long period
preferences and the number of assets in the portfolio of an
investor.
We propose the integrated approach to the solution of this
problem, that combines four stages—forecasting, modeling,
scalarization and solving the scalarized problem.
One of the main difficulties in solving the multi-objective
portfolio optimization problems arises in the scalarization stage
of the constructed mathematical model. The scalarization method
used for this purpose, must guarantee the calculation of all the
efficient solutions of the problem under consideration. Unfortunately, because of the non-convexity of such models, many
methods existing in the literature cannot guarantee to calculate
the efficient solutions corresponding to the preferences of
decision maker (see [19]).
In this paper, to scalarize the multi-objective mathematical
model constructed, we use the conic scalarization method
suggested by Kasimbeyli (see [19,20]). This method allows to
find all the efficient (Pareto optimal) solutions corresponding to
decision maker’s preferences in multi-objective optimization
problems without convexity and boundedness assumptions. We
present new features of this method and apply it to combine the
11-objective functions of our model. The problem obtained
after scalarization is not only non-convex but also non-differentiable (as many scalarized portfolio optimization problems). In the
fourth stage, the F-MSG algorithm (see, [21]) is utilized to solve the
non-differentiable and non-convex scalar optimization problem.
To the best of our knowledge, the 11-objective portfolio
optimization model proposed in this paper is not investigated in
the literature earlier. Some of the objective functions involved
were considered by different authors. For example, 7th and 8th
objective functions were considered by Ehrgott et al. [22,23]. The
first (expected performance in the future), the third (variance)
and the fifth objective functions (skewness) were considered by
Yu et al. [18]. One of the main differences of our approach is that,
in addition to these objective functions, our model involves also
the ones representing distributional properties of error forecasts
(second, fourth, sixth and 11th objective functions), the long
period preferences (ninth objective function) and the number of
assets (10th objective function) in the portfolio of an investor.
Besides, the size of the problem solved in the case study is larger
(N ¼20) than those considered in the literature. Additionally, the
proposed model eliminates some drawbacks of the previous
models. Ehrgott et al. [22] ignored the skewness of return. Their
five-objective model is based on purely historical data and the
authors used a utility function method for scalarizing the
proposed model. The differences between the utility function
and the conic scalarization approaches are presented in [23].
Leung et al. [14] used the polynomial goal programming to
scalarize a mean–variance–skewness model. The most important
weakness of traditional forms of goal programming approach is
that, it does not guarantee to calculate efficient (Pareto optimal)
solutions. Leung et al. [14] did not describe the optimization
method used for solving the scalarized problem. Yu et al. [18]
used the weighted-sum scalarization method to scalarize a mean–
variance–skewness model. The weighted-sum scalarization
method guarantees to calculate all efficient solutions only in
convex case. They used the first order Karush–Kuhn–Tucker
conditions and ordinary Lagrangian function to find optimal
solution of the scalarized problem. Thus the approach used is
restricted by differentiability and convexity conditions.
The performance of the integrated approach is tested on a real
case problem constructed on the data derived from Istanbul Stock
Exchange. The comparison is conducted with respect to different
levels of investor preferences over return, variance, and skewness
and obtained results are summarized.
In the case study, the error distribution of the predictions of
monthly returns are tested for normality using Wilk–Shapiro
W-test. The evidence indicates that many of error distributions
exhibit significant skewness. These findings confirm our argument that higher moments of error distributions cannot be
neglected in the portfolio selection. The empirical evidence
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
suggests that the incorporation of higher moments of error
forecasts into the investor’s portfolio decision causes a major
change in the construction of the optimal portfolio.
The paper is organized as follows. In Section 2, a model for
combining forecasts based on mean absolute forecasting errors is
described. A general portfolio selection model combined with
different analyst’ forecasts given in Section 3. In Section 4, the
scalarization method for the multi-objective model is explained. In
this section a nice illustrative example whose data are drawn from
[43] is also presented. The explanations how to use the F-MSG
algorithm for solving the obtained scalar problem are given in
Section 5. An empirical example demonstrating the efficiency of the
proposed approach is given in Section 6. In this section, to compare
the performance of the proposed approach, an additional model
suggested by Yu et al. [18] is analyzed. Finally, Section 7 presents
some conclusions on the paper.
2. Combination of different forecasts
Before any multi-objective investment portfolio can be constructed, we need to obtain the forecasts from different sources (or
analysts). To accomplish this task, we mimic the estimates made by
different analysts by creating the forecast series using various
forecasting techniques. In a study exploring the effects of combining
forecasts generated by different techniques, Makridakis and Winkler
[24] tested 1001 time series and its subset of 111 series with 14
individual forecasting techniques. They found that the variance of
overall mean absolute percentage error decreases exponentially
with the increasing number of techniques. According to their
empirical results, there is a drastic improvement in this performance
measure when the number of techniques used in forecast combination increases from 2 to 4. While a broad consensus is that forecast
combination improves forecast accuracy, there is no consensus
concerning how to form the forecast weights. The most recent
literature has focused on two particularly appealing methods—simple averaging and Bayesian averaging [25].
Guided by these results, three forecasting techniques – autoregressive integrated moving average, exponential smoothing, and time
series decomposition – are chosen to generate series of forecasts. In
the following sections we briefly summarize the logic of each
forecasting technique and describe the model used for the combination of different forecasts based on mean absolute forecasting error.
2.1. Autoregressive integrated moving average (ARIMA) technique
Since the seminal work of Box and Jenkins, ARIMA models have
been widely used in the forecasting of economic and financial time
series [26]. The general ARIMA(p,n,q) model can be written as
gt ¼ a0 þa1 gt1 þ þ ap gtp þ et þb1 et1 þ þbq etq ,
ð1Þ
where gt is the series of interest and et is the residual in period t, ai
and bj are coefficients of AR and MA terms, respectively, for
i¼1,y,p; j¼1,y,q. p is the number of autoregressive terms, n
indicates the degree of differencing used to obtain stationarity, and q
is the number of lagged forecast errors in Eq. (1). In our experiment,
we follow Box and Jenkins’ three stage method aimed at selecting an
appropriate model for estimating and forecasting a univariate time
series. This method consists of identification, estimation, and
diagnostic checking stages [14].
2.2. Exponential smoothing (ES)
Exponential smoothing is a procedure for continually revising
an estimate in the light of more recent experience. This method is
807
based on averaging (smoothing) past values of a series in a
decreasing (exponential) manner. The observations are weighted,
with more weight being given to the more recent observations.
More formally, the exponential smoothing equation is
g^ t þ 1 ¼ agt þð1 þ aÞg^ t ¼ g^ t þ aðgt g^ t Þ ¼ g^ t þ aet ,
ð2Þ
where g^ t is the forecast for period t, gt is the actual observation
made in period t, a is the smoothing constant (0 o a o1), and et is
the forecast error or residual in period t (et ¼ gt g^ t ). The value
assigned to a is the key to the analysis. One method of estimating
the a is an iterative procedure that minimizes the mean square
error due to a. The value of a is generally determined by using
standard statistical softwares such as MINITAB, RATS, SAS, etc.
2.3. Time series decomposition (TSD)
One approach to the analysis of time series data involves an
attempt to identify the component factors that influence each of
the periodic values in a series. This identification procedure is
called decomposition. Each component is identified separately so
that the time series can be projected into the future and used for
both short-run and long-run forecasting. For a series measured in
time periods less than a year, such as monthly and quarterly time
series, each original value is considered to be the multiplicative
product of four components, as shown in the following equation:
g ¼ TR CY SE IR,
ð3Þ
where g is the actual observations, TR is the trend component, CY
is the cyclical component, SE is the seasonal component, and IR is
the irregular component. It would be impractical to supply a
complete procedure of the TSD method, instead we refer the
reader to the book by Hanke and Reitsch [27, pp. 302–356].
2.4. A model for combining forecasts based on mean absolute
forecast error (MAFE)
The traditional trading approach which is guided by a single
forecasting technique often leads to a unique trading pattern, and
requires an investor to choose a specific forecasting model.
However, there is no guarantee that the performance of a certain
model is consistent from period to period. Instead, our approach
considers the optimal convex combination of data obtained by
using different forecasting techniques. The weights for constructing an optimal convex combination are found by minimizing the
mean absolute forecast error. Constructing a forecast portfolio by
this manner can possibly alleviate the impact resulted from an
unexpected performance deterioration of a single forecasting
model.
The idea of ‘‘forecast combination’’ was first studied by Bates
and Granger [28]. In their seminal work, the authors proposed a
linear combination of two forecasts with weights selected to
minimize the predicted forecast error variance. This research
study was later extended by Newbold and Granger [29] which
formed consensus from more than two forecasts.
The superior performance of combining forecasts over individual approaches was illustrated in the extensive empirical
evaluation conducted by Makridakis and Winkler [24] and Russel
and Adam [30]. A comprehensive review of works in this research
stream can be found in Clemen [31].
Let eitk be the forecast error generated by a forecasting
technique k in time period t for asset i, where k¼1,2,y,M,
t¼ 1,2,y,T and i¼1,2,y,N. M denotes the number of forecasting
models, T denotes the number of historical time periods, and N is
the number of available assets. A mathematical model minimizing
the mean absolute forecast error for each asset i is formulated
808
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
R i be the average return of asset i based on the combined
as follows:
ðMAFEi Þ
minimize
T X
M
X
forecasts over the next P months (in percent):
lik jeitk j=T
t ¼1k¼1
subject to
M
X
lik ¼ 1, lik Z 0, k ¼ 1,2, . . . ,M,
eit be the combined forecasting error of the return of asset i
where lik is the weight of the forecast k for asset i. MAFE measures
the forecast accuracy by averaging the magnitudes of the forecast
errors (absolute values of each error). MAFE is most useful when
the analyst wants to measure forecast error in the same units as
the original series. There are many other methods for evaluating a
forecasting technique such as the mean squared error (MSE), the
mean absolute percentage error (MAPE), the mean percentage
error (MPE), etc.
A series of the predicted returns and their residuals for each
asset are simultaneously outputs and inputs of the forecasting
and the mathematical modeling stages, respectively.
Let Ritk be a forecast of return generated by a forecasting
technique k in (future) time period t of asset i, where i¼1,2,y,N,
k¼1,2,y,M, t¼ T+1,y,T+P and P denotes the number of future time
T
periods. Also, let li ¼ ðli1 , li2 , . . . , liM Þ be the transpose of the
optimal weight vector obtained from the problem (MAFEi) used to
combine these forecasts. Consequently, the return of asset i in period
t based on the combined forecasts, can be described as follows:
M
X
lik Ritk , i ¼ 1,2, . . . ,N; t ¼ T þ 1, . . . ,T þP:
ð4Þ
k¼1
eit ¼
obtained from Eq. (5) for time period t,
e i be the average forecasting error of the return of asset i over
the last T months (in percent):
e i ¼ ð1=TÞ
T
X
eit ,
t¼1
sij be the covariance between the returns of assets i and j,
based on the combined forecasts over the next P months:
TX
þP
sij ¼ ð1=PÞ
ðRit R i ÞðRjt R j Þ,
t ¼ T þ1
s^ ij be the covariance between the combined forecasting errors
of assets i and j over the last T months:
s^ ij ¼ ð1=TÞ
T
X
ðeit e i Þðejt e j Þ,
t¼1
s3i denotes the skewness of the returns of asset i, based on the
combined forecasts over the next P months:
By the same manner, we can describe the combined error. Again, let
eitk be a forecasting error of return generated by a forecasting
technique or an analyst k in time period t of asset i. Then, the
forecasting error of return of asset i in period t based on the
combined forecasts can be described as
M
X
Rit ,
t ¼ T þ1
k¼1
Rit ¼
TX
þP
Ri ¼ ð1=PÞ
lik eitk , i ¼ 1,2, . . . ,N; t ¼ 1,2, . . . ,T:
ð5Þ
k¼1
The combined returns and errors of returns for each asset
found by using the optimal weights will be used as input
parameters in constructing a mean–variance–skewness model in
the next section.
3. An extended mean–variance–skewness model
TX
þP
s3i ¼ ð1=PÞ
ðRit R i Þ3 ,
t ¼ T þ1
siij and sijj be the coskewness’ between the returns of assets i
and j, based on the combined forecasts over the next P months:
TX
þP
siij ¼ ð1=PÞ
ðRit R i Þ2 ðRjt R j Þ,
t ¼ T þ1
TX
þP
sijj ¼ ð1=PÞ
ðRit R i ÞðRjt R j Þ2 ,
t ¼ T þ1
s^ 3i denotes the skewness of the forecasting error of asset i,
based on the combined forecasts over the last T months:
In this section we present the portfolio optimization problem
with 11-objective functions in the following form.
Parameters: Let
pit be the price (value) of the asset i in the time period t,
R1it be the relative change of the price of asset i over time period
t (in percent):
R1it ¼ ðpi,t pi,t1 Þ=pi,t1 ,
3
s^ i ¼ ð1=TÞ
T
X
ðeit e i Þ3 ,
t¼1
s^ iij and s^ ijj be the coskewness’ between the forecasting errors of
assets i and j, based on the combined forecasts over the last T
months:
s^ iij ¼ ð1=TÞ
T
X
ðeit e i Þ2 ðejt e j Þ,
t¼1
R12
i be the relative change of the price of asset i over the last 12
months (in percent):
R12
i ¼ ðpi,T pi,T12 Þ=pi,T12 ,
R36
be the relative change of the price of asset i over the last
i
three years (in percent):
R36
i ¼ ðpi,T pi,T36 Þ=pi,T36 ,
s^ ijj ¼ ð1=TÞ
T
X
ðeit e i Þðejt e j Þ2 ,
t¼1
lwi be the minimum proportion that must be held of asset i if
any of the asset i is held,
upi be the maximum proportion that can be held of asset i if
any of the asset i is held,
vi be the Wilk–Shapiro test statistic value according to the
Rit be the return of asset i in period t based on the combined
forecasts obtained from Eq. (4),
residual distributions of the asset i,
sri be the number of stars assigned to the asset i.
809
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
Decision variables: Let,
and errors are defined as
xi be the proportion (0 rxi r 1) held of asset i,
zi be 1, if any of asset i is held, 0 otherwise.
f5 ðxÞ ¼
f6 ðxÞ ¼
f1 ðxÞ ¼
xi R i :
individual asset i measures the average forecasting error of the
return of the asset over the last T months (in percent) and is
therefore a measure for the long term expected error. The
average forecasting error of a portfolio is then given by
xi e i :
i¼1
A general opinion is that, it is very difficult to predict the
expected returns for all assets, and that the optimization
process is extremely sensitive to forecasting errors. Therefore,
the average forecasting error of the portfolio cannot be
neglected in the portfolio optimization process. For example,
assume that two assets with annual estimated returns
R 1 ¼ R 2 ¼ 5 (in percent) are available for the investment. Since
these returns are equal, the corresponding assets might
equally be preferred by an investor. Now assume that the
average forecasting errors of these return values are given to
be e1 ¼ 0:1 and e2 ¼ 0:5 (in percent), respectively. When the
average returns and forecasting errors of assets are simultaneously considered, the second asset becomes more attractive
than the first one. The reason is the following. The positive
average forecasting error shows that the realized return
exceeds the estimated one. Since an investor is interested in
maximizing the average return, he (or she) will prefer the
assets with positive average forecasting error. This analysis
leads to the assertion that, in general, the asset with the
greater average forecasting error should be preferred.
Variances: The third objective function is a risk function related
to the covariances of returns based on the combined forecasts
over the next P months:
f3 ðxÞ ¼
N X
N
X
xi xj sij :
i¼1j¼1
The fourth objective function is also a risk function related to
the covariances of forecasting errors based on the combined
forecasts over the last T months:
f4 ðxÞ ¼
N
X
ðx2i xj siij þxi x2j sijj Þ,
i ¼ 1 j ¼ 1ðj a iÞ
3
x3i s^ i þ 3
N
X
N
X
ðx2i xj s^ iij þ xi x2j s^ ijj Þ:
i ¼ 1 j ¼ 1ðj a iÞ
The positive skewness of forecasting errors shows that the
realized skewness of returns exceeds their estimated skewness. Therefore, investor desires to maximize the skewness of
excess return of a portfolio.
12-Month performance: The 12-month performance R12
of an
i
individual asset i measures the relative change of the price of
the asset over the last 12 months (in percent) and is therefore a
measure for the short term expected return. In particular, the
12-month performance of a portfolio is given by
f7 ðxÞ ¼
Average forecasting error: The average forecasting error e i for an
f2 ðxÞ ¼
N
X
N
X
xi R12
i
i¼1
i¼1
N
X
N
X
i¼1
Expected performance in the future: The expected performance
N
X
x3i s3i þ 3
i¼1
Objective functions: The generalized mean–variance–skewness
model of this paper consists of the following 11-objective
functions.
R i of an individual asset i measures the average return of the
asset over the next P months (in percent). We assume that
the returns follow some statistical distribution, so that the
expected return in the future can be obtained as a ‘‘weighted
sum’’ of the expected return of the individual assets in the
portfolio. Therefore the corresponding objective function f1(x)
is formulated as
N
X
N X
N
X
xi xj s^ ij :
i¼1j¼1
Skewnesses: In the similar way, the fifth and sixth objective
functions related to the skewnesses of the portfolio returns
again assuming a statistical distribution of the R12
values.
i
3-Year performance: By using the long term performance R36
i of
an individual asset i, the corresponding 3-year performance of
a portfolio is given by
f8 ðxÞ ¼
N
X
xi R36
i
i¼1
again assuming a statistical distribution of the R36
values.
i
Investor’s star ranking: The 5-year average monthly performance of an asset i is calculated as
1
Ri ¼
60
X
R1it =60:
t¼1
Then the investor or financial manager assigns between one
star (for a relatively poor performance) and up to five stars (for
a very good performance) by comparing the 5-year average
periodic performances of the available assets. We will assume
in the following that the ranking is additive in the sense that
the ranking of a portfolio of investment funds can be obtained
as the weighted sum of the rankings of the individual
investment funds in the portfolio. Consequently, the ninth
objective function can be written as
f9 ðxÞ ¼
N
X
xi sr i :
i¼1
Number of assets in the portfolio: The constant and variable
transaction costs generally depend on the number of assets in
a portfolio. Therefore, investor may be interested in minimizing the number of assets in a portfolio f10(z):
f10 ðzÞ ¼
N
X
zi :
i¼1
The Wilk–Shapiro test statistic: If the combined forecast model
is correct, the residuals should be normally distributed. For
taking into account the test results for normality of residual
distributions, we use the Wilk–Shapiro test (WS-test). This test
is selected as the methodology, since it is found to be the best
for testing normality under various alternative specifications
of the probability distribution [32]. Specifically, the WS-test
tests the hypothesis:
Hip0: The parent population is normally distributed.
Hip1: The parent population is not normally distributed.
To determine whether to reject the null hypothesis of
normality it is necessary to examine the probability associated
810
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
with the WS-statistic. If this probability is less than some
specified level (such as 0.05), it means that the null hypothesis
cannot be supported at the five percent level of significance.
We will assume in the following that the V values of the WStest statistics are additive in the sense that the V value of a
portfolio of investment assets can be obtained as the weighted
sum of the V values of the individual investment assets in the
portfolio. The weighted V value of a portfolio is given by
f11 ðxÞ ¼ X T V ¼
N
X
xi vi :
i¼1
Thus, our generalized mean–variance–skewness model for portfolio optimization problem (with 11-objective functions) can be
written as follows:
8
N
X
>
>
>
>
max f1 ðxÞ ¼
xi R i ,
>
>
>
>
i¼1
>
>
>
N
>
X
>
>
>
max f2 ðxÞ ¼
xi e i ,
>
>
>
>
i¼1
>
>
>
>
N X
N
>
X
>
>
>
min f3 ðxÞ ¼
xi xj sij ,
>
>
>
>
i ¼ 1j ¼ 1
>
>
>
>
N X
N
>
X
>
>
>
min f4 ðxÞ ¼
xi xj s^ ij ,
>
>
>
>
i ¼ 1j ¼ 1
>
>
>
>
N
N
N
>
X
X
X
>
>
>
x3i s3i þ3
ðx2i xj siij þxi x2j sijj Þ,
max f5 ðxÞ ¼
>
>
>
>
i
¼
1
i
¼
1
j
¼
1ðj
a
iÞ
>
>
>
>
>
N
N
N
X
X
X
>
>
3
>
>
x3i s^ i þ3
ðx2i xj s^ iij þ xi x2j s^ ijj Þ,
max f6 ðxÞ ¼
>
>
>
>
i
¼
1
i
¼
1
j
¼
1ðj
a
iÞ
>
>
<
N
X
ð6Þ
ðPÞ : max f ðxÞ ¼
xi R12
7
>
i ,
>
>
>
i
¼
1
>
>
>
N
>
X
>
>
>
xi R36
max f8 ðxÞ ¼
>
i ,
>
>
>
i
¼
1
>
>
>
>
N
>
X
>
>
>
xi sri ,
max f9 ðxÞ ¼
>
>
>
>
i¼1
>
>
>
>
N
>
X
>
>
>
zi ,
> min f10 ðzÞ ¼
>
>
>
i¼1
>
>
>
>
N
X
>
>
>
>
xi vi ,
max f11 ðxÞ ¼
>
>
>
i¼1
>
>
>
>
N
>
X
>
>
>
s:t: hðxÞ ¼
xi 1 ¼ 0,
>
>
>
>
i¼1
>
>
>
>
>
lwi zi r xi r upi zi , i ¼ 1, . . . ,N,
>
>
>
:
x A ½0,1, z A f0,1g, i ¼ 1, . . . ,N:
i
i
The constraint
N
X
xi ¼ 1
ð7Þ
i¼1
ensures that the proportions add to one. The constraints
lwi zi rxi rupi zi ,
i ¼ 1, . . . ,N
ð8Þ
ensures that if the asset i is held (zi ¼ 1) its proportion xi must lie
between lwi and upi, whilst if the asset i is not held (zi ¼0) its
proportion xi is zero.
It is important to note that the objective functions f6, f7, f8 and
f9(x) can be non-concave in general. With the combination of
continuous and binary variables, the resulting problem becomes a
non-convex and non-differentiable multi-objective mixed integer
programming problem.
In this paper, to find a set of efficient solutions of (P) we use a
special scalarization method for scalarizing the obtained multiobjective programming problem.
4. Scalarization
We begin this section with standard definitions from multiobjective programming.
Let Rpþ :¼ fy ¼ ðy1 , . . . ,yp Þ : yk Z 0,k ¼ 1, . . . ,pg, and let Y Rp be
non-empty set.
Definition 1.
1. An element yA Y is called non-dominated if ðfygRpþ Þ \Y ¼ fyg.
2. An element y A Y is called weakly non-dominated if ðfyg
intðRpþ ÞÞ \ Y ¼ |.
3. An element yA Y is called properly non-dominated (in the
sense of Benson) if y is a non-dominated element of Y and
the zero element of Rp is a non-dominated element of
clðconeðY þ Rpþ yÞÞ, where clðYÞ denotes the closure of a set Y
and coneðYÞ :¼ fay : a Z 0,yA Yg.
The set of all non-dominated, weakly non-dominated and
properly non-dominated elements of Y are denoted by Yn, Ywn
and Ypn, respectively.
Consider a multi-objective programming problem:
min½f1 ðxÞ, . . . ,fp ðxÞ,
ð9Þ
xAX
where X is a non-empty set of feasible solutions and fi : X-
R,i ¼ 1, . . . ,p are real-valued functions. Let f(x)¼(f1(x),y, fp(x))
and let Y :¼ f ðXÞ.
In the context of the portfolio optimization problem under
consideration, we will have
(
)
N
X
N
N
X ¼ ðx,zÞ A R þ f0,1g :
xi ¼ 1,lwi zi r xi r upi zi , 8i :
i¼1
Definition 2. A feasible element x A X is called efficient, weakly
efficient or properly efficient solution of multi-objective programming problem (9) if y¼ f(x) is a non-dominated, weakly nondominated or properly non-dominated element of Y, respectively.
The sets of efficient, weakly efficient and properly efficient
solutions of the multi-objective programming problem are
denoted by Xe, Xwe and Xpe, respectively.
Over the last decades, various interactive methods and
decision support systems have been developed to deal with
multi-objective programming problems. Although some of the
methods work well only on problems with convex objective
functions and a convex feasible region, the most of real life
problems such as portfolio optimization, are not convex.
Scalarization method is an important tool in the study of multiobjective optimization problems. It is a standard technique for
finding efficient solutions. Via scalarization, the multi-objective
problem is transformed into a single objective optimization problem
involving possibly some parameters or additional constraints.
Solving the single objective problem for a range of parameter values
yields (some) efficient solutions to the multi-objective programme.
One way of constructing such an optimization problem is through
using scalarizing functions. Multi-objective optimization methods
utilize different scalarizing functions in different ways.
In a large variety of methods, norms have usually been used to
identify the non-dominated elements that are closest to some
(reference) point. In particular, the family of lp norms has
811
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
extensively been studied by many researchers, including Yu [33],
Zeleny [34], Steuer [35], Lewandowski and Wierzbicki [36], and
many others. Gearhart [37] studied a family of norms including
the lp norms. The l1 norm and the augmented l1 norm turned out
to be very useful in generating non-dominated solutions of
general continuous and discrete multiple criteria programs and
led to the well-known weighted (augmented) Tchebycheff
scalarization and its variations. Kaliszewski [38] introduced a
modified l1 norm and showed its applicability in generating nondominated solutions. Recently, a new class of norms called the
oblique norms was proposed in Schandl et al. [39] to generate the
properly non-dominated points.
Comparisons of multi-objective optimization methods have
been realized in different ways. Wierzbicki [40,41] produced
seminal research on reference point methods and investigated the
characteristics of various scalarizing functions. These scalarizing
functions were designed to have a significant advantage over goal
programming by producing only non-dominated, or Paretooptimal, solutions. Several widely known scalarizing functions
and their modifications are discussed and compared in Miettinen
and Makela [42]. A nice comparison between the utility function
approach and the use of the conic scalarization approach for
calculating efficient solutions of multi-objective portfolio optimization problems is given in [23].
By summarizing the properties of different scalarization
techniques, the following questions related to a quality or
performance of the technique can be drawn:
Does this method always yields efficient solutions?
If so, can all efficient solutions be detected by this way?
Will the preference and reference point information of the
decision maker be taken into consideration by this technique?
The simple investigation on different techniques may show
that not all methods can positively answer all the questions raised
here.
In this section we explain a conic scalarization method, the
idea of which is based on the characterization theorem for Benson
properly non-dominated elements given in [20]. A full explanation of this method is given in [19].
In [20], a special class of monotonically increasing convex
functions is introduced. By using this class of functions,
characterization theorems via scalarization, for Benson properly
non-dominated elements of a set, are proven without any
restrictions on the set. Recently, the same class of functions are
used by Kasimbeyli who proved theorems fully characterizing
efficient solutions of multi-objective problems via scalarization
[19]. Now we present the main characterization theorem without
proof, which can easily be obtained from Theorems 5.8 and 5.9
presented in [19]. Let
W :¼ fðw, aÞ A Rpþ R : 0 r a ominfw1 , . . . ,wp gg:
ð10Þ
Theorem 1. Let a ¼ ða1 , . . . ,ap Þ A Rp be a reference point for problem
(9). The element y^ is a Benson properly non-dominated element of Y
if and only if there exists a pair ðw, aÞ A W such that, y^ A Y is an
optimal solution to the scalar minimization problem
min
yAY
p
X
k¼1
wk ðyk ak Þ þ a
p
X
jyk ak j:
ð11Þ
k¼1
The following proposition explains some important properties
of scalarizing functions gðw, aÞ : Rp -R used in Theorem 1 which
are defined for each pair of scalarizing parameters ðw, aÞ as
P
P
gðw, aÞ ðyÞ ¼ pk ¼ 1 wk yk þ pk ¼ 1 jyk j.
Proposition 1. The following statements hold:
(i) For each ðw, aÞ A W with a 4 0, the ‘‘lower’’ sublevel set C ðw, aÞ
of function gðw, aÞ defined by
C ðw, aÞ ¼ fy A Rp : wuy þ aJyJ1 r0g
ð12Þ
Rpþ ,
is a pointed closed convex cone containing
where wuy ¼
Pp
w
y
is
a
scalar
product
of
vectors
w
and
y,
and JyJ1 ¼
Pkp ¼ 1 k k
jy
j
is
an
l
norm
of
the
vector
y.
Note
that
a
cone
K is called
1
k
k¼1
pointed if K \ ðKÞ ¼ f0g.
(ii) ðw, a1 Þ,ðw, a2 Þ A W and a1 r a2 imply C ðw, a2 Þ D C ðw, a1 Þ.
Proof. (i) Let ðw, aÞ A W with a 4 0. It is clear that the set C ðw, aÞ
is a cone, and since the function gðw, aÞ ðyÞ ¼ wuyþ aJyJ is convex and
continuous, it follows that C ðw, aÞ is convex and closed. Now let
yA C ðw, aÞ. Then
wuyþ aJyJ r0:
ð13Þ
If to assume yA C ðw, aÞ, then we have
wuyþ aJyJ r0:
ð14Þ
Eqs. (13) and (14) together with the condition a 40 lead to y¼0
and thus prove that C ðw, aÞ is a pointed cone. Now let y A Rpþ .
Then y A Rpþ and since ðw, aÞ A W we have wðyÞaJyJ Z 0, or
wuyþ aJyJ r0 which means by the definition of C ðw, aÞ that
yA C ðw, aÞ.
(ii) Let ðw, a1 Þ,ðw, a2 Þ A W, a1 r a2 and let yA C ðw, a2 Þ. Then,
0 Z wuyþ a2 JyJ Z wuyþ a1 JyJ,
which implies that yA Cðw, a1 Þ and thus proves the lemma.
&
Now we explain how these theorems can be used to scalarize
the given multi-objective problem (9).
Suppose that decision maker specifies her (or his) preferences
by giving a weighting vector w ¼(w1,y,wp), with wi Z0,w1 þw2 þ
þwp ¼ 1, and a reference point a ¼(a1,y,ap). The reference
point a can be used in situations when the decision maker may
desire to calculate the non-dominated elements that are closest
(in a certain sense) to some point a.
It is evident that if x^ A X is an efficient solution to problem (9),
then it is also an efficient solution to the ‘‘shifted’’ multi-objective
programme
minðf1 ðxÞa1 , . . . ,fp ðxÞap Þ,
xAX
ð15Þ
where a A Rp is an arbitrary vector.
In this case we can formulate the scalarized problem, in the
form of (11):
(
)
p
p
X
X
wk ðfk ðxÞak Þ þ a
jfk ðxÞak j ,
min
xAX
k¼1
k¼1
where w ¼(w1,y,wp) is a weighting vector with wi Z0,w1 þw2 þ
þwp ¼ 1, a ¼(a1,y,ap) is a reference point specifying decision
maker’s preferences and a is a non-negative augmentation
parameter satisfying the condition 0 r a r minfw1 , . . . ,wp g. Theorem 1 implies that every solution to the scalar problem (11)
for 0 r a o minfw1 , . . . ,wp g is a proper efficient solution to the
problem (15), and therefore a proper efficient solution to the
problem (9). Note that taking a ¼ 0 reduces (11) to a weighted
sum scalarization problem. If x^ A X is an optimal solution to the
problem (11) for the case a ¼ 0, then we have
^
wuðf ðxÞaÞ Zwuðf ðxÞaÞ,
8x A X
or
^
wuðyaÞ Z wuðyaÞ,
8y A Y ¼ f ðXÞ,
^ The last inequality can also be written as
where y^ ¼ f ðxÞ.
^ Z0,
wuðyyÞ
8y A Y,
812
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
^ is a supportwhich means that the non-dominated point y^ ¼ f ðxÞ
ing point of the set Y¼ f(X) with respect to the hyperplane
^ ¼ 0g.
H ¼ fy A Rp : wuðyyÞ
Now let a be a positive number less than minfw1 , . . . ,wp g and
let x be an optimal solution of the scalar problem (11). Then
we have
wuðf ðxÞaÞ þ aJf ðxÞaJ1 Z wuðf ðxÞaÞ þ aJf ðxÞaJ1 ,
the weighted-sum and the conic scalarization methods.
xi R i ,
i¼1
4 X
4
X
minf3 ðxÞ ¼
xi xj sij ,
i¼1j¼1
8x A X
Subject to
or
wuðyaÞ þ aJyaJ1 ZwuðyaÞ þ aJyaJ1 ,
8yA Y ¼ f ðXÞ,
ð16Þ
f10 ðzÞ ¼
4
X
zi ¼ 2,
i¼1
where y ¼ f ðxÞ. The last inequality implies
wuðyyÞ þ aJyyJ1 Z 0,
4
X
maxf1 ðxÞ ¼
8y A Y ¼ f ðXÞ:
ð17Þ
hðxÞ ¼
It follows from the relation (17) that the conic surface
Cðw, a; yÞ ¼ fy A Rp : wuðyyÞ þ aJðyyÞJ1 ¼ 0g
0 rxi r zi ,
with vertex at y is a supporting cone to the set Y at the nondominated point y in the following sense:
xi A ½0,1,
ð19Þ
ð20Þ
The supporting philosophy described by the last two relations
can equivalently be written also in the form
Yfyg C þ ðw, aÞ ¼ fy A Rp : wuy þ aJyJ1 Z0g
ð21Þ
and
f0g Yfyg \ Cðw, aÞ ¼ fyA Rp : wuy þ aJyJ1 ¼ 0g:
i ¼ 1, . . . ,4,
zi A f0,1g, i ¼ 1, . . . ,4:
The preference weights of the objective functions f1(x) and
f3(x) are selected as w3 ¼ 1 w1 using w1 ¼ 0.01t, t ¼0,1,y,100
and a ¼ minfw1 ; w3 g0:001. The reference point is taken as
a¼(0.0026, 0.000656). Efficient points for this problem, calculated
by the conic and weighted sum scalarization methods, are
depicted in Figs. 1 and 2, respectively. These figures nicely
demonstrate the superiority of the conic scalarization method.
The efficient points shown in the triangle in Fig. 1 are calculated
by the conic scalarization method for the same set of weighting
parameters used in the weighted sum scalarization. As it can be
seen from Fig. 2, these points could not be calculated by the
weighted sum scalarization method. Thus, this example demonstrates that there may be some ‘‘hidden’’ efficient solutions of
portfolio optimization problems which cannot be detected by the
weighted sum scalarization method. The use of the conic
scalarization method allows decision maker to calculate also
such solutions.
Finally note that, earlier this method was successfully applied
in solving multi-objective faculty course assignment problems
and a 1.5-dimensional multi-objective assortment problems (see,
[44–46]).
and
y Y \ Cðw, a; yÞ:
xi 1 ¼ 0,
i¼1
ð18Þ
Y C þ ðw, a; yÞ ¼ fy A Rp : wuðyyÞ þ aJðyyÞJ1 Z0g
4
X
ð22Þ
As it was analyzed above, for a ¼ 0 the cone Cðw, aÞ becomes a
hyperplane. By Proposition 1, this cone becomes as smaller as a
increases, and approaches to a smallest cone containing Rpþ for
a maximal possible value of a, i.e., a ¼ minfw1 , . . . ,wp g.
So, the scalarized problem (11) allows to find all properly
efficient solutions of the multi-objective problem (9) corresponding to the decision maker’s preferences w and a by varying the
augmentation parameter a between zero and minfw1 , . . . ,wp g.
The following example is prepared to illustrate the role of the
conic scalarization method.
5. Modified subgradient algorithm based on feasible values
(F-MSG algorithm)
Example 1. In order to illustrate the effect of cardinality
constraints we consider the small (four asset) problem whose
data are drawn from the FTSE data set, [43]. We calculate efficient
elements of the following two-objective problem by using both
After scalarization, the obtained scalar problem is not only
non-convex but also a non-differentiable optimization problem.
0.006
0.005
Return
0.004
0.003
0.002
0.001
0.000
0
0.0005
0.001
0.0015
0.002
0.0025
Variance
Fig. 1. Efficient points obtained by the conic scalarization method for Example 1.
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
813
0.006
0.005
Return
0.004
0.003
0.002
0.001
0.000
0
0.0005
0.001
0.0015
Variance
0.002
0.0025
Fig. 2. Efficient points obtained by the weighted-sum scalarization method for Example 1.
An important class of exact methods used for solving mixed
integer problems with non-convex objective and/or constraints is
based on penalty function or augmented Lagrangian approaches.
The simple examples show that in many cases the solution to the
penalty problem can be made sufficiently close to the optimal
solution of the original problem by choosing the penalty
parameter large enough. However, solving a penalty problem
with a very large penalty parameter leads to computational
difficulties (see, for example, [47]).
In recent years, different augmented Lagrangian duality
schemes which are able to eliminate duality gap in a wide class
of non-convex optimization problems and to provide solution
algorithms have been studied (see, for example, [48–52]).
Modified subgradient (MSG) algorithm proposed by Gasimov
[49] has been designed for solving sharp augmented Lagrangian
dual problems. However, as many subgradient algorithms, this
algorithm also requires finding an unconstrained global minimum
of the Lagrangian function and knowing an optimal value of the
problem under consideration in order to update dual variables at
each iteration.
The Modified Subgradient Algorithm Based on Feasible Values,
or briefly F-MSG algorithm, recently developed by Kasimbeyli
et al. [21], is a generalized version of the modified subgradient
(MSG) algorithm. The new algorithm is proposed for solving sharp
augmented Lagrangian dual problems, in which some drawbacks
of the MSG algorithm were eliminated.
In this paper we utilize the F-MSG algorithm for solving the
scalarized multi-objective portfolio optimization model developed in Section 3. Next we present the F-MSG algorithm.
Consider a nonlinear programming problem:
ðSPÞ
minimize f0 ðxÞ
subject to x A X satisfying hðxÞ ¼ 0,
ð23Þ
and
Hðu,cÞ ¼ minLðx,u,cÞ,
xAX
respectively. The dual problem (P*) is then given by
ðP* Þ :
max
ðu,cÞ A Rm R þ
Hðu,cÞ:
Step 1. Choose positive numbers y1 , y2 , D1 , F and a number H1.
Set n¼ 1, t ¼0, r¼ 0.
Step 2. Choose ðun1 ,c1n Þ A Rm R þ and ‘ð1Þ 4 0 and set k¼1,
uk ¼un1, ck ¼ cn1.
Step 3. Given (uk,ck), solve the following constraint satisfaction
problem (CSP) P(Hn):
Find an element x A X such that f0 ðxÞ þ ck JhðxÞJ
/uk ,hðxÞS rHn ,If a solution to (25) does not exist
(for example, if ‘ðkÞ 4 F) then go to Step 6, otherwise, if a
solution xk exists then check whether h(xk)¼0 or not. If h(xk)¼0
(or if Jhðxk ÞJ r y1 ) then go to Step 5, otherwise go to Step 4.
Step 4. Update dual variables as
uk þ 1 :¼ uk bsk hðxk Þ,
ð26Þ
ck þ 1 :¼ ck þ ð1 þ bÞsk Jhðxk ÞJ,
ð27Þ
where sk is a positive stepsize parameter defined as
0 o sk ¼
dbðHn Lðxk ,uk ,ck ÞÞ
2
½b þ ð1 þ bÞ2 Jhðxk ÞJ2
ð24Þ
ð28Þ
or
0 o sk ¼
d½bðHn Lðxk ,uk ,ck ÞÞ þ ðcck ÞJhðxk ÞJ
2
½b þ ð1 þ bÞ2 Jhðxk ÞJ2
ð29Þ
with b 4 0, and 0 o d o2. We will also require the stepsize sk
corresponding to the dual variables (uk,ck) to satisfy the
following property:
sk Jhðxk ÞJ þ ck Juk J 4 ‘ðkÞ,
where S is a subset of a metric space X, and f0 : X-R and h :
X-Rm are given functions.
Denote by R þ ,J J and / , S the set of non-negative numbers,
the Euclidean norm and the Euclidean inner product on Rm ,
respectively. The sharp augmented Lagrangian L : X Rm
R þ -R and the dual function H : Rm R þ -R associated with
the problem (SP) are defined as
Lðx,u,cÞ :¼ f0 ðxÞ þcJhðxÞJ/u,hðxÞS
Algorithm 1. F-MSG algorithm.
ð30Þ
Set k¼k+ 1, update ‘ðkÞ in such a way that ‘ðkÞ- þ 1 as
k- þ 1, and repeat Step 3.
Step 5. Let xk be a solution to (25) with h(xk)¼0. In this case
L(xk,uk,ck)¼ f0(xk). Set r ¼r + 1 and check t. If t ¼0 then set
Dn þ 1 ¼ Dn , otherwise set Dn þ 1 ¼ 12 Dn . Check Dn þ 1 . If Dn þ 1 o y2
then stop, f0(xk) is an approximate optimal value, xk is an
approximate primal solution and (uk,ck) is an approximate dual
solution; otherwise set Hn þ 1 ¼ minff0 ðxk Þ,Hn Dn þ 1 g,n ¼ n þ 1
and go to Step 2.
Step 6. Set t¼t + 1. If r ¼0 then set Dn þ 1 ¼ Dn ; otherwise set
Dn þ 1 ¼ 12 Dn . Check Dn þ 1 . If Dn þ 1 o y2 then stop; in this case
the last calculated feasible solution xk is an approximate
primal solution and (uk; ck) is an approximate dual solution;
f0(xk) is an approximate optimal value. Otherwise set
Hn þ 1 ¼ Hn þ Dn þ 1 , n¼n + 1 and go to Step 2.
814
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
Steps 3 and 4 are concerned with finding a feasible solution to
the CSP P(Hn) for a given real number Hn.
A positive number y1 chosen at Step 1 is an error bound for
checking the feasibility of a solution xk found to CSP P(Hn) at Step
3. If an exact feasible solution at kth iteration with h(xk)¼ 0 cannot
be found (within a reasonable time limit), the decision maker may
check whether Jhðxk ÞJ r y1 or not.
A positive number F chosen at Step 1 is a large number used as
a termination tolerance for estimating whether the right-hand
side number Hn in (25) is feasible or not. That is if the value of ‘ðkÞ
in (30) becomes greater than F not reaching an approximate or
exact feasible solution, then this means that a feasible solution to
(25) with the right-hand side number Hn does not exist, so the
number Hn has to be identified as an infeasible value. In this case
Step 6 of the outer loop will be visited. Otherwise the updating
formulas (26), (27) together with the stepsize parameters
calculated from (28) (or (29)) and (30) will force the sequence
of solutions xk of (25), to converge to a feasible solution at Step 4.
A positive number D1 chosen at Step 1 is a stepsize parameter
used for updating the right-hand side numbers at steps 5 and 6 in
the F-MSG algorithm. Choosing a suitable value for this parameter
is very important for the performance of the algorithm. In some
cases it may happen that a solution xk to (25) with the right-hand
side number Hn is a feasible solution (with h(xk)¼0) which
provides the same value Hn to the objective function f0. In such
cases the next right-hand side value Hn + 1 will be defined as
Hn þ 1 ¼ Hn Dn þ 1 , where a value of Dn þ 1 is equal to Dn or Dn =2 in
dependence of the value of a parameter t (see steps 5 and 6). It is
clear that in dependence of, how successful the value for D1 has
been chosen, the updated value of Hn + 1 may become (sufficiently)
close to the optimal value of the problem under consideration, or
conversely, it may go (too) far from the optimal value. This may
seriously affect a convergence rate of the algorithm.
When the values of parameters t and r both become different
from zero, this means that an interval with feasible and infeasible
endpoints has been found. From this iteration on, a value of Dn
will decrease at each outer iteration, which results in decreasing
of a length of the interval with feasible and infeasible endpoints.
Since the optimal value of the problem belongs to this interval,
algorithm stops if a value of the parameter Dn þ 1 becomes less
than the error bound e2 , and states that a last feasible value is an
approximate optimal value.
The method applied for solving the subproblem (25) at Step 3
is another factor which may affect the performance of the F-MSG
algorithm. In this paper we use the following way for finding a
solution to the subproblem (25). We ‘‘minimize’’ a fictitious
objective function (which is identically constant) with respect to
the single inequality from this subproblem. Such a problem can be
solved by using different standard optimization packages.
6. Empirical applications
In this section, the four-stage integrated approach is applied to
a real case multi-objective portfolio selection problem constructed by using the sample data taken from Istanbul Stock
Exchange (ISE) official web site (http://www.ise.org/). The data
consist of monthly rates of returns from January 2000 to
December 2004, a total of T¼60 months of observations of
N ¼20 stocks. The selected 20 stocks are chosen from the main list
of firms in ISE 100 index. These firms are ABANA ELEKTROMEKANIK, ALORKO HOLDING, ALORKO CARRIER, ALCATEL TELETAS,
ALORKO GMYO, ALTERNATIFBANK, ALTINYILDIZ, ANADOLU CAM,
ANADOLU SIGORTA, ARCELIK, ALTERNATIF YAT.ORT., ARSAN
TEKSTIL, ASELSAN, FINANSBANK, F-M IZMIT PISTON, FINANS
YAT.ORT., FRIGO PAK GIDA, FORD OTOSAN, GARANTI BANK.
Financial characteristics of these firms can be easily found on
the official web site of ISE (www.ise.gov.tr). One month returns
(Turkish Lira (TL) based) of stocks are regularly calculated and
published on this web site.
The portfolio selection problem related to these data is
formulated and solved into four stages. These stages are forecasting,
the model formulation, the scalarization of the multi-objective
model and the solution of the so-obtained scalarized problem.
Stage 1: Forecasting stage: In the forecasting stage, ARIMA,
exponential smoothing, and time series decomposition techniques (M ¼3) are used. The MINITAB statistical program (release
13.32) was used for determining the model parameters and
specifications for all the three forecasting techniques. By using
these techniques, series of a total of P¼12 monthly forecasts from
January 2005 to December 2005 for each stock are generated.
Obtained forecasts are combined using the optimal weights
found by solving the mean absolute forecast error minimization
problem for each stock i¼1,y,20 (see, Section 3.4):
minimize
ðMAFEi Þ
60 X
3
X
lik jeitk j=60
t ¼1k¼1
3
X
subject to
lik ¼ 1, lik Z 0, k ¼ 1,2,3,
k¼1
where k¼1 corresponds to ARIMA, k¼ 2 corresponds to exponential smoothing, and k¼3 corresponds to time series decomposition forecasting technique, and lik is the weight of forecasting
technique k for asset i. The problem (MAFEi) for each stock i is
solved by using GAMS/CPLEX linear programming solver. The
optimal weights lik in the combined forecast portfolio are
presented in Table 1.
For each set of optimal weights ðli1 , li2 , li3 Þ, the combined
forecast error eit is calculated as follows:
eit ¼
3
X
lik eitk , i ¼ 1,2, . . . ,20; t ¼ 1,2, . . . ,60,
ð31Þ
k¼1
where eikt is the forecast error generated by a forecasting technique
k¼1,2,3 in time period t¼1,2,y,60 for asset i¼1,y,20 (see, Section
3.4). A mean squared error formula used for evaluating the
performances of the individual (MSEik) and combined forecasts
(MSEi) are presented below:
MSEik ¼
60
X
ðeitk Þ2 =60,
i ¼ 1,2, . . . ,20, k ¼ 1, . . . ,3,
ð32Þ
t¼1
MSEi ¼
60
X
ðeit Þ2 =60,
i ¼ 1,2, . . . ,20:
ð33Þ
t¼1
The mean squared error values (MSEik) and (MSEi) are summarized
in Table 2.
As it can be seen from Table 2, the combined mean squared
error values (MSEi) are essentially less than the individual mean
squared error values (MSEik). For example, consider asset 18. The
Table 1
The optimal weights lik of forecast k for stock i in the combined forecast portfolio.
i¼
li1
li2
li3
i¼
li1
li2
li3
1
2
0.7
0
0.31
11
0.32
0
0.68
3
0.32
0
0.68
12
0.23
0.19
0.58
4
0.18
0
0.82
13
0.14
0.32
0.54
5
0.39
0
0.61
14
0.27
0.03
0.7
6
0.12
0
0.89
15
0.07
0
0.93
7
0.39
0
0.61
16
0.44
0
0.56
8
0.39
0.01
0.61
17
0
0.13
0.87
9
0.02
0.07
0.91
18
0.37
0
0.63
10
0.15
0
0.85
19
0.21
0.08
0.72
0
0.25
0.75
20
0.07
0
0.93
815
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
Table 2
Mean squared error values.
1
i¼
MSEi
MSEi
MSEi
MSEi
200.56
260.38
232.55
185.41
1
2
3
279.61
349.05
243.89
215.35
11
i¼
MSEi
MSEi
MSEi
MSEi
2
12
403.27
468.21
356.91
320.92
1
2
3
964.42
1062.96
966.4
953.46
3
4
5
6
7
8
9
10
234.91
244.22
196.47
192.76
349.62
412.09
343.44
303.9
235.77
269.12
152.86
153.8
327.57
455.81
340.27
280.93
379.26
319.25
279.08
249.73
285.59
285.59
208.75
205.83
298.87
298.87
177.26
175.8
534.57
534.57
426.14
428.61
13
14
651.82
769.03
522.56
539.9
638.63
638.63
521.09
536.85
15
580.56
580.56
469.62
468.72
Table 3
Mean, variance, skewness and Wilk–Shapiro test statistics related to the
forecasting errors for stocks.
i
Mean ðe i Þ
Variance ðs^ i Þ
Skewness ðs^ i Þa
Wilk–Shapiro
statistic (vi)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0.88
0.58
0.51
0.47
0.99
1.07
0.33
1.19
0.37
0.51
0.07
0.64
0.06
0.62
0.5
0.93
0.97
0.64
0.09
0.21
185.41
215.35
192.76
303.9
153.8
280.93
249.73
205.83
175.8
428.61
320.92
953.46
539.9
536.85
468.72
261.33
364.29
256.55
253.83
413.18
0.03
0.29
0.1
0.17
0.31
0.3
0.04
0.43
0.22
0.6
0.46
1.81
0.59
1.78
1.42
0.39
0.3
0.18
0.67
1.29
0.99
0.98
0.99
0.99
0.99
0.99
0.98
0.99
0.99
0.98
0.99
0.91
0.98
0.91
0.94
0.99
0.98
0.99
0.98
0.94
2
3
16
a
Skewness represents third central moment of the forecasting errors. Since
the skewness of a normal distribution is zero, the skewness of a data set can
sometimes be examined to provide an informal check of normality.
optimal weights related to this asset are depicted in the 18th
column of Table 1: li1 ¼ 0:37, li2 ¼ 0, li3 ¼ 0:63. The combined and
individual mean squared error values MSE18 ¼256.55 and
MSE18,1 ¼425.54, MSE18,2 ¼435.35, MSE18,3 ¼319.9 are calculated
by using the formulas (31), (32) and (33). These values are
presented in the 18th column of Table 2.
2
Table 3 lists the values of the mean (ei ), variance (s^ i ), skewness
3
^
(s i ), and Wilk–Shapiro test statistics related to the forecasting errors
for each stock. These error statistics allows the investor to more
efficiently evaluate a forecasting performance in constituting his or
her portfolio. For example, the value 0.88 presented in the ‘‘Mean’’
(2nd) column of Table 3 for the 1st stock shows that the real return
was less than the forecasted value for this stock in average 0.88%.
The value 0.09 presented for the 19th stock at the same column in
this table shows that the real return was greater than the forecasted
return value for this stock in average 0.09%. Therefore, the
maximization of the mean of forecasting error, that is the function
f2(x) may influence to constitute the more beneficial and more
efficient portfolio.
It can be explained in a similar way that the portfolio with a
greater error skewness is more desirable.
2
Since the variance of forecasting errors (s^ i ) characterizes the
2
forecasting risk, the portfolio with a lesser s^ i is more preferable.
17
536.96
398.13
399.32
261.33
18
453.15
453.15
363.83
364.29
425.54
435.35
319.9
256.55
19
20
290.39
314.84
256.2
253.83
444.4
517.91
419.47
413.18
The closeness of the value of Wilk–Shapiro test statistic for
forecasting errors to 1 ensures that the forecasting error distribution
is close to the normal distribution. When the mean of forecasting
errors is close to 0, the maximization of the related objective
function (f11(x)) may lead to the forecasting which is unbiased.
Investment returns and residuals on each series of forecasts
are measured and then evaluated by three performance criteria,
namely, mean, variance, and skewness. Subsequently, the outputs
of this model are then used as input parameters in the multiobjective mean–variance–skewness model in the second stage.
Stage 2: Mathematical modeling stage: All the parameter values
(using the data taken from ISE) together with those calculated in
the first stage are used to construct the multi-objective mathematical model of the portfolio selection problem (P).
Stage 3: Scalarization stage: The scalarization of the multiobjective problem (P) with 11 objectives is performed applying
the following steps.
Step 3-1: Choose scalarization parameters: the weights
w¼(w1,y,w11), the augmentation parameter a and the reference
point a ¼(a1,y,a11). The weights w¼(w1,y,w11) and the
augmentation parameter a must be taken from the set
(see explanations in Section 4):
W :¼ fðw, aÞ A R11
þ R : 0 o a ominfw1 , . . . ,w11 gg:
ð34Þ
A set of 20 scalarization parameters ðw, aÞ used are presented in
Table 4. The reference point a A R11 is chosen as a ¼ (4,0,50,200,
700,7500,150,300,3,4, 0.97).
Step 3-2: In accordance with the conic scalarization method
explained in Section 4, the scalarized problem is formulated in the
following form:
(
)
11
11
X
X
^
^
ðSPÞ : minimize f ðxÞ ¼ a
jf ðxÞap j þ
wp ðf ðxÞap Þ
0
p
p¼1
p
p¼1
ð35Þ
subject to
hðxÞ ¼
20
X
xi 1 ¼ 0,
ð36Þ
i¼1
lwi zi r xi rupi zi
xi A ½0,1,zi A f0,1g
for i ¼ 1, . . . ,20,
for i ¼ 1, . . . ,20,
ð37Þ
ð38Þ
where the scalarization parameters ðw, a,aÞ determined in Step
3-1 are used, li ¼0.05, ui ¼0.3 for all i ¼ 1, . . . ,20 and f^ i ðxÞ ¼ fi ðxÞ
for i¼1,2,5,6,7,8,9,11, f^ j ðxÞ ¼ fj ðxÞ for i¼3,4,10. The reason for
taking some objective functions with minus sign is to put the
problem to a ‘‘minimization’’ form for all objectives.
Stage 4: Solving the scalarized problem: In this stage, F-MSG
algorithm is applied to solve the scalarized problem (SP) formed
in the previous stage. For a given pair of dual variables u and c, the
816
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
Table 4
A set of 20 scalarization parameters.
No.
a
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0
0
0
0
0
0
0
0
0
0
0.49
0.19
0.32
0.19
0.11
0.14
0.09
0.09
0.09
0.09
1
0
0
0
0
0
0
0
0
0
0.5
0.2
0.33
0.2
0.12
0.15
0.15
0.1
0.15
0.09
0
1
0
0
0
0
0
0
0
0
–
0.2
–
–
0.12
–
–
0.1
–
0.09
0
0
0
0
0
1
0
0
0
0
0.5
0.2
0.33
0.2
0.12
0.175
0.15
0.1
0.15
0.09
0
0
0
0
0
0
1
0
0
0
–
–
–
0.2
0.12
0.175
0.15
0.1
0.15
0.09
0
0
0
0
0
0
0
1
0
0
–
–
–
0.2
0.14
0.175
0.15
0.1
0.1
0.09
0
0
0
0
0
0
0
0
1
0
–
–
–
0.2
0.14
0.175
0.15
0.1
0.1
0.09
0
0
1
0
0
0
0
0
0
0
–
0.2
–
–
0.12
–
–
0.1
–
0.09
0
0
0
1
0
0
0
0
0
0
–
0.2
–
–
0.12
–
–
0.1
–
0.09
0
0
0
0
1
0
0
0
0
0
–
–
0.33
–
–
0.15
–
–
0.15
0.09
0
0
0
0
0
0
0
0
0
0.5
–
–
–
–
–
–
0.15
0.1
0.1
0.09
0
0
0
0
0
0
0
0
0
0.5
–
–
–
–
–
–
0.1
0.1
0.1
0.09
‘‘–’’ It means that the investor does not consider the related objectives.
Lagrange function L is defined as
Lðx,z,u,cÞ ¼ f0 ðxÞ þ cJhðxÞJuhðxÞ,
ð39Þ
and the set X of primal feasible solutions is defined to be a set of
decision variables x and y satisfying relations (37) and (38):
X ¼ fðx,zÞ A R20 R20 : lwi zi rxi r upi zi ,xi A ½0,1,
zi A f0,1g for i ¼ 1, . . . ,20g:
The solution of the subproblem (25) in Step 3 of the F-MSG
algorithm is found by solving the following problem:
minimize Fðx,zÞ ¼ 0
subject to Lðx,z,uk ,ck Þ r Hn ,
ðx,zÞ A X,
ð40Þ
where F(x,z) is a ‘‘fictitious’’ objective function which is identically
zero (it can be taken any constant instead of zero). This problem is
solved by implementing GAMS/CONOPT/DNLP and GAMS/SNOPT/
DNLP derivative-free solvers. Documentation and information
about GAMS are available via World Wide Web at the URL:
http:www.gams.com. All the programs have been run on the HP
workstation XW6000 with WINDOWS operating system on two
processors.
Stepsize parameters were calculated by using the stepsize
formula (28), with b ¼ 1, d ¼ 1 and D1 ¼ 0:05. The initial Lagrange
multipliers are chosen as (u0,c0)¼( 1,1). The terminating parameters are chosen as y1 ¼ 1010 and y2 ¼ 102 .
F-MSG algorithm has been applied for the sets of scalarizing
parameters presented in Table 4. The corresponding computational results are depicted in Table 5. In Table 5, efficient points
presented on rows 1–10 correspond to the set of scalarizing
parameters presented in Table 4 on rows 1–10. For each set of
scalarizing parameters given on rows 11–20 of Table 4, two
efficient points are presented in the corresponding rows of
Table 5, where the row written in bold type corresponds to the
set of parameters presented in Table 4, and the other row
corresponds to the same set of weights with a ¼ 0. As it can be
seen from Table 5 different efficient points have been obtained for
the cases a ¼ 0 and a a0 (bold type) from 11th to 20th rows.
It is remarkable that by changing the value of the augmentation parameter a, different efficient solutions have been obtained
for the same set of weights. This situation demonstrates that there
may be several efficient points corresponding to the same set of
preference weights in non-convex multi-objective optimization
problems. The superiority of the conic scalarization method over
others is that, by changing the augmentation parameter a in its
feasible range (between zero and mini wi), one can calculate all
the existing efficient points corresponding to the given set of
preference weights.
To compare the performance of the proposed approach, we
analyze the mean–variance-skewness model suggested by Yu
et al. [18] using the approach presented in this paper. Parameters
for this model are derived from the combined forecasting values
of returns for 20 stocks from ISE.
Example 2.
maxf1 ðxÞ ¼
20
X
xi R i ,
i¼1
20 X
20
X
minf3 ðxÞ ¼
xi xj sij ,
i¼1j¼1
maxf5 ðxÞ ¼
20
X
x3i s3i þ 3
i¼1
20
X
20
X
ðx2i xj siij þxi x2j sijj Þ,
i ¼ 1 j ¼ 1ðj a iÞ
subject to
hðxÞ ¼
20
X
xi 1 ¼ 0,
i¼1
xi Z 0,
i ¼ 1, . . . ,20:
Let the reference points for three objective functions of this
model be a1 ¼4, a3 ¼50, a5 ¼700, respectively. For making the
comparison easy, the same reference points as in the main
example are used. We have chosen six different weight vectors
and solve the scalarized versions of this problem obtained by both
the weighted sum and the conic scalarization methods. Obtained
results are depicted in Tables 6 and 7.
As it can be seen from the results presented in Table 7, the
investor reaches the reference points for the weights given in
the rows 1, 3, 4 and 5 by using the conic scalarizing approach. For
817
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
Table 5
Non-dominated points of problem (P) calculated by solving the scalarized problem (SP) for different sets of scalarized parameters presented in Table 4.
No.
f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
f11
1
2
3
4
5
6
7
8
9
10
11
11.65
8.89
11.65
2.29
3.73
0.52
2.97
11.65
0.52
3.71
11.65
4.61
11.65
7.7
11.65
4.61
3.55
4.42
4.8
6.24
2.97
5.73
2.39
3.24
11.09
9.29
8.44
7.23
9.71
7.73
0.5
1.19
0.5
0.58
0.07
0.63
0.09
0.5
0.63
0.71
0.5
0.8
0.5
0.44
0.5
0.8
0.04
0.01
0.41
0.17
0.09
0.13
0.04
0.01
0.55
0.47
0.57
0.36
0.39
0.26
97.8
91.58
97.8
77.29
92.44
3.76
30.75
97.8
3.76
35.62
97.8
44.59
97.8
56.99
97.8
44.59
32.57
35.27
30.14
56.48
30.75
50
27.69
37.95
92.23
78.41
58.8
49.42
79.04
61.65
468.72
205.44
468.72
215.34
320.94
126.77
254.34
468.72
126.77
140.69
468.72
137.01
468.72
273.66
468.72
137.01
230.24
220.97
161.48
165.78
254.34
172.07
182.03
196.33
357.82
200.05
244.41
200
247.09
195.41
631.11
505.88
631.11
349.84
1401.35
6.9
177.97
631.11
6.9
126.05
631.11
282.7
631.11
398.9
631.11
282.7
306.99
370.8
150.14
709.52
177.97
663.97
315.22
459.48
800.61
727.81
346.5
378.31
784.01
645.56
14458.1
1270.5
14458.1
931.48
2625
689.67
2715.79
14458.1
689.67
37.64
14458.1
280.44
14458.1
3685.47
14458.1
280.44
2063.81
1291.1
149.3
442.87
2715.79
594.05
801.46
1032.41
6358.82
788.3
2994.77
875.11
1481.73
262.55
207.09
138.34
207.09
53.85
29.92
21.11
45.81
207.09
21.11
77.93
207.09
150
207.09
150
207.09
150
61.19
77.32
127.63
132.25
45.81
118.79
35.12
52.69
199.47
197.83
169.68
150
187.05
150
307.2
670
307.2
42.75
145.9
20.2
190.65
307.2
20.2
182.45
307.2
300
307.2
300
307.2
300
195.17
206.82
356.96
216.04
190.65
212.09
152.03
177.33
336.34
300
381.08
300
269.53
233.76
4
4
4
5
4
2.74
5
4
2.74
3.1
4
3
4
4.17
4
3
4.8
4.7
3.7
3.88
5
4.05
4.3
4.3
3.9
3.36
3.8
3.91
3.8
3.79
1
1
1
1
1
4
1
1
4
10
1
3
1
4
1
3
2
3
3
2
1
2
4
4
3
3
5
4
3
4
0.94
0.99
0.94
0.98
0.99
0.99
0.98
0.94
0.99
0.99
0.94
0.99
0.94
0.96
0.94
0.99
0.98
0.98
0.99
0.99
0.98
0.98
0.98
0.98
0.95
0.97
0.97
0.97
0.96
0.97
12
13
14
15
16
17
18
19
20
Table 6
Pareto efficient points obtained by using the weighted-sum approach for Example 2.
No.
w1
w3
w5
Weighted-sum
f1(x)
f3(x)
f5(x)
Runtime
(in seconds)
1
2
3
4
5
6
0.33
0.2
0.4
0.25
0.5
0.5
0.33
0.4
0.4
0.5
0.25
0.5
0.33
0.4
0.2
0.25
0.25
0
743.07
899.49
435.91
543.36
564.47
0.72
6.11
6.11
6.12
6.12
6.11
2.92
78.38
78.38
78.33
78.34
78.37
4.36
2324.04
2324.04
2323.96
2323.98
2324.03
5.06
1
1.18
1.39
1.64
1.81
1.98
Table 7
Pareto efficient points obtained by using the conic scalarization approach for
Example 2.
No. a
w1
w3
w5
f0(x)
f1(x)
f3(x)
f5(x)
Runtime
(in seconds)
1
2
3
4
5
0.33
0.2
0.4
0.25
0.5
0.33
0.4
0.4
0.5
0.25
0.33
0.4
0.2
0.25
0.25
8.99
408.34
9.41
9.12
9.20
5.61
6.12
5.46
5.41
5.78
50
78.31
50
50
50
1197.02
2323.92
1210.38
1210.62
1173.52
45.19
113.68
155.87
129.87
161.74
0.32
0.19
0.19
0.24
0.24
mean–variance–skewness model only and ignores the forecasting
errors and the other objectives of the proposed extended model,
can be exposed to a considerable risk.
Finally, to test the performance of proposed model according
to the objective functions f2(x) and f4(x) which are the distributional properties of the forecasting error, we present the next
example.
Example 3. The data consist of monthly rates of returns from
January 2000 to December 2004, a total of T¼60 months of
observations of N ¼20 stocks. We estimate the monthly rates of
returns from January 2005 to December 2005, a total of T¼12
months of N ¼20 stocks by using ARIMA. Two different cases, with
and without error distribution in the portfolio optimization, are
considered and compared. The performances for these two cases
are compared by using the actual data distribution of monthly
rates of returns from January 2005 to December 2005 of N ¼20
stocks and an additive utility function. In the first case, we assume
that the investor invests his/her money in the 20 stocks from ISE
by using Markowitz mean–variance model. In this case, investor
ignores the mean and variance of the prediction errors. The
corresponding mathematical model for this first case is as follows:
max f1 ðxÞ ¼
20
X
xi R i ,
i¼1
example in the row 1 of Table 7, the investor reaches better points
than the given reference points for first and fifth objectives. The
following objective function values are obtained for this problem:
f1(x) ¼5.61, f3(x)¼ 50 and f5(x) ¼1197.02. This example demonstrates that investor can obtain more satisfactory portfolios by
applying the generalized approach presented in this paper.
On the other hand, it is clear that the mean–variance–
skewness model is a special version of the proposed extended
model. Therefore, the investor, who prefers the usage of the
min f3 ðxÞ ¼
20 X
20
X
xi xj sij ,
i¼1j¼1
subject to
hðxÞ ¼
20
X
xi 1 ¼ 0,
i¼1
xi Z 0,
i ¼ 1, . . . ,20:
818
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
Assume that an investor’s aspiration levels (reference point)
according to the objectives are a1 ¼4 and a3 ¼50, respectively.
Twenty one different weight vectors are generated and the conic
scalarization method is applied for scalarization. Estimated and
realized mean and variance values are calculated according to the
obtained Pareto efficient solutions. The additive utility function of
the form U ¼ Uðf 1 ðxÞ,f 3 ðxÞÞ ¼ w1 f 1 ðxÞw3 f 3 ðxÞ is used to
compare the obtained results, where the functions f 1 ðxÞ and
f 3 ðxÞ are the actual mean return and variance of a portfolio in
2005. It is clear that investor desires to maximize the utility
function. Obtained results are given in Table 8.
It is remarkable that the investor reaches the individual
aspiration levels as seen in Table 8 for the estimated mean return
and variance. For all weights, the same Pareto efficient solution
(f1(x),f2(x))¼ (6.6, 0) is obtained. It seems that this is an ideal
efficient solution. The estimated variance value is zero. But the
selected 20 stocks does not contain a risk free asset. Actual
variance equals 92.45 for 2005, which is given in the eighth
column of the Table 8. The main reason for this difference
between the estimated and actual variances, is the prediction
error which is ignored by the investor. Risk is shifted to the
prediction errors.
In the second case, we again assume that the investor invests
his/her money in the 20 stocks from ISE by using Markowitz
mean–variance model. In this case, investor considers the mean
and variance of the prediction errors:
Table 9
The estimated and actual mean and variance values according to the obtained
Pareto efficient solutions for Example 3, second case.
No. a
w1 ¼w2 w3 ¼w4 f1(x) f2(x)
f3(x) f4(x)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
3.67
1.19
0.9
0.28
0.04
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.04
0.03
0
0
0.04
0.09
0.14
0.19
0.24
0.29
0.34
0.39
0.44
0.49
0.44
0.39
0.34
0.29
0.24
0.19
0.14
0.09
0.04
0
1
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
2.27
3.43
4.15
4.27
4.3
4.09
4.06
4.06
4.06
4.07
4.07
4.07
4.08
4.08
4.09
4.11
4.13
4.17
4.24
4.48
6.25
0.26
0.16
0
0.07
0.17
0.44
0.48
0.48
0.48
0.48
0.48
0.47
0.47
0.46
0.45
0.44
0.42
0.38
0.31
0.07
1.69
f 1 ðxÞ
f 3 ðxÞ
116.47
2.67
71.81
148.29
2.4
54.23
215.08
3.87
55.63
238.32
4.4
56.94
261.15
4.84
58.5
276.33
5.23
59.52
279.06
5.3
59.77
279.06
5.3
59.78
279.06
5.31
59.79
279.06
5.31
59.8
279.07
5.31
59.81
279.07
5.33
59.85
279.09
5.35
59.89
279.11
5.38
59.96
279.16
5.41
60.05
279.24
5.46
60.18
279.41
5.54
60.39
279.84
5.66
60.8
281.2
5.92
61.8
290.22
6.76
66.56
580.56 13.03 176.67
U
71.81
51.4
49.68
47.74
45.83
43.33
40.25
37
33.75
30.5
27.25
24
20.75
17.49
14.22
10.95
7.65
4.31
0.85
3.1
13.03
subject to
hðxÞ ¼
20
X
xi 1 ¼ 0,
i¼1
max f1 ðxÞ ¼
20
X
xi R i ,
xi Z 0,
i¼1
max f2 ðxÞ ¼
N
X
xi e i ,
i¼1
min f3 ðxÞ ¼
20 X
20
X
xi xj sij ,
i¼1j¼1
min f4 ðxÞ ¼
N X
N
X
xi xj s^ ij ,
i¼1j¼1
Table 8
The estimated and actual mean and variance values according to the obtained
Pareto efficient solutions for Example 3, first case.
No.
a
w1
w3
f1(x)
f3(x)
f 1 ðxÞ
f 3 ðxÞ
U
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
0
0.04
0.09
0.14
0.19
0.24
0.29
0.34
0.39
0.44
0.49
0.44
0.39
0.34
0.29
0.24
0.19
0.14
0.09
0.04
0
0
0.05
0.10
0.15
0.2
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1
1
0.95
0.90
0.85
0.80
0.75
0.70
0.65
0.60
0.55
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
6.6
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
3.18
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
92.45
87.67
82.89
78.1
73.32
68.54
63.76
58.98
54.2
49.42
44.64
39.85
35.07
30.29
25.51
20.73
15.95
11.17
6.39
1.6
3.18
i ¼ 1, . . . ,20:
Assume that investor’s aspiration levels (reference point) according to the objectives are a1 ¼4, a2 ¼4, a3 ¼50, and a4 ¼ 50,
respectively. Twenty one different weight vectors are generated
and again the conic scalarization method is applied. Estimated
and actual mean and variance values are calculated according
to the obtained efficient solutions. The additive utility function
of the form U ¼ Uðf 1 ðxÞ,f 3 ðxÞÞ ¼ w1 f 1 ðxÞw3 f 3 ðxÞ is used to
compare the obtained results, which are presented in Table 9.
As it can be seen from the utility values calculated for both
cases (see, the last columns of Tables 8 and 9, investor reaches the
greater utility values for all weights in the second case. It is also
remarkable that, many efficient solutions obtained for the second
case, dominate the corresponding solutions obtained for the first
case. For example, the values of the vector (f 1 ðxÞ,f 2 ðxÞ) for
scalarization parameters presented on rows 4–20 in Table 9 are
better than the vector (f 1 ðxÞ, f 2 ðxÞ) ¼(3.18, 92.45) obtained for the
first case. Investor gained more return for lower risk in the second
case, than in first case. This example clearly demonstrates the
importance of consideration of distributional properties of
forecasting errors for investors.
7. Conclusion
In this study, we propose a general mean–variance–skewness
model for portfolio optimization. First, a convex combination of
three forecasting methods are used for monthly return predictions. Investment returns and residuals on each series of forecasts
are measured and then evaluated by three performance criteria,
namely mean, variance, and skewness. Subsequently, these
distributional properties of returns and residuals are used in the
multi-objective portfolio optimization model.
One of the main differences of our approach is that, it involves
the objective functions representing distributional properties of
error forecasts (second, fourth, sixth and 11th objective functions),
O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819
the long period preferences (ninth objective function) and the
number of assets (10th objective function) in the portfolio of an
investor. Besides, the size of the problem solved in the case study is
larger (N¼20) than those considered in the literature.
In the case study, the error distribution of the predictions of
monthly returns are tested for normality using Wilk–Shapiro
W-test. The evidence indicates that many of error distributions
exhibit significant skewness. These findings confirm our argument that higher moments of error distributions cannot be
neglected in the portfolio selection. The empirical evidence
suggests that the incorporation of higher moments of error
forecasts into the investor’s portfolio decision causes a major
change in the construction of the optimal portfolio.
The objective functions in this model are scalarized by using
the conic scalarization method in which investor preferences such
as weights and reference points can be easily incorporated. In the
fourth stage, F-MSG algorithm is utilized to solve the scalar
problem obtained. Because the conic scalarization method and
F-MSG algorithm integration does not require convexity assumptions on the mathematical model, it provides significant advantage in practice.
The superiority of the approach presented is demonstrated on
the illustrative examples.
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
References
[1] Wu D, Olson DL. Enterprise risk management: coping with model risk in a
large bank. Journal of the Operational Research Society 2010;61(2):179–90.
[2] Wu D, Olson DL. Enterprise risk management: small business scorecard
analysis. Production Planning and Control 2009;120(4):362–9.
[3] Wu D, Olson DL. Enterprise risk management: a DEA VaR approach in vendor
selection. International Journal of Production Research 2010;48(16):4919–32.
[4] Markowitz H. Portfolio selection. Journal of Finance 1952;7:77–91.
[5] Wu D, Olson DL. Introduction to the special issue on ‘‘Optimizing risk
management: methods and tools’’. Human and Ecological Risk Assessment
2009;15(2):220–6.
[6] Wu D, Olson DL. Supply chain risk, simulation, and vendor selection.
International Journal of Production Economics 2008;114(2):646–55.
[7] Wu D, Xie K, Chen G, Ping G. A risk analysis model in concurrent engineering
product development. Risk Analysis, in press, doi:10.1111/j.1539-6924.2010.
01432.x.
[8] Sharpe WF, Alexander GJ, Bailey JF. Investments. New Jersey: Prentice-Hall
International Inc.; 1999.
[9] Mao JCT. Models of capital budgeting, E-V vs. E-S. Journal of Financial and
Quantitative Analysis 1970;5:657–75.
[10] Simaan Y. Estimation risk in portfolio selection: the mean variance model
versus the mean absolute deviation model. Management Science
1997;43(10):1437–46.
[11] Fishburn DC. Mean-risk analysis with risk associated with below-target
returns. American Economical Review 1977;67:117–26.
[12] Konno H, Suzuki K. A mean–variance–skewness optimization model. Journal
of the Operations Research of Japan 1995;38:137–87.
[13] Lai T. Portfolio selection with skewness: a multiple-objective approach.
Review of Quantitative Finance and Accounting 1991;1:293–305.
[14] Leung MT, Daouk H, Chen AS. Using investment portfolio return to combine
forecasts: a multiobjective approach. European Journal of Operational
Research 2001;134:84–102.
[15] Liu SC, Wang SY, Qiu WH. A mean–variance–skewness model for portfolio
selection with transaction costs. International Journal of Systems Sciences
2003;34(4):255–62.
[16] Kennedy MP, Chua LO. Neural networks for nonlinear programming. IEEE
Transactions on Circuits and Systems 1988;35(3):554–62.
[17] Leoid O, Wolf M. Improved estimation of the covariance matrix of stock
returns with an application to portfolio selection. Journal of Empirical
Finance 2003;10:603–21.
[18] Yu L, Wang S, Lai KK. Neural network-based mean–variance–skewness model
for portfolio selection. Computers & Operations Research 2008;35(1):34–46.
[19] Kasimbeyli R. A nonlinear cone separation theorem and scalarization in
nonconvex vector optimization. SIAM Journal on Optimization 2010;20:
1591–619.
[20] Gasimov RN. Characterization of the Benson proper efficiency and scalarization in nonconvex vector optimization. In: Koksalan M, Zionts S, editors.
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
819
Multiple criteria decision making in the new millennium. Lecture notes in
economics and mathematical systems, vol. 507, 2001. p. 189–98.
Kasimbeyli RN, Ustun O, Rubinov AM. The modified subgradient algorithm
based on feasible values. Optimization 2009;58(5):535–60.
Ehrgott M, Klamroth K, Schwehm C. An MCDM approach to portfolio
optimization. European Journal of Operational Research 2004;155:752–70.
Ehrgott M, Waters C, Kasimbeyli R, Ustun O. Multiobjective programming
and multiattribute utility functions in portfolio optimization. INFOR
2009;47:31–42.
Makridakis S, Winkler RL. Averages of forecasts. Management Sciences
1983;29:987–96.
Hansen BE. Least-squares forecast averaging. Journal of Econometrics
2008;146:342–50.
Box G, Jenkins G. Time series analysis: forecasting and control. San Francisco:
Holden-Day; 1976.
Hanke JE, Reitsch AG. Business forecasting. New Jersey: Prentice-Hall Inc.;
1998.
Bates JM, Granger CWJ. The combination of forecasts. Operational Research
Quarterly 1969;20:451–68.
Newbold P, Granger CW. Experience with forecasting univariate time series
and the combination of forecasts. Journal of Royal Statistical Society
1974;137:231–46.
Russel TD, Adam Jr. EE. An empirical evaluation of alternative forecasting
combinations. Management Sciences 1987;31:1267–76.
Clemen RT. Combining forecasts: a review and annotated bibliography.
International Journal of Forecasting 1989;5:559–83.
Shapiro PJ, Wilk MB, Chen HJ. A comparative study of various tests for
normality. Journal of American Statistical Association 1968;63:1343–72.
Yu PL. A class of solutions for group decision problems. Management Science
1973;19:936–49.
Zeleny M. Compromise programming. In: Cochrane JL, Zeleny M, editors.
Multiple criteria decision making. Columbia: University of South Carolina;
1973. p. 262–301.
Steuer RE. Multiple criteria optimization: theory, computation and application. New York: Wiley; 1986.
Lewandowski A, Wierzbicki AP. Decision support systems using reference
point optimization, aspiration based decision support systems. Berlin:
Springer; 1989.
Gearhart WB. Compromise solutions and estimation of the noninferior set.
Journal of Optimization Theory and Applications 1979;28(1):29–47.
Kaliszewski I. A modified weighted Tchebycheff metric for multiple objective
programming. Computers & Operations Research 1987;14:315–23.
Schandl B, Klamroth K, Wiecek MM. Introducing oblique norms into multiple
criteria programming. Journal of Global Optimization 2002;23(1):81–97.
Wierzbicki AP. The use of reference objectives in multiobjective optimization.
In: Fandel G, Gal T, editors. MCDM theory and application, proceedings.
Lecture notes in economics and mathematical systems, vol. 177. Hagen:
Springer Verlag; 1980. p. 468–86.
Wierzbicki AP. Reference point methods in vector optimization and decision
support. Interim Report IR-98-017/ April, International Institute for Applied
Systems Analysis, Laxenburg, Austria; 1998.
Miettinen KM, Makela MM. On scalarizing functions in multiobjective
optimization. OR Spectrum 2002;24:193–213.
Chang TJ, Meade N, Beasley JE, Sharaiha YM. Heuristics for cardinality
constrained portfolio optimisation. Computers & Operations Research
2000;27(13):1271–302.
Ozdemir MS, Gasimov RN. The analytic hierarchy process and multiobjective
0–1 faculty course assignment. European Journal of Operational Research
2004;157(2):398–408.
Gasimov RN, Sipahioglu A, Sarac T. A multi-objective programming approach
to 1.5-dimensional assortment problem. European Journal of Operational
Research 2007;179(1):64–79.
Ismayilova NA, Sagir M, Gasimov RN. A multiobjective faculty-course-time
slot assignment problem with preferences. Mathematical and Computer
Modelling 2007;46:1017–29.
Bazaraa MS, Sherali HD, Shetty CM. Nonlinear programming: theory and
algorithms. New York: John Wiley & Sons Inc.; 1993.
Azimov AY, Gasimov RN. Stability and duality of nonconvex problems via
augmented Lagrangian. Cybernetics and Systems Analysis 2002;38:412–21.
Gasimov RN. Augmented Lagrangian duality and nondifferentiable optimization methods in nonconvex programming. Journal of Global Optimization
2002;24:187–203.
Gasimov RN, Rubinov AM. On augmented Lagrangians for optimization
problems with a single constraint. Journal of Global Optimization
2004;28(2):153–73.
Rubinov AM, Gasimov RN. The nonlinear and augmented Lagrangians for
nonconvex optimization problems with a single constraint. Applied and
Computational Mathematics 2002;1:142–57.
Burachik RS, Gasimov RN, Ismayilova NA, Kaya CY. On a modified subgradient
algorithm for dual problems via sharp augmented Lagrangian. Journal of
Global Optimization 2006;34:55–78.