Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Computers & Operations Research 39 (2012) 805–819 Contents lists available at ScienceDirect Computers & Operations Research journal homepage: www.elsevier.com/locate/caor Combined forecasts in portfolio optimization: A generalized approach Ozden Ustun a, Refail Kasimbeyli b,,1 a b Dumlupinar University, Faculty of Engineering, Department of Industrial Engineering, The Central Campus, 43100 Kutahya, Turkey Izmir University of Economics, Faculty of Engineering and Computer Sciences, Department of Industrial Systems Engineering, Balc- ova 35330, Izmir, Turkey a r t i c l e in fo abstract Available online 29 September 2010 In this paper a general mathematical model for portfolio selection problem is proposed. By considering a forecasting performance according to the distributional properties of residuals, we formulate an extended mean–variance–skewness model with 11 objective functions. Returns and return errors for each asset obtained using different forecasting techniques, are combined in optimal proportions so as to minimize the mean absolute forecast error. These proportions are then used in constructing six criteria related to the mean, variance and skewness of return forecasts of assets in the future and forecasting errors of returns of assets in the past. The obtained multi-objective model is scalarized by using the conic scalarization method which guarantees to find all non-dominated solutions by considering investor preferences in non-convex multi-objective problems. The obtained scalar problem is solved by utilizing F-MSG algorithm. The performance of the proposed approach is tested on a real case problem generated on the data derived from Istanbul Stock Exchange. The comparison is conducted with respect to different levels of investor preferences over return, variance, and skewness and obtained results are summarized. & 2010 Elsevier Ltd. All rights reserved. Keywords: Multiple objective programming Risk management Global optimization Portfolio optimization Conic scalarization F-MSG algorithm 1. Introduction In recent years, risk management has attracted a great deal of attention from both researchers and practitioners. Complexity and uncertainty in many practical problems require new methods and tools. Risk management can be defined as the process of identification, analysis, and either acceptance or mitigation of uncertainty in investment decision-making. Risk management can be used as a tool for greater rewards, not just control against loss. Risks are studied from different silo disciplinary perspectives, with a discussion of how various methods and tools are used to optimize risk management the field of financial risk management has experienced a fast and advanced growth at an incredible speed [1–3]. The traditional financial risk management approach is based on mean–variance framework of portfolio theory, that is, selection and diversification [4]. It is recognized in finance that risk can be understood in two types: systematic (nondiversifiable) risk, which is positively correlated with the rate of return, and unsystematic (diversifiable) risk, which can be diversified by increasing the number of securities invested [5–7]. The mean–variance model originally introduced by Markowitz [4] plays an important and critical role in modern portfolio theory.  Corresponding author. Tel.: +90 232 4888418; fax: + 90 232 4888475. E-mail addresses: oustun@dpu.edu.tr (O. Ustun), refail.kasimbeyli@ieu.edu.tr (R. Kasimbeyli). 1 The author published under the name Rafail N. Gasimov until 2007. 0305-0548/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.cor.2010.09.008 Markowitz’s portfolio model is a bi-criteria optimization problem where a reasonable trade-off between return and risk is considered by minimizing risk for a given level of expected return, or by maximizing expected return for a given level of risk. Since Markowitz’s pioneering work [4] was published, the mean– variance model has revolutionized the way people think about a portfolio of assets. This model has gained widespread acceptance as a practical tool for portfolio optimization, and numerous later studies have examined the issue of risk diversification. With the continuous effort of various researchers, Markowitz’s seminal work has been widely extended. The mean–variance analysis provides a basis for the derivation of the equilibrium model known variously as the Capital Asset Pricing Model (CAPM), Sharpe–Lintner model, black model and two-factor model [8]. Extended models mainly include the mean semi-variance model [9], mean absolute deviation model [10], mean target model [11] and some frictional models, such as those reported in [9]. A distinct characteristic of these studies is that only the first two moments of return distributions are taken into account. But there is a controversy over the issue of whether higher moments should be accounted for in portfolio selection. Many academic researchers (see, e.g., [12]) argued that the higher moments cannot be neglected unless there is a reason to believe that the asset returns are normally distributed or the utility function is quadratic, or that the higher moments are irrelevant to the investor’s decision. As a result, in some recent studies, such as [12–15], the concept of mean–variance trade-off has been extended to include the skewness of return in portfolio selection. In other words, a 806 O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819 mean–variance–skewness trade-off model for portfolio selection has been generated. One problem with the mean–variance– skewness trade-off model for portfolio selection is that it is not easy to find a trade-off between the three objectives because this is a non-smooth multi-objective optimization problem. Until now, many methods used to tackle this problem have been restricted to goal programming or linear programming techniques. For example, Lai [13] gave a goal programming procedure that performs portfolio selection based on competing and conflicting objectives by maximizing both expected return and skewness while minimizing the risk associated with the return (i.e., variance). Liu et al. [15] also transformed the mean–variance–skewness model with transaction costs into a linear programming problem and verified its efficiency via a numerical example. However, the main disadvantage of these algorithms is that they generally converge slowly, if at all [16]. Furthermore, most existing studies only present some numerical examples with realized data. Because investment decisions are oriented towards the future, realized returns are of no use here. The benefits of portfolio optimization critically depend on how accurately the involved measures are forecasted. In general, opinion is that, it is very difficult to predict the expected return, variance, and skewness, and therefore reducing forecast errors in portfolio management becomes very important. The role of covariance matrix of asset returns and its estimation procedure has been addressed by using shrinkage estimators [17]. Most portfolio selection models in the literature only consider the distribution properties of investment returns; other factors, such as investors’ risk preferences and trading strategies, are not taken into account. Unlike others, these important factors were considered in the integrated approaches to portfolio selection proposed by Leung et al. [14]. The authors proposed a multiobjective approach which explicitly encompasses the average return, expected variance, and skewness in combining forecasts. They dealt with the practical issue of how to incorporate different analysts’ forecasts into an investment trading model and used the polynomial goal programming approach of Lai [13] to scalarize the multi-objective model. The scalarized problem, then is solved by using a standard program package. In [18], a new artificial intelligence technique is introduced and a fast and efficient neural network-based methodology to solve the trade-off of the mean–variance–skewness model is proposed. The authors used the weighted sum scalarization method to scalarize the multi-objective model. In the solution process of the scalarized model which is a constrained optimization problem, the first order Karush–Kuhn–Tucker conditions are used. Note that both the weighted sum scalarization method and the Karush–Kuhn–Tucker conditions are well-known tools efficiently used in solving convex problems. In this paper, by considering the forecasting performance according to the distributional properties of the residuals, we formulate an extended mean–variance–skewness model with 11 objective functions. Returns and return errors obtained using different forecasting techniques for each asset are combined in optimal proportions so as to minimize the mean absolute forecast error. These data are then used in constructing the six criteria related to the mean, variance and skewness of return forecasts of assets in the future and forecasting errors of returns of assets in the past. We construct the 11-objective portfolio optimization model, which involves the objective functions representing distributional properties of error forecasts, the long period preferences and the number of assets in the portfolio of an investor. We propose the integrated approach to the solution of this problem, that combines four stages—forecasting, modeling, scalarization and solving the scalarized problem. One of the main difficulties in solving the multi-objective portfolio optimization problems arises in the scalarization stage of the constructed mathematical model. The scalarization method used for this purpose, must guarantee the calculation of all the efficient solutions of the problem under consideration. Unfortunately, because of the non-convexity of such models, many methods existing in the literature cannot guarantee to calculate the efficient solutions corresponding to the preferences of decision maker (see [19]). In this paper, to scalarize the multi-objective mathematical model constructed, we use the conic scalarization method suggested by Kasimbeyli (see [19,20]). This method allows to find all the efficient (Pareto optimal) solutions corresponding to decision maker’s preferences in multi-objective optimization problems without convexity and boundedness assumptions. We present new features of this method and apply it to combine the 11-objective functions of our model. The problem obtained after scalarization is not only non-convex but also non-differentiable (as many scalarized portfolio optimization problems). In the fourth stage, the F-MSG algorithm (see, [21]) is utilized to solve the non-differentiable and non-convex scalar optimization problem. To the best of our knowledge, the 11-objective portfolio optimization model proposed in this paper is not investigated in the literature earlier. Some of the objective functions involved were considered by different authors. For example, 7th and 8th objective functions were considered by Ehrgott et al. [22,23]. The first (expected performance in the future), the third (variance) and the fifth objective functions (skewness) were considered by Yu et al. [18]. One of the main differences of our approach is that, in addition to these objective functions, our model involves also the ones representing distributional properties of error forecasts (second, fourth, sixth and 11th objective functions), the long period preferences (ninth objective function) and the number of assets (10th objective function) in the portfolio of an investor. Besides, the size of the problem solved in the case study is larger (N ¼20) than those considered in the literature. Additionally, the proposed model eliminates some drawbacks of the previous models. Ehrgott et al. [22] ignored the skewness of return. Their five-objective model is based on purely historical data and the authors used a utility function method for scalarizing the proposed model. The differences between the utility function and the conic scalarization approaches are presented in [23]. Leung et al. [14] used the polynomial goal programming to scalarize a mean–variance–skewness model. The most important weakness of traditional forms of goal programming approach is that, it does not guarantee to calculate efficient (Pareto optimal) solutions. Leung et al. [14] did not describe the optimization method used for solving the scalarized problem. Yu et al. [18] used the weighted-sum scalarization method to scalarize a mean– variance–skewness model. The weighted-sum scalarization method guarantees to calculate all efficient solutions only in convex case. They used the first order Karush–Kuhn–Tucker conditions and ordinary Lagrangian function to find optimal solution of the scalarized problem. Thus the approach used is restricted by differentiability and convexity conditions. The performance of the integrated approach is tested on a real case problem constructed on the data derived from Istanbul Stock Exchange. The comparison is conducted with respect to different levels of investor preferences over return, variance, and skewness and obtained results are summarized. In the case study, the error distribution of the predictions of monthly returns are tested for normality using Wilk–Shapiro W-test. The evidence indicates that many of error distributions exhibit significant skewness. These findings confirm our argument that higher moments of error distributions cannot be neglected in the portfolio selection. The empirical evidence O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819 suggests that the incorporation of higher moments of error forecasts into the investor’s portfolio decision causes a major change in the construction of the optimal portfolio. The paper is organized as follows. In Section 2, a model for combining forecasts based on mean absolute forecasting errors is described. A general portfolio selection model combined with different analyst’ forecasts given in Section 3. In Section 4, the scalarization method for the multi-objective model is explained. In this section a nice illustrative example whose data are drawn from [43] is also presented. The explanations how to use the F-MSG algorithm for solving the obtained scalar problem are given in Section 5. An empirical example demonstrating the efficiency of the proposed approach is given in Section 6. In this section, to compare the performance of the proposed approach, an additional model suggested by Yu et al. [18] is analyzed. Finally, Section 7 presents some conclusions on the paper. 2. Combination of different forecasts Before any multi-objective investment portfolio can be constructed, we need to obtain the forecasts from different sources (or analysts). To accomplish this task, we mimic the estimates made by different analysts by creating the forecast series using various forecasting techniques. In a study exploring the effects of combining forecasts generated by different techniques, Makridakis and Winkler [24] tested 1001 time series and its subset of 111 series with 14 individual forecasting techniques. They found that the variance of overall mean absolute percentage error decreases exponentially with the increasing number of techniques. According to their empirical results, there is a drastic improvement in this performance measure when the number of techniques used in forecast combination increases from 2 to 4. While a broad consensus is that forecast combination improves forecast accuracy, there is no consensus concerning how to form the forecast weights. The most recent literature has focused on two particularly appealing methods—simple averaging and Bayesian averaging [25]. Guided by these results, three forecasting techniques – autoregressive integrated moving average, exponential smoothing, and time series decomposition – are chosen to generate series of forecasts. In the following sections we briefly summarize the logic of each forecasting technique and describe the model used for the combination of different forecasts based on mean absolute forecasting error. 2.1. Autoregressive integrated moving average (ARIMA) technique Since the seminal work of Box and Jenkins, ARIMA models have been widely used in the forecasting of economic and financial time series [26]. The general ARIMA(p,n,q) model can be written as gt ¼ a0 þa1 gt1 þ    þ ap gtp þ et þb1 et1 þ    þbq etq , ð1Þ where gt is the series of interest and et is the residual in period t, ai and bj are coefficients of AR and MA terms, respectively, for i¼1,y,p; j¼1,y,q. p is the number of autoregressive terms, n indicates the degree of differencing used to obtain stationarity, and q is the number of lagged forecast errors in Eq. (1). In our experiment, we follow Box and Jenkins’ three stage method aimed at selecting an appropriate model for estimating and forecasting a univariate time series. This method consists of identification, estimation, and diagnostic checking stages [14]. 2.2. Exponential smoothing (ES) Exponential smoothing is a procedure for continually revising an estimate in the light of more recent experience. This method is 807 based on averaging (smoothing) past values of a series in a decreasing (exponential) manner. The observations are weighted, with more weight being given to the more recent observations. More formally, the exponential smoothing equation is g^ t þ 1 ¼ agt þð1 þ aÞg^ t ¼ g^ t þ aðgt g^ t Þ ¼ g^ t þ aet , ð2Þ where g^ t is the forecast for period t, gt is the actual observation made in period t, a is the smoothing constant (0 o a o1), and et is the forecast error or residual in period t (et ¼ gt g^ t ). The value assigned to a is the key to the analysis. One method of estimating the a is an iterative procedure that minimizes the mean square error due to a. The value of a is generally determined by using standard statistical softwares such as MINITAB, RATS, SAS, etc. 2.3. Time series decomposition (TSD) One approach to the analysis of time series data involves an attempt to identify the component factors that influence each of the periodic values in a series. This identification procedure is called decomposition. Each component is identified separately so that the time series can be projected into the future and used for both short-run and long-run forecasting. For a series measured in time periods less than a year, such as monthly and quarterly time series, each original value is considered to be the multiplicative product of four components, as shown in the following equation: g ¼ TR  CY  SE  IR, ð3Þ where g is the actual observations, TR is the trend component, CY is the cyclical component, SE is the seasonal component, and IR is the irregular component. It would be impractical to supply a complete procedure of the TSD method, instead we refer the reader to the book by Hanke and Reitsch [27, pp. 302–356]. 2.4. A model for combining forecasts based on mean absolute forecast error (MAFE) The traditional trading approach which is guided by a single forecasting technique often leads to a unique trading pattern, and requires an investor to choose a specific forecasting model. However, there is no guarantee that the performance of a certain model is consistent from period to period. Instead, our approach considers the optimal convex combination of data obtained by using different forecasting techniques. The weights for constructing an optimal convex combination are found by minimizing the mean absolute forecast error. Constructing a forecast portfolio by this manner can possibly alleviate the impact resulted from an unexpected performance deterioration of a single forecasting model. The idea of ‘‘forecast combination’’ was first studied by Bates and Granger [28]. In their seminal work, the authors proposed a linear combination of two forecasts with weights selected to minimize the predicted forecast error variance. This research study was later extended by Newbold and Granger [29] which formed consensus from more than two forecasts. The superior performance of combining forecasts over individual approaches was illustrated in the extensive empirical evaluation conducted by Makridakis and Winkler [24] and Russel and Adam [30]. A comprehensive review of works in this research stream can be found in Clemen [31]. Let eitk be the forecast error generated by a forecasting technique k in time period t for asset i, where k¼1,2,y,M, t¼ 1,2,y,T and i¼1,2,y,N. M denotes the number of forecasting models, T denotes the number of historical time periods, and N is the number of available assets. A mathematical model minimizing the mean absolute forecast error for each asset i is formulated 808 O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819  R i be the average return of asset i based on the combined as follows: ðMAFEi Þ minimize T X M X forecasts over the next P months (in percent): lik jeitk j=T t ¼1k¼1 subject to M X lik ¼ 1, lik Z 0, k ¼ 1,2, . . . ,M,  eit be the combined forecasting error of the return of asset i where lik is the weight of the forecast k for asset i. MAFE measures the forecast accuracy by averaging the magnitudes of the forecast errors (absolute values of each error). MAFE is most useful when the analyst wants to measure forecast error in the same units as the original series. There are many other methods for evaluating a forecasting technique such as the mean squared error (MSE), the mean absolute percentage error (MAPE), the mean percentage error (MPE), etc. A series of the predicted returns and their residuals for each asset are simultaneously outputs and inputs of the forecasting and the mathematical modeling stages, respectively. Let Ritk be a forecast of return generated by a forecasting technique k in (future) time period t of asset i, where i¼1,2,y,N, k¼1,2,y,M, t¼ T+1,y,T+P and P denotes the number of future time T periods. Also, let li ¼ ðli1 , li2 , . . . , liM Þ be the transpose of the optimal weight vector obtained from the problem (MAFEi) used to combine these forecasts. Consequently, the return of asset i in period t based on the combined forecasts, can be described as follows: M X lik Ritk , i ¼ 1,2, . . . ,N; t ¼ T þ 1, . . . ,T þP: ð4Þ k¼1 eit ¼ obtained from Eq. (5) for time period t,  e i be the average forecasting error of the return of asset i over the last T months (in percent): e i ¼ ð1=TÞ T X eit , t¼1  sij be the covariance between the returns of assets i and j, based on the combined forecasts over the next P months: TX þP sij ¼ ð1=PÞ ðRit R i ÞðRjt R j Þ, t ¼ T þ1  s^ ij be the covariance between the combined forecasting errors of assets i and j over the last T months: s^ ij ¼ ð1=TÞ T X ðeit e i Þðejt e j Þ, t¼1  s3i denotes the skewness of the returns of asset i, based on the combined forecasts over the next P months: By the same manner, we can describe the combined error. Again, let eitk be a forecasting error of return generated by a forecasting technique or an analyst k in time period t of asset i. Then, the forecasting error of return of asset i in period t based on the combined forecasts can be described as M X Rit , t ¼ T þ1 k¼1 Rit ¼ TX þP Ri ¼ ð1=PÞ lik eitk , i ¼ 1,2, . . . ,N; t ¼ 1,2, . . . ,T: ð5Þ k¼1 The combined returns and errors of returns for each asset found by using the optimal weights will be used as input parameters in constructing a mean–variance–skewness model in the next section. 3. An extended mean–variance–skewness model TX þP s3i ¼ ð1=PÞ ðRit R i Þ3 , t ¼ T þ1  siij and sijj be the coskewness’ between the returns of assets i and j, based on the combined forecasts over the next P months: TX þP siij ¼ ð1=PÞ ðRit R i Þ2 ðRjt R j Þ, t ¼ T þ1 TX þP sijj ¼ ð1=PÞ ðRit R i ÞðRjt R j Þ2 , t ¼ T þ1  s^ 3i denotes the skewness of the forecasting error of asset i, based on the combined forecasts over the last T months: In this section we present the portfolio optimization problem with 11-objective functions in the following form. Parameters: Let  pit be the price (value) of the asset i in the time period t,  R1it be the relative change of the price of asset i over time period t (in percent): R1it ¼ ðpi,t pi,t1 Þ=pi,t1 , 3 s^ i ¼ ð1=TÞ T X ðeit e i Þ3 , t¼1  s^ iij and s^ ijj be the coskewness’ between the forecasting errors of assets i and j, based on the combined forecasts over the last T months: s^ iij ¼ ð1=TÞ T X ðeit e i Þ2 ðejt e j Þ, t¼1  R12 i be the relative change of the price of asset i over the last 12 months (in percent): R12 i ¼ ðpi,T pi,T12 Þ=pi,T12 ,  R36 be the relative change of the price of asset i over the last i three years (in percent): R36 i ¼ ðpi,T pi,T36 Þ=pi,T36 , s^ ijj ¼ ð1=TÞ T X ðeit e i Þðejt e j Þ2 , t¼1  lwi be the minimum proportion that must be held of asset i if any of the asset i is held,  upi be the maximum proportion that can be held of asset i if any of the asset i is held,  vi be the Wilk–Shapiro test statistic value according to the  Rit be the return of asset i in period t based on the combined forecasts obtained from Eq. (4), residual distributions of the asset i,  sri be the number of stars assigned to the asset i. 809 O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819 Decision variables: Let, and errors are defined as xi be the proportion (0 rxi r 1) held of asset i, zi be 1, if any of asset i is held, 0 otherwise. f5 ðxÞ ¼ f6 ðxÞ ¼ f1 ðxÞ ¼  xi R i :  individual asset i measures the average forecasting error of the return of the asset over the last T months (in percent) and is therefore a measure for the long term expected error. The average forecasting error of a portfolio is then given by xi e i : i¼1  A general opinion is that, it is very difficult to predict the expected returns for all assets, and that the optimization process is extremely sensitive to forecasting errors. Therefore, the average forecasting error of the portfolio cannot be neglected in the portfolio optimization process. For example, assume that two assets with annual estimated returns R 1 ¼ R 2 ¼ 5 (in percent) are available for the investment. Since these returns are equal, the corresponding assets might equally be preferred by an investor. Now assume that the average forecasting errors of these return values are given to be e1 ¼ 0:1 and e2 ¼ 0:5 (in percent), respectively. When the average returns and forecasting errors of assets are simultaneously considered, the second asset becomes more attractive than the first one. The reason is the following. The positive average forecasting error shows that the realized return exceeds the estimated one. Since an investor is interested in maximizing the average return, he (or she) will prefer the assets with positive average forecasting error. This analysis leads to the assertion that, in general, the asset with the greater average forecasting error should be preferred. Variances: The third objective function is a risk function related to the covariances of returns based on the combined forecasts over the next P months: f3 ðxÞ ¼ N X N X xi xj sij : i¼1j¼1 The fourth objective function is also a risk function related to the covariances of forecasting errors based on the combined forecasts over the last T months: f4 ðxÞ ¼ N X ðx2i xj siij þxi x2j sijj Þ, i ¼ 1 j ¼ 1ðj a iÞ 3 x3i s^ i þ 3 N X N X ðx2i xj s^ iij þ xi x2j s^ ijj Þ: i ¼ 1 j ¼ 1ðj a iÞ The positive skewness of forecasting errors shows that the realized skewness of returns exceeds their estimated skewness. Therefore, investor desires to maximize the skewness of excess return of a portfolio. 12-Month performance: The 12-month performance R12 of an i individual asset i measures the relative change of the price of the asset over the last 12 months (in percent) and is therefore a measure for the short term expected return. In particular, the 12-month performance of a portfolio is given by f7 ðxÞ ¼  Average forecasting error: The average forecasting error e i for an f2 ðxÞ ¼ N X N X xi R12 i i¼1 i¼1 N X N X i¼1  Expected performance in the future: The expected performance N X x3i s3i þ 3 i¼1 Objective functions: The generalized mean–variance–skewness model of this paper consists of the following 11-objective functions. R i of an individual asset i measures the average return of the asset over the next P months (in percent). We assume that the returns follow some statistical distribution, so that the expected return in the future can be obtained as a ‘‘weighted sum’’ of the expected return of the individual assets in the portfolio. Therefore the corresponding objective function f1(x) is formulated as N X N X N X xi xj s^ ij : i¼1j¼1  Skewnesses: In the similar way, the fifth and sixth objective functions related to the skewnesses of the portfolio returns again assuming a statistical distribution of the R12 values. i 3-Year performance: By using the long term performance R36 i of an individual asset i, the corresponding 3-year performance of a portfolio is given by f8 ðxÞ ¼ N X xi R36 i i¼1  again assuming a statistical distribution of the R36 values. i Investor’s star ranking: The 5-year average monthly performance of an asset i is calculated as 1 Ri ¼ 60 X R1it =60: t¼1 Then the investor or financial manager assigns between one star (for a relatively poor performance) and up to five stars (for a very good performance) by comparing the 5-year average periodic performances of the available assets. We will assume in the following that the ranking is additive in the sense that the ranking of a portfolio of investment funds can be obtained as the weighted sum of the rankings of the individual investment funds in the portfolio. Consequently, the ninth objective function can be written as f9 ðxÞ ¼ N X xi sr i : i¼1  Number of assets in the portfolio: The constant and variable transaction costs generally depend on the number of assets in a portfolio. Therefore, investor may be interested in minimizing the number of assets in a portfolio f10(z): f10 ðzÞ ¼ N X zi : i¼1  The Wilk–Shapiro test statistic: If the combined forecast model is correct, the residuals should be normally distributed. For taking into account the test results for normality of residual distributions, we use the Wilk–Shapiro test (WS-test). This test is selected as the methodology, since it is found to be the best for testing normality under various alternative specifications of the probability distribution [32]. Specifically, the WS-test tests the hypothesis: Hip0: The parent population is normally distributed. Hip1: The parent population is not normally distributed. To determine whether to reject the null hypothesis of normality it is necessary to examine the probability associated 810 O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819 with the WS-statistic. If this probability is less than some specified level (such as 0.05), it means that the null hypothesis cannot be supported at the five percent level of significance. We will assume in the following that the V values of the WStest statistics are additive in the sense that the V value of a portfolio of investment assets can be obtained as the weighted sum of the V values of the individual investment assets in the portfolio. The weighted V value of a portfolio is given by f11 ðxÞ ¼ X T V ¼ N X xi vi : i¼1 Thus, our generalized mean–variance–skewness model for portfolio optimization problem (with 11-objective functions) can be written as follows: 8 N X > > > > max f1 ðxÞ ¼ xi R i , > > > > i¼1 > > > N > X > > > max f2 ðxÞ ¼ xi e i , > > > > i¼1 > > > > N X N > X > > > min f3 ðxÞ ¼ xi xj sij , > > > > i ¼ 1j ¼ 1 > > > > N X N > X > > > min f4 ðxÞ ¼ xi xj s^ ij , > > > > i ¼ 1j ¼ 1 > > > > N N N > X X X > > > x3i s3i þ3 ðx2i xj siij þxi x2j sijj Þ, max f5 ðxÞ ¼ > > > > i ¼ 1 i ¼ 1 j ¼ 1ðj a iÞ > > > > > N N N X X X > > 3 > > x3i s^ i þ3 ðx2i xj s^ iij þ xi x2j s^ ijj Þ, max f6 ðxÞ ¼ > > > > i ¼ 1 i ¼ 1 j ¼ 1ðj a iÞ > > < N X ð6Þ ðPÞ : max f ðxÞ ¼ xi R12 7 > i , > > > i ¼ 1 > > > N > X > > > xi R36 max f8 ðxÞ ¼ > i , > > > i ¼ 1 > > > > N > X > > > xi sri , max f9 ðxÞ ¼ > > > > i¼1 > > > > N > X > > > zi , > min f10 ðzÞ ¼ > > > i¼1 > > > > N X > > > > xi vi , max f11 ðxÞ ¼ > > > i¼1 > > > > N > X > > > s:t: hðxÞ ¼ xi 1 ¼ 0, > > > > i¼1 > > > > > lwi zi r xi r upi zi , i ¼ 1, . . . ,N, > > > : x A ½0,1, z A f0,1g, i ¼ 1, . . . ,N: i i The constraint N X xi ¼ 1 ð7Þ i¼1 ensures that the proportions add to one. The constraints lwi zi rxi rupi zi , i ¼ 1, . . . ,N ð8Þ ensures that if the asset i is held (zi ¼ 1) its proportion xi must lie between lwi and upi, whilst if the asset i is not held (zi ¼0) its proportion xi is zero. It is important to note that the objective functions f6, f7, f8 and f9(x) can be non-concave in general. With the combination of continuous and binary variables, the resulting problem becomes a non-convex and non-differentiable multi-objective mixed integer programming problem. In this paper, to find a set of efficient solutions of (P) we use a special scalarization method for scalarizing the obtained multiobjective programming problem. 4. Scalarization We begin this section with standard definitions from multiobjective programming. Let Rpþ :¼ fy ¼ ðy1 , . . . ,yp Þ : yk Z 0,k ¼ 1, . . . ,pg, and let Y  Rp be non-empty set. Definition 1. 1. An element yA Y is called non-dominated if ðfygRpþ Þ \Y ¼ fyg. 2. An element y A Y is called weakly non-dominated if ðfyg intðRpþ ÞÞ \ Y ¼ |. 3. An element yA Y is called properly non-dominated (in the sense of Benson) if y is a non-dominated element of Y and the zero element of Rp is a non-dominated element of clðconeðY þ Rpþ yÞÞ, where clðYÞ denotes the closure of a set Y and coneðYÞ :¼ fay : a Z 0,yA Yg. The set of all non-dominated, weakly non-dominated and properly non-dominated elements of Y are denoted by Yn, Ywn and Ypn, respectively. Consider a multi-objective programming problem: min½f1 ðxÞ, . . . ,fp ðxÞ, ð9Þ xAX where X is a non-empty set of feasible solutions and fi : X- R,i ¼ 1, . . . ,p are real-valued functions. Let f(x)¼(f1(x),y, fp(x)) and let Y :¼ f ðXÞ. In the context of the portfolio optimization problem under consideration, we will have ( ) N X N N X ¼ ðx,zÞ A R þ  f0,1g : xi ¼ 1,lwi zi r xi r upi zi , 8i : i¼1 Definition 2. A feasible element x A X is called efficient, weakly efficient or properly efficient solution of multi-objective programming problem (9) if y¼ f(x) is a non-dominated, weakly nondominated or properly non-dominated element of Y, respectively. The sets of efficient, weakly efficient and properly efficient solutions of the multi-objective programming problem are denoted by Xe, Xwe and Xpe, respectively. Over the last decades, various interactive methods and decision support systems have been developed to deal with multi-objective programming problems. Although some of the methods work well only on problems with convex objective functions and a convex feasible region, the most of real life problems such as portfolio optimization, are not convex. Scalarization method is an important tool in the study of multiobjective optimization problems. It is a standard technique for finding efficient solutions. Via scalarization, the multi-objective problem is transformed into a single objective optimization problem involving possibly some parameters or additional constraints. Solving the single objective problem for a range of parameter values yields (some) efficient solutions to the multi-objective programme. One way of constructing such an optimization problem is through using scalarizing functions. Multi-objective optimization methods utilize different scalarizing functions in different ways. In a large variety of methods, norms have usually been used to identify the non-dominated elements that are closest to some (reference) point. In particular, the family of lp norms has 811 O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819 extensively been studied by many researchers, including Yu [33], Zeleny [34], Steuer [35], Lewandowski and Wierzbicki [36], and many others. Gearhart [37] studied a family of norms including the lp norms. The l1 norm and the augmented l1 norm turned out to be very useful in generating non-dominated solutions of general continuous and discrete multiple criteria programs and led to the well-known weighted (augmented) Tchebycheff scalarization and its variations. Kaliszewski [38] introduced a modified l1 norm and showed its applicability in generating nondominated solutions. Recently, a new class of norms called the oblique norms was proposed in Schandl et al. [39] to generate the properly non-dominated points. Comparisons of multi-objective optimization methods have been realized in different ways. Wierzbicki [40,41] produced seminal research on reference point methods and investigated the characteristics of various scalarizing functions. These scalarizing functions were designed to have a significant advantage over goal programming by producing only non-dominated, or Paretooptimal, solutions. Several widely known scalarizing functions and their modifications are discussed and compared in Miettinen and Makela [42]. A nice comparison between the utility function approach and the use of the conic scalarization approach for calculating efficient solutions of multi-objective portfolio optimization problems is given in [23]. By summarizing the properties of different scalarization techniques, the following questions related to a quality or performance of the technique can be drawn:  Does this method always yields efficient solutions?  If so, can all efficient solutions be detected by this way?  Will the preference and reference point information of the decision maker be taken into consideration by this technique? The simple investigation on different techniques may show that not all methods can positively answer all the questions raised here. In this section we explain a conic scalarization method, the idea of which is based on the characterization theorem for Benson properly non-dominated elements given in [20]. A full explanation of this method is given in [19]. In [20], a special class of monotonically increasing convex functions is introduced. By using this class of functions, characterization theorems via scalarization, for Benson properly non-dominated elements of a set, are proven without any restrictions on the set. Recently, the same class of functions are used by Kasimbeyli who proved theorems fully characterizing efficient solutions of multi-objective problems via scalarization [19]. Now we present the main characterization theorem without proof, which can easily be obtained from Theorems 5.8 and 5.9 presented in [19]. Let W :¼ fðw, aÞ A Rpþ  R : 0 r a ominfw1 , . . . ,wp gg: ð10Þ Theorem 1. Let a ¼ ða1 , . . . ,ap Þ A Rp be a reference point for problem (9). The element y^ is a Benson properly non-dominated element of Y if and only if there exists a pair ðw, aÞ A W such that, y^ A Y is an optimal solution to the scalar minimization problem min yAY p X k¼1 wk ðyk ak Þ þ a p X jyk ak j: ð11Þ k¼1 The following proposition explains some important properties of scalarizing functions gðw, aÞ : Rp -R used in Theorem 1 which are defined for each pair of scalarizing parameters ðw, aÞ as P P gðw, aÞ ðyÞ ¼ pk ¼ 1 wk yk þ pk ¼ 1 jyk j. Proposition 1. The following statements hold: (i) For each ðw, aÞ A W with a 4 0, the ‘‘lower’’ sublevel set C  ðw, aÞ of function gðw, aÞ defined by C  ðw, aÞ ¼ fy A Rp : wuy þ aJyJ1 r0g ð12Þ Rpþ , is a pointed closed convex cone containing where wuy ¼ Pp w y is a scalar product of vectors w and y, and JyJ1 ¼ Pkp ¼ 1 k k jy j is an l norm of the vector y. Note that a cone K is called 1 k k¼1 pointed if K \ ðKÞ ¼ f0g. (ii) ðw, a1 Þ,ðw, a2 Þ A W and a1 r a2 imply C  ðw, a2 Þ D C  ðw, a1 Þ. Proof. (i) Let ðw, aÞ A W with a 4 0. It is clear that the set C  ðw, aÞ is a cone, and since the function gðw, aÞ ðyÞ ¼ wuyþ aJyJ is convex and continuous, it follows that C  ðw, aÞ is convex and closed. Now let yA C  ðw, aÞ. Then wuyþ aJyJ r0: ð13Þ  If to assume yA C ðw, aÞ, then we have wuyþ aJyJ r0: ð14Þ Eqs. (13) and (14) together with the condition a 40 lead to y¼0 and thus prove that C  ðw, aÞ is a pointed cone. Now let y A Rpþ . Then y A Rpþ and since ðw, aÞ A W we have wðyÞaJyJ Z 0, or wuyþ aJyJ r0 which means by the definition of C  ðw, aÞ that yA C  ðw, aÞ. (ii) Let ðw, a1 Þ,ðw, a2 Þ A W, a1 r a2 and let yA C  ðw, a2 Þ. Then, 0 Z wuyþ a2 JyJ Z wuyþ a1 JyJ, which implies that yA Cðw, a1 Þ and thus proves the lemma. & Now we explain how these theorems can be used to scalarize the given multi-objective problem (9). Suppose that decision maker specifies her (or his) preferences by giving a weighting vector w ¼(w1,y,wp), with wi Z0,w1 þw2 þ    þwp ¼ 1, and a reference point a ¼(a1,y,ap). The reference point a can be used in situations when the decision maker may desire to calculate the non-dominated elements that are closest (in a certain sense) to some point a. It is evident that if x^ A X is an efficient solution to problem (9), then it is also an efficient solution to the ‘‘shifted’’ multi-objective programme minðf1 ðxÞa1 , . . . ,fp ðxÞap Þ, xAX ð15Þ where a A Rp is an arbitrary vector. In this case we can formulate the scalarized problem, in the form of (11): ( ) p p X X wk ðfk ðxÞak Þ þ a jfk ðxÞak j , min xAX k¼1 k¼1 where w ¼(w1,y,wp) is a weighting vector with wi Z0,w1 þw2 þ    þwp ¼ 1, a ¼(a1,y,ap) is a reference point specifying decision maker’s preferences and a is a non-negative augmentation parameter satisfying the condition 0 r a r minfw1 , . . . ,wp g. Theorem 1 implies that every solution to the scalar problem (11) for 0 r a o minfw1 , . . . ,wp g is a proper efficient solution to the problem (15), and therefore a proper efficient solution to the problem (9). Note that taking a ¼ 0 reduces (11) to a weighted sum scalarization problem. If x^ A X is an optimal solution to the problem (11) for the case a ¼ 0, then we have ^ wuðf ðxÞaÞ Zwuðf ðxÞaÞ, 8x A X or ^ wuðyaÞ Z wuðyaÞ, 8y A Y ¼ f ðXÞ, ^ The last inequality can also be written as where y^ ¼ f ðxÞ. ^ Z0, wuðyyÞ 8y A Y, 812 O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819 ^ is a supportwhich means that the non-dominated point y^ ¼ f ðxÞ ing point of the set Y¼ f(X) with respect to the hyperplane ^ ¼ 0g. H ¼ fy A Rp : wuðyyÞ Now let a be a positive number less than minfw1 , . . . ,wp g and let x be an optimal solution of the scalar problem (11). Then we have wuðf ðxÞaÞ þ aJf ðxÞaJ1 Z wuðf ðxÞaÞ þ aJf ðxÞaJ1 , the weighted-sum and the conic scalarization methods. xi R i , i¼1 4 X 4 X minf3 ðxÞ ¼ xi xj sij , i¼1j¼1 8x A X Subject to or wuðyaÞ þ aJyaJ1 ZwuðyaÞ þ aJyaJ1 , 8yA Y ¼ f ðXÞ, ð16Þ f10 ðzÞ ¼ 4 X zi ¼ 2, i¼1 where y ¼ f ðxÞ. The last inequality implies wuðyyÞ þ aJyyJ1 Z 0, 4 X maxf1 ðxÞ ¼ 8y A Y ¼ f ðXÞ: ð17Þ hðxÞ ¼ It follows from the relation (17) that the conic surface Cðw, a; yÞ ¼ fy A Rp : wuðyyÞ þ aJðyyÞJ1 ¼ 0g 0 rxi r zi , with vertex at y is a supporting cone to the set Y at the nondominated point y in the following sense: xi A ½0,1, ð19Þ ð20Þ The supporting philosophy described by the last two relations can equivalently be written also in the form Yfyg  C þ ðw, aÞ ¼ fy A Rp : wuy þ aJyJ1 Z0g ð21Þ and f0g  Yfyg \ Cðw, aÞ ¼ fyA Rp : wuy þ aJyJ1 ¼ 0g: i ¼ 1, . . . ,4, zi A f0,1g, i ¼ 1, . . . ,4: The preference weights of the objective functions f1(x) and f3(x) are selected as w3 ¼ 1 w1 using w1 ¼ 0.01t, t ¼0,1,y,100 and a ¼ minfw1 ; w3 g0:001. The reference point is taken as a¼(0.0026, 0.000656). Efficient points for this problem, calculated by the conic and weighted sum scalarization methods, are depicted in Figs. 1 and 2, respectively. These figures nicely demonstrate the superiority of the conic scalarization method. The efficient points shown in the triangle in Fig. 1 are calculated by the conic scalarization method for the same set of weighting parameters used in the weighted sum scalarization. As it can be seen from Fig. 2, these points could not be calculated by the weighted sum scalarization method. Thus, this example demonstrates that there may be some ‘‘hidden’’ efficient solutions of portfolio optimization problems which cannot be detected by the weighted sum scalarization method. The use of the conic scalarization method allows decision maker to calculate also such solutions. Finally note that, earlier this method was successfully applied in solving multi-objective faculty course assignment problems and a 1.5-dimensional multi-objective assortment problems (see, [44–46]). and y  Y \ Cðw, a; yÞ: xi 1 ¼ 0, i¼1 ð18Þ Y  C þ ðw, a; yÞ ¼ fy A Rp : wuðyyÞ þ aJðyyÞJ1 Z0g 4 X ð22Þ As it was analyzed above, for a ¼ 0 the cone Cðw, aÞ becomes a hyperplane. By Proposition 1, this cone becomes as smaller as a increases, and approaches to a smallest cone containing Rpþ for a maximal possible value of a, i.e., a ¼ minfw1 , . . . ,wp g. So, the scalarized problem (11) allows to find all properly efficient solutions of the multi-objective problem (9) corresponding to the decision maker’s preferences w and a by varying the augmentation parameter a between zero and minfw1 , . . . ,wp g. The following example is prepared to illustrate the role of the conic scalarization method. 5. Modified subgradient algorithm based on feasible values (F-MSG algorithm) Example 1. In order to illustrate the effect of cardinality constraints we consider the small (four asset) problem whose data are drawn from the FTSE data set, [43]. We calculate efficient elements of the following two-objective problem by using both After scalarization, the obtained scalar problem is not only non-convex but also a non-differentiable optimization problem. 0.006 0.005 Return 0.004 0.003 0.002 0.001 0.000 0 0.0005 0.001 0.0015 0.002 0.0025 Variance Fig. 1. Efficient points obtained by the conic scalarization method for Example 1. O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819 813 0.006 0.005 Return 0.004 0.003 0.002 0.001 0.000 0 0.0005 0.001 0.0015 Variance 0.002 0.0025 Fig. 2. Efficient points obtained by the weighted-sum scalarization method for Example 1. An important class of exact methods used for solving mixed integer problems with non-convex objective and/or constraints is based on penalty function or augmented Lagrangian approaches. The simple examples show that in many cases the solution to the penalty problem can be made sufficiently close to the optimal solution of the original problem by choosing the penalty parameter large enough. However, solving a penalty problem with a very large penalty parameter leads to computational difficulties (see, for example, [47]). In recent years, different augmented Lagrangian duality schemes which are able to eliminate duality gap in a wide class of non-convex optimization problems and to provide solution algorithms have been studied (see, for example, [48–52]). Modified subgradient (MSG) algorithm proposed by Gasimov [49] has been designed for solving sharp augmented Lagrangian dual problems. However, as many subgradient algorithms, this algorithm also requires finding an unconstrained global minimum of the Lagrangian function and knowing an optimal value of the problem under consideration in order to update dual variables at each iteration. The Modified Subgradient Algorithm Based on Feasible Values, or briefly F-MSG algorithm, recently developed by Kasimbeyli et al. [21], is a generalized version of the modified subgradient (MSG) algorithm. The new algorithm is proposed for solving sharp augmented Lagrangian dual problems, in which some drawbacks of the MSG algorithm were eliminated. In this paper we utilize the F-MSG algorithm for solving the scalarized multi-objective portfolio optimization model developed in Section 3. Next we present the F-MSG algorithm. Consider a nonlinear programming problem: ðSPÞ minimize f0 ðxÞ subject to x A X satisfying hðxÞ ¼ 0, ð23Þ and Hðu,cÞ ¼ minLðx,u,cÞ, xAX respectively. The dual problem (P*) is then given by ðP* Þ : max ðu,cÞ A Rm R þ Hðu,cÞ: Step 1. Choose positive numbers y1 , y2 , D1 , F and a number H1. Set n¼ 1, t ¼0, r¼ 0. Step 2. Choose ðun1 ,c1n Þ A Rm  R þ and ‘ð1Þ 4 0 and set k¼1, uk ¼un1, ck ¼ cn1. Step 3. Given (uk,ck), solve the following constraint satisfaction problem (CSP) P(Hn): Find an element x A X such that f0 ðxÞ þ ck JhðxÞJ /uk ,hðxÞS rHn ,If a solution to (25) does not exist (for example, if ‘ðkÞ 4 F) then go to Step 6, otherwise, if a solution xk exists then check whether h(xk)¼0 or not. If h(xk)¼0 (or if Jhðxk ÞJ r y1 ) then go to Step 5, otherwise go to Step 4. Step 4. Update dual variables as uk þ 1 :¼ uk bsk hðxk Þ, ð26Þ ck þ 1 :¼ ck þ ð1 þ bÞsk Jhðxk ÞJ, ð27Þ where sk is a positive stepsize parameter defined as 0 o sk ¼ dbðHn Lðxk ,uk ,ck ÞÞ 2 ½b þ ð1 þ bÞ2 Jhðxk ÞJ2 ð24Þ ð28Þ or 0 o sk ¼ d½bðHn Lðxk ,uk ,ck ÞÞ þ ðcck ÞJhðxk ÞJ 2 ½b þ ð1 þ bÞ2 Jhðxk ÞJ2 ð29Þ with b 4 0, and 0 o d o2. We will also require the stepsize sk corresponding to the dual variables (uk,ck) to satisfy the following property: sk Jhðxk ÞJ þ ck Juk J 4 ‘ðkÞ, where S is a subset of a metric space X, and f0 : X-R and h : X-Rm are given functions. Denote by R þ ,J  J and /  ,  S the set of non-negative numbers, the Euclidean norm and the Euclidean inner product on Rm , respectively. The sharp augmented Lagrangian L : X  Rm  R þ -R and the dual function H : Rm  R þ -R associated with the problem (SP) are defined as Lðx,u,cÞ :¼ f0 ðxÞ þcJhðxÞJ/u,hðxÞS Algorithm 1. F-MSG algorithm. ð30Þ Set k¼k+ 1, update ‘ðkÞ in such a way that ‘ðkÞ- þ 1 as k- þ 1, and repeat Step 3. Step 5. Let xk be a solution to (25) with h(xk)¼0. In this case L(xk,uk,ck)¼ f0(xk). Set r ¼r + 1 and check t. If t ¼0 then set Dn þ 1 ¼ Dn , otherwise set Dn þ 1 ¼ 12 Dn . Check Dn þ 1 . If Dn þ 1 o y2 then stop, f0(xk) is an approximate optimal value, xk is an approximate primal solution and (uk,ck) is an approximate dual solution; otherwise set Hn þ 1 ¼ minff0 ðxk Þ,Hn Dn þ 1 g,n ¼ n þ 1 and go to Step 2. Step 6. Set t¼t + 1. If r ¼0 then set Dn þ 1 ¼ Dn ; otherwise set Dn þ 1 ¼ 12 Dn . Check Dn þ 1 . If Dn þ 1 o y2 then stop; in this case the last calculated feasible solution xk is an approximate primal solution and (uk; ck) is an approximate dual solution; f0(xk) is an approximate optimal value. Otherwise set Hn þ 1 ¼ Hn þ Dn þ 1 , n¼n + 1 and go to Step 2. 814 O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819 Steps 3 and 4 are concerned with finding a feasible solution to the CSP P(Hn) for a given real number Hn. A positive number y1 chosen at Step 1 is an error bound for checking the feasibility of a solution xk found to CSP P(Hn) at Step 3. If an exact feasible solution at kth iteration with h(xk)¼ 0 cannot be found (within a reasonable time limit), the decision maker may check whether Jhðxk ÞJ r y1 or not. A positive number F chosen at Step 1 is a large number used as a termination tolerance for estimating whether the right-hand side number Hn in (25) is feasible or not. That is if the value of ‘ðkÞ in (30) becomes greater than F not reaching an approximate or exact feasible solution, then this means that a feasible solution to (25) with the right-hand side number Hn does not exist, so the number Hn has to be identified as an infeasible value. In this case Step 6 of the outer loop will be visited. Otherwise the updating formulas (26), (27) together with the stepsize parameters calculated from (28) (or (29)) and (30) will force the sequence of solutions xk of (25), to converge to a feasible solution at Step 4. A positive number D1 chosen at Step 1 is a stepsize parameter used for updating the right-hand side numbers at steps 5 and 6 in the F-MSG algorithm. Choosing a suitable value for this parameter is very important for the performance of the algorithm. In some cases it may happen that a solution xk to (25) with the right-hand side number Hn is a feasible solution (with h(xk)¼0) which provides the same value Hn to the objective function f0. In such cases the next right-hand side value Hn + 1 will be defined as Hn þ 1 ¼ Hn Dn þ 1 , where a value of Dn þ 1 is equal to Dn or Dn =2 in dependence of the value of a parameter t (see steps 5 and 6). It is clear that in dependence of, how successful the value for D1 has been chosen, the updated value of Hn + 1 may become (sufficiently) close to the optimal value of the problem under consideration, or conversely, it may go (too) far from the optimal value. This may seriously affect a convergence rate of the algorithm. When the values of parameters t and r both become different from zero, this means that an interval with feasible and infeasible endpoints has been found. From this iteration on, a value of Dn will decrease at each outer iteration, which results in decreasing of a length of the interval with feasible and infeasible endpoints. Since the optimal value of the problem belongs to this interval, algorithm stops if a value of the parameter Dn þ 1 becomes less than the error bound e2 , and states that a last feasible value is an approximate optimal value. The method applied for solving the subproblem (25) at Step 3 is another factor which may affect the performance of the F-MSG algorithm. In this paper we use the following way for finding a solution to the subproblem (25). We ‘‘minimize’’ a fictitious objective function (which is identically constant) with respect to the single inequality from this subproblem. Such a problem can be solved by using different standard optimization packages. 6. Empirical applications In this section, the four-stage integrated approach is applied to a real case multi-objective portfolio selection problem constructed by using the sample data taken from Istanbul Stock Exchange (ISE) official web site (http://www.ise.org/). The data consist of monthly rates of returns from January 2000 to December 2004, a total of T¼60 months of observations of N ¼20 stocks. The selected 20 stocks are chosen from the main list of firms in ISE 100 index. These firms are ABANA ELEKTROMEKANIK, ALORKO HOLDING, ALORKO CARRIER, ALCATEL TELETAS, ALORKO GMYO, ALTERNATIFBANK, ALTINYILDIZ, ANADOLU CAM, ANADOLU SIGORTA, ARCELIK, ALTERNATIF YAT.ORT., ARSAN TEKSTIL, ASELSAN, FINANSBANK, F-M IZMIT PISTON, FINANS YAT.ORT., FRIGO PAK GIDA, FORD OTOSAN, GARANTI BANK. Financial characteristics of these firms can be easily found on the official web site of ISE (www.ise.gov.tr). One month returns (Turkish Lira (TL) based) of stocks are regularly calculated and published on this web site. The portfolio selection problem related to these data is formulated and solved into four stages. These stages are forecasting, the model formulation, the scalarization of the multi-objective model and the solution of the so-obtained scalarized problem. Stage 1: Forecasting stage: In the forecasting stage, ARIMA, exponential smoothing, and time series decomposition techniques (M ¼3) are used. The MINITAB statistical program (release 13.32) was used for determining the model parameters and specifications for all the three forecasting techniques. By using these techniques, series of a total of P¼12 monthly forecasts from January 2005 to December 2005 for each stock are generated. Obtained forecasts are combined using the optimal weights found by solving the mean absolute forecast error minimization problem for each stock i¼1,y,20 (see, Section 3.4): minimize ðMAFEi Þ 60 X 3 X lik jeitk j=60 t ¼1k¼1 3 X subject to lik ¼ 1, lik Z 0, k ¼ 1,2,3, k¼1 where k¼1 corresponds to ARIMA, k¼ 2 corresponds to exponential smoothing, and k¼3 corresponds to time series decomposition forecasting technique, and lik is the weight of forecasting technique k for asset i. The problem (MAFEi) for each stock i is solved by using GAMS/CPLEX linear programming solver. The optimal weights lik in the combined forecast portfolio are presented in Table 1. For each set of optimal weights ðli1 , li2 , li3 Þ, the combined forecast error eit is calculated as follows: eit ¼ 3 X lik eitk , i ¼ 1,2, . . . ,20; t ¼ 1,2, . . . ,60, ð31Þ k¼1 where eikt is the forecast error generated by a forecasting technique k¼1,2,3 in time period t¼1,2,y,60 for asset i¼1,y,20 (see, Section 3.4). A mean squared error formula used for evaluating the performances of the individual (MSEik) and combined forecasts (MSEi) are presented below: MSEik ¼ 60 X ðeitk Þ2 =60, i ¼ 1,2, . . . ,20, k ¼ 1, . . . ,3, ð32Þ t¼1 MSEi ¼ 60 X ðeit Þ2 =60, i ¼ 1,2, . . . ,20: ð33Þ t¼1 The mean squared error values (MSEik) and (MSEi) are summarized in Table 2. As it can be seen from Table 2, the combined mean squared error values (MSEi) are essentially less than the individual mean squared error values (MSEik). For example, consider asset 18. The Table 1 The optimal weights lik of forecast k for stock i in the combined forecast portfolio. i¼ li1 li2 li3 i¼ li1 li2 li3 1 2 0.7 0 0.31 11 0.32 0 0.68 3 0.32 0 0.68 12 0.23 0.19 0.58 4 0.18 0 0.82 13 0.14 0.32 0.54 5 0.39 0 0.61 14 0.27 0.03 0.7 6 0.12 0 0.89 15 0.07 0 0.93 7 0.39 0 0.61 16 0.44 0 0.56 8 0.39 0.01 0.61 17 0 0.13 0.87 9 0.02 0.07 0.91 18 0.37 0 0.63 10 0.15 0 0.85 19 0.21 0.08 0.72 0 0.25 0.75 20 0.07 0 0.93 815 O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819 Table 2 Mean squared error values. 1 i¼ MSEi MSEi MSEi MSEi 200.56 260.38 232.55 185.41 1 2 3 279.61 349.05 243.89 215.35 11 i¼ MSEi MSEi MSEi MSEi 2 12 403.27 468.21 356.91 320.92 1 2 3 964.42 1062.96 966.4 953.46 3 4 5 6 7 8 9 10 234.91 244.22 196.47 192.76 349.62 412.09 343.44 303.9 235.77 269.12 152.86 153.8 327.57 455.81 340.27 280.93 379.26 319.25 279.08 249.73 285.59 285.59 208.75 205.83 298.87 298.87 177.26 175.8 534.57 534.57 426.14 428.61 13 14 651.82 769.03 522.56 539.9 638.63 638.63 521.09 536.85 15 580.56 580.56 469.62 468.72 Table 3 Mean, variance, skewness and Wilk–Shapiro test statistics related to the forecasting errors for stocks. i Mean ðe i Þ Variance ðs^ i Þ Skewness ðs^ i Þa Wilk–Shapiro statistic (vi) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  0.88  0.58  0.51  0.47  0.99  1.07  0.33  1.19  0.37  0.51  0.07  0.64  0.06  0.62  0.5  0.93  0.97  0.64 0.09  0.21 185.41 215.35 192.76 303.9 153.8 280.93 249.73 205.83 175.8 428.61 320.92 953.46 539.9 536.85 468.72 261.33 364.29 256.55 253.83 413.18 0.03  0.29 0.1 0.17 0.31 0.3  0.04  0.43 0.22 0.6 0.46 1.81 0.59 1.78 1.42 0.39 0.3 0.18 0.67 1.29 0.99 0.98 0.99 0.99 0.99 0.99 0.98 0.99 0.99 0.98 0.99 0.91 0.98 0.91 0.94 0.99 0.98 0.99 0.98 0.94 2 3 16 a Skewness represents third central moment of the forecasting errors. Since the skewness of a normal distribution is zero, the skewness of a data set can sometimes be examined to provide an informal check of normality. optimal weights related to this asset are depicted in the 18th column of Table 1: li1 ¼ 0:37, li2 ¼ 0, li3 ¼ 0:63. The combined and individual mean squared error values MSE18 ¼256.55 and MSE18,1 ¼425.54, MSE18,2 ¼435.35, MSE18,3 ¼319.9 are calculated by using the formulas (31), (32) and (33). These values are presented in the 18th column of Table 2. 2 Table 3 lists the values of the mean (ei ), variance (s^ i ), skewness 3 ^ (s i ), and Wilk–Shapiro test statistics related to the forecasting errors for each stock. These error statistics allows the investor to more efficiently evaluate a forecasting performance in constituting his or her portfolio. For example, the value 0.88 presented in the ‘‘Mean’’ (2nd) column of Table 3 for the 1st stock shows that the real return was less than the forecasted value for this stock in average 0.88%. The value 0.09 presented for the 19th stock at the same column in this table shows that the real return was greater than the forecasted return value for this stock in average 0.09%. Therefore, the maximization of the mean of forecasting error, that is the function f2(x) may influence to constitute the more beneficial and more efficient portfolio. It can be explained in a similar way that the portfolio with a greater error skewness is more desirable. 2 Since the variance of forecasting errors (s^ i ) characterizes the 2 forecasting risk, the portfolio with a lesser s^ i is more preferable. 17 536.96 398.13 399.32 261.33 18 453.15 453.15 363.83 364.29 425.54 435.35 319.9 256.55 19 20 290.39 314.84 256.2 253.83 444.4 517.91 419.47 413.18 The closeness of the value of Wilk–Shapiro test statistic for forecasting errors to 1 ensures that the forecasting error distribution is close to the normal distribution. When the mean of forecasting errors is close to 0, the maximization of the related objective function (f11(x)) may lead to the forecasting which is unbiased. Investment returns and residuals on each series of forecasts are measured and then evaluated by three performance criteria, namely, mean, variance, and skewness. Subsequently, the outputs of this model are then used as input parameters in the multiobjective mean–variance–skewness model in the second stage. Stage 2: Mathematical modeling stage: All the parameter values (using the data taken from ISE) together with those calculated in the first stage are used to construct the multi-objective mathematical model of the portfolio selection problem (P). Stage 3: Scalarization stage: The scalarization of the multiobjective problem (P) with 11 objectives is performed applying the following steps. Step 3-1: Choose scalarization parameters: the weights w¼(w1,y,w11), the augmentation parameter a and the reference point a ¼(a1,y,a11). The weights w¼(w1,y,w11) and the augmentation parameter a must be taken from the set (see explanations in Section 4): W :¼ fðw, aÞ A R11 þ  R : 0 o a ominfw1 , . . . ,w11 gg: ð34Þ A set of 20 scalarization parameters ðw, aÞ used are presented in Table 4. The reference point a A R11 is chosen as a ¼ (4,0,50,200, 700,7500,150,300,3,4, 0.97). Step 3-2: In accordance with the conic scalarization method explained in Section 4, the scalarized problem is formulated in the following form: ( ) 11 11 X X ^ ^ ðSPÞ : minimize f ðxÞ ¼ a jf ðxÞap j þ wp ðf ðxÞap Þ 0 p p¼1 p p¼1 ð35Þ subject to hðxÞ ¼ 20 X xi 1 ¼ 0, ð36Þ i¼1 lwi zi r xi rupi zi xi A ½0,1,zi A f0,1g for i ¼ 1, . . . ,20, for i ¼ 1, . . . ,20, ð37Þ ð38Þ where the scalarization parameters ðw, a,aÞ determined in Step 3-1 are used, li ¼0.05, ui ¼0.3 for all i ¼ 1, . . . ,20 and f^ i ðxÞ ¼ fi ðxÞ for i¼1,2,5,6,7,8,9,11, f^ j ðxÞ ¼ fj ðxÞ for i¼3,4,10. The reason for taking some objective functions with minus sign is to put the problem to a ‘‘minimization’’ form for all objectives. Stage 4: Solving the scalarized problem: In this stage, F-MSG algorithm is applied to solve the scalarized problem (SP) formed in the previous stage. For a given pair of dual variables u and c, the 816 O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819 Table 4 A set of 20 scalarization parameters. No. a w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 0 0 0 0 0 0 0 0 0 0.49 0.19 0.32 0.19 0.11 0.14 0.09 0.09 0.09 0.09 1 0 0 0 0 0 0 0 0 0 0.5 0.2 0.33 0.2 0.12 0.15 0.15 0.1 0.15 0.09 0 1 0 0 0 0 0 0 0 0 – 0.2 – – 0.12 – – 0.1 – 0.09 0 0 0 0 0 1 0 0 0 0 0.5 0.2 0.33 0.2 0.12 0.175 0.15 0.1 0.15 0.09 0 0 0 0 0 0 1 0 0 0 – – – 0.2 0.12 0.175 0.15 0.1 0.15 0.09 0 0 0 0 0 0 0 1 0 0 – – – 0.2 0.14 0.175 0.15 0.1 0.1 0.09 0 0 0 0 0 0 0 0 1 0 – – – 0.2 0.14 0.175 0.15 0.1 0.1 0.09 0 0 1 0 0 0 0 0 0 0 – 0.2 – – 0.12 – – 0.1 – 0.09 0 0 0 1 0 0 0 0 0 0 – 0.2 – – 0.12 – – 0.1 – 0.09 0 0 0 0 1 0 0 0 0 0 – – 0.33 – – 0.15 – – 0.15 0.09 0 0 0 0 0 0 0 0 0 0.5 – – – – – – 0.15 0.1 0.1 0.09 0 0 0 0 0 0 0 0 0 0.5 – – – – – – 0.1 0.1 0.1 0.09 ‘‘–’’ It means that the investor does not consider the related objectives. Lagrange function L is defined as Lðx,z,u,cÞ ¼ f0 ðxÞ þ cJhðxÞJuhðxÞ, ð39Þ and the set X of primal feasible solutions is defined to be a set of decision variables x and y satisfying relations (37) and (38): X ¼ fðx,zÞ A R20  R20 : lwi zi rxi r upi zi ,xi A ½0,1, zi A f0,1g for i ¼ 1, . . . ,20g: The solution of the subproblem (25) in Step 3 of the F-MSG algorithm is found by solving the following problem: minimize Fðx,zÞ ¼ 0 subject to Lðx,z,uk ,ck Þ r Hn , ðx,zÞ A X, ð40Þ where F(x,z) is a ‘‘fictitious’’ objective function which is identically zero (it can be taken any constant instead of zero). This problem is solved by implementing GAMS/CONOPT/DNLP and GAMS/SNOPT/ DNLP derivative-free solvers. Documentation and information about GAMS are available via World Wide Web at the URL: http:www.gams.com. All the programs have been run on the HP workstation XW6000 with WINDOWS operating system on two processors. Stepsize parameters were calculated by using the stepsize formula (28), with b ¼ 1, d ¼ 1 and D1 ¼ 0:05. The initial Lagrange multipliers are chosen as (u0,c0)¼( 1,1). The terminating parameters are chosen as y1 ¼ 1010 and y2 ¼ 102 . F-MSG algorithm has been applied for the sets of scalarizing parameters presented in Table 4. The corresponding computational results are depicted in Table 5. In Table 5, efficient points presented on rows 1–10 correspond to the set of scalarizing parameters presented in Table 4 on rows 1–10. For each set of scalarizing parameters given on rows 11–20 of Table 4, two efficient points are presented in the corresponding rows of Table 5, where the row written in bold type corresponds to the set of parameters presented in Table 4, and the other row corresponds to the same set of weights with a ¼ 0. As it can be seen from Table 5 different efficient points have been obtained for the cases a ¼ 0 and a a0 (bold type) from 11th to 20th rows. It is remarkable that by changing the value of the augmentation parameter a, different efficient solutions have been obtained for the same set of weights. This situation demonstrates that there may be several efficient points corresponding to the same set of preference weights in non-convex multi-objective optimization problems. The superiority of the conic scalarization method over others is that, by changing the augmentation parameter a in its feasible range (between zero and mini wi), one can calculate all the existing efficient points corresponding to the given set of preference weights. To compare the performance of the proposed approach, we analyze the mean–variance-skewness model suggested by Yu et al. [18] using the approach presented in this paper. Parameters for this model are derived from the combined forecasting values of returns for 20 stocks from ISE. Example 2. maxf1 ðxÞ ¼ 20 X xi R i , i¼1 20 X 20 X minf3 ðxÞ ¼ xi xj sij , i¼1j¼1 maxf5 ðxÞ ¼ 20 X x3i s3i þ 3 i¼1 20 X 20 X ðx2i xj siij þxi x2j sijj Þ, i ¼ 1 j ¼ 1ðj a iÞ subject to hðxÞ ¼ 20 X xi 1 ¼ 0, i¼1 xi Z 0, i ¼ 1, . . . ,20: Let the reference points for three objective functions of this model be a1 ¼4, a3 ¼50, a5 ¼700, respectively. For making the comparison easy, the same reference points as in the main example are used. We have chosen six different weight vectors and solve the scalarized versions of this problem obtained by both the weighted sum and the conic scalarization methods. Obtained results are depicted in Tables 6 and 7. As it can be seen from the results presented in Table 7, the investor reaches the reference points for the weights given in the rows 1, 3, 4 and 5 by using the conic scalarizing approach. For 817 O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819 Table 5 Non-dominated points of problem (P) calculated by solving the scalarized problem (SP) for different sets of scalarized parameters presented in Table 4. No. f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 1 2 3 4 5 6 7 8 9 10 11 11.65 8.89 11.65 2.29 3.73 0.52 2.97 11.65 0.52 3.71 11.65 4.61 11.65 7.7 11.65 4.61 3.55 4.42 4.8 6.24 2.97 5.73 2.39 3.24 11.09 9.29 8.44 7.23 9.71 7.73  0.5  1.19  0.5  0.58  0.07  0.63 0.09  0.5  0.63  0.71  0.5  0.8  0.5  0.44  0.5  0.8 0.04  0.01  0.41  0.17 0.09  0.13  0.04 0.01  0.55  0.47  0.57  0.36  0.39  0.26 97.8 91.58 97.8 77.29 92.44 3.76 30.75 97.8 3.76 35.62 97.8 44.59 97.8 56.99 97.8 44.59 32.57 35.27 30.14 56.48 30.75 50 27.69 37.95 92.23 78.41 58.8 49.42 79.04 61.65 468.72 205.44 468.72 215.34 320.94 126.77 254.34 468.72 126.77 140.69 468.72 137.01 468.72 273.66 468.72 137.01 230.24 220.97 161.48 165.78 254.34 172.07 182.03 196.33 357.82 200.05 244.41 200 247.09 195.41 631.11 505.88 631.11 349.84 1401.35  6.9 177.97 631.11  6.9 126.05 631.11 282.7 631.11 398.9 631.11 282.7 306.99 370.8 150.14 709.52 177.97 663.97 315.22 459.48 800.61 727.81 346.5 378.31 784.01 645.56 14458.1  1270.5 14458.1  931.48 2625  689.67 2715.79 14458.1  689.67 37.64 14458.1 280.44 14458.1 3685.47 14458.1 280.44 2063.81 1291.1 149.3 442.87 2715.79 594.05 801.46 1032.41 6358.82 788.3 2994.77 875.11 1481.73 262.55 207.09 138.34 207.09 53.85 29.92 21.11 45.81 207.09 21.11 77.93 207.09 150 207.09 150 207.09 150 61.19 77.32 127.63 132.25 45.81 118.79 35.12 52.69 199.47 197.83 169.68 150 187.05 150 307.2 670 307.2 42.75 145.9 20.2 190.65 307.2 20.2 182.45 307.2 300 307.2 300 307.2 300 195.17 206.82 356.96 216.04 190.65 212.09 152.03 177.33 336.34 300 381.08 300 269.53 233.76 4 4 4 5 4 2.74 5 4 2.74 3.1 4 3 4 4.17 4 3 4.8 4.7 3.7 3.88 5 4.05 4.3 4.3 3.9 3.36 3.8 3.91 3.8 3.79 1 1 1 1 1 4 1 1 4 10 1 3 1 4 1 3 2 3 3 2 1 2 4 4 3 3 5 4 3 4 0.94 0.99 0.94 0.98 0.99 0.99 0.98 0.94 0.99 0.99 0.94 0.99 0.94 0.96 0.94 0.99 0.98 0.98 0.99 0.99 0.98 0.98 0.98 0.98 0.95 0.97 0.97 0.97 0.96 0.97 12 13 14 15 16 17 18 19 20 Table 6 Pareto efficient points obtained by using the weighted-sum approach for Example 2. No. w1 w3 w5 Weighted-sum f1(x) f3(x) f5(x) Runtime (in seconds) 1 2 3 4 5 6 0.33 0.2 0.4 0.25 0.5 0.5 0.33 0.4 0.4 0.5 0.25 0.5 0.33 0.4 0.2 0.25 0.25 0  743.07  899.49  435.91  543.36  564.47 0.72 6.11 6.11 6.12 6.12 6.11 2.92 78.38 78.38 78.33 78.34 78.37 4.36 2324.04 2324.04 2323.96 2323.98 2324.03  5.06 1 1.18 1.39 1.64 1.81 1.98 Table 7 Pareto efficient points obtained by using the conic scalarization approach for Example 2. No. a w1 w3 w5 f0(x) f1(x) f3(x) f5(x) Runtime (in seconds) 1 2 3 4 5 0.33 0.2 0.4 0.25 0.5 0.33 0.4 0.4 0.5 0.25 0.33 0.4 0.2 0.25 0.25  8.99  408.34  9.41  9.12  9.20 5.61 6.12 5.46 5.41 5.78 50 78.31 50 50 50 1197.02 2323.92 1210.38 1210.62 1173.52 45.19 113.68 155.87 129.87 161.74 0.32 0.19 0.19 0.24 0.24 mean–variance–skewness model only and ignores the forecasting errors and the other objectives of the proposed extended model, can be exposed to a considerable risk. Finally, to test the performance of proposed model according to the objective functions f2(x) and f4(x) which are the distributional properties of the forecasting error, we present the next example. Example 3. The data consist of monthly rates of returns from January 2000 to December 2004, a total of T¼60 months of observations of N ¼20 stocks. We estimate the monthly rates of returns from January 2005 to December 2005, a total of T¼12 months of N ¼20 stocks by using ARIMA. Two different cases, with and without error distribution in the portfolio optimization, are considered and compared. The performances for these two cases are compared by using the actual data distribution of monthly rates of returns from January 2005 to December 2005 of N ¼20 stocks and an additive utility function. In the first case, we assume that the investor invests his/her money in the 20 stocks from ISE by using Markowitz mean–variance model. In this case, investor ignores the mean and variance of the prediction errors. The corresponding mathematical model for this first case is as follows: max f1 ðxÞ ¼ 20 X xi R i , i¼1 example in the row 1 of Table 7, the investor reaches better points than the given reference points for first and fifth objectives. The following objective function values are obtained for this problem: f1(x) ¼5.61, f3(x)¼ 50 and f5(x) ¼1197.02. This example demonstrates that investor can obtain more satisfactory portfolios by applying the generalized approach presented in this paper. On the other hand, it is clear that the mean–variance– skewness model is a special version of the proposed extended model. Therefore, the investor, who prefers the usage of the min f3 ðxÞ ¼ 20 X 20 X xi xj sij , i¼1j¼1 subject to hðxÞ ¼ 20 X xi 1 ¼ 0, i¼1 xi Z 0, i ¼ 1, . . . ,20: 818 O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819 Assume that an investor’s aspiration levels (reference point) according to the objectives are a1 ¼4 and a3 ¼50, respectively. Twenty one different weight vectors are generated and the conic scalarization method is applied for scalarization. Estimated and realized mean and variance values are calculated according to the obtained Pareto efficient solutions. The additive utility function of the form U ¼ Uðf 1 ðxÞ,f 3 ðxÞÞ ¼ w1  f 1 ðxÞw3  f 3 ðxÞ is used to compare the obtained results, where the functions f 1 ðxÞ and f 3 ðxÞ are the actual mean return and variance of a portfolio in 2005. It is clear that investor desires to maximize the utility function. Obtained results are given in Table 8. It is remarkable that the investor reaches the individual aspiration levels as seen in Table 8 for the estimated mean return and variance. For all weights, the same Pareto efficient solution (f1(x),f2(x))¼ (6.6, 0) is obtained. It seems that this is an ideal efficient solution. The estimated variance value is zero. But the selected 20 stocks does not contain a risk free asset. Actual variance equals 92.45 for 2005, which is given in the eighth column of the Table 8. The main reason for this difference between the estimated and actual variances, is the prediction error which is ignored by the investor. Risk is shifted to the prediction errors. In the second case, we again assume that the investor invests his/her money in the 20 stocks from ISE by using Markowitz mean–variance model. In this case, investor considers the mean and variance of the prediction errors: Table 9 The estimated and actual mean and variance values according to the obtained Pareto efficient solutions for Example 3, second case. No. a w1 ¼w2 w3 ¼w4 f1(x) f2(x) f3(x) f4(x) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 3.67 1.19 0.9 0.28 0.04 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.04 0.03 0 0 0.04 0.09 0.14 0.19 0.24 0.29 0.34 0.39 0.44 0.49 0.44 0.39 0.34 0.29 0.24 0.19 0.14 0.09 0.04 0 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 2.27 3.43 4.15 4.27 4.3 4.09 4.06 4.06 4.06 4.07 4.07 4.07 4.08 4.08 4.09 4.11 4.13 4.17 4.24 4.48 6.25  0.26  0.16 0 0.07 0.17 0.44 0.48 0.48 0.48 0.48 0.48 0.47 0.47 0.46 0.45 0.44 0.42 0.38 0.31 0.07 1.69 f 1 ðxÞ f 3 ðxÞ 116.47 2.67 71.81 148.29 2.4 54.23 215.08 3.87 55.63 238.32 4.4 56.94 261.15 4.84 58.5 276.33 5.23 59.52 279.06 5.3 59.77 279.06 5.3 59.78 279.06 5.31 59.79 279.06 5.31 59.8 279.07 5.31 59.81 279.07 5.33 59.85 279.09 5.35 59.89 279.11 5.38 59.96 279.16 5.41 60.05 279.24 5.46 60.18 279.41 5.54 60.39 279.84 5.66 60.8 281.2 5.92 61.8 290.22 6.76 66.56 580.56 13.03 176.67 U 71.81 51.4 49.68 47.74 45.83 43.33  40.25 37 33.75  30.5 27.25 24  20.75 17.49 14.22  10.95 7.65 4.31  0.85 3.1 13.03 subject to hðxÞ ¼ 20 X xi 1 ¼ 0, i¼1 max f1 ðxÞ ¼ 20 X xi R i , xi Z 0, i¼1 max f2 ðxÞ ¼ N X xi e i , i¼1 min f3 ðxÞ ¼ 20 X 20 X xi xj sij , i¼1j¼1 min f4 ðxÞ ¼ N X N X xi xj s^ ij , i¼1j¼1 Table 8 The estimated and actual mean and variance values according to the obtained Pareto efficient solutions for Example 3, first case. No. a w1 w3 f1(x) f3(x) f 1 ðxÞ f 3 ðxÞ U 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 0 0.04 0.09 0.14 0.19 0.24 0.29 0.34 0.39 0.44 0.49 0.44 0.39 0.34 0.29 0.24 0.19 0.14 0.09 0.04 0 0 0.05 0.10 0.15 0.2 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1 1 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 6.6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 3.18 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45 92.45  92.45  87.67  82.89  78.1  73.32  68.54  63.76  58.98  54.2  49.42  44.64  39.85  35.07  30.29  25.51  20.73  15.95  11.17  6.39  1.6 3.18 i ¼ 1, . . . ,20: Assume that investor’s aspiration levels (reference point) according to the objectives are a1 ¼4, a2 ¼4, a3 ¼50, and a4 ¼ 50, respectively. Twenty one different weight vectors are generated and again the conic scalarization method is applied. Estimated and actual mean and variance values are calculated according to the obtained efficient solutions. The additive utility function of the form U ¼ Uðf 1 ðxÞ,f 3 ðxÞÞ ¼ w1  f 1 ðxÞw3  f 3 ðxÞ is used to compare the obtained results, which are presented in Table 9. As it can be seen from the utility values calculated for both cases (see, the last columns of Tables 8 and 9, investor reaches the greater utility values for all weights in the second case. It is also remarkable that, many efficient solutions obtained for the second case, dominate the corresponding solutions obtained for the first case. For example, the values of the vector (f 1 ðxÞ,f 2 ðxÞ) for scalarization parameters presented on rows 4–20 in Table 9 are better than the vector (f 1 ðxÞ, f 2 ðxÞ) ¼(3.18, 92.45) obtained for the first case. Investor gained more return for lower risk in the second case, than in first case. This example clearly demonstrates the importance of consideration of distributional properties of forecasting errors for investors. 7. Conclusion In this study, we propose a general mean–variance–skewness model for portfolio optimization. First, a convex combination of three forecasting methods are used for monthly return predictions. Investment returns and residuals on each series of forecasts are measured and then evaluated by three performance criteria, namely mean, variance, and skewness. Subsequently, these distributional properties of returns and residuals are used in the multi-objective portfolio optimization model. One of the main differences of our approach is that, it involves the objective functions representing distributional properties of error forecasts (second, fourth, sixth and 11th objective functions), O. Ustun, R. Kasimbeyli / Computers & Operations Research 39 (2012) 805–819 the long period preferences (ninth objective function) and the number of assets (10th objective function) in the portfolio of an investor. Besides, the size of the problem solved in the case study is larger (N¼20) than those considered in the literature. In the case study, the error distribution of the predictions of monthly returns are tested for normality using Wilk–Shapiro W-test. The evidence indicates that many of error distributions exhibit significant skewness. These findings confirm our argument that higher moments of error distributions cannot be neglected in the portfolio selection. The empirical evidence suggests that the incorporation of higher moments of error forecasts into the investor’s portfolio decision causes a major change in the construction of the optimal portfolio. The objective functions in this model are scalarized by using the conic scalarization method in which investor preferences such as weights and reference points can be easily incorporated. In the fourth stage, F-MSG algorithm is utilized to solve the scalar problem obtained. Because the conic scalarization method and F-MSG algorithm integration does not require convexity assumptions on the mathematical model, it provides significant advantage in practice. The superiority of the approach presented is demonstrated on the illustrative examples. [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] References [1] Wu D, Olson DL. Enterprise risk management: coping with model risk in a large bank. Journal of the Operational Research Society 2010;61(2):179–90. [2] Wu D, Olson DL. Enterprise risk management: small business scorecard analysis. Production Planning and Control 2009;120(4):362–9. [3] Wu D, Olson DL. Enterprise risk management: a DEA VaR approach in vendor selection. International Journal of Production Research 2010;48(16):4919–32. [4] Markowitz H. Portfolio selection. Journal of Finance 1952;7:77–91. [5] Wu D, Olson DL. Introduction to the special issue on ‘‘Optimizing risk management: methods and tools’’. Human and Ecological Risk Assessment 2009;15(2):220–6. [6] Wu D, Olson DL. Supply chain risk, simulation, and vendor selection. International Journal of Production Economics 2008;114(2):646–55. [7] Wu D, Xie K, Chen G, Ping G. A risk analysis model in concurrent engineering product development. Risk Analysis, in press, doi:10.1111/j.1539-6924.2010. 01432.x. [8] Sharpe WF, Alexander GJ, Bailey JF. Investments. New Jersey: Prentice-Hall International Inc.; 1999. [9] Mao JCT. Models of capital budgeting, E-V vs. E-S. Journal of Financial and Quantitative Analysis 1970;5:657–75. [10] Simaan Y. Estimation risk in portfolio selection: the mean variance model versus the mean absolute deviation model. Management Science 1997;43(10):1437–46. [11] Fishburn DC. Mean-risk analysis with risk associated with below-target returns. American Economical Review 1977;67:117–26. [12] Konno H, Suzuki K. A mean–variance–skewness optimization model. Journal of the Operations Research of Japan 1995;38:137–87. [13] Lai T. Portfolio selection with skewness: a multiple-objective approach. Review of Quantitative Finance and Accounting 1991;1:293–305. [14] Leung MT, Daouk H, Chen AS. Using investment portfolio return to combine forecasts: a multiobjective approach. European Journal of Operational Research 2001;134:84–102. [15] Liu SC, Wang SY, Qiu WH. A mean–variance–skewness model for portfolio selection with transaction costs. International Journal of Systems Sciences 2003;34(4):255–62. [16] Kennedy MP, Chua LO. Neural networks for nonlinear programming. IEEE Transactions on Circuits and Systems 1988;35(3):554–62. [17] Leoid O, Wolf M. Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. Journal of Empirical Finance 2003;10:603–21. [18] Yu L, Wang S, Lai KK. Neural network-based mean–variance–skewness model for portfolio selection. Computers & Operations Research 2008;35(1):34–46. [19] Kasimbeyli R. A nonlinear cone separation theorem and scalarization in nonconvex vector optimization. SIAM Journal on Optimization 2010;20: 1591–619. [20] Gasimov RN. Characterization of the Benson proper efficiency and scalarization in nonconvex vector optimization. In: Koksalan M, Zionts S, editors. [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] 819 Multiple criteria decision making in the new millennium. Lecture notes in economics and mathematical systems, vol. 507, 2001. p. 189–98. Kasimbeyli RN, Ustun O, Rubinov AM. The modified subgradient algorithm based on feasible values. Optimization 2009;58(5):535–60. Ehrgott M, Klamroth K, Schwehm C. An MCDM approach to portfolio optimization. European Journal of Operational Research 2004;155:752–70. Ehrgott M, Waters C, Kasimbeyli R, Ustun O. Multiobjective programming and multiattribute utility functions in portfolio optimization. INFOR 2009;47:31–42. Makridakis S, Winkler RL. Averages of forecasts. Management Sciences 1983;29:987–96. Hansen BE. Least-squares forecast averaging. Journal of Econometrics 2008;146:342–50. Box G, Jenkins G. Time series analysis: forecasting and control. San Francisco: Holden-Day; 1976. Hanke JE, Reitsch AG. Business forecasting. New Jersey: Prentice-Hall Inc.; 1998. Bates JM, Granger CWJ. The combination of forecasts. Operational Research Quarterly 1969;20:451–68. Newbold P, Granger CW. Experience with forecasting univariate time series and the combination of forecasts. Journal of Royal Statistical Society 1974;137:231–46. Russel TD, Adam Jr. EE. An empirical evaluation of alternative forecasting combinations. Management Sciences 1987;31:1267–76. Clemen RT. Combining forecasts: a review and annotated bibliography. International Journal of Forecasting 1989;5:559–83. Shapiro PJ, Wilk MB, Chen HJ. A comparative study of various tests for normality. Journal of American Statistical Association 1968;63:1343–72. Yu PL. A class of solutions for group decision problems. Management Science 1973;19:936–49. Zeleny M. Compromise programming. In: Cochrane JL, Zeleny M, editors. Multiple criteria decision making. Columbia: University of South Carolina; 1973. p. 262–301. Steuer RE. Multiple criteria optimization: theory, computation and application. New York: Wiley; 1986. Lewandowski A, Wierzbicki AP. Decision support systems using reference point optimization, aspiration based decision support systems. Berlin: Springer; 1989. Gearhart WB. Compromise solutions and estimation of the noninferior set. Journal of Optimization Theory and Applications 1979;28(1):29–47. Kaliszewski I. A modified weighted Tchebycheff metric for multiple objective programming. Computers & Operations Research 1987;14:315–23. Schandl B, Klamroth K, Wiecek MM. Introducing oblique norms into multiple criteria programming. Journal of Global Optimization 2002;23(1):81–97. Wierzbicki AP. The use of reference objectives in multiobjective optimization. In: Fandel G, Gal T, editors. MCDM theory and application, proceedings. Lecture notes in economics and mathematical systems, vol. 177. Hagen: Springer Verlag; 1980. p. 468–86. Wierzbicki AP. Reference point methods in vector optimization and decision support. Interim Report IR-98-017/ April, International Institute for Applied Systems Analysis, Laxenburg, Austria; 1998. Miettinen KM, Makela MM. On scalarizing functions in multiobjective optimization. OR Spectrum 2002;24:193–213. Chang TJ, Meade N, Beasley JE, Sharaiha YM. Heuristics for cardinality constrained portfolio optimisation. Computers & Operations Research 2000;27(13):1271–302. Ozdemir MS, Gasimov RN. The analytic hierarchy process and multiobjective 0–1 faculty course assignment. European Journal of Operational Research 2004;157(2):398–408. Gasimov RN, Sipahioglu A, Sarac T. A multi-objective programming approach to 1.5-dimensional assortment problem. European Journal of Operational Research 2007;179(1):64–79. Ismayilova NA, Sagir M, Gasimov RN. A multiobjective faculty-course-time slot assignment problem with preferences. Mathematical and Computer Modelling 2007;46:1017–29. Bazaraa MS, Sherali HD, Shetty CM. Nonlinear programming: theory and algorithms. New York: John Wiley & Sons Inc.; 1993. Azimov AY, Gasimov RN. Stability and duality of nonconvex problems via augmented Lagrangian. Cybernetics and Systems Analysis 2002;38:412–21. Gasimov RN. Augmented Lagrangian duality and nondifferentiable optimization methods in nonconvex programming. Journal of Global Optimization 2002;24:187–203. Gasimov RN, Rubinov AM. On augmented Lagrangians for optimization problems with a single constraint. Journal of Global Optimization 2004;28(2):153–73. Rubinov AM, Gasimov RN. The nonlinear and augmented Lagrangians for nonconvex optimization problems with a single constraint. Applied and Computational Mathematics 2002;1:142–57. Burachik RS, Gasimov RN, Ismayilova NA, Kaya CY. On a modified subgradient algorithm for dual problems via sharp augmented Lagrangian. Journal of Global Optimization 2006;34:55–78.