Spectral Projected Subgradient With a Momentum Term for the Lagrangean Dual Approach

Marcos Raydan

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/220470614 Spectral projected subgradient with a momentum term for the Lagrangean dual approach Article in Computers & Operations Research · October 2007 DOI: 10.1016/j.cor.2005.11.024 · Source: DBLP CITATIONS READS 10 47 3 authors: Alejandro Crema Milagros. Loreto 21 PUBLICATIONS 185 CITATIONS 5 PUBLICATIONS 16 CITATIONS Central University of Venezuela SEE PROFILE Florida Memorial University SEE PROFILE Marcos Raydan Simon Bolívar University 82 PUBLICATIONS 2,685 CITATIONS SEE PROFILE All content following this page was uploaded by Marcos Raydan on 28 December 2013. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately. Computers & Operations Research 34 (2007) 3174 – 3186 www.elsevier.com/locate/cor Spectral projected subgradient with a momentum term for the Lagrangean dual approach Alejandro Crema∗ , Milagros Loreto1 , Marcos Raydan1 Departamento de Computación, Facultad de Ciencias, Universidad Central de Venezuela, Ap. 47002, Caracas, 1041-A, Venezuela Available online 4 January 2006 Abstract The Lagrangean dual problem, with a non-differentiable convex objective function, is usually solved by using the subgradient method, whose convergence is guaranteed if the optimal value of the dual objective function is known. In practice, this optimal value is approximated by a previously computed bound. In this work, we combine the subgradient method with a different choice of steplength, based on the recently developed spectral projected gradient method, that does not require either exact or approximated estimates of the optimal value. We also add a momentum term to the subgradient direction that accelerates the convergence process towards global solutions. To illustrate the behavior of our new algorithm we solve Lagrangean dual problems associated with integer programming problems. In particular, we present and discuss encouraging numerical results for set covering problems and generalized assignment problems. 䉷 2005 Elsevier Ltd. All rights reserved. MSC: 90C10; 90C25 Keywords: Spectral projected gradient; Subgradient optimization; Set covering problems; Generalized assignment problems 1. Introduction The subgradient method is a scheme for minimizing a non-differentiable convex functions that was originally developed by Shor in the 1970s. The classic reference on this topic is his book [1]. Over the last few decades, many extensions and variations of the subgradient method have also been developed (see e.g., [2,3]). A standard iteration of the subgradient method consists, mainly, in moving in the direction opposite to a subgradient direction at the current iterate, as in the classical gradient method for differentiable functions. An interesting review on this topic can be found in [4]. Recently, the machinery for unconstrained problems has been extended to convex constrained problems. In particular, authors like Alber et al. [5], among others, developed algorithms that can be seen as extensions of the projected subgradient method for convex optimization in a Hilbert space. On the other hand, Birgin et al. [6] developed an algorithm that can be considered as an extension of the classical projected gradient method for the minimization of differentiable functions on convex sets. In that sense, the spectral projected subgradient method, introduced in this work, ∗ Corresponding author. Fax: +58 2 60 52 168. E-mail addresses: acrema@kuaimare.ciens.ucv.ve (A. Crema), mloreto@kuaimare.ciens.ucv.ve (M. Loreto), mraydan@kuaimare.ciens.ucv.ve (M. Raydan). 1 Supported by the Center of Scientific Computing at UCV. 0305-0548/$ - see front matter 䉷 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.cor.2005.11.024 A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186 3175 can be considered as an extension of both methods. An additional feature of our proposal is the use of a momentum term. The momentum term incorporates, in the present iteration, some influence of the past iterations, and this can help to accelerate the convergence. Some references on this issue are [7–10]. In general, the momentum term is frequently used in neural networks training, and also for backpropagation algorithms. In this work, we consider the spectral projected subgradient method for the minimization of non-differentiable functions on convex sets, specifically solving the Lagrangean dual problem that appears in integer programming problems. The dual problem consists in minimizing, subject to simple constraints, a convex piecewise linear function. In this case, solving the Lagrangean dual problem, by using the subgradient minimization method [3], usually requires the value of the objective function at the optimal solution or an approximation of it. An important feature of the spectral projected subgradient method is that, for solving the same problem, it does not need either the optimal value of the objective function or an approximation of it. The rest of this paper is organized as follows: in Section 2 we present the basic concepts related to integer programming, duality, and the algorithm based on subgradient directions. In Section 3 we present our spectral projected subgradient algorithm, and discuss its relevant properties. In Section 4, we present numerical results comparing the performance of the new algorithm with the classic one for set covering problems and generalized assignment problems. Finally, in Section 5 we present our concluding remarks. 2. Integer programming and duality Let us consider the following integer programming problem (P): max s.t. cT x Ax b, Dx e, x ∈ Zn , x 0, where Z represents the integer numbers, c, b and e are vectors and A and D are matrices of suitable dimensions. The following problem is the Lagrangean relaxation of P, relative to the unstructured or hard constraints block Ax b, and that will be denoted by P : max cT x + T (b − Ax) s.t. x ∈ X, where X={x ∈ Zn , x 0 : Dx e} and 0 is a vector of suitable dimension. For us, Dx e represents the structured or easy constrains block. We are mainly concerned with the Lagrangean dual formulation of P, that will be referred as problem (D), and is given by min s.t. Z() 0, where Z() = max{cT x + T (b − Ax), x ∈ X}. One of the most common scheme for solving problem P is based on the branch-and-bound strategy. In this case, the important ingredients are branching, relaxation, and fathoming [11]. In the relaxation process a new problem is obtained whose optimal value is a bound for the optimal value of problem P, which is the main motivation for solving the Lagrangean dual problem D. This problem can be solved using the subgradient optimization algorithm, that needs in theory the optimal value of problem D (see e.g. [2,3]). Of course, in practice, this optimal value is approximated by a previously computed bound. We now present a brief review of the ideas associated with duality for solving integer programming problems. In our presentation we follow [2,12,3]. It is well-known that if ZD is the optimal value of D then ZD ZI P , where ZI P is the optimal value of problem P. The Lagrangean dual approach can be used for obtaining good bounds of ZI P . 3176 A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186 In this work we assume that X is bounded and hence finite. Let X={x 1 , x 2 , . . . , x m } and Z()=max{i=1,2,...,m} (cT x i + (b − Ax i )). If we denote fi = (b − Ax i ) and hi = cT x i then T Z() = max {i=1,2,...,m} hi + fiT . Therefore, Z() is a piece-wise linear convex function. If Z() were differentiable then the gradient method could be used for solving P . Unfortunately, Z() is not differentiable everywhere. The so-called subgradient optimization algorithm (S), that will be described below, can be used for solving P and converges under the following assumptions: the exact value ZD is known and the norms of the obtained subgradients are uniformly bounded. The value of ZD is used to compute the steplength at every iteration. In general, ZD is not known, and only approximate values are available. Starting from a given 0 0, we now describe iteration kth of the subgradient optimization algorithm that will be extended and combined later with some new ideas. Subgradient optimization algorithm (S): 1. Choose k ∈ *Z(k ), the set of subgradient vectors of Z(k ) • If k = 0 stop, k is the optimal solution • Else k ) , 0 < 1 ˆk 2 − 2 , 2 > 0 2. Set k = ˆk Z̃−Z( k 3. Set (k+1)j = max{kj − k kj , 0}, ∀j Remarks. 1. Z̃ is an estimate of ZD . In practice, different stopping criteria can be used. For example, the process can be stopped when Z̃ − Z(k ) tol, where tol has been pre-established, or when a maximum number of iterations has been reached [13]. 2. Note that neither the chosen subgradient directions nor the steplength used guarantee that Z(k+1 ) < Z(k ), and so this is not a descent algorithm. 3. Different heuristics can be used for obtaining ˆ k . The convergence theorem that supports the subgradient optimization algorithm (see [14]) states that if Z is a convex function, an optimal solution ∗ exists, and the subgradients are uniformly bounded, then limk→∞ Z(k ) = Z(∗ ), and any limit point of {k } is an optimal solution. However, the convergence of the method heavily depends on the quality of the estimate Z̃. In our next section, we combine the subgradient Lagrangean scheme with a different choice of steplength, based on the recently developed spectral projected gradient method [6,15], that does not require either exact or approximated estimates of the optimal value. We also add a momentum term [8] to the subgradient direction that in some difficult cases accelerates the convergence process of the algorithm. 3. Spectral projected subgradient algorithm Our main optimization problem can also be written as min s.t. Z() ∈ , where we recall that Z() = max{cT x + T (b − Ax), x ∈ X} is convex, piece-wise linear, and non-differentiable at some points, = { : 0} is a non-empty, closed, and convex subset of Rn , and X = {x ∈ Zn , x 0 : Dx d}, where b, c and d are vectors, and A and D are matrices of suitable dimensions. In this work, we propose an extension of the spectral projected gradient (SPG) method [6,15], that can also be viewed as an extension of the subgradient optimization algorithm, that will be called the spectral projected subgradient (SPS) method. A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186 3177 The SPG method is related to the practical version of Bertsekas [16] of the classical gradient projected method of Goldstein [17] and Levitin [18]. However, some crucial differences make this method much more efficient than its gradient projection predecessors. The key issue is that the first trial step at each iteration is taken using the spectral steplength introduced in [19] and later analyzed in [20–22] among others. The spectral step is a Rayleigh quotient related with an average Hessian matrix. For a review containing the more recent advances on this special choice of steplength see [23]. Therefore, it is natural to transport the projected spectral gradient idea with a non-monotone line search to the projected subgradient, in order to speed up the convergence of the subgradient method and to use the steplength that does not depend on the optimal value of the objective function. A typical iteration, for obtaining xk+1 , in the SPG method can be described as follows: 1. Set dk = P (xk − k g(xk )) − xk , where P is the projection onto . 2. Set xk+1 = xk + k dk , where the vector gk is the gradient (or subgradient) of Z evaluated at xk , the steplength k is obtained using the spectral choice, and k is obtained by means of a non-monotone line search technique along the direction dk . For our nonmonotone globalization technique, we combine and extend the Grippo et al. [24] line search scheme with the recently proposed globalization scheme of La Cruz et al. [25]. Roughly speaking our acceptance condition for the next iterate is Z(xk+1 ) max 0 j min{k,M−1} Z(xk−j ) + dk , g(xk ) + where is a small positive number, and 0< k < ∞. k k, is chosen such that (1) k The terms max0 j M−1 Z(xk−j ) and k > 0 are responsible for the sufficiently non-monotone behavior of Z(xk ). The spectral choice of steplength has proved effective when combined with the negative gradient direction [26], that has been historically associated with the steepest descent method, which is famous for being extremely slow for ill-conditioned problems. Another proposed remedy for the slowness of the steepest descent method was given by Nowland et al. [8], who add a momentum term to the gradient direction that has interesting practical properties. In this work, we combine both strategies (spectral steplength and momentum term) successfully for the Lagrangean dual problem. The typical iteration of the idea proposed in [8] can be described as follows: xk+1 = xk + △gk , △gk = −k gk + △gk−1 , where ∈ [0, 1] is the momentum parameter, and x0 and △g0 are given. This idea can be adapted for obtaining the direction dk in the SPS method, as follows: dk = P (xk − mk ) − xk , where mk = k gk + mk−1 , If m0 = 0. = 0 then dk = P (xk − k g(xk )) − xk , and the original SPS scheme is recovered. Now, the extension method applied to the problem P , using subgradients, is given by dk = P (k − k gk ) − k , k+1 = k + k dk , 3178 A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186 where gk is the subgradient vector of Z at k and the spectral steplength k is given by k = t s sk−1 k−1 t y sk−1 k−1 , where sk−1 = k − k−1 and yk−1 = gk − gk−1 . At the first iteration k = 1. The search direction dk , when we are adding a momentum term is given by dk = P (k − mk ) − k and mk = k gk + mk−1 , m0 = 0. The idea consists in incorporating in the present iteration some influence of the past iterations. This effectively adds inertia to the motion of a massive particle through a weighted space, and avoids the typical zig-zagging behavior of the steepest descent method. Indeed, it introduces a damping effect on the iterations by averaging gradient components with opposite signs in flat areas. In some cases it avoids the iterative process from stagnating around a local minimum, helping it to skip over these flat areas without performing too many unnecessary iterations. For more details on these properties of the momentum term, see [7]. In practice, the momentum term can be chosen in two different ways: using a learning rule or fixed by the user as recommended by Tseng [10]. In this work we used a fixed momentum term. Combining all the ingredients described above, we obtain the following algorithm. Spectral Projected Subgradient (SPS): Given 0 ∈ , integers M 1 and , g0 a subgradient of Z at 0 , ∈ (0, 1), a parameter MAXITER of maximum number of iterations allowed, 0 > 0, m0 = 0, and ∈ [0, 1], 0 = max(Z(0 ), g(0 )). For k = 0, . . . , MAXITER • mk = k gk + mk−1 • dk = P (k − mk ) − k • =1 • + = k + dk • k = 0 /k (1.1) While Z(+ ) > max0 j min{k,M−1} Z(k−j ) + dk , g(k ) + Compute new = new + = k + dk End while • k+1 = + • sk = k+1 − k • yk = g(k+1 ) − g(k ) s t sk • k+1 = kt s k yk End for k Remarks. 1. The parameter new could be computed in many different ways. In our implementation we choose new = /2. 2. The parameter k = 0 /k (1.1) guarantees that (1) is satisfied. If we choose k = 0 /k r where r > 1, then (1) will also hold. Our feasible choice r = 1.1 is suitable for the sufficiently non-monotone desired behavior of the method. 3. As in the S algorithm, at each iteration only a subgradient g(k ) = gk is computed. This calculation accounts for the computational cost per iteration. 4. The SPS algorithm is stopped when MAXITER iterations are reached. Unfortunately, at this point we have no theoretical criterion to stop the process. 4. Numerical results We compare the performance of the SPS algorithm with the classical S algorithm on two different type of test problems: set covering problems and generalized assignment problems. A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186 3179 Table 1 Characteristics of set covering problems Problem Rows (m) Columns (n) Density (%) 4 5 6 A B C D E 200 200 200 300 300 400 400 50 1000 2000 1000 3000 3000 4000 4000 500 2 2 5 2 5 2 5 20 All the experiments were run in Matlab 5.3 on a Pentium III, 550 MHz clock, 320 MBytes of RAM memory and Windows XP. 4.1. Set covering problems A set covering problem is the problem of covering the rows of an m-row, n-column, zero-one matrix (aij ) by a subset of the columns at minimum cost. These problems can be formulated as follows: min n cj x j s.t. n aij xj 1, j =1 i = 1, . . . , m, j =1 xj ∈ {0, 1} with cj > 0, aij ∈ {0, 1}, ∀i, j . If we define cj = −cj and aij = −aij , then the problem P can be written as ⎞ ⎛ n m n aij xj ⎠ i ⎝−1 − cj xj + max j =1 s.t. j =1 i=1 xj ∈ {0, 1}. An optimal solution of P is given by ⎧ if cj − m ⎪ i=1 i aij > 0, ⎨1 xj = {0, 1} if cj − m i=1 i aij = 0, ⎪ ⎩ 0 if cj − m i=1 i aij < 0. (2) Let x ∗ be a solution of P then the subgradient is given by i = −1 − n j =1 aij xj∗ , i = 1, . . . , m. In Table 1 we describe the set of covering problems considered in our experiments. All of them were obtained from Beasley’s OR library [27]. Beasley [28] also used this set of problems in his work. The parameter density represents the percentage of non-zero elements. As we mentioned before there are different heuristics for computing ˆ . We will follow the recipe given by Caprara et al. [29], which consists in comparing the best and the worst bound for the optimal value in the last p iterations: 1. Set (at the beginning) ˆ = 0.1. 3180 A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186 Table 2 Results for set covering problems (Part I) Problems 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 6.1 6.2 6.3 6.4 6.5 SPS S SPS∗ itbest Dev(%) S∗ ZD itbest Dev(%) −428.9966 −512.0000 −516.0000 −493.8561 −512.0000 −557.1859 −429.9992 −488.0351 −637.8956 −513.4759 −251.0872 −299.3289 −225.9761 −240.4017 −210.9999 −212.4880 −291.6719 −286.4383 −278.9931 −264.9773 −133.0122 −140.2567 −139.6418 −128.8822 −152.6906 160 463 427 448 275 499 329 498 500 370 496 499 264 500 241 408 498 498 488 85 500 500 500 500 485 0.0008 0 0 0.0291 0 0.0115 0.0002 0.1279 0.0164 0.0928 0.0549 0.1442 0.0106 0.0409 0 0.2302 0.0363 0.1957 0.0025 0.0086 0.0957 0.1423 0.3512 0.0913 0.4319 −423.3816 −508.8298 −514.9335 −488.6496 −509.7710 −555.5556 −423.3945 −485.8885 −630.8328 −509.8203 −248.6560 −293.1867 −225.4144 −238.9139 −209.8727 −211.5217 −288.4858 −283.0592 −273.9048 −264.3398 −125.2575 −135.9985 −134.7918 −123.4389 −146.2715 219 220 239 195 220 220 199 200 197 220 199 175 219 199 219 200 200 198 197 240 179 201 181 200 179 1.3097 0.6192 0.2067 1.0831 0.4354 0.3041 1.5362 0.5672 1.1234 0.6198 1.0226 2.1932 0.2591 0.6595 0.5343 0.2256 1.1283 1.3731 1.8262 0.2491 5.9202 3.1739 3.8122 4.3109 4.6177 −429 −512 −516 −494 −512 −557.2500 −430 −488.6666 −638 −513 −251.2250 −299.7611 −226 −240.5000 −211 −212 −291.7778 −287 −279 −265 −133.1396 −140.4565 −140.1340 −129 −153.3529 2. Every p iterations: • If |ZDworst −ZDbest | tol ZDbest then ˆ = ˆ . • else ˆ = ˆ. In practice we use p = 20, = 0.5, = 1.5 and tol = 0.01. Concerning the initial guess, we follow the ideas developed in [29], that has proved to be effective for set covering problems: Set i = minj ∈Ji cj /|Ij | where the set of indices are defined as Ji = {j : aij = 1} and Ij = {i : aij = 1}. Using the integrability property, ZD is the solution for the associated linear problem, which is the bound used by the S algorithm. The parameters for the SPS method were fixed as follows: M = 10, and = 0.7 for all tests, which are standard values recommended in the literature [26]. For this set of problems, we report itbest, the iteration where the best value is obtained before reaching 500 (MAXITER) iterations. The best value obtained by the SPS algorithm is reported under the column (SPS∗ ), and the best value obtained by the S algorithm is reported under the column (S ∗ ). We also report the percentage deviation (Dev(%)) from the value ZD for both methods. Tables 2 and 3 show the results for problems obtained from Beasley’s OR library, and Table 4 shows the percentage-deviation average for each type of problems. Comparing the results obtained for the SPS and the S methods in Tables 2–4 it is clear that the SPS algorithm outperforms algorithm S in almost all cases. Nevertheless, for problems E.1–E.5 algorithm S achieves a much better solution than the SPS algorithm. Notice also that using the exact optimal value of ZD , the S method failed to achieve the optimal solution in most cases. We now select some set covering problems to perform a different experiment. In this experiment we would like to analyze the CPU time required by both algorithms to achieve a pre-established accuracy. For that, we first run each chosen experiment with the S algorithm, until 15 000 iterations (MAXITER) are reached or when the following A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186 3181 Table 3 Results for set covering problems (Part II) Problems A.1 A.2 A.3 A.4 A.5 B.1 B.2 B.3 B.4 B.5 C.1 C.2 C.3 C.4 C.5 D.1 D.2 D.3 D.4 D.5 E.1 E.2 E.3 E.4 E.5 SPS S SPS∗ itbest Dev(%) S∗ ZD itbest Dev(%) −245.7767 −246.8282 −227.6905 −230.7771 −234.0998 −64.2460 −68.9861 −73.9028 −70.9567 −67.4474 −223.4008 −212.1952 −233.8021 −213.1523 −211.1172 −55.0057 −59.0354 −64.6676 −55.5219 −58.3457 −2.7278 −2.7528 −2.5740 −2.7278 −2.6083 499 500 496 500 500 500 496 499 500 496 500 496 500 500 493 498 498 493 500 497 21 10 48 21 114 0.4295 0.2700 0.1357 0.2678 0.3359 0.4582 0.4557 0.3431 0.3641 0.3232 0.1788 0.3065 0.3328 0.3255 0.2454 0.5480 0.5224 0.6132 0.5723 0.4603 21.0249 18.6068 21.9740 21.0249 23.0771 −239.4092 −240.0630 −223.1457 −227.0280 −230.2944 −61.8867 −64.7946 −70.8196 −66.5235 −64.2238 −215.5645 −206.5421 −225.5729 −206.5712 −206.3287 −50.7707 −55.4453 −60.1095 −52.3733 −54.9932 −3.3021 −3.2361 −3.1846 −3.3021 −3.2402 177 200 200 199 200 199 178 197 178 178 200 200 199 195 197 180 180 180 179 198 139 135 134 139 128 3.0091 3.0034 2.1291 1.8880 1.9560 4.1136 6.5039 4.5007 6.5891 5.0872 3.6803 2.9624 3.8409 3.4029 2.5080 8.2050 6.5719 7.6185 6.2108 6.1798 4.3978 4.3168 3.4648 4.3978 4.4414 −246.8368 −247.4964 −228 −231.3968 −234.8889 −64.5417 −69.3019 −74.1572 −71.2160 −67.6661 −223.8010 −212.8475 −234.5829 −213.8483 −211.6365 −55.3088 −59.3454 −65.0666 −55.8415 −58.6155 −3.4540 −3.3821 −3.2989 −3.4540 −3.3908 Table 4 Percentage-deviation average for each type of set covering problems Type SPS S 4 5 6 A B C D E 0.0279 0.0724 0.2225 0.2878 0.3888 0.2778 0.5432 21.1415 0.7805 0.9471 4.3670 2.3971 5.3589 3.2789 6.9572 4.2037 condition is satisfied: |Z(k ) − ZD | 10−2 . |ZD | After that, we run the same experiment with the SPS algorithm until Z(k )S ∗ , where S ∗ is the best value obtained by the S algorithm or until we reach MAXITER iterations. In this experiment, the momentum term is fixed as = 0.7. For the S algorithm, using the recipe given by Caprara et al. [29], the required accuracy is not achieved in most cases. Therefore, we do not use it. Instead, we fix ˆk = 0.1 for all k, and the accuracy was achieved in most experiments. It is worth mentioning that the Caprara’s choice, used in Tables 2–4, was better than choosing a fixed value of ˆk ∈ (0, 2) to approximate ZD in the first 500 iterations. 3182 A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186 Table 5 Set covering problems (CPU time) Problems 4.1 5.3 4.4 5.1 6.1 A.1 B.1 C.1 D.1 E.1 SPS S SPS∗ Time iter S∗ ZD Time iter −424.9327 −223.8241 −489.3107 −248.8899 −131.9544 −244.3807 −63.8405 −221.5730 −54.4323 −3.3534 8.4 21.3 9.2 22.5 16.5 72.4 125.4 112.0 401.7 11 576.6 36 36 38 39 54 53 61 38 53 15 000 −424.7109 −223.7406 −489.0658 −248.7176 −131.8085 −244.3693 −63.7956 −221.5639 −54.3565 −3.4195 1025.7 795.8 589.8 1113.6 4615.1 7265.0 25 488.9 18 648.7 48 450.2 783.7 3748 1437 2235 2032 11 927 5631 15 000 7893 15 000 7979 −429 −226 −494 −251.2250 −133.1396 −246.8368 −64.5417 −223.8010 −55.3088 −3.4540 In Table 5 we show the results of this experiments. We report the labels of the chosen problems, the CPU time in seconds (Time), the number of iterations required (iter), and the best obtained values under the columns S ∗ and SPS∗ , respectively. From Table 5, it is clear that the SPS algorithm achieves a similar accuracy with significantly less iterations than the S algorithm except for the E.1 problem. Concerning the CPU time, it is also clear that an iteration for the SPS algorithm involves the same amount of computational effort as an iteration of the S algorithm. Indeed, the quotient between the required CPU time and the number of iterations, in both algorithms, mainly coincides for all experiments. 4.2. The generalized assignment problem (GAP) In this section, following [30], we report some numerical results with a certain type of test problem: the dual of generalized assignment problems [31]. A generalized assignment problem consists in assigning m jobs to n machines (mn). If job i is performed at machine j, it costs cij and requires pij time units. Given the total available time Tj at machine j, we want to find the minimum cost assignment of the jobs to the machines. Formally the problem is: min s.t. m n cij yij i=1 j =1 m pij yij Tj , i=1 n yij = 1, j = 1, . . . , n, i = 1, . . . , m, j =1 yij ∈ {0, 1} ∀i, j . where yij is the assignment variable, which is equal to 1 if the ith job is assigned to the jth machine and is equal to ij = −pij , and relaxing the time constraints for the machines, we obtain the 0 otherwise. Defining cij = −cij and p Lagrangean relaxation problem P : max s.t. n m j =1 i=1 n ij )yij + ( cij + j p yij = 1, j =1 i = 1, . . . , m, j =1 yij ∈ {0, 1} n ∀i, j . j Tj A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186 3183 Table 6 Characteristics of generalized assignment problems Problems No. of Prob Rows (m) Columns (n) Optimal gapA 6 5 5 10 10 20 20 100 200 100 200 100 200 −1698 −3235 −1360 −2623 −1158 −2339 −1697.7 −3234.7 −1358.6 −2623.0 −1157.1 −2337.3 gapB 6 5 5 10 10 20 20 100 200 100 200 100 200 −1843 −3553 −1407 −2831 −1166 −2340 −1831.3 −3547.4 −1400.7 −2815.1 −1155.2 −2331.1 gapC 6 5 5 10 10 20 20 100 200 100 200 100 200 −1931 −3458 −1403 −2814 −1244 −2397 −1924.0 −3450.8 −1387.0 −2795.4 −1219.0 −2376.9 gapD 6 5 5 10 10 20 20 100 200 100 200 100 200 −6373 −12 796 −6379 −12 601 −6269 −12 452 −6345.4 −12 736.0 −6323.5 −12 418.0 −6142.5 −12 218.0 ZD The assignment problem P can be solved as follows. For each j choose ij∗ such that ij }. cij + j p c ij∗ j = max { ij∗ j + j p 1i m Let y ∗ ≡ yij∗ be an optimal solution of P (i.e., yij∗ = 1 if i = ij∗ and yij∗ = 0 otherwise), then the subgradient of P is given by j = m i=1 ij yij∗ + Tj , p j = 1, . . . , n. In Table 6 we describe the set of generalized assignment problems considered in our experiments. All of them were obtained from Beasley’s OR library [27]. Chu and Beasley [32] also used this set of problems in their work. Using the integrability property, a bound ZD for the optimal value is obtained by solving the linear problem associated to P. The problems were solved with the methods S and SPS, using as initial guess 0 =(5, 5, . . . , 5)t and MAXITER=250. The momentum term was fixed as = 0.3 for all experiments. For computing ˆk , we will follow the recipe given by Caprara et al. [29]. In Table 7, we report itbest, the iteration where the best value is obtained before reaching MAXITER iterations. The best value obtained by the SPS algorithm is reported under the column (SPS∗ ), and the best value obtained by the S algorithm is reported under the column (S ∗ ). We also report the percentage deviation (Dev(%)) from the value ZD for both methods. Table 8 shows the percentage-deviation average for each type of problems. From Tables 7 and 8 we observe once again that the SPS method outperforms the S method, using the bound ZD , in 2 out of every 3 experiments. Moreover, we can also observe in general that the SPS method produces a more accurate solution than the S method. We select some generalized assignment problems to perform a different experiment similar to the second set of experiments for set covering problems. In this experiment we would also like to analyze the CPU time required by both algorithms to achieve a pre-established accuracy. Once again, we use the recipe given by Caprara et al. [29] to 3184 A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186 Table 7 Results for generalized assignment problems Problems SPS S SPS∗ itbest Dev(%) S∗ itbest Dev(%) gapa5100 gapa5200 gapa10100 gapa10200 gapa20100 gapa20200 −1697.7 −3234.7 −1358.6 −2622.9 −1157.0 −2337.3 109 11 211 59 71 67 0 0 0 0.0038 0.0086 0 −1697.6 −3234.7 −1358.4 −2623.0 −1156.9 −2337.3 249 158 250 249 250 193 0.0059 0 0.0147 0 0.0173 0 gapb5100 gapb5200 gapb10100 gapb10200 gapb20100 gapb20200 −1831.3 −3547.4 −1400.5 −2815.0 −1155.1 −2331.1 27 54 93 198 88 158 0 0 0.0143 0.0036 0.0087 0 −1831.2 −3547.4 −1400.5 −2814.9 −1155.0 −2330.9 180 157 160 160 159 173 0.0055 0 0.0143 0.0071 0.0173 0.0086 gapc5100 gapc5200 gapc10100 gapc10200 gapc20100 gapc20200 −1923.9 −3450.8 −1387.0 −2795.4 −1217.9 −2375.9 179 221 167 52 92 64 0.0052 0 0 0 0.0902 0.0421 −1923.9 −3450.7 −1386.6 −2795.4 −1218.3 −2376.0 110 158 138 139 159 129 0.0052 0.0029 0.0288 0 0.0574 0.0379 gapd5100 gapd5200 gapd10100 gapd10200 gapd20100 gapd20200 −6345.3 −12 736.0 −6323.4 −12 418.0 −6142.1 −12 218.0 243 119 81 181 85 119 0.0016 0 0.0016 0 0.0065 0 −6345.4 −12 736.0 −6323.4 −12 418.0 −6142.2 −12 217.0 159 91 140 121 153 119 0 0 0.0016 0 0.0049 0.0082 Table 8 Percentage-deviation average for each type of generalized assignment problems Type SPS S gapa gapb gapc gapd 0.0021 0.0044 0.0229 0.0016 0.0063 0.0088 0.0220 0.0024 compute ˆk , and the required accuracy was achieved in all experiments. We first run each chosen experiment with the S algorithm, until 1000 iterations (MAXITER) are reached or when the following condition is satisfied: |Z(k ) − ZD | tol, |ZD | where tol = 10−3 . After that, we run the same experiment with the SPS algorithm until Z(k )S ∗ , where S ∗ is the best value obtained by the S algorithm or until we reach MAXITER iterations. In this experiment, the momentum term is fixed as = 0.3. In Table 9, we show the results of this experiments. We report CPU time in seconds (Time), number of iterations (iter), and the best obtained values SPS∗ and S ∗ . From Table 9, it is clear that the SPS algorithm achieves a similar accuracy with significantly less iterations than the S algorithm in all case. Concerning the CPU time, it is also clear that an iteration for the SPS algorithm involves the same amount of computational effort as an iteration of the S algorithm. Once again, the quotient between the required CPU time and the number of iterations, in both algorithms, mainly coincides for all experiments. A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186 3185 Table 9 Generalized assignment problems (CPU time) Problems gapb10100 gapb20100 gapc10100 gapc20100 gapd10100 gapd10200 gapd20100 gapd20200 SPS S SPS∗ iter Time S∗ iter Time −1399.4 −1154.2706 −1386.0893 −1217.8388 −6319.7904 −12 413.0860 −6138.6518 −12 208.8468 16 34 25 59 29 33 47 34 0.15 0.45 0.27 0.81 0.29 0.64 0.66 0.85 −1399.3 −1154.1583 −1385.7711 −1217.8383 −6317.5195 −12 407.6925 −6136.4702 −12 206.9791 91 99 92 128 54 52 64 58 0.95 1.36 0.94 1.74 0.54 0.97 0.87 1.46 5. Final remarks We have presented a new scheme (SPS) for solving Lagrangean dual problems associated with integer programming problems that combines the subgradient method with the spectral choice of steplength and a momentum term. To illustrate the wide range of possible applications of the SPS algorithm, we have reported numerical results on two different type of problems: set covering problems and generalized assignment problems. Based on the computational experiments carried out, the SPS algorithm has the tendency of reaching a closer approximation of the optimal solution with fewer iterations than the S algorithm, without requiring any additional (and hypothetical) information, as the optimal value ZD of the objective function at the solution. The SPS method seems to be a good choice for large-scale problems, since it requires very low memory requirements and low computational cost per iteration. Moreover, it seems to be suitable for general non-differentiable problems, since it only requires a subgradient direction per iteration. This possible extension is an interesting topic that deserves further investigation. Finally, we would also like to study the convergence properties of the SPS method. This is an open issue that deserves special attention in the near future. Acknowledgements We are indebted to two anonymous referees whose comments helped us to improve the quality of this paper. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] Shor NZ. Minimization methods for non-differentiable functions. Berlin: Springer Series in Computational Mathematics, Springer; 1985 Bertsimas D, Tsitsiklis JN. Introduction to linear optimization. Belmont, MA: Athena Scientific; 1997. Held M, Wolfe P, Crowder H. Validation of subgradient optimization. Mathematical Programming 1974;6:62–88. Balinski ML, Wolfe P, (editors.). Nondifferentiable optimization. Mathematical programming study, vol. 3. Amsterdam: North-Holland; 1975. Alber YI, Iusem AN, Solodov MV. On the projected subgradient method for nonsmooth convex optimization in a Hilbert space. Mathematical Programming 1998;81:23–35. Birgin EG, Martinez JM, Raydan M. Nonmonotone spectral projected gradient methods on convex set. SIAM Journal on Optimization 2000;10:1196–211. Bishop CM. Neural networks for pattern recognition. New York: Oxford; 1997. Plaut D, Nowlan S, Hinton GE. Experiments on learning by back propagation. Technical Report CMU-CS-86-126, Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA; 1986. Solodov MV, Zavriev SK. Error stability properties of generalized gradient-type algorithms. Journal of Optimization Theory and Applications 1998;98:663–80. Tseng P. An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization 1998;8:506–31. Geoffrion AM, Marsten RE. Integer programming algorithms: a framework and state-of-the-art survey. Management Science 1972;18(9): 465–89. 3186 [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186 Geoffrion AM. Lagrangean relaxation for integer programing. Mathematical Programming Study 1974;2:82–114. Goffin JL. On convergence rates of subgradient optimization methods. Mathematical Programming 1977;13:329–47. Shapiro JF. Programing structures and algorithms. New York: Wiley; 1979. Birgin EG, Martinez JM, Raydan M. Algorithm 813: SPG-software for convex-constrained optimization. ACM Transactions on Mathematical Software 2001;27:340–9. Bertsekas DP. On the Goldstein–Levitin–Polyak gradient projection method. IEEE Transactions on Automatic Control 1976;21:174–84. Goldstein AA. Convex programming in Hilbert space. Bulletin of American Mathematical Society 1964;70:709–10. Levitin ES, Polyak BT. Constrained minimization problems. USSR Computational Mathematics and Mathematical Physics 1966;6:1–50. Barzilai J, Borwein JM. Two point step size gradient methods. IMA Journal of Numerical Analysis 1988;8:141–8. Dai YH, Liao LZ. R-linear convergence of the Barzilai–Borwein gradient method. IMA Journal of Numerical Analysis 2002;22:1–10. Fletcher R. Low storage methods for unconstrained optimization. Lectures in Applied Mathematics (AMS) 1990;26:165–79. Raydan M. On the Barzilai and Borwein choice of steplength for the gradient method. IMA Journal of Numerical Analysis 1993;13:321–6. Fletcher R. On the Barzilai–Borwein method. Technical Report NA/207, Department of Mathematics, University of Dundee, Dundee, Scotland; 2001. Grippo L, Lampariello F, Lucidi S. A nonmonotone line search technique for Newton’s method. SIAM Journal on Numerical Analysis 1986;23:707–16. La Cruz W, Martinez JM, Raydan M. Spectral residual method without gradient information for solving large-scale nonlinear systems, Mathematics of Computation, 2005, to appear. Raydan M. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM Journal on Optimization 1997;7:26–33. Beasley JE. Or-library: Distributing test problems by electronic mail. Journal of Operational Research Society 1990;41:1069–72. Beasley JE. An algorithm for set covering problems. European Journal of Operational Research 1987;31:85–93. Caprara A, Fischetti M, Toth P. A heuristic method for the set covering problem. Operations Research 1999;47:730–43. Nedić A, Bertsekas DP. Incremental subgradient methods for nondifferentiable optimization. SIAM Journal on Optimization 2001;12:109–38. Ross GT, Soland RM. Modeling facility location problems as generalized assignment problems. Management Science 1977;24:345–57. Chu PC, Beasley JE. A genetic algorithm for the generalised assignment problem. Computers & Operations Research 1997;24:17–23. View publication stats

RELATED PAPERS

RELATED TOPICS

Log In

Spectral Projected Subgradient With a Momentum Term for the Lagrangean Dual Approach

Spectral Projected Subgradient With a Momentum Term for the Lagrangean Dual Approach

Related Papers

RELATED PAPERS

RELATED TOPICS