See discussions, stats, and author profiles for this publication at:
https://www.researchgate.net/publication/220470614
Spectral projected subgradient with a
momentum term for the Lagrangean dual
approach
Article in Computers & Operations Research · October 2007
DOI: 10.1016/j.cor.2005.11.024 · Source: DBLP
CITATIONS
READS
10
47
3 authors:
Alejandro Crema
Milagros. Loreto
21 PUBLICATIONS 185 CITATIONS
5 PUBLICATIONS 16 CITATIONS
Central University of Venezuela
SEE PROFILE
Florida Memorial University
SEE PROFILE
Marcos Raydan
Simon Bolívar University
82 PUBLICATIONS 2,685 CITATIONS
SEE PROFILE
All content following this page was uploaded by Marcos Raydan on 28 December 2013.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Computers & Operations Research 34 (2007) 3174 – 3186
www.elsevier.com/locate/cor
Spectral projected subgradient with a momentum term
for the Lagrangean dual approach
Alejandro Crema∗ , Milagros Loreto1 , Marcos Raydan1
Departamento de Computación, Facultad de Ciencias, Universidad Central de Venezuela, Ap. 47002, Caracas, 1041-A, Venezuela
Available online 4 January 2006
Abstract
The Lagrangean dual problem, with a non-differentiable convex objective function, is usually solved by using the subgradient
method, whose convergence is guaranteed if the optimal value of the dual objective function is known. In practice, this optimal
value is approximated by a previously computed bound. In this work, we combine the subgradient method with a different choice of
steplength, based on the recently developed spectral projected gradient method, that does not require either exact or approximated
estimates of the optimal value. We also add a momentum term to the subgradient direction that accelerates the convergence process
towards global solutions. To illustrate the behavior of our new algorithm we solve Lagrangean dual problems associated with
integer programming problems. In particular, we present and discuss encouraging numerical results for set covering problems and
generalized assignment problems.
䉷 2005 Elsevier Ltd. All rights reserved.
MSC: 90C10; 90C25
Keywords: Spectral projected gradient; Subgradient optimization; Set covering problems; Generalized assignment problems
1. Introduction
The subgradient method is a scheme for minimizing a non-differentiable convex functions that was originally
developed by Shor in the 1970s. The classic reference on this topic is his book [1]. Over the last few decades, many
extensions and variations of the subgradient method have also been developed (see e.g., [2,3]). A standard iteration of the
subgradient method consists, mainly, in moving in the direction opposite to a subgradient direction at the current iterate,
as in the classical gradient method for differentiable functions. An interesting review on this topic can be found in [4].
Recently, the machinery for unconstrained problems has been extended to convex constrained problems. In particular,
authors like Alber et al. [5], among others, developed algorithms that can be seen as extensions of the projected
subgradient method for convex optimization in a Hilbert space. On the other hand, Birgin et al. [6] developed an
algorithm that can be considered as an extension of the classical projected gradient method for the minimization of
differentiable functions on convex sets. In that sense, the spectral projected subgradient method, introduced in this work,
∗ Corresponding author. Fax: +58 2 60 52 168.
E-mail addresses: acrema@kuaimare.ciens.ucv.ve (A. Crema), mloreto@kuaimare.ciens.ucv.ve (M. Loreto), mraydan@kuaimare.ciens.ucv.ve
(M. Raydan).
1 Supported by the Center of Scientific Computing at UCV.
0305-0548/$ - see front matter 䉷 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cor.2005.11.024
A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186
3175
can be considered as an extension of both methods. An additional feature of our proposal is the use of a momentum
term. The momentum term incorporates, in the present iteration, some influence of the past iterations, and this can help
to accelerate the convergence. Some references on this issue are [7–10]. In general, the momentum term is frequently
used in neural networks training, and also for backpropagation algorithms.
In this work, we consider the spectral projected subgradient method for the minimization of non-differentiable
functions on convex sets, specifically solving the Lagrangean dual problem that appears in integer programming
problems. The dual problem consists in minimizing, subject to simple constraints, a convex piecewise linear function.
In this case, solving the Lagrangean dual problem, by using the subgradient minimization method [3], usually requires
the value of the objective function at the optimal solution or an approximation of it. An important feature of the spectral
projected subgradient method is that, for solving the same problem, it does not need either the optimal value of the
objective function or an approximation of it.
The rest of this paper is organized as follows: in Section 2 we present the basic concepts related to integer programming, duality, and the algorithm based on subgradient directions. In Section 3 we present our spectral projected
subgradient algorithm, and discuss its relevant properties. In Section 4, we present numerical results comparing the
performance of the new algorithm with the classic one for set covering problems and generalized assignment problems.
Finally, in Section 5 we present our concluding remarks.
2. Integer programming and duality
Let us consider the following integer programming problem (P):
max
s.t.
cT x
Ax b,
Dx e,
x ∈ Zn ,
x 0,
where Z represents the integer numbers, c, b and e are vectors and A and D are matrices of suitable dimensions. The
following problem is the Lagrangean relaxation of P, relative to the unstructured or hard constraints block Ax b, and
that will be denoted by P :
max cT x + T (b − Ax)
s.t. x ∈ X,
where X={x ∈ Zn , x 0 : Dx e} and 0 is a vector of suitable dimension. For us, Dx e represents the structured
or easy constrains block.
We are mainly concerned with the Lagrangean dual formulation of P, that will be referred as problem (D), and is
given by
min
s.t.
Z()
0,
where Z() = max{cT x + T (b − Ax), x ∈ X}.
One of the most common scheme for solving problem P is based on the branch-and-bound strategy. In this case,
the important ingredients are branching, relaxation, and fathoming [11]. In the relaxation process a new problem is
obtained whose optimal value is a bound for the optimal value of problem P, which is the main motivation for solving
the Lagrangean dual problem D. This problem can be solved using the subgradient optimization algorithm, that needs
in theory the optimal value of problem D (see e.g. [2,3]). Of course, in practice, this optimal value is approximated by
a previously computed bound.
We now present a brief review of the ideas associated with duality for solving integer programming problems. In
our presentation we follow [2,12,3].
It is well-known that if ZD is the optimal value of D then ZD ZI P , where ZI P is the optimal value of problem P.
The Lagrangean dual approach can be used for obtaining good bounds of ZI P .
3176
A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186
In this work we assume that X is bounded and hence finite. Let X={x 1 , x 2 , . . . , x m } and Z()=max{i=1,2,...,m} (cT x i +
(b − Ax i )). If we denote fi = (b − Ax i ) and hi = cT x i then
T
Z() =
max
{i=1,2,...,m}
hi + fiT .
Therefore, Z() is a piece-wise linear convex function. If Z() were differentiable then the gradient method could be
used for solving P . Unfortunately, Z() is not differentiable everywhere.
The so-called subgradient optimization algorithm (S), that will be described below, can be used for solving P and
converges under the following assumptions: the exact value ZD is known and the norms of the obtained subgradients
are uniformly bounded. The value of ZD is used to compute the steplength at every iteration. In general, ZD is not
known, and only approximate values are available.
Starting from a given 0 0, we now describe iteration kth of the subgradient optimization algorithm that will be
extended and combined later with some new ideas.
Subgradient optimization algorithm (S):
1. Choose k ∈ *Z(k ), the set of subgradient vectors of Z(k )
• If k = 0 stop, k is the optimal solution
• Else
k )
, 0 < 1 ˆk 2 − 2 , 2 > 0
2. Set k = ˆk Z̃−Z(
k
3. Set (k+1)j = max{kj − k kj , 0}, ∀j
Remarks. 1. Z̃ is an estimate of ZD . In practice, different stopping criteria can be used. For example, the process can
be stopped when Z̃ − Z(k ) tol, where tol has been pre-established, or when a maximum number of iterations has
been reached [13].
2. Note that neither the chosen subgradient directions nor the steplength used guarantee that Z(k+1 ) < Z(k ), and
so this is not a descent algorithm.
3. Different heuristics can be used for obtaining ˆ k .
The convergence theorem that supports the subgradient optimization algorithm (see [14]) states that if Z is a convex
function, an optimal solution ∗ exists, and the subgradients are uniformly bounded, then limk→∞ Z(k ) = Z(∗ ), and
any limit point of {k } is an optimal solution. However, the convergence of the method heavily depends on the quality
of the estimate Z̃.
In our next section, we combine the subgradient Lagrangean scheme with a different choice of steplength, based on
the recently developed spectral projected gradient method [6,15], that does not require either exact or approximated
estimates of the optimal value. We also add a momentum term [8] to the subgradient direction that in some difficult
cases accelerates the convergence process of the algorithm.
3. Spectral projected subgradient algorithm
Our main optimization problem can also be written as
min
s.t.
Z()
∈ ,
where we recall that Z() = max{cT x + T (b − Ax), x ∈ X} is convex, piece-wise linear, and non-differentiable at
some points, = { : 0} is a non-empty, closed, and convex subset of Rn , and X = {x ∈ Zn , x 0 : Dx d}, where
b, c and d are vectors, and A and D are matrices of suitable dimensions.
In this work, we propose an extension of the spectral projected gradient (SPG) method [6,15], that can also be
viewed as an extension of the subgradient optimization algorithm, that will be called the spectral projected subgradient
(SPS) method.
A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186
3177
The SPG method is related to the practical version of Bertsekas [16] of the classical gradient projected method of
Goldstein [17] and Levitin [18]. However, some crucial differences make this method much more efficient than its
gradient projection predecessors. The key issue is that the first trial step at each iteration is taken using the spectral
steplength introduced in [19] and later analyzed in [20–22] among others. The spectral step is a Rayleigh quotient
related with an average Hessian matrix. For a review containing the more recent advances on this special choice of
steplength see [23].
Therefore, it is natural to transport the projected spectral gradient idea with a non-monotone line search to the
projected subgradient, in order to speed up the convergence of the subgradient method and to use the steplength that
does not depend on the optimal value of the objective function.
A typical iteration, for obtaining xk+1 , in the SPG method can be described as follows:
1. Set dk = P (xk − k g(xk )) − xk , where P is the projection onto .
2. Set xk+1 = xk + k dk ,
where the vector gk is the gradient (or subgradient) of Z evaluated at xk , the steplength k is obtained using the spectral
choice, and k is obtained by means of a non-monotone line search technique along the direction dk . For our nonmonotone globalization technique, we combine and extend the Grippo et al. [24] line search scheme with the recently
proposed globalization scheme of La Cruz et al. [25]. Roughly speaking our acceptance condition for the next iterate is
Z(xk+1 )
max
0 j min{k,M−1}
Z(xk−j ) + dk , g(xk ) +
where is a small positive number, and
0<
k < ∞.
k
k,
is chosen such that
(1)
k
The terms max0 j M−1 Z(xk−j ) and k > 0 are responsible for the sufficiently non-monotone behavior of Z(xk ).
The spectral choice of steplength has proved effective when combined with the negative gradient direction [26],
that has been historically associated with the steepest descent method, which is famous for being extremely slow for
ill-conditioned problems. Another proposed remedy for the slowness of the steepest descent method was given by
Nowland et al. [8], who add a momentum term to the gradient direction that has interesting practical properties. In
this work, we combine both strategies (spectral steplength and momentum term) successfully for the Lagrangean dual
problem.
The typical iteration of the idea proposed in [8] can be described as follows:
xk+1 = xk + △gk ,
△gk = −k gk + △gk−1 ,
where ∈ [0, 1] is the momentum parameter, and x0 and △g0 are given. This idea can be adapted for obtaining the
direction dk in the SPS method, as follows:
dk = P (xk − mk ) − xk ,
where
mk = k gk + mk−1 ,
If
m0 = 0.
= 0 then dk = P (xk − k g(xk )) − xk , and the original SPS scheme is recovered.
Now, the extension method applied to the problem P , using subgradients, is given by
dk = P (k − k gk ) − k ,
k+1 = k + k dk ,
3178
A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186
where gk is the subgradient vector of Z at k and the spectral steplength k is given by
k =
t s
sk−1
k−1
t y
sk−1
k−1
,
where sk−1 = k − k−1 and yk−1 = gk − gk−1 . At the first iteration k = 1.
The search direction dk , when we are adding a momentum term is given by dk = P (k − mk ) − k and mk = k gk +
mk−1 , m0 = 0. The idea consists in incorporating in the present iteration some influence of the past iterations. This
effectively adds inertia to the motion of a massive particle through a weighted space, and avoids the typical zig-zagging
behavior of the steepest descent method. Indeed, it introduces a damping effect on the iterations by averaging gradient
components with opposite signs in flat areas. In some cases it avoids the iterative process from stagnating around a
local minimum, helping it to skip over these flat areas without performing too many unnecessary iterations. For more
details on these properties of the momentum term, see [7]. In practice, the momentum term can be chosen in two
different ways: using a learning rule or fixed by the user as recommended by Tseng [10]. In this work we used a fixed
momentum term.
Combining all the ingredients described above, we obtain the following algorithm.
Spectral Projected Subgradient (SPS):
Given 0 ∈ , integers M 1 and , g0 a subgradient of Z at 0 , ∈ (0, 1), a parameter MAXITER of maximum
number of iterations allowed, 0 > 0, m0 = 0, and ∈ [0, 1], 0 = max(Z(0 ), g(0 )).
For k = 0, . . . , MAXITER
• mk = k gk + mk−1
• dk = P (k − mk ) − k
• =1
• + = k + dk
• k = 0 /k (1.1)
While Z(+ ) > max0 j min{k,M−1} Z(k−j ) + dk , g(k ) +
Compute new
= new
+ = k + dk
End while
• k+1 = +
• sk = k+1 − k
• yk = g(k+1 ) − g(k )
s t sk
• k+1 = kt
s k yk
End for
k
Remarks. 1. The parameter new could be computed in many different ways. In our implementation we choose
new = /2.
2. The parameter k = 0 /k (1.1) guarantees that (1) is satisfied. If we choose k = 0 /k r where r > 1, then (1) will
also hold. Our feasible choice r = 1.1 is suitable for the sufficiently non-monotone desired behavior of the method.
3. As in the S algorithm, at each iteration only a subgradient g(k ) = gk is computed. This calculation accounts for
the computational cost per iteration.
4. The SPS algorithm is stopped when MAXITER iterations are reached. Unfortunately, at this point we have no
theoretical criterion to stop the process.
4. Numerical results
We compare the performance of the SPS algorithm with the classical S algorithm on two different type of test
problems: set covering problems and generalized assignment problems.
A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186
3179
Table 1
Characteristics of set covering problems
Problem
Rows (m)
Columns (n)
Density (%)
4
5
6
A
B
C
D
E
200
200
200
300
300
400
400
50
1000
2000
1000
3000
3000
4000
4000
500
2
2
5
2
5
2
5
20
All the experiments were run in Matlab 5.3 on a Pentium III, 550 MHz clock, 320 MBytes of RAM memory and
Windows XP.
4.1. Set covering problems
A set covering problem is the problem of covering the rows of an m-row, n-column, zero-one matrix (aij ) by a subset
of the columns at minimum cost. These problems can be formulated as follows:
min
n
cj x j
s.t.
n
aij xj 1,
j =1
i = 1, . . . , m,
j =1
xj ∈ {0, 1}
with cj > 0, aij ∈ {0, 1}, ∀i, j .
If we define cj = −cj and aij = −aij , then the problem P can be written as
⎞
⎛
n
m
n
aij xj ⎠
i ⎝−1 −
cj xj +
max
j =1
s.t.
j =1
i=1
xj ∈ {0, 1}.
An optimal solution of P is given by
⎧
if cj − m
⎪
i=1 i aij > 0,
⎨1
xj = {0, 1} if cj − m
i=1 i aij = 0,
⎪
⎩
0
if cj − m
i=1 i aij < 0.
(2)
Let x ∗ be a solution of P then the subgradient is given by
i = −1 −
n
j =1
aij xj∗ ,
i = 1, . . . , m.
In Table 1 we describe the set of covering problems considered in our experiments. All of them were obtained from
Beasley’s OR library [27]. Beasley [28] also used this set of problems in his work. The parameter density represents
the percentage of non-zero elements.
As we mentioned before there are different heuristics for computing ˆ . We will follow the recipe given by Caprara
et al. [29], which consists in comparing the best and the worst bound for the optimal value in the last p iterations:
1. Set (at the beginning) ˆ = 0.1.
3180
A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186
Table 2
Results for set covering problems (Part I)
Problems
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
6.1
6.2
6.3
6.4
6.5
SPS
S
SPS∗
itbest
Dev(%)
S∗
ZD
itbest
Dev(%)
−428.9966
−512.0000
−516.0000
−493.8561
−512.0000
−557.1859
−429.9992
−488.0351
−637.8956
−513.4759
−251.0872
−299.3289
−225.9761
−240.4017
−210.9999
−212.4880
−291.6719
−286.4383
−278.9931
−264.9773
−133.0122
−140.2567
−139.6418
−128.8822
−152.6906
160
463
427
448
275
499
329
498
500
370
496
499
264
500
241
408
498
498
488
85
500
500
500
500
485
0.0008
0
0
0.0291
0
0.0115
0.0002
0.1279
0.0164
0.0928
0.0549
0.1442
0.0106
0.0409
0
0.2302
0.0363
0.1957
0.0025
0.0086
0.0957
0.1423
0.3512
0.0913
0.4319
−423.3816
−508.8298
−514.9335
−488.6496
−509.7710
−555.5556
−423.3945
−485.8885
−630.8328
−509.8203
−248.6560
−293.1867
−225.4144
−238.9139
−209.8727
−211.5217
−288.4858
−283.0592
−273.9048
−264.3398
−125.2575
−135.9985
−134.7918
−123.4389
−146.2715
219
220
239
195
220
220
199
200
197
220
199
175
219
199
219
200
200
198
197
240
179
201
181
200
179
1.3097
0.6192
0.2067
1.0831
0.4354
0.3041
1.5362
0.5672
1.1234
0.6198
1.0226
2.1932
0.2591
0.6595
0.5343
0.2256
1.1283
1.3731
1.8262
0.2491
5.9202
3.1739
3.8122
4.3109
4.6177
−429
−512
−516
−494
−512
−557.2500
−430
−488.6666
−638
−513
−251.2250
−299.7611
−226
−240.5000
−211
−212
−291.7778
−287
−279
−265
−133.1396
−140.4565
−140.1340
−129
−153.3529
2. Every p iterations:
• If
|ZDworst −ZDbest |
tol
ZDbest
then ˆ = ˆ .
• else ˆ = ˆ.
In practice we use p = 20, = 0.5, = 1.5 and tol = 0.01.
Concerning the initial guess, we follow the ideas developed in [29], that has proved to be effective for set covering
problems: Set i = minj ∈Ji cj /|Ij | where the set of indices are defined as Ji = {j : aij = 1} and Ij = {i : aij = 1}.
Using the integrability property, ZD is the solution for the associated linear problem, which is the bound used by the
S algorithm. The parameters for the SPS method were fixed as follows: M = 10, and = 0.7 for all tests, which are
standard values recommended in the literature [26]. For this set of problems, we report itbest, the iteration where the
best value is obtained before reaching 500 (MAXITER) iterations. The best value obtained by the SPS algorithm is
reported under the column (SPS∗ ), and the best value obtained by the S algorithm is reported under the column (S ∗ ).
We also report the percentage deviation (Dev(%)) from the value ZD for both methods. Tables 2 and 3 show the results
for problems obtained from Beasley’s OR library, and Table 4 shows the percentage-deviation average for each type
of problems.
Comparing the results obtained for the SPS and the S methods in Tables 2–4 it is clear that the SPS algorithm
outperforms algorithm S in almost all cases. Nevertheless, for problems E.1–E.5 algorithm S achieves a much better
solution than the SPS algorithm. Notice also that using the exact optimal value of ZD , the S method failed to achieve
the optimal solution in most cases.
We now select some set covering problems to perform a different experiment. In this experiment we would like
to analyze the CPU time required by both algorithms to achieve a pre-established accuracy. For that, we first run
each chosen experiment with the S algorithm, until 15 000 iterations (MAXITER) are reached or when the following
A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186
3181
Table 3
Results for set covering problems (Part II)
Problems
A.1
A.2
A.3
A.4
A.5
B.1
B.2
B.3
B.4
B.5
C.1
C.2
C.3
C.4
C.5
D.1
D.2
D.3
D.4
D.5
E.1
E.2
E.3
E.4
E.5
SPS
S
SPS∗
itbest
Dev(%)
S∗
ZD
itbest
Dev(%)
−245.7767
−246.8282
−227.6905
−230.7771
−234.0998
−64.2460
−68.9861
−73.9028
−70.9567
−67.4474
−223.4008
−212.1952
−233.8021
−213.1523
−211.1172
−55.0057
−59.0354
−64.6676
−55.5219
−58.3457
−2.7278
−2.7528
−2.5740
−2.7278
−2.6083
499
500
496
500
500
500
496
499
500
496
500
496
500
500
493
498
498
493
500
497
21
10
48
21
114
0.4295
0.2700
0.1357
0.2678
0.3359
0.4582
0.4557
0.3431
0.3641
0.3232
0.1788
0.3065
0.3328
0.3255
0.2454
0.5480
0.5224
0.6132
0.5723
0.4603
21.0249
18.6068
21.9740
21.0249
23.0771
−239.4092
−240.0630
−223.1457
−227.0280
−230.2944
−61.8867
−64.7946
−70.8196
−66.5235
−64.2238
−215.5645
−206.5421
−225.5729
−206.5712
−206.3287
−50.7707
−55.4453
−60.1095
−52.3733
−54.9932
−3.3021
−3.2361
−3.1846
−3.3021
−3.2402
177
200
200
199
200
199
178
197
178
178
200
200
199
195
197
180
180
180
179
198
139
135
134
139
128
3.0091
3.0034
2.1291
1.8880
1.9560
4.1136
6.5039
4.5007
6.5891
5.0872
3.6803
2.9624
3.8409
3.4029
2.5080
8.2050
6.5719
7.6185
6.2108
6.1798
4.3978
4.3168
3.4648
4.3978
4.4414
−246.8368
−247.4964
−228
−231.3968
−234.8889
−64.5417
−69.3019
−74.1572
−71.2160
−67.6661
−223.8010
−212.8475
−234.5829
−213.8483
−211.6365
−55.3088
−59.3454
−65.0666
−55.8415
−58.6155
−3.4540
−3.3821
−3.2989
−3.4540
−3.3908
Table 4
Percentage-deviation average for each type of set covering problems
Type
SPS
S
4
5
6
A
B
C
D
E
0.0279
0.0724
0.2225
0.2878
0.3888
0.2778
0.5432
21.1415
0.7805
0.9471
4.3670
2.3971
5.3589
3.2789
6.9572
4.2037
condition is satisfied:
|Z(k ) − ZD |
10−2 .
|ZD |
After that, we run the same experiment with the SPS algorithm until Z(k )S ∗ , where S ∗ is the best value obtained
by the S algorithm or until we reach MAXITER iterations. In this experiment, the momentum term is fixed as = 0.7.
For the S algorithm, using the recipe given by Caprara et al. [29], the required accuracy is not achieved in most cases.
Therefore, we do not use it. Instead, we fix ˆk = 0.1 for all k, and the accuracy was achieved in most experiments. It is
worth mentioning that the Caprara’s choice, used in Tables 2–4, was better than choosing a fixed value of ˆk ∈ (0, 2)
to approximate ZD in the first 500 iterations.
3182
A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186
Table 5
Set covering problems (CPU time)
Problems
4.1
5.3
4.4
5.1
6.1
A.1
B.1
C.1
D.1
E.1
SPS
S
SPS∗
Time
iter
S∗
ZD
Time
iter
−424.9327
−223.8241
−489.3107
−248.8899
−131.9544
−244.3807
−63.8405
−221.5730
−54.4323
−3.3534
8.4
21.3
9.2
22.5
16.5
72.4
125.4
112.0
401.7
11 576.6
36
36
38
39
54
53
61
38
53
15 000
−424.7109
−223.7406
−489.0658
−248.7176
−131.8085
−244.3693
−63.7956
−221.5639
−54.3565
−3.4195
1025.7
795.8
589.8
1113.6
4615.1
7265.0
25 488.9
18 648.7
48 450.2
783.7
3748
1437
2235
2032
11 927
5631
15 000
7893
15 000
7979
−429
−226
−494
−251.2250
−133.1396
−246.8368
−64.5417
−223.8010
−55.3088
−3.4540
In Table 5 we show the results of this experiments. We report the labels of the chosen problems, the CPU time in
seconds (Time), the number of iterations required (iter), and the best obtained values under the columns S ∗ and SPS∗ ,
respectively.
From Table 5, it is clear that the SPS algorithm achieves a similar accuracy with significantly less iterations than the
S algorithm except for the E.1 problem. Concerning the CPU time, it is also clear that an iteration for the SPS algorithm
involves the same amount of computational effort as an iteration of the S algorithm. Indeed, the quotient between the
required CPU time and the number of iterations, in both algorithms, mainly coincides for all experiments.
4.2. The generalized assignment problem (GAP)
In this section, following [30], we report some numerical results with a certain type of test problem: the dual of
generalized assignment problems [31]. A generalized assignment problem consists in assigning m jobs to n machines
(mn). If job i is performed at machine j, it costs cij and requires pij time units. Given the total available time Tj at
machine j, we want to find the minimum cost assignment of the jobs to the machines. Formally the problem is:
min
s.t.
m
n
cij yij
i=1 j =1
m
pij yij Tj ,
i=1
n
yij = 1,
j = 1, . . . , n,
i = 1, . . . , m,
j =1
yij ∈ {0, 1}
∀i, j .
where yij is the assignment variable, which is equal to 1 if the ith job is assigned to the jth machine and is equal to
ij = −pij , and relaxing the time constraints for the machines, we obtain the
0 otherwise. Defining cij = −cij and p
Lagrangean relaxation problem P :
max
s.t.
n
m
j =1 i=1
n
ij )yij +
(
cij + j p
yij = 1,
j =1
i = 1, . . . , m,
j =1
yij ∈ {0, 1}
n
∀i, j .
j Tj
A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186
3183
Table 6
Characteristics of generalized assignment problems
Problems
No. of Prob
Rows (m)
Columns (n)
Optimal
gapA
6
5
5
10
10
20
20
100
200
100
200
100
200
−1698
−3235
−1360
−2623
−1158
−2339
−1697.7
−3234.7
−1358.6
−2623.0
−1157.1
−2337.3
gapB
6
5
5
10
10
20
20
100
200
100
200
100
200
−1843
−3553
−1407
−2831
−1166
−2340
−1831.3
−3547.4
−1400.7
−2815.1
−1155.2
−2331.1
gapC
6
5
5
10
10
20
20
100
200
100
200
100
200
−1931
−3458
−1403
−2814
−1244
−2397
−1924.0
−3450.8
−1387.0
−2795.4
−1219.0
−2376.9
gapD
6
5
5
10
10
20
20
100
200
100
200
100
200
−6373
−12 796
−6379
−12 601
−6269
−12 452
−6345.4
−12 736.0
−6323.5
−12 418.0
−6142.5
−12 218.0
ZD
The assignment problem P can be solved as follows. For each j choose ij∗ such that
ij }.
cij + j p
c
ij∗ j = max {
ij∗ j + j p
1i m
Let y ∗ ≡ yij∗ be an optimal solution of P (i.e., yij∗ = 1 if i = ij∗ and yij∗ = 0 otherwise), then the subgradient of P is
given by
j =
m
i=1
ij yij∗ + Tj ,
p
j = 1, . . . , n.
In Table 6 we describe the set of generalized assignment problems considered in our experiments. All of them were
obtained from Beasley’s OR library [27]. Chu and Beasley [32] also used this set of problems in their work. Using the
integrability property, a bound ZD for the optimal value is obtained by solving the linear problem associated to P.
The problems were solved with the methods S and SPS, using as initial guess 0 =(5, 5, . . . , 5)t and MAXITER=250.
The momentum term was fixed as = 0.3 for all experiments. For computing ˆk , we will follow the recipe given by
Caprara et al. [29]. In Table 7, we report itbest, the iteration where the best value is obtained before reaching MAXITER
iterations. The best value obtained by the SPS algorithm is reported under the column (SPS∗ ), and the best value obtained
by the S algorithm is reported under the column (S ∗ ). We also report the percentage deviation (Dev(%)) from the value
ZD for both methods. Table 8 shows the percentage-deviation average for each type of problems.
From Tables 7 and 8 we observe once again that the SPS method outperforms the S method, using the bound ZD , in
2 out of every 3 experiments. Moreover, we can also observe in general that the SPS method produces a more accurate
solution than the S method.
We select some generalized assignment problems to perform a different experiment similar to the second set of
experiments for set covering problems. In this experiment we would also like to analyze the CPU time required by
both algorithms to achieve a pre-established accuracy. Once again, we use the recipe given by Caprara et al. [29] to
3184
A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186
Table 7
Results for generalized assignment problems
Problems
SPS
S
SPS∗
itbest
Dev(%)
S∗
itbest
Dev(%)
gapa5100
gapa5200
gapa10100
gapa10200
gapa20100
gapa20200
−1697.7
−3234.7
−1358.6
−2622.9
−1157.0
−2337.3
109
11
211
59
71
67
0
0
0
0.0038
0.0086
0
−1697.6
−3234.7
−1358.4
−2623.0
−1156.9
−2337.3
249
158
250
249
250
193
0.0059
0
0.0147
0
0.0173
0
gapb5100
gapb5200
gapb10100
gapb10200
gapb20100
gapb20200
−1831.3
−3547.4
−1400.5
−2815.0
−1155.1
−2331.1
27
54
93
198
88
158
0
0
0.0143
0.0036
0.0087
0
−1831.2
−3547.4
−1400.5
−2814.9
−1155.0
−2330.9
180
157
160
160
159
173
0.0055
0
0.0143
0.0071
0.0173
0.0086
gapc5100
gapc5200
gapc10100
gapc10200
gapc20100
gapc20200
−1923.9
−3450.8
−1387.0
−2795.4
−1217.9
−2375.9
179
221
167
52
92
64
0.0052
0
0
0
0.0902
0.0421
−1923.9
−3450.7
−1386.6
−2795.4
−1218.3
−2376.0
110
158
138
139
159
129
0.0052
0.0029
0.0288
0
0.0574
0.0379
gapd5100
gapd5200
gapd10100
gapd10200
gapd20100
gapd20200
−6345.3
−12 736.0
−6323.4
−12 418.0
−6142.1
−12 218.0
243
119
81
181
85
119
0.0016
0
0.0016
0
0.0065
0
−6345.4
−12 736.0
−6323.4
−12 418.0
−6142.2
−12 217.0
159
91
140
121
153
119
0
0
0.0016
0
0.0049
0.0082
Table 8
Percentage-deviation average for each type of generalized assignment problems
Type
SPS
S
gapa
gapb
gapc
gapd
0.0021
0.0044
0.0229
0.0016
0.0063
0.0088
0.0220
0.0024
compute ˆk , and the required accuracy was achieved in all experiments. We first run each chosen experiment with the
S algorithm, until 1000 iterations (MAXITER) are reached or when the following condition is satisfied:
|Z(k ) − ZD |
tol,
|ZD |
where tol = 10−3 . After that, we run the same experiment with the SPS algorithm until Z(k )S ∗ , where S ∗ is the
best value obtained by the S algorithm or until we reach MAXITER iterations. In this experiment, the momentum term
is fixed as = 0.3.
In Table 9, we show the results of this experiments. We report CPU time in seconds (Time), number of iterations
(iter), and the best obtained values SPS∗ and S ∗ .
From Table 9, it is clear that the SPS algorithm achieves a similar accuracy with significantly less iterations than the
S algorithm in all case. Concerning the CPU time, it is also clear that an iteration for the SPS algorithm involves the
same amount of computational effort as an iteration of the S algorithm. Once again, the quotient between the required
CPU time and the number of iterations, in both algorithms, mainly coincides for all experiments.
A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186
3185
Table 9
Generalized assignment problems (CPU time)
Problems
gapb10100
gapb20100
gapc10100
gapc20100
gapd10100
gapd10200
gapd20100
gapd20200
SPS
S
SPS∗
iter
Time
S∗
iter
Time
−1399.4
−1154.2706
−1386.0893
−1217.8388
−6319.7904
−12 413.0860
−6138.6518
−12 208.8468
16
34
25
59
29
33
47
34
0.15
0.45
0.27
0.81
0.29
0.64
0.66
0.85
−1399.3
−1154.1583
−1385.7711
−1217.8383
−6317.5195
−12 407.6925
−6136.4702
−12 206.9791
91
99
92
128
54
52
64
58
0.95
1.36
0.94
1.74
0.54
0.97
0.87
1.46
5. Final remarks
We have presented a new scheme (SPS) for solving Lagrangean dual problems associated with integer programming
problems that combines the subgradient method with the spectral choice of steplength and a momentum term. To
illustrate the wide range of possible applications of the SPS algorithm, we have reported numerical results on two
different type of problems: set covering problems and generalized assignment problems. Based on the computational
experiments carried out, the SPS algorithm has the tendency of reaching a closer approximation of the optimal solution
with fewer iterations than the S algorithm, without requiring any additional (and hypothetical) information, as the
optimal value ZD of the objective function at the solution.
The SPS method seems to be a good choice for large-scale problems, since it requires very low memory requirements
and low computational cost per iteration. Moreover, it seems to be suitable for general non-differentiable problems,
since it only requires a subgradient direction per iteration. This possible extension is an interesting topic that deserves
further investigation.
Finally, we would also like to study the convergence properties of the SPS method. This is an open issue that deserves
special attention in the near future.
Acknowledgements
We are indebted to two anonymous referees whose comments helped us to improve the quality of this paper.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
Shor NZ. Minimization methods for non-differentiable functions. Berlin: Springer Series in Computational Mathematics, Springer; 1985
Bertsimas D, Tsitsiklis JN. Introduction to linear optimization. Belmont, MA: Athena Scientific; 1997.
Held M, Wolfe P, Crowder H. Validation of subgradient optimization. Mathematical Programming 1974;6:62–88.
Balinski ML, Wolfe P, (editors.). Nondifferentiable optimization. Mathematical programming study, vol. 3. Amsterdam: North-Holland; 1975.
Alber YI, Iusem AN, Solodov MV. On the projected subgradient method for nonsmooth convex optimization in a Hilbert space. Mathematical
Programming 1998;81:23–35.
Birgin EG, Martinez JM, Raydan M. Nonmonotone spectral projected gradient methods on convex set. SIAM Journal on Optimization
2000;10:1196–211.
Bishop CM. Neural networks for pattern recognition. New York: Oxford; 1997.
Plaut D, Nowlan S, Hinton GE. Experiments on learning by back propagation. Technical Report CMU-CS-86-126, Department of Computer
Science, Carnegie Mellon University, Pittsburgh, PA; 1986.
Solodov MV, Zavriev SK. Error stability properties of generalized gradient-type algorithms. Journal of Optimization Theory and Applications
1998;98:663–80.
Tseng P. An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM Journal on Optimization
1998;8:506–31.
Geoffrion AM, Marsten RE. Integer programming algorithms: a framework and state-of-the-art survey. Management Science 1972;18(9):
465–89.
3186
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
A. Crema et al. / Computers & Operations Research 34 (2007) 3174 – 3186
Geoffrion AM. Lagrangean relaxation for integer programing. Mathematical Programming Study 1974;2:82–114.
Goffin JL. On convergence rates of subgradient optimization methods. Mathematical Programming 1977;13:329–47.
Shapiro JF. Programing structures and algorithms. New York: Wiley; 1979.
Birgin EG, Martinez JM, Raydan M. Algorithm 813: SPG-software for convex-constrained optimization. ACM Transactions on Mathematical
Software 2001;27:340–9.
Bertsekas DP. On the Goldstein–Levitin–Polyak gradient projection method. IEEE Transactions on Automatic Control 1976;21:174–84.
Goldstein AA. Convex programming in Hilbert space. Bulletin of American Mathematical Society 1964;70:709–10.
Levitin ES, Polyak BT. Constrained minimization problems. USSR Computational Mathematics and Mathematical Physics 1966;6:1–50.
Barzilai J, Borwein JM. Two point step size gradient methods. IMA Journal of Numerical Analysis 1988;8:141–8.
Dai YH, Liao LZ. R-linear convergence of the Barzilai–Borwein gradient method. IMA Journal of Numerical Analysis 2002;22:1–10.
Fletcher R. Low storage methods for unconstrained optimization. Lectures in Applied Mathematics (AMS) 1990;26:165–79.
Raydan M. On the Barzilai and Borwein choice of steplength for the gradient method. IMA Journal of Numerical Analysis 1993;13:321–6.
Fletcher R. On the Barzilai–Borwein method. Technical Report NA/207, Department of Mathematics, University of Dundee, Dundee, Scotland;
2001.
Grippo L, Lampariello F, Lucidi S. A nonmonotone line search technique for Newton’s method. SIAM Journal on Numerical Analysis
1986;23:707–16.
La Cruz W, Martinez JM, Raydan M. Spectral residual method without gradient information for solving large-scale nonlinear systems,
Mathematics of Computation, 2005, to appear.
Raydan M. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM Journal on Optimization
1997;7:26–33.
Beasley JE. Or-library: Distributing test problems by electronic mail. Journal of Operational Research Society 1990;41:1069–72.
Beasley JE. An algorithm for set covering problems. European Journal of Operational Research 1987;31:85–93.
Caprara A, Fischetti M, Toth P. A heuristic method for the set covering problem. Operations Research 1999;47:730–43.
Nedić A, Bertsekas DP. Incremental subgradient methods for nondifferentiable optimization. SIAM Journal on Optimization 2001;12:109–38.
Ross GT, Soland RM. Modeling facility location problems as generalized assignment problems. Management Science 1977;24:345–57.
Chu PC, Beasley JE. A genetic algorithm for the generalised assignment problem. Computers & Operations Research 1997;24:17–23.
View publication stats