Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

main

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

A progression strategy of proximal algorithm for the

unconstrained optimization

Marouane Nazih Khalid Minaoui


LRIT, Associated Unit to CNRST (URAC 29) LRIT, Associated Unit to CNRST (URAC 29)
IT Rabat Center, IT Rabat Center,
Faculty of Sciences In Rabat, Faculty of Sciences In Rabat,
MOHAMMED V UNIVERSITY IN RABAT, MOHAMMED V UNIVERSITY IN RABAT,
B.P.1014 RP, Rabat, Morocco B.P.1014 RP, Rabat, Morocco
Email: marouane.nazih1@gmail.com Email: khalid.minaoui@fsr.ac.ma

Abstract—In order to accurately solve the unconstrained optimization problem with penalized terms in the objective
optimization problem in which the objective function is nonlinear, function and solve the problem as an unconstrained problem.
a new optimization method different from both linear search
and trust region is presented in this paper. This new method is In many nonlinear optimization problems, the objective
based on the family of proximal algorithms, we customize our function has a large number of local minima (and maxima),
algorithm to solve different unconstrained optimization problems, finding an arbitrary local optimum is relatively straightforward
and we verify the theoretical of the proposed method via different by using classical local optimization methods, but finding the
numerical examples where we compare the new algorithm with global minimum (or maximum) of a function is far more diffi-
some existing state-of-the-art algorithms. Finally the simulation cult where the analytical methods are frequently not applicable,
results proved the performance of the algorithm and indicates
and the use of numerical solution strategies often leads to
the advantage of the proposed algorithm in the case where the
hypotheses are verified. very hard challenge. In our case, where the objective function
is not quadratic function, then many optimization methods
Keywords—Optimization, Unconstrained optimization problems, use strategies to ensure that some subsequence of iterations
Proximal algorithm. converges to an optimal solution. There are two main families
of global strategies for solving problem 1, the first and the
I. I NTRODUCTION most popular for ensuring convergence relies on line searches
and the second and increasingly popular is trust region.
Optimization refers to finding an argument that minimizes
or maximizes a given function (quadratic function, nonlinear At every iteration k, the following iter x(k+1) in both line
function, the sum of squares function, etc.), many optimization searches and trust region is construct as:
algorithms seek to find the solution by solving either linear or
nonlinear equations by defining the derivative of the objective x(k+1) = x(k) + ⇢(k) d(k) (2)
function equal to zero [1].
Unconstrained optimization problems consider the problem by controlling the step length ⇢(k) and the direction d(k)
of minimizing an objective function that depends on real to forcing the descent condition:
variables with no restrictions on their values.
f (x(k) + ⇢(k) d(k) ) < f (x(k) ) (3)

Mathematically, let x 2 RN be a real vector with N 1 both strategies differ in the order in which they choose the
components and let f : RN 7! R be a convex smooth function, direction and the distance of the move, Line searches strategy
Then our unconstrained optimization problem is: chooses the direction first, followed by the distance, conversely
of the Trust-region strategy that firstly chooses the maximum
min f (x) (1) distance, followed by the direction.
x
It is well known in the philosophy of optimization that
Unconstrained optimization problems arise directly in sev- strategies for rapid convergence and robustness of the
eral applications in Numerical Optimization [2] and in Cloud algorithms is often in direct conflict with the requirements
Computing [3] [4], but they also arise indirectly from re- of computer storage and speed, for example the gradient
formulations of constrained optimization problems that has a method is simple to compute, but it is quite slow and find
wide range of applications in Energy efficient communications only local minimum [8]. There is also the newton’s method
[6] [7], and also in Internet of Things (IoT) [5] to recover that minimizes a function using knowledge of its second
in a stable way a matrix from the matrix of detection data, derivative, it can be faster for good initial guesses when the
note that constrained problems can often be transformed into second derivative is known and easy to compute, however,
unconstrained problems with the help of Lagrange multipliers the analytic expression for the second derivative is often
[13], And it is practical to replace the constraints of an complicated or intractable, and the computation of the second
derivative also requires a lot of computation.

In other hand trust region algorithms are later developments


than line search and the former have better convergence
properties in general, since they change not only the length
of the search vector, but also its direction. However, solving
the trust region subproblem at each iteration is more expensive
than a simple line search, so the former may not pay off for
large-scale problems [14]. For large-scale problems, a good
strategy is to use the conjugate gradient algorithm which ends
in at most N steps, in theory (with N is the size of the problem),
but can take much more than N steps or even do not converge Fig. 1. Example of local approximation m(k) (x) by Taylor-Young
if the problem is ill-conditioned [8].
In case that the problem is ill-conditioned, an effective and
practical way is to introduce regularization to our objective A. Globalization by line search
function instead of going fancy with these globalization algo-
In the line search approach, a descent direction d(k) is
rithms.
constructed and a search is made along this direction starting
Thus, the motivation of this paper is to develop a robust
from x(k) to find a new iterate whose value of the objective
algorithm for unconstrained optimization problems, which can
function is smaller. The direction d(k) is a direction of descent
be faster and more accurate compared to others algorithms.
if there exists C 0 such that:
The proposed algorithm belongs to a third family of global
strategies called the proximal method which seeks a global f (x(k) + ⇢(k) d(k) ) < f (x(k) ), 8⇢ 2]0, C] (3)
minimum, The algorithm introduce a regularization term to
our objective function (1) to penalize the distance between two Which simply becomes, in the case where f is differentiable
iterations. The problem (1) benefits from the strongly convex in x(k) ,
property [15], which allows us to reach the global minimum d(k)T rx f (x(k) )  0 (4)
in a small number of iterations, and we propose a strategy of
progression of the parameter µ(k) for our proximal algorithm many algorithms belonging to this category such as The
to correctly parameterize the intensity of the penalty, which is gradient method. This method consider a starting point x(0) ,
adapted from iteration to another. and try to minimize the cost function f using a descent
direction d(k) = g (k) such as
The rest of the paper is organized as follows. In Section
2, we review the different approaches of unconstrained opti- x(k+1) = x(k) ⇢(k) g (k) , (5)
mization while addressing the main algorithms that belong to
where ⇢ is the descent step of the method and g
(k) (k)
is the
each approach. In Section 3 we introduce our new optimization
gradient of the function f .
algorithm. In Section 4 we deal with analysis of the simulation
results and finally Section 6 concludes the paper. Although these methods are conceptually very simple and
can be programmed directly, they are often slow in practice.
II. R ELATED W ORK They converge but under often complex conditions of conver-
gence [8]. Therefore, the Conjugate Gradient method (CG) is
The principle of iterative methods is to generate an im-
used instead, this second method of line search is very efficient
provement sequence of approximate solutions, in which the
for solving (1) especially when N is large and they must
k-th approximation is derived from the previous ones, they
converge in at most N iterations, where N is the dimension
start at a point x(0) and generate a sequence {x(k) } that is
of the unknown vector. The search direction has the following
interrupted when no progress can be made or when a solution
form: ⇢
seems to have approached with sufficient accuracy. To decide g (k) if x = 0
how to go from one iteration x(k) to the next one, these d(k) =
g (k) + (k) d(k 1) if k 1
methods are based on local information on f obtained at the With the coefficient of conjugation:
point x(k) and possibly at the preceding points: they construct
(k) (k)
a local approximation m(k) (x) (Fig. 1) - the Taylor Young (k) (g T ) g
limited development in the current point x(k) - that is used to = (k 1) (k 1)
(7)
solve a local problem and find a new iterated x(k+1) for which (g T ) g
the value of the objective function will be lesser. However,
The third method in this category is The Quasi-Newton
these algorithms must regularly return to the main objective
method. It consist in imitating the Newton method [14], where
function f (x) And can not indefinitely deal with the ersatz
the optimization of a function is obtained from the successive
that is m(k) (x) At the risk of being misled dangerously. This
minimizations of its second-order approximation. Indeed, the
step, consists of going back from the local problem to the
disadvantage of the direction of Newton is that it requires
global problem, is called globalization.
knowledge of the Hessian of the objective function, the calcu-
Two methods of globalization are discussed in this section: lation of this matrix can be complicated and costly. Moreover,
the line search and the trust region approaches, a third for some problems, the criterion is not twice differentiable.
method of globalization known as the proximal point which This motivated the emergence of the Quasi-Newton methods
is the main part of our contribution. [8] which define the direction by:
1 k (k)
d(k) = (B ) g (8) 0 < ⌘1  ⌘ 2 < 1 and 0< 1  2 <1 (20)

where B is the approximation to the Hessian matrix.


Calculate f (x(0) ) and initialize k=1.
Readers seeking clarification on the methods developed in this
section are invited to consult the work of Stoer and Bulirsch • Step 1: Define the local approximation. Choose the
[9] for example. norm k.kk and define a local approximation m(k) in
Formally the algorithm model based on line search is B (k) .
provided by algorithm 1
• Step 2: Calculation of a progress step. Calculate a
algorithm 1 : Choice of an initial iterate x0 2 Rn and a step s(k) reducing sufficiently the local approximation
small parameter ✏ > 0. Initialization k=1. m(k) and such that: xe(k) = x(k) + s(k) 2 B(k) .
• Step 1: Convergence test : if krf (x(k) )k2 < ✏ stop • Step 3: Accept or reject the test point. Evaluate
of the algorithm f (x̃(k) ) and define the ratio
• Step 2: determine a descent direction dk f (x(k) ) f (x̃(k) )
⇢(k) = (21)
• Step 3: determine a step of progression ⇢k such that m(k) (x)(k) m(k) (x̃(k) )
the function f decreases sufficiently
If ⇢(k) ⌘1 , define x(k+1) = x̃(k) ;
• Step 4: determine a new iterated x(k+1) = x(k) + otherwise, x(k+1) = x(k) .
⇢(k) d(k)
• Step 4: Update the confidence region radius. Choos-
• Step 5: set k=k+1 and return to step 1. ing:
8 (k)
B. Globalization by Trust Region <[ , +1[ if ⇢(k) ⌘2
(k+1)
Optimization methods by trust region are based on a simple 2 [ 2 (k)
, (k)
] if ⇢(k) 2 [⌘1 , ⌘2 ]
:
idea: At each iteration, the local approximation m(k) (x) (Fig. [ 1 (k)
, 2 (k)
] if ⇢(k) < ⌘1
1) is considered as reliable in a given domain of determined (22)
validity. A trust region, which the size is adapted according • Step 5: set k=k+1 and return to step 1.
to the iterations, and with some assumptions, the global
convergence of this approach to a local minimum can be We must also mention that there are other techniques of
rigorously established [10]. At each iteration, the algorithm globalization alongside line search and trust region. Among
defines a local approximation m(k) (x) which has the aim to these techniques, those called proximal point, which are very
approach the objective function in a trust region close, in their minds to the methods of the trust region. The
B = {x 2 Rn : kx xk kk  k
} (17) so-called proximal methods are already present in the Martinet
thesis [11] but the algorithm of the proximal point finds
where k
is the Confidence Region Radius (CRR) and where its full foundations in the work of Rockafellar [12]. In its
k.kk is a norm depending on the iteration. A progression generality, this algorithm is developed to search for a zero of
step s(k) is then computed by solving at each iteration the a monotonous maximal operator, one of its numerous applica-
subproblem: tions being the convex optimization. Formally, let assume that
minimize m(k) (x(k) + s) f be a closed proper convex function. The proximal operator
s
proxf : RN 7! RN of f is defined by
subject to kskk  (k) (18)
1
proxf (v) = arg min(f (x) + kx vk22 ), (26)
x 2µ
Thereafter, the objective function f (x̃ ) is computed at the
(k)

test point where k.k2 is the usual Euclidean norm. The function min-
e(k) = x(k) + s(k) ,
x (19) imized on the righthand side is strongly convex and not
everywhere infinite, so it has a unique minimizer for every v 2
and compared to the value predicted by the local approxima- RN . Figure 2 represents what a proximal operator does. The
tion m(k) (x). If a sufficient reduction of the objective function black lines are level curves of a convex function f, the blue line
is obtained, the test point is accepted as the next iterate and indicates the boundary of its domain. Evaluating the proximal
the CRR increased or maintained constant. Otherwise, the test at the blue points moves them to the corresponding red points.
point is rejected and the CRR is contracted, in the hope that the The three points in the domain of the function stay in the
local approximation will give better predictions over a smaller domain and move towards the minimum of the function, while
region [10]. the other two move to the boundary of the domain and towards
Formally the algorithm model based on trust region is the minimum of the function. The parameter µ controls the
provided by algorithm 2 stretch to which the proximal operator maps points towards
the minimum of f, with larger values of µ associated with
Algorithm 2 : Let a starting point x(0) , an initial CRR (0) mapped points near the minimum, and smaller values giving
and the constants ⌘1 , ⌘2 , 1 , 2 which satisfy the conditions a smaller movement towards the minimum.
between two iterates while the second confines the iterated
x(k+1) around x(k) by a constraint (the trust region). Where,
on the one hand, the intensity of the penalty is governed by
the parameter µ(k) which adapts from iteration to iteration, We
have, on the other hand, a Confidence Region Radius (k) also
varying during the iterations which determines the size of the
”containment” region. Small values of µ(k) provoke a strong
penalty whose effects are similar to a trust region of small
radius (k) .
Regularization of proximal methods acts as a penalization -i.e.
a form of constraint that is weak and easy to parameterize-
Whereas trust region use a strong, impracticable and impass-
able constraint.

IV. R ESULTS AND D ISCUSSION


In this section, we present experimental results and several
simulations for the various iterative methods for optimization.
Fig. 2. Evaluating a proximal operator at various points We illustrate the behavior of the different algorithms presented
before, in order to highlight the impact of several factors on
their performance, and we finish by introducing the simula-
III. T HE PROPSED ALGORITHM tions of our algorithm in comparison with the other relevant
The proposition is based on the Proximal point Algorithm algorithms.
by a progression strategy. In fact, the proximal point algorithm Simulation was done in Matlab on a computer with Intel
seeks a minimizer of f by successive approximations, in this i5 CPU (2.6GHz) and 10GB memory running 64bit Mac
context, the algorithm is characterized by a basic iteration of OS. For simplicity, we have applied our algorithm to the
the form quadratic functions because it is the basis for most of the
1
arg min f (x) + (k) kx x(k) k2 . (27) nonlinear iterative algorithms which tries to find the solution
x 2µ
by seeking for the solution of an approximation quadratic
The distance term is introduced in order to regularize the con- function in a local point. As long as the algorithm used to
vex function f(x) and thus ensure the existence and uniqueness find the solution of the quadratic function is fast as long as it
of the minimum x(k+1) . positively influences the resolution of the nonlinear functions
Adapting µ at each iteration (not fixing it) increases the since the latter must find the solution of a quadratic function
results of convergence rates, for this purpose we analyzed (approximation m(k) (x)) at each iteration.
the important role that µ plays in the control of the extent to
which the proximal point maps points towards the minimum To have the possibility of comparing the different algo-
of f (Figure 2), so we suggest starting with a small µ ( i.e. rithms, we need a performance index. For all scenarios. We
important penalization, high 1/µ) then test if the next point is choose to use an error index, defined as:
near to the minimum, if that is the case, we increase µ (weak
penalization, small 1/µ ) otherwise we continue with the same kx(k) x⇤ k22
µ. E(k) = (29)
kx⇤ k2
the popsed algorithm model based on proximal point is
provided as follows: When the performance index E(k) is close to 0, the results
obtained are said to be better. So we must give a threshold from
Algorithm 3 : Let a starting point x(0) and an ✏ fixed, which we consider that the global minimum of our function is
• Step 1: Initialization k=1, the proximal algorithm gen- reached, for example ✏ = 10 6 . As a result, the stop criterion
erate a sequence {x(k) } of iterations, by calculating will be E(k) < ✏. Note also that the results are obtained after
x(k+1) from x(k) by: 20 Monte Carlo iterations.
1 For the first simulation, we compare the algorithms already
x(k+1) = arg min f (x) + (k) kx x(k) k2 (29) present in the literature to have a validation of projection
x 2µ
between the theoretical and experimental results, and also to
• Step 2: conclude on the performance of each algorithms. In that case,
if k x(k+1) - x(k) k2  ✏ then we consider a matrix of size 100⇥100 of a quadratic function
µ(k+1) = 2*µ(k) and we compare the convergence of the four algorithms with
else respect to a threshold of precise iterations. Figure 3 shows the
µ(k+1) = µ(k) convergence of these algorithms with respect to the threshold
end if which is T hresholdIter = 70 iterations. By comparing the four
• Step 3: set k=k+1 and return to step 1. curves obtained in this figure, we note that for 70 iterations, the
gradient algorithm tends to an error of about 10 1 , the trust
We can see some similarity between the proximal methods region algorithm tends to 10 6 , the quasi-newton algorithm
and the trust region methods: the first penalize the distance tends to 10 10 while the conjugate gradient algorithm reaches
Fig. 3. Performance index according to the number of iterations Fig. 5. Simulation of the proximal algorithm with both constant and flexible
progression of µ(k)

with our strategy needs fewer iterations 200 iterations against


250 for the the proximal with a constant µ(k) .

V. C ONCLUSION
In this article, we presented an algorithm with a progression
strategy for the proximal point parameter. This algorithm has
the advantage of being fast and robust compared to other
iterative optimization algorithms.
The main advantage of using the proximal algorithm with
the progression strategy is it’s flexibility and simplicity.
In addition, we have made a complete comparison of
the simulations results of the first order iterative algorithms
presented in the literature with those of our algorithm, which
proved the robustness of our algorithm in terms of accuracy
and speed of convergence.
Fig. 4. Performance index according to the number of iterations
Furthermore, the simulation results between the proximal
point with a constant progression step and with our proposed
strategy for advance this parameter showed a good improve-
an error Of 10 12 . These results allow us to conclude that the
ment in the accuracy of the results.
algorithm of the conjugate gradient is more efficient than the
others. In conclusion, the proximal point algorithm seems to be a
competitive candidate for optimization algorithms. We expect
In the second simulation we consider a matrix of size that the proximal point can be extended to other more complex
60⇥60 and we compare the convergence of the three relevant functions and also to other applications.
algorithms of literature with our proposed algorithm, to situate
our algorithm in relation to the algorithms already studied [2] [3] [4] [5]
previously. Figure 4 shows the convergence of these algo- [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]
rithms. By comparing the four curves obtained in this figure,
we observe that our proposed algorithm with the progress
strategy requires a minimum of iterations with respect to the R EFERENCES
other algorithms. All these results allow us to conclude that [1] G. K. Smyth, “Optimization and nonlinear equations,” Wiley StatsRef:
our algorithm is more efficient than the others. Statistics Reference Online, pp. 1–9, 2015.
[2] R. K. Joseph Sloan, David Kesler and A. Rahimi, “A numerical
For the final simulation, we consider a matrix of size optimization-based methodology for application robustification: Trans-
60⇥80 and we compare the convergence of the two proposed forming applications for error tolerance,” in IEEE/IFIP International
Conference on Dependable Systems and Networks, August 2010.
algorithms, the results of the simulations show us that the
[3] S. B. Stefania Sardellitti, Gesualdo Scutari, “Joint optimization of radio
proximal with our strategy of advancing the parameter µ(k) and computational resources for multicell mobile-edge computing,”
gives accurate results even better than with the proximal with IEEE Trans. on Signal and Information Processing over Networks, Dec
a constant µ(k) , and we can also observe that the proximal 2014.
[4] R. B. Leila Ismail, “Implementation and performance evaluation of a
distributed conjugate gradient method in a cloud computing environ-
ment,” Software: Practice and Experience, vol. 43, pp. 281–304, March
2013.
[5] Y. X. et al., “Cognitive internet of things: A new paradigm beyond
connection,” IEEE Internet of Things Journal, vol. 1, April 2014.
[6] A. G. S Cui and A. Bahai, “Energy-constrained modulation optimiza-
tion,” IEEE Transactions on Wireless Communications, vol. 4, Sept
2005.
[7] R. Z. CK Ho, “Optimal energy allocation for wireless communications
with energy harvesting constraints,” IEEE Transactions on Signal Pro-
cessing, 2012.
[8] J. Nocedal and S. J. Wright, Numerical Optimization. Springer Series
in Operations Research. Springer, 1999.
[9] J. Stoer and R. Bulirsch, Introduction to numerical analysis. Springer-
Verlag, New York, third dition, 2002.
[10] G. e. a. Conn, A. R., Trust-Region Methods. MPS/SIAM Series on
Optimization. SIAM, Philadelphia, USA, 2000.
[11] B. Martinet, Algorithms for solving optimization and minimax problems.
PhD thesis, University of Grenoble, 1972.
[12] R. T. Rockafellar, “Monotone operators and the proximal point algo-
rithm,” SIAM Journal on Control and Optimization, pp. 877–898, 1976.
[13] D. Bertsekas, Nonlinear Programming. Springer Series in Operations
Research. Springer, 1999.
[14] A. Ahookhosh, M., “An efficient nonmonotone trust-region method for
unconstrained optimization,” Numerical Algorithms, p. 523?540, 2012.
[15] N. Parikh and S. Boyd, “Proximal algorithms,” Found. Trends Optim,
vol. 1, 2013.
[16] S. Boyd and L. Vandenberghe., “Convex optimization,” Cambridge
University Press, 2004.

You might also like