OQM Lecture Note - Part 8 Unconstrained Nonlinear Optimisation
OQM Lecture Note - Part 8 Unconstrained Nonlinear Optimisation
OQM Lecture Note - Part 8 Unconstrained Nonlinear Optimisation
1 Introduction
1
2 Convexity
In this section, we briefly consider what kind of situations could
make an optimisation problem particularly hard to solve. We will
need the following important definitions.
0 < kx − x∗ k < ε.
The value f (x∗ ) is called a local minimum value of f (x).
2
Similarly defined is a strict local minimiser or a strict local mini-
mum point, where
0 < kx − x∗ k < ε.
It is possible for a function to have
3
Now we limit ourselves to a specific type of NLP problems – min-
imising a convex function (or maximising a concave function) over
a convex set.
A function f (x) is called convex if for any two points (or vectors)
x1 ∈ D and x2 ∈ D and for any α ∈ [0, 1] we have
f αx1 + (1 − α)x2 ≤ αf (x1 ) + (1 − α)f (x2 ).
the Hessian matrix for f (x) at point x. Note that the Hessian is a
symmetric matrix since if f (x) has second-order partial derivatives
4
at point x we have for 1 ≤ i, j ≤ n
∂ 2 f (x) ∂ 2 f (x)
= .
∂xi ∂xj ∂xj ∂xi
5
The 1st principal minors are 2 > 0 and 2 > 0. The 2nd principal
minor is 2 × 2 − 2 × 2 = 0. So, Theorem 1 shows that f (x) is a
convex function on R2.
A function f (x) is called concave if for any two points (or vectors)
x1 ∈ D and x2 ∈ D and for any α ∈ [0, 1] we have
f αx1 + (1 − α)x2 ≥ αf (x1 ) + (1 − α)f (x2 ).
Note:
6
k = 1, . . . , n all nonzero k th principal minors of its Hessian matrix
have the same sign as (−1)k .
Example 3. The Hessian of the function f (x) = −3x21 + 4x1x2 −
2x22 at any point x = (x1, x2 ) ∈ R2 is
!
−6 4
∇2 (x) =
4 −4
The 1st principal minors are −6 < 0 and −4 < 0. The 2nd princi-
pal minor is (−6) × (−4) − (4) × (4) = 8 > 0. Theorem 2 shows
that f (x) is a concave function on R2.
Now we discuss the domain, i.e. feasible region, of the considered
objective function.
Recall that the set S is called convex if for any two x1 , x2 ∈ S and
any α ∈ (0, 1) we have αx1 + (1 − α)x2 ∈ S.1
Note:
is convex.2
7
• If g(x) is a convex function, then the set
min f (x)
s.t. x ∈ S
8
This is a formal way of denoting that this remainder term gets
very small if δ is close to zero, and is “dominated” by the other
terms.
The Taylor’s formula leads to the following necessary condition
and sufficient condition for x∗ to be a local minimum of f (x).
First-order condition: If x∗ is a local minimum of f (x), then
f ′ (x∗ ) = 0.
This condition is also referred to as a necessary condition for a
local minimum, since it must happen in order for x∗ being a local
minimum. But if f ′ (x∗ ) = 0, we don’t know for sure that we have
a local minimum. Thus, it is not a sufficient condition, since it
does not guarantee that x∗ will be a local minimum.
Second-order condition: If f ′ (x∗ ) = 0 and f ′′ (x∗ ) > 0, then x∗
is a local minimum of f (x).
This condition is also referred to as a sufficient condition for a
local minimum.
1
= f (x∗ ) + (∇f (x∗ ))T d + dT ∇2 f (x∗ )d + o(kdk2 ).
2
9
The optimality conditions for the n-dimensional case is shown as
follows.
10
In the n-dimensional case, the second-order derivative is gener-
alised to the Hessian matrix. If x∗ is a stationary point, then the
Taylor’s formula at x∗ gives an approximation
1
f (x∗ + d) ∼
= f (x∗ ) + dT ∇2 f (x∗ )d.
2
If f (x∗ + d) ≥ f (x∗ ), i.e. x∗ is a local minimum, then we have
dT ∇2 f (x∗ )d ≥ 0 for any d.
Definition: An n × n symmetric matrix A is called
11
!
1 −1
Example 6. The matrix A = gives
−1 1
! !
1 −1 x1
= (x1 − x2 )2 ≥ 0.
x1 x2
−1 1 x2
for all x ∈ R2. So A is positive semidefinite. Note that for any
vector x = (x1 , x1 )T we have xT Ax = 0. Hence, A is not positive
definite.
Definition: Assume that A is an n × n matrix. A nonzero n-
dimensional vector x is called an eigenvector of A if it satisfies
the equality Ax = λx for some scalar λ. The scalar λ is called an
eigenvalue of A.
The eigenvalues of A can be found by solving the characteristic
equation
det(A − λI) = 0,
where “det” indicates the determinant.
Theorem 4. A symmetric matrix A is positive definite if and
only if all its eigenvalues are positive.
Now we turn back to the optimality condition.
Theorem 5. (Second-order necessary condition)
If x∗ is a local minimum for an unconstrained NLP problem min f (x),
then
(i) ∇f (x∗ ) = 0, and
12
Theorem 6. (Second-order sufficient condition)
If
(i) ∇f (x∗ ) = 0, and
!
2 −1
Thus, we have ∇2f (x∗ ) = ∇2f (1, 2) = .
−1 2
The eigenvalues of this matrix are found by solving the equation
2− λ −1
2
det(∇ f (1, 2) − λI) =
−1 2 − λ
= (2 − λ)2 − 1 = (3 − λ)(1 − λ) = 0.
⇒ λ1 = 3, λ2 = 1.
Since both eigenvalues are positive, the Hessian matrix ∇2 f (1, 2)
is positive definite. Hence, by the second-order sufficient condition
the point x∗ = (1, 2)T is a local minimum of f (x).
13
4 Convexity Revisited
Recall that a function f (x) is called convex if for any two points
x1 and x2 in its domain and for any α ∈ [0, 1] we have
f αx1 + (1 − α)x2 ≤ αf (x1 ) + (1 − α)f (x2 ).
14
5 Gradient Methods
In this section, we will study an important class of solution pro-
cedures for the unconstrained NLP.4 Once again, we consider the
unconstrained minimisation NLP problem
15
1. choose a direction dk such that (∇f (xk ))T dk < 0, and
(∇f (xk ))T dk = (∇f (xk ))T (−Dk ∇f (xk )) = −(∇f (xk ))T Dk ∇f (xk ) < 0.
16
Algorithm for Steepest Descent Method
dk = −∇f (xk ).
Iteration 0.
Step 0. Let x0 = (0, 0)T .
p
Step 1. ∇f (x0 ) = (0, −3)T , and k∇f (x0 )k = 02 + (−3)2 =
3 > 0.5 = ǫ.
Let d0 = −∇f (x0 ) = (0, 3)T . GO TO Step 2.
17
Step 2. To find the the step size value α0 , we need to solve the
problem
min g(α) = min f (x0 + αd0).
α>0 α>0
Then we have
Iteration 1. q
Step 1. ∇f (x1 ) = (− 32 , T
0) , and k∇f (x1 )k = (− 23 )2 + 02 =
3
2
> ǫ = 0.5.
3
T
So let d1 = −∇f (x1 ) = 2, 0 and GO TO Step 2.
Step 2. To find the the step size value α1 , we need to solve the
problem
min g(α) = min f (x1 + αd1).
α>0 α>0
Then we have
3 T 3 T 3α 3 T
x1 + αd1 = 0, +α , 0 = ,
2 2 2 2
3α 3 9
⇒ g(α) = f (x1 + αd1 ) = f , = (α2 − α − 1)
2 2 4
18
dg(α) 9 1
= (2α − 1) = 0 ⇒ α1 = > 0
dα 4 2
Hence,
3 T 1 3 T 3 3 T
x2 = x1 + α1 d1 = 0, + ,0 = , .
2 2 2 4 2
Iteration 2.
T
Step 1. ∇f (x2 ) = 0, − 34 , and k∇f (x2 )k = 43 > ǫ = 0.5.
T
So let d2 = −∇f (x2 ) = 0, 34 and GO TO Step 2.
Step 2. To find the the step size value α2 , we need to solve the
problem
min g(α) = min f (x2 + αd2).
α>0 α>0
Then we have
3 3 T T
3 T 3 3 3α
x2 + αd2 = , + α 0, = , +
4 2 4 4 2 4
3 3 3α 9
⇒ g(α) = f (x2 + αd2 ) = f , + = (α2 − α − 5)
4 2 4 16
dg(α) 9 1
= (2α − 1) = 0 ⇒ α1 = > 0
dα 16 2
Hence,
3 3 T 1 3 T 3 15
x3 = x2 + α2 d2 = , + 0, = , .
4 2 2 4 4 8
Iteration 3.
T
Step 1. ∇f (x3 ) = − 38 , 0
, and k∇f
(x3 )k
3
= 8 = 0.375 <
ǫ = 0.5. Stop and declare that x3 = 34 , 15
8 is a satisfactory
approximation solution.
19
Recall that the optimum to this minimisation NLP occurs at x =
(1, 2)T .
Steepest descent method was invented in the nineteenth century
by Cauchy. The advantages of this method is that it does not
require
• storing matrices.
20
The minimum of g(x) occurs when the gradient of this quadratic
is zero, i.e. the next approximation can be taken as the solution
of the vector equation
21
Apply Newton’s method to find an approximation solution with
ǫ = 0.5 and a starting point at the origin.
Solution. We have
Step 2. Set k = 0 + 1 = 1.
Iteration 1.
Step 1.
!
0
∇f (x1 ) = ⇒ k∇f (x1 )k = 0 < ǫ = 0.5
0
22
Stop and declare that a satisfactory approximation solution is
found.
The point x1 = (1, 2)T is not only a satisfactory approximation
minimum but also a stationary point. Since ∇2 f (x) is positive
definite for any x ∈ R2 , it is actually a global minimum for f (x).
Further reading: Section 11.1–11.7 in the reference book “Operations Research: Ap-
plications and Algorithms” (Winston, 2004)
23