Optimization Lesson 2 - Constrained Multi-Variable Optimization
Optimization Lesson 2 - Constrained Multi-Variable Optimization
(ME3202)
• There may be ‘less than or equal to’ type constraints, which can be converted to ‘greater
than or equal to’ type by multiplying the constraints with −1
Constrained Multi-variable Optimization Problems
• There is no restriction on the number of inequality constraints. However, the total number of active
constraints (satisfied at equality) must be less than, or at the most, equal to the number of design
variables.
• The inequality constraints can be scaled by any positive constant, and the equalities, by any
constant. This will not affect the feasible region and hence the optimum solution. All the foregoing
transformations, however, affect the values of the Lagrange multipliers and the performance of the
numerical algorithms also.
• The number of independent equality constraints must be less than, or at the most, equal to the
number of design variables, i.e. 𝑀 ≤ 𝑁
• If, 𝑀 < 𝑁, we have a feasible region to search for the optimal solution.
• When 𝑀 = 𝑁, no optimization of the system is necessary because the roots of the equality
constraints are the only candidate points for optimum design.
• If, 𝑀 > 𝑁, we have an overdetermined system of equations. In that case,
1. Either, some equality constraints are redundant (linearly dependent on other constraints), In
this case, the redundant constraints can be deleted and, if 𝑀 becomes < 𝑁, the optimum
solution for the problem is now possible.
2. Or, they are inconsistent. In this case, no solution for the design problem is possible and the
problem formulation can be reexamined.
Constrained Multi-variable Optimization Problems
• The inequality constraints 𝑔𝑘 𝑥𝑗 ≥ 0 are said to be active, if equalities are satisfied at the
optimal point 𝑥𝑗∗ i.e., 𝑔𝑘 𝑥𝑗∗ = 0. These are also called a tight or binding constraint.
• For a feasible design, an inequality constraint may or may not be active but, equality
constraints are.
• The inequality constraints are inactive if, 𝑔𝑘 𝑥𝑗∗ > 0
• The inequality constraints are violated if, 𝑔𝑘 𝑥𝑗∗ < 0 (opposite relationship w.r.t the
constraints’ original definitions)
• The inequality constraints are violated if, ℎ𝑚 𝑥𝑗 ≠ 0. So, by these definitions, an equality
constraint is either active or violated at a given optimal point.
Solution Techniques for
Constrained Multi-Variable Optimization Problems
• Constrained Multi-Variable Optimization Problems can be solved analytically as well as numerically.
• Lagrange Multiplier method is an analytical technique to tackle multi-variable optimization
problems with equality constrained. Though, it can be extended to solve inequality constrained
cases as well.
• Using slack variables and Lagrange multiplier technique, the inequality and equality constraints can
be added to the objective function to form an unconstrained problem, solution to which gives the
Kuhn-Tucker points in which optimal point may be present. This is a generalized method.
• Often, the necessary conditions of the Lagrange Multiplier Theorem lead to a nonlinear set of
equations that cannot be solved analytically. In such cases, we must use a numerical algorithm,
such as Newton’s method.
• The numerical algorithms can be classified into direct and gradient-based methods.
• A solution to a constrained optimization problem may not exist, this may happen due to over-
constraining or conflicting constrains. In that case, the formulation may be rechecked.
Multivariable Optimization with Equality Constraints
• Let us consider a multivariable optimization problem,
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥𝑗 , for 𝑗 = 1 to 𝑁
Subject to,
ℎ𝑚 𝑥𝑗 = 0, for 𝑚 = 1 to 𝑀
𝜕𝑓
• At the optimal point 𝑥1∗ , 𝑥2∗ , as per the necessary condition, 𝜕𝑥 ቚ = 0 … (7)
1 𝑥1∗ ,𝑥2∗
𝜕𝑓
𝜕𝑓 𝜕𝑥2 𝜕ℎ
• Therefore, ቚ
𝜕𝑥1 𝑥 ∗ ,𝑥 ∗
+ − 𝜕ℎ อ ቚ
𝜕𝑥1 𝑥 ∗ ,𝑥 ∗
= 0 … (10)
1 2 𝜕𝑥2 1 2
𝑥1∗ ,𝑥2∗
Lagrange Multiplier
𝜕𝑓
ቚ
𝜕𝑥2 𝑥∗ ,𝑥∗
• We define a scalar quantity 𝑣, called Lagrange multiplier, as: 𝑣 = − 𝜕ℎ
1 2
… (11)
ቚ
𝜕𝑥2 𝑥∗ ,𝑥∗
1 2
(Remember, 𝑣 is a sign free variable, can be considered +ve also)
𝜕𝑓 𝜕ℎ
• Replacing 𝑣, eq.(10) can be written as, 𝜕𝑥 ቚ + 𝑣 𝜕𝑥 ቚ = 0 … 12
1 𝑥1∗ ,𝑥2∗ 1 𝑥1∗ ,𝑥2∗
𝜕𝑓 𝜕ℎ
• From eq.(11), we can also write, 𝜕𝑥 ቚ + 𝑣 𝜕𝑥 ቚ = 0 … 13
2 𝑥1∗ ,𝑥2∗ 2 𝑥1∗ ,𝑥2∗
• eq.(12) & (13) can be written in general vector notation as, 𝛻𝑓 𝐱 ∗ + 𝑣 𝛻ℎ 𝐱 ∗ = 𝟎 … (14)
𝜕𝑓 𝜕𝑓
∗
ቚ ቚ
𝑥 𝜕𝑥1 𝑥 ∗ ,𝑥 ∗ 𝜕𝑥1 𝑥 ∗ ,𝑥 ∗
1
• Where, 𝐱 ∗ = ∗ , 𝛻𝑓 𝐱 ∗ = 1 2
and 𝛻ℎ 𝐱 ∗ = 1 2
𝑥2 𝜕𝑓
ቚ
𝜕𝑓
ቚ
𝜕𝑥2 𝑥 ∗ ,𝑥 ∗ 𝜕𝑥2 𝑥 ∗ ,𝑥 ∗
1 2 1 2
Necessary Condition for Optimality
for Lagrange Multiplier method
• Eq.(2), eq.(12) and (13) or eq.(2) and eq.(14) gives the necessary conditions of optimality for
the problem, for Lagrange Multiplier method.
• The necessary conditions are more commonly generated by constructing a function 𝐿,
known as the Lagrange function, as 𝐿 𝑥1 , 𝑥2 , 𝑣 = 𝑓 𝑥1 , 𝑥2 + 𝑣 ℎ 𝑥1 , 𝑥2 . For this case 𝐿 is
a function of three variables 𝑥1 , 𝑥2 and 𝑣
• The necessary conditions for its extremum are given in terms of 𝐿, by:
𝜕𝐿 𝜕𝑓 𝜕ℎ
• ቚ = 𝜕𝑥 ቚ + 𝑣 𝜕𝑥 ቚ =0
𝜕𝑥1 𝑥 ∗ ,𝑥 ∗ 1 𝑥1∗ ,𝑥2∗ 1 𝑥1∗ ,𝑥2∗
1 2
𝜕𝐿 𝜕𝑓 𝜕ℎ
• ቚ
𝜕𝑥2 𝑥 ∗ ,𝑥 ∗
= 𝜕𝑥 ቚ + 𝑣 𝜕𝑥 ቚ =0
1 2
2 𝑥1∗ ,𝑥2∗ 2 𝑥1∗ ,𝑥2∗
𝜕𝐿
• ቚ = ℎ 𝑥1∗ , 𝑥2∗ = 0
𝜕𝑣 𝑥 ∗ ,𝑥 ∗
1 2
Lagrange Multiplier Method in General Index Notation
• Constraints, ℎ𝑚 𝑥𝑗 = 0, must be differentiable.
• Number of constraints must be less than or equal to the number of design variables. If
they are equal, solving constraints themselves will give the solution.
• For a general multivariable (single objective) optimization problem with equality
constraint,
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥𝑗 , for 𝑗 = 1 to 𝑁
𝑆𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜, ℎ𝑚 𝑥𝑗 = 0, for 𝑚 = 1 to 𝑀
• If the Lagrange function 𝐿 𝑥𝑗 , 𝑣𝑚 = 𝑓 𝑥𝑗 + 𝑣𝑚 ℎ𝑚 𝑥𝑗
• or, expanding, 𝐿 𝑥𝑗 , 𝑣𝑚 = 𝑓 𝑥𝑗 + 𝑣1 ℎ1 𝑥𝑗 + 𝑣2 ℎ2 𝑥𝑗 + ⋯ + 𝑣𝑀 ℎ𝑀 𝑥𝑗 then,
Lagrange Multiplier Method in Index Notation …contd.
• The necessary conditions (otherwise known as the necessity theorem) for optimality for
Lagrange multiplier method is given by:
𝜕𝐿 𝜕𝑓 𝜕ℎ𝑚
ฬ = 𝜕𝑥 ฬ + 𝑣𝑚 ฬ =0
𝜕𝑥𝑗 𝜕𝑥𝑗
𝑥𝑗∗ 𝑗 𝑥𝑗∗ 𝑥𝑗∗
𝜕𝐿
ቚ = ℎ𝑚 𝑥𝑗∗ = 0
𝜕𝑣𝑚 𝑥 ∗
𝑗
• The necessary conditions become sufficient conditions for 𝑓(𝑥𝑗 ) to have a constrained
relative minimum (or, maximum) at 𝑥𝑗∗ if:
1) The objective function 𝑓(𝑥𝑗 ) is concave (convex), by checking the definiteness of the
Hessian matrix (details can be found at the end of this PPT), and,
2) The constraints are of equality type
Lagrange Multiplier Method in Index Notation …contd.
• Another way of checking the sufficient conditions is by constructing the bordered Hessian Matrix,
𝟎 𝑈
𝐻 𝐵 as: 𝐻 𝐵 = 𝑇 which is a 𝑀 + 𝑁 × 𝑀 + 𝑁 matrix,
𝑈 𝑉
𝜕ℎ1 𝜕ℎ1 𝜕ℎ1 𝜕2 𝐿 𝜕2 𝐿 𝜕2 𝐿
⋯ ⋯
𝜕𝑥1 𝜕𝑥2 𝜕𝑥𝑁 𝜕𝑥12 𝜕𝑥1 𝜕𝑥2 𝜕𝑥1 𝜕𝑥𝑁
𝜕ℎ2 𝜕ℎ2 𝜕ℎ2 𝜕2 𝐿 𝜕2 𝐿 𝜕2 𝐿
𝜕ℎ𝑚 ⋯ 𝜕2 𝐿 ⋯
where, 𝑈 = 𝜕𝑥𝑗
= 𝜕𝑥1 𝜕𝑥2 𝜕𝑥𝑁 and 𝑉 = 𝜕𝑥 𝜕𝑥 = 𝜕𝑥2 𝜕𝑥1 𝜕𝑥22 𝜕𝑥2 𝜕𝑥𝑁
𝑖 𝑗
⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋱ ⋮
𝜕ℎ𝑀 𝜕ℎ𝑀 𝜕ℎ𝑀 𝜕2 𝐿 𝜕2 𝐿 𝜕2 𝐿
⋯ ⋯ 2
𝜕𝑥1 𝜕𝑥2 𝜕𝑥𝑁 𝑀×𝑁 𝜕𝑥𝑁 𝜕𝑥1 𝜕𝑥𝑁 𝜕𝑥2 𝜕𝑥𝑁
𝑀×𝑁
• Starting with the principal minors of order 2𝑀 + 1, compute the last (𝑁 − 𝑀) principal minors of
𝐻 𝐵 at the point 𝑥𝑗∗ with 𝜆∗
If the principal minors are of alternate sign, starting with −1 𝑀+𝑁
, point 𝑥𝑗∗ is a maximum point
If the principal minors are of same sign, starting with −1 𝑀
, point 𝑥𝑗∗ is a minimum point
Multivariable Optimization with Inequality Constraints
• Let us consider a multivariable optimization problem,
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥𝑗 , for 𝑗 = 1 to 𝑁
𝑆𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜, 𝑔𝑘 𝑥𝑗 ≤ 0, for 𝑘 = 1 to 𝐾
𝑙𝑜𝑤𝑒𝑟 𝑢𝑝𝑝𝑒𝑟
(in which variable bounds 𝑥𝑗 ≤ 𝑥𝑗 ≤ 𝑥𝑗 are included, so, 𝐾 = 𝐾 + 2𝑁 total
inequalities)
• The inequality constraints can be transformed into equality constraints by adding non-
negative slack variables, 𝑠𝑘2 (to impose non-negativity and at the same time to avoid
additional constraints 𝑠𝑘 ≥ 0, the square of 𝑠𝑘 are taken).
• Note:
‘greater-than-or-equal-to’ inequalities are to be converted to ‘less-than-or-equal-to’ first by multiplying
−1 , to add slack variable.
if the problem is maximization or if the constraint is ‘greater than or equal to’ type, the 𝑠𝑘 had to be
non-positive.
Multivariable Optimization with Inequality Constraints
• Now, the problem definition becomes,
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥𝑗 , for 𝑗 = 1 to 𝑁
𝑆𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜, 𝑔𝑘 𝑥𝑗 , 𝑠𝑘 = 𝑔𝑘 𝑥𝑗 + 𝑠𝑘2 = 0, for 𝑘 = 1 to 𝐾
• This problem can now be solved conveniently by the method of Lagrange multipliers.
• For this, we construct the Lagrange function 𝐿 as,
• 𝐿 𝑥𝑗 , 𝑢𝑘 , 𝑠𝑘 = 𝑓 𝑥𝑗 + 𝑢𝑘 𝑔𝑘 𝑥𝑗 , 𝑠𝑘 , where, 𝑢𝑘 is the vector of Lagrange Multipliers
(letter 𝑢 instead of 𝑣 is used to differentiate that in this case, the inequality has been
converted to equality)
Multivariable Optimization with Inequality Constraints
• So, the necessary conditions for optimality for Lagrange multiplier method is given by:
𝜕𝐿 𝜕𝑓 𝜕𝑔𝑘
ฬ = ฬ + 𝑢𝑘 𝜕𝑥 ฬ =0
𝜕𝑥𝑗 ∗ 𝜕𝑥𝑗 ∗ 𝑗 𝑥∗
𝑥𝑗 𝑥𝑗 𝑗
𝜕𝐿
ቚ
𝜕𝑢𝑘 𝑥 ∗
= 𝑔𝑘 𝑥𝑗 , 𝑠𝑘 ห𝑥 ∗ = 0
𝑗
𝑗
𝜕𝐿
ቚ = 2 𝑢𝑘 𝑠𝑘 ȁ𝑥𝑗∗ = 0
𝜕𝑠𝑘 𝑥 ∗
𝑗
• It can be shown that 𝑢𝑘 ≥ 0 considering the changes of gradients in the feasible direction
• The solution of these equations gives the optimum point, 𝑥𝑗∗ and the values of the
Lagrange multiplier vector, 𝑢𝑘 and the values of the slack variable vector, 𝑠𝑘 evaluated at
𝑥𝑗∗ . For any of the 𝑠𝑘 = 0 gives corresponding 𝑔𝑘 = 0 (active) at 𝑥𝑗∗
Karush-Kuhn-Tucker (KKT) Conditions for
Multivariable Optimization with (General) Constraints
• Previously known as Kuhn-Tucker (KT) condition based on the first derivative tests.
Sometimes known as the ‘first order necessary conditions’.
• From the previous two cases, by combining we get the KKT conditions, we get:
𝐿 𝑥𝑗 , 𝑣𝑚 , 𝑢𝑘 , 𝑠𝑘 = 𝑓 𝑥𝑗 + 𝑣𝑚 ℎ𝑚 𝑥𝑗 + 𝑢𝑘 𝑔𝑘 𝑥𝑗 , 𝑠𝑘
Karush-Kuhn-Tucker (KKT) Conditions for
Multivariable Optimization with (General) Constraints
2. Gradient condition:
𝜕𝐿 𝜕𝑓 𝜕ℎ𝑚 𝜕𝑔
Stationary Conditions: 𝜕𝑥 ฬ = 𝜕𝑥 ฬ + 𝑣𝑚 𝜕𝑥𝑗
ฬ + 𝑢𝑘 𝜕𝑥𝑘 ฬ =0
𝑗 𝑥𝑗∗ 𝑗 𝑥𝑗∗ 𝑥𝑗∗ 𝑗 𝑥𝑗∗
𝜕𝐿
ቚ = ℎ𝑚 𝑥𝑗 ห ∗ = 0
𝜕𝑣𝑚 𝑥 ∗ 𝑥 𝑗
𝑗
Primal Feasibility Conditions: 𝜕𝐿
ቚ
𝜕𝑢𝑘 𝑥 ∗
= 𝑔𝑘 𝑥𝑗 , 𝑠𝑘 ห𝑥 ∗ = 0
𝑗
𝑗
(remember, here though 𝑘 is repeated but it doesn’t mean that 𝑢𝑘 𝑠𝑘 undergo Einstein summation)
Karush-Kuhn-Tucker (KKT) Conditions for Multivariable
Optimization with (General) Constraints
5. Non-negativity of the Lagrange Multipliers for inequalities or, Dual Feasibility: 𝑢𝑘 ȁ𝑥𝑗∗ ≥ 0
(positive or negative depends on the convexity or concavity of the inequality constraints)
6. Regularity check: Gradients of the active constraints must be linearly independent. In such
a case the Lagrange multipliers for the constraints are unique.
𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 ≤ 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓 𝑥2
Note: A convex set 𝐶 is a collection of points (vectors, 𝑥𝑗 ) having the following property: If 𝑃1
and 𝑃2 are two arbitrary points in 𝐶, then the entire line segment 𝑃1 𝑃2 must also be in 𝐶.
Convex Function:
Graphical Representation
or, 𝑃𝑄 = 𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2
• And, similarly, 𝑃𝑅 = 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓 𝑥2
• Therefore, 𝑃𝑄 ≤ 𝑃𝑅 means, 𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 ≤ 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓 𝑥2
Convexity of a Single Variable Function
• For a single variable function,
if the double derivative of the function is greater than or equal to zero 𝑓 ′′ 𝑥 ≥ 0
throughout the domain of 𝑥, then the function is convex. Example: 𝑓(𝑥) = 𝑥 2 . The
domain of the convex function is a convex set.
if the double derivative of the function is less than or equal to zero 𝑓 ′′ 𝑥 ≤ 0
throughout the domain of 𝑥, then the function is concave. Example: 𝑓 𝑥 = log 𝑥, or,
𝑓 𝑥 = 𝑥, or, 𝑓 𝑥 = 1 − 𝑥 2 , etc.
if the double derivative of the function is equal to zero 𝑓 ′′ 𝑥 = 0 throughout the
domain of 𝑥 , then the function is both convex and concave or linear.
Example: 𝑓(𝑥) = 4𝑥
if the double derivative of the function greater than or equal to zero 𝑓 ′′ 𝑥 ≥ 0 for
some points in the domain of 𝑥 and less than or equal to zero 𝑓 ′′ 𝑥 ≤ 0 for some
other points in the domain of 𝑥, then the function is neither convex nor concave.
Example: 𝑓 𝑥 = sin 𝑥, or, 𝑓 𝑥 = 𝑥 3 , etc.
Convexity of a Multi-Variable Function
• For a multi-variable function, if the Hessian matrix is positive semidefinite, the function is
convex. A positive definite matrix gives the function to be strictly convex.
• In other words, if the Hessian matrix of the function is negative semi-definite, the
function is concave.
Properties of Convex / Concave Functions
• A function, 𝑓 𝑥 , is said to be concave, if the negative function, −𝑓 𝑥 , is convex and vice-versa.
• A convex / concave function need not to be differentiable. Example, 𝑓 𝑥 = 𝑥 is a convex function
but not differentiable at 𝑥 = 0
𝑥 2 , −1 ≤ 𝑥 ≤ 1
• A convex function need not to be continuous. Example, 𝑓 𝑥 = ቊ ; but the function
2 , 𝑥=1
is always continuous in the interior of the domain: −1 < 𝑥 < 1
• If 𝑓(𝑥) is convex, the for constant 𝑎 and 𝑏, 𝑓(𝑎𝑥 + 𝑏) will also be convex.
• If 𝑓 and 𝑔 are two convex functions, then, 𝑓 + 𝑔, max (𝑓, 𝑔) and α𝑓 (where 𝛼 is a non-negative
scalar, 𝛼 ≥ 0) are also convex.
• If 𝑓 and 𝑔 are two concave functions, then, 𝑓 + 𝑔, min (𝑓, 𝑔) and α𝑓 (where 𝛼 is a non-negative
scalar, 𝛼 ≥ 0) are also concave.
• A convex function always gives a minima and a concave function always gives a maxima.
• If 𝑓 is a convex function defined on a convex set, then every local minima is global minima of 𝑓.