Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
8 views

4 Nonlinear Programming

Nonlinear programming 2024

Uploaded by

B M Fakhor Uddin
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

4 Nonlinear Programming

Nonlinear programming 2024

Uploaded by

B M Fakhor Uddin
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

4 NONLINEAR PROGRAMMING

Optimization problems whose mathematical models are characterized by nonlinear equations


are called Nonlinear Programming (NLP) problems. In Chapter 3 it was noted that these problems
also fell into the category of mathematical programming problems. Engineering design problems
are mostly nonlinear. In Chapter 2 several problems were examined graphically and it was evident
that curvature and the gradient of the functions involved had a significant influence on the
solution. In subsequent chapters we will continue to center the discussion of optimality conditions
and numerical techniques around two-variable problems because the ideas can also be expressed
graphically. Extension to more than two variables is quite straightforward and is most simple
when the presentation is made through the use of vector algebra. MATLAB will be utilized for all
graphic needs.
Traditionally, there is a bottom-up presentation of material for nonlinear optimization.
Unconstrained problems are discussed first followed by constrained problems. For constrained
problems the equality constrained problem is discussed first. A similar progression is observed
with regard to the number of variables. A single-variable problem is introduced, followed by two
variables which is then extended to a general problem involving n variables. This order allows
incremental introduction of new concepts, but primarily allows the creative use of existing rules to
establish solutions to the extended problems.
An analytical foundation is essential to understand and establish the conditions that the
optimal solution will have to satisfy. This is not for the sake of mathematical curiosity, but is an
essential component of the numerical technique: notably the stopping criteria. The necessary
mathematical definitions and illustrations are introduced in this chapter. References are available
for refreshing the calculus and the numerical techniques essential to the development of NLP [1,
2]. The books familiar to the reader should do admirably. This chapter also introduces the
symbolic computation (computer algebra) resource available in MATLAB, namely, Symbolic
Math Toolbox [3].

4.1 PROBLEM DEFINITION


In NLP it is not essential that all the functions involved be nonlinear. It is sufficient if just one
of them is nonlinear. There are many examples in engineering in which only the objective function
is nonlinear while the constraints are linear. If in this case the objective function is a quadratic
function, these problems are termed linear quadratic problems (LQP). Optimization problems for
the most part rely on experience to identify the mathematical model comprising of the design
variables, objective, and the constraints. A knowledge of engineering, or the appropriate
discipline, is also essential to establish a mathematical model. Primarily, this involves determining
the functional relationships among the design variables. The remaining task then is to establish the
solution.
How does one establish the solution to the nonlinear optimization problem?
In mathematics (after all at this stage there is a mathematical model for the problem) the
solution is obtained by satisfying the necessary and sufficient conditions related to the class of the
problems. The necessary conditions are those relations that a candidate for the optimum solution
must satisfy. If it does, then, and this is important, it may be an optimal solution. To qualify a
design vector XP (X represents the design vector) as an optimum, it must satisfy additional
relations called the sufficient conditions. Therefore, an optimum solution must satisfy both
necessary and sufficient conditions. This chapter establishes these conditions for the optimization
problem. Example 4.1 is established next and is used in several ways in the remainder of the
chapter 10 develop the conditions mentioned above. Once these conditions are available, the
numerical techniques in optimization will incorporate them to establish the solution.

4.1.1 Problem Formulation - Example 4.1


The problem is restricted to two variables to draw graphical support for some of the
discussions. There are two constraints, which, during the development of this chapter, may switch
between equality and inequality constraints to illustrate related features.
Problem: Find the rectangle of the largest area (in the positive quadrant) that can be
transcribed within a given ellipse and satisfy a prescribed linear constraint.
From the problem specification, the ellipse will provide an inequality constraint while the
linear relation among the variables will provide an equality constraint for this example.

Mathematical Model: Figure 4.1 captures the essence of the problem. Code files Ex41_1.m and
create_ellipse.m are files necessary for MATLAB to create the figure. There are two variables x1
and x2. There are standard mathematical expressions for the ellipse, and the straight line is no
problem after Chapter 3.
Standard Format: The standard format of the NLP is reproduced here for convenience:
Minimize f(x1,x2,…,xn) (4.1)
Subject to: hk(x1,x2,…,xn)=0, k=1,2,…,l (4.2)
gj(x1,x2,…,xn)≤0, j=1,2,…,m (4.3)
xil≤x≤xiu i=1,2,…,n (4.4)
In vector notation
Minimize f(X): [X]n (4.5)
Subject to: [h(X)]l=0 (4.6)
[g(X)]m≤0, (4.7)
low up
X ≤X≤X (4.8)
For the specific problem being discussed, and referring to Figure 4.1, the design variables are the
coordinate values x1 and x2 that allow the computation of the rectangular area. The optimization
problem is
Minimize f(x1,x2): -x1x2 (4.9)
Subject to: h1(x1,x2): 20x1+15x2-30=0 (4.10)
h2(x1,x2): (x12/4)+x22-1=0 (4.11)
0≤x1≤3; 0≤x2≤3 (4.12)
The side constraints in (4.12) can also be postulated as one-sided and can be written as
x1≥0; x2≥0 (4.13)

4.1.2 Discussion of Constraints


Using the relations (4.9)-(4.12) several additional classes of problems can be described by
including only a subset of the relations. They are examined below.
Unconstrained Problem: There are no functional constraints although the side constraints are
necessary to keep the solution finite. For this example,
Minimize f(x1,x2): -x1x2 (4.9)
In this problem, if the design variables are unbounded at the upper limit, then the solution would
be at the largest positive value of x1 and x2. A two-sided limit for the design variables is usually a
good idea. The designer does have the responsibility of defining an acceptable design space.
Equality Constrained Problem 1: The functional constraints in this problem are only
equalities. With reference 10 Example 4.1 the following problem can be set up (after changing the
inequality to an equality):
Minimize f(x1,x2): -x1x2
Subject to: h1(x1,x2): 20x1+15x2-30=0
h2(x1,x2): (x12/4)+x22-1=0
0≤x1≤3; 0≤x2≤3
Intuitively, such a problem may not be optimized since the two constraints by themselves should
establish the values for the Iwo design variables. The arguments used in LP for acceptable
problem definition are also valid here. There is always the possibility of multiple solutions-which
is a strong feature of nonlinear problems. In such an event the set of variables that yield the lowest
value for the objective will be the optimal solution. Note that such a solution is obtained by
scanning the acceptable solutions rather than through application of any rigorous conditions.
Equality Constrained Problem 2: If the problem were to include only one of the constraints,
for example,
This is a valid optimization problem. A similar problem can be defined with the first constraint by
itself.
Inequality Constrained Problem: In this case the constraints are all inequalities. A variation
on Example 4.1 would be (the equality constraint is transformed to an equality constraint)

Like its counterpart in linear programming, this is a valid optimization problem. Equally valid
would be a problem that included just one of the constraints or any number of inequality
constraints.
It is essential to understand both the nature and the number of constraints as well as how they
affect the problem. In general, equality constraints are easy to handle mathematically, difficult to
satisfy numerically, and more restrictive on the search for the solution. Inequality constraints are
difficult to resolve mathematically and are more flexible with respect to the search for the optimal
solution as they define a larger feasible region. A well -posed problem requires that some
constraints are active (equality) at the optimal solution as otherwise it would be an unconstrained
solution.

4.2 MATHEMATICAL CONCEPTS


Like LP, some mathematical definitions are necessary before the necessary and sufficient
conditions for the NLP can be established. Definitions are needed for both the analytical
discussion as well as numerical techniques. MATLAB provides a Symbolic Math Toolbox which
permits symbolic computation integrated in the numerical environment of MATLAB. This allows
the user to explore problems in calculus, linear algebra, solutions of system of equations, and other
areas. In fact, using symbolic computation students can easily recover the prerequisite information
for the course. A short hands-on exercise to symbolic computation is provided. The primary
definitions we need are derivatives, partial derivatives, matrices, derivatives of matrices, and
solutions of nonlinear equations.

4.2.1 Symbolic Computation Using MATLAB


One of the best ways to get familiar with symbolic computation is to take the quick online
introduction to the Symbolic Math Tool box available in the MATLAB Demos dialog box [4]. The
following sequence locates the tutorial:
>> demos
Symbolic Math--> Introduction
The computational engine executing the symbolic operations in MATLAB is the kernel of
Maple marketed and supported by Waterloo Maple, Inc. lf the reader is already familiar with
Maple, then MATLAB provides a hook through which Maple commands can be executed in
MATLAB. The symbolic computation in MATLAB is performed using a symbolic object or sym.
This is another data type like the number and string data types used in earlier exercises. The
Symbolic Math Toolbox uses sym objects to represent symbolic variables, expressions, and
matrices.
In the exercise that follows, a function of one variable, and two functions of two variables
(constraints from Example 4.1) are used for illustration. Drawing on the author's classroom
experience this preliminary discussion is in terms of variables x and y for improved
comprehension. In later sections, subscripts on x are used to define multiple variables so that the
transition to the general problem can be facilitated using vector description. The functions in these
exercises are

The following MATLAB session was captured as a diary file and edited in a text editor.
Therefore the MATLAB prompt does not appear. The boldface words are commands that the
reader will type at the command line.
x=sym('x') % defining x as a single symbolic object
x=
x
syms y f g1 g2 g % definition of multiple objects
whos % types of variables in the workspace
Name Size Bytes Class
f 1×1 126 sym object
g 1×1 126 sym object
g1 1×1 128 sym object
g2 1×1 128 sym object
x 1×1 126 sym object
y 1×1 126 sym object
Grand total is 14 elements using 760 bytes
f=(x-1)*(x-1)*(x-2)*(x-3) % constructing f
f=12+(x-1)*(x-1)*(x-2)*(x-3)
diff(f) % first derivative
ans=
2*(x-1)*(x-2)*(x-3)+(x-1)^2*(x-3)+(x-1)^2*(x-2)
% note the chain rule for derivatives
% note the independent variable is assumed to be x
diff(f,x,2) % the second derivative wrt x
ans=
2*(x-2)*(x-3)+4*(x-1)*(x-3)+4*(x-1)*(x-2)+2*(x-1)^2
diff(f,x,3) % the third derivative wrt x
ans =
24*x-42
g1=20*x+15*y-30 % define g1
g1 =
20*x+15*y-30
g2=0.25*x+y-1; % define g2
% g1,g2 can only have partial derivatives
% independent variables have to be identified
diff(g1,x) % partial derivative
ans =
20
diff(g1,y)
ans =
15
g=[g1;g2] % g column vector based on g1, g2
g=
[20*x+15*y-30]
[1/4*x+y-1]
% g can be the constraint vector in optimization problems
% the partial derivatives of g with respect to design
% variables is called the Jacobian matrix
% the properties of this matrix are import ant for
% numerical techniques
xy=[x y] ; % row vector of variables
J=jacobian(g, xy) % calculating the Jacobian
J=
[20,15]
[1/4,1]
ezplot(f) % a plot of f for -2pi × 2pi (default)
ezplot(f,[0,4]) % plot between 0<=x<=4
df=diff(f);
hold on
ezplot(df,[0,4]) % plotting function and derivative
% combine with MATLAB graphics – draw a line
Line([0 4],[0 0],’Color’,’r’)
g
g=
[20*x+15*y-30]
[1/4*x+y-1]
% to evaluate g at x=1, y=2.5
subs(g,{x,y},{1,2.5})
ans =
27.5000
1.7500
Additional symbolic computations will be introduced through code as appropriate. Note that
the result of both numeric and symbolic computations can be easily combined along with graphics
to provide a powerful computing environment.

4.2.2 Basic Mathematical Concepts


The basic mathematical elements in the discussion of NLP are derivatives, partial derivatives,
vectors, matrices, Jacobian, and Hessian. We have used the Symbolic Math Toolbox in the
previous section to calculate some of these quantities without defining them. These topics will
have been extensively covered in the foundation courses on mathematics in most disciplines. A
brief review is offered in this section using MATLAB. This opportunity will also be utilized to
increase familiarity with the Symbolic Math Toolbox and incorporate it with the MATLAB
commands that were used in the earlier chapters.
Function of One Variable: f(x) identifies a function of one variable:
f(x)=12+(x-1)2(x-2)(x-3)
is used as a specific example of such a function. The derivative of the function at the location x is
written as

x is the point about which the derivative is computed. △x is the distance to a neighboring point
whose location therefore will be x+△x. The value of the derivative is obtained as a limit of the
ratio of the difference in the value of the function at the two points (△f) the distance separating the
two points (△x), as this separation is reduced to zero. The computation of the derivative for the
specific example is usually obtained using the product rule. Results from the exercise on symbolic
computation provide the result

Table 4.1 illustrates the limiting process for the derivative of the function chosen as the example.
The derivative is being evaluated at point 3. As the displacements are reduced, the value of the
ratio approaches the value of 4 which is the actual value of the derivative. When △x is large (with
a value of 1), the ratio is 18. This is significantly different from the value of 4. With △x of 0.1 the
value is around 4.851. With further reduction to 0.001 the ratio has a value of 4.008. From these
computations it can be expected that as △x approaches 0 the ratio will reach the exact value of 4.0.
Numerical Derivative Computation: Many numerical techniques in NLP require the
computation of derivatives. Most software implementing these techniques do not use symbolic
computation. Automation and ease of use require that these derivatives be computed numerically.
The results in Table 4.1 justify the numerical computation of a derivative through a technique
called the first forward difference [5]. Using a very small perturbation △x, the derivative at a point
is numerically calculated as the ratio of the change in the function value, △f, to the change in the
displacement, △x. For the example, the derivative at x=3 with △x=0.001 is obtained as
The derivative for the single-variable function at any value x is also called the slope or the
gradient of the function at that point. If a line is drawn tangent to the function at the value x, the
tangent of the angle that this line makes with the x-axis will have the same value as the derivative.
If this angle is θ, then

Figure 4.2 illustrates the tangency property of the derivative.


MATLAB Code: Figure 4.2 was created by the code tangent.m. In the same directory, the
code derivative.m provides the illustration of tangency through a figure animation in MATLAB.
In the figure window both the function (drawn as the curve) as well as the line representing △f and
△x are drawn. As the neighboring points are closer, it is evident from the figure that the curve can
be approximated by a straight line. When △x=0.001, the derivative is tangent to the curve as they
are indistinguishable.
Higher Derivatives: The derivative of the derivative is the called the second derivative.
Formally it is defined as

Similarly the third derivative is defined as


This can go on provided the function has sufficient higher-order dependence on the independent
variable x.
Function of Two Variables: The two-variable function

is chosen to illustrate the required mathematical concepts. The first important feature of two or
more variables is that the derivatives defined for a single variable (ordinary derivatives) do not
apply. The equivalent concept here is the partial derivative. Partial derivatives are defined for each
independent variable. The partial derivative is denoted by the symbol ∂. These derivatives are
obtained in the same way as the ordinary derivative except the other variables are held at a
constant value. In the example, when computing the partial derivative of x the value of y is not
allowed to change.

The above relation expresses the partial derivative of the function f with respect to the variable x.
Graphically this suggests that the two points are displaced horizontally (assuming the x-axis is
horizontal). In the above expression (and prior ones too), the subscript after the vertical line
establishes the point about which the partial derivative with respect to x is being evaluated, (x,y)
represents any/all points. A similar expression can be written for the partial derivative of f with
respect to y. For this example:

For the point x=2, y=1:

Until this point we have made use of symbols representing changes in values (functions, variables)
without a formal definition. In this book
△( ): represents finite/significant changes in the quantity ( )
d( ), δ( ): represents differential/infinitesimal changes in ( )
Changes in functions occur due to changes in the variables. From calculus [1] the differential
change in f(x,y) (df) due to the differential change in the variables x (dx) and y (dy) is expressed as

For convenience and simplicity, the subscript representing the point where the expression is
evaluated is not indicated. The definition of the partial derivative is apparent in the above
expression as holding y at a constant value implies that dy=0. Another interpretation for the partial
derivative can be observed from the above definition: change of the function per unit change in
the variables.
Gradient of the Function: In the function of a single variable, the derivative was associated
with the slope. In two or more variables the slope is equivalent to the gradient. The gradient is a
vector, and at any point represents the direction in which the function will increase most rapidly.
Examining the conventional objective of NLP, minimization of objective functions, the gradient
has a natural part to play in the development of methods to solve the problem. The gradient is
composed of the partial derivatives organized as a vector. Vectors in this book are column vectors
unless other noted. The gradient has a standard mathematical symbol. It is defined as

At this stage it is appropriate to consolidate this information using graphics. The graphical
description of the example defined in Equation (4.18) has to be three dimensional as we need one
axis for x, another for y, and the third for f(x,y). Chapter 2 introduced three-dimensional plotting
using MATLAB. A more useful description of the problem is to use contour plots. Contour plots
are drawn for specific values of the function. In the illustration the values for the contours of f are
0, 1, 2, and 3. In Figure 4.3, two kinds of contour plots are shown. In the top half a three-
dimensional (3D) contour plot is shown, while on the lower half the same information is presented
as a two-dimensional (2D) contour plot (this is considered the standard contour plot). The 2D
contour plot is more useful for graphical support of some of the ideas in NLP. The 3D contour will
be dispensed within the remainder of the book.
MATLAB Code: Fig4_3.m. The annotation in Figure 4.3 (including the tangent line and the
gradient arrow) was done through the plot editing commands available on the menu bar in the
figure window (MATLAB version 5.2 and later). A major portion of the plots are generated
through the statements in the m-file indicated above. The code mixes numeric and symbolic
computation. It allows multiple plots and targets graphic commands to specific plots.
Discussion of Figure 4.3: In Figure 4.3 point P is on contour f=0. Point Q is on the contour f=2.
Point S is on the contour f=1. Point R has the same y value as point P and the same x value as
point Q. For the values in the figure the contour value is 0.75 (this value should be displayed in
MATLAB window when the code Fig4_3.m is run).
line PQ is a measure △f
line PR represents changes in △f when △y is 0
line RQ represents changes in △f when △x is 0
Mathematically, △f can only be estimated by adding the changes along the lines PR and RQ since
the definition of the partial derivatives only permits calculating changes along the coordinate
directions.
At point S the dotted line is the tangent. The gradient at the same point is normal
(perpendicular) to the tangent and is directed toward increasing the value of the function
(indicated by the arrow). By definition, if df represents a differential move along the gradient at
any point, then (dx, dy are measured along the gradient vector)
From the figure the value of f will change along gradient direction. If df represents a differential
move along the tangent line, then (dx, dy are measured along tangent line)

Equation (4.23) should be zero because moving tangentially (a small amount) the value of f is not
changed.
Jacobian: The Jacobian [J] defines a useful way to organize the gradients of several functions.
Using three variables and two functions f(x,y,z) and g(x,y,z) the definition of the Jacobian is

In Equation (4.24) the gradients of the function appear in the same row. The first row is the
gradient of f while the second row is the gradient of g. lf the two functions are collected into a
column vector, the differential changes [df dg]T in the functions, due to the differential change in
the variables [dx dy dz], can be expressed as a matrix multiplication using the Jacobian

which is similar to Equation (4.20).


Hessian: The Hessian matrix [H] is the same as the matrix of second derivatives of a function of
several variables. For f(x,y)

The Hessian matrix is symmetric. For the example defined by Equation (4.18),
Function of n Variables: For f(X), where [X]=[x1,x2,…,xn]T, the gradient is

The Hessian is

Equations (4.27) and (4.28) will appear quite often in succeeding chapters. A few minutes of
familiarization will provide sustained comprehension later.

4.2.3 Taylor's Theorem/Series


Single Variable: The Taylor series is a useful mechanism to approximate the value of the
function f(x) at the point (xp+△x) if the function is completely known at point xp. The expansion is
(for finite n)

The series is widely used in most disciplines to establish continuous models. It is the mainstay of
many numerical techniques, including those in optimization. Equation (4.29) is usually truncated
to the first two or three terms with the understanding the approximation will suffer some error
whose order depends on the term that is being truncated:

If the first term is brought to the left, the equation, discarding the error term, can be written as

In Equation (4.31), the first term on the right is called the first-order/linear variation while the
second term is the second-order/quadratic variation.
Figure 4.4 demonstrates the approximations using Taylor's series of various orders at point
2.5 with respect to the original function (red). The principal idea is to deal with the approximating
curve which has known properties instead of the original curve. The constant Taylor series is
woefully inadequate. The linear expansion is only marginally better. The quadratic expansion
approximates the right side of the function compared to the left side which is in significant error.
The fifth-order expansion is definitely acceptable in the range shown. To see how the Taylor series
is used, consider the quadratic curve about the point x=2.5. At this point the value of the function
f(2.5)=11.4375, the value of the first derivative f'(2.5)=-0.75, and the value of the second
derivative is f"(2.5)=4. Using Equation (4.31)

For different values of △x, both positive and negative, the value of f(x) can be obtained. The plot
of Equation (4.32) should be the same as the one labeled quadratic in Figure 4.4.

MATLAB Code: Figure 4.4 is completely created using the code Fig4_4.m. It uses the MATLAB-
provided symbolic “taylor" function to generate the various expansions about point 2.5.
Two or More Variables: The series are only expanded to the quadratic terms. The truncation error
is ignored. The two-variable function expansion is shown in detail and also organized in terms of
vectors and matrices. The first -order expansion is expressed in terms of the gradient. The second-
order expansion will be expressed in terms of the Hessian matrix
If the displacements are organized as a column vector [△x △y]T, the expansion in (4.33) can be
expressed in a condensed manner as

For n variables, with Xp the current point and △X the displacement vector,

4.3 GRAPHICAL SOLUTIONS


The graphical solutions are presented for three kinds of problems: unconstrained, equality
constrained, and inequality constrained, based on the functions in Example 4.1.

4.3.1 Unconstrained Problem

The side constraints serve to limit the design space. Considering the objective function in
(4.36), it is clear that the minimum value of f will be realized if the variables are at the maximum
(x1*=3, x2*=3). The value of the function is -9. Figure 4.5 il1ustrates the problem. The solution is
at the boundary of the design space. Fig4_5.m is the MATLAB code that will produce Figure 4.5.
The maximum value of the function is 0 (based on the side constraints). In Figure 4.5 the tangent
and the gradient to the objective function at the solution are drawn in using MATLAB plotedit
commands. In this particular example, the side constraints are necessary to determine the solution.
If the design space were increased the solution would correspondingly change.
The example chosen to illustrate the unconstrained problem defined by (4.36) and (4.37) is
not usually employed to develop the optimality conditions for general unconstrained problems.
Two requirements are expected to be satisfied by such problems: (1) the solution must be in the
interior of the design space and (2) there must be a unique solution, or the problem is unimodal.
Nevertheless, in practical applications these conditions may not exist. Most software are
developed to apply the optimality conditions and they rarely verify these requirements are being
met. It is usually the designer's responsibility to ensure these requirements are met. To develop the
optimality conditions an alternate unconstrained problem is presented:

Figure 4.6 displays the problem and its solution. Fig4_6.m provides the code. that will generate
most of Figure 4.6. It is clear that the solution appears to be at x1*=2 and x2*=2. The optimal value
of the objective function is -2. In the contour plot, the optimum is a point making it difficult to
draw the gradient. If this is the case, then in three dimensions, at the optimum, the gradient will lie
in a plane tangent to the function surface. Moving in this plane should not change the value of the
objective function. This observation is used to develop the necessary conditions later.

4.3.2 Equality Constrained Problem


For a two-variable problem we can only utilize one constraint for a meaningful optimization
problem:
Figure 4.7 (through Fig4_7.m) illustrates the problem. The dashed line is the constraint. Since it is
an equality constraint, the solution must be a point on the dashed line. The contour of f=-1 appears
to just graze the constraint and therefore is the minimum possible value of the function without
violating the constraint. In Figure 4.7, the gradient of the objective function as we Il as the
gradient of the constraints at the solution are illustrated. It appears that at this point these gradients
are parallel even though they are directed opposite to each other. By definition the gradient is in
the direction of the most rapid increase of the function at the selected point. This is not a
coincidence. This fact is used to establish the necessary conditions for the problem.

4.3.3 Inequality Constrained Problem


The number of inequality constraints is not dependent on the number of variables. For illustration,
both constraint functions of Example 4.1 are formulated as inequality constraints.
Figure 4.8 illustrates the graphical solution. The solution is at the intersection of the dotted lines.
The code is available in Fig4_8.m. The gradients are drawn using the plotedit functions on the
menubar in the window. Also font style and font size have been adjusted using the plotedit
commands. The optimal solution must lie on or to the left of the dashed line. Similarly it must lie
below or on the dashed curve. Simultaneously it should a1so decrease the objective function value
as much as possible.

4.3.4 Equality and Inequality Constraints


Example 4.1 is developed with both types on constraints present. It is reproduced here:
The graphical solution of Section 4.3.3 suggests that the inequality constraint will be active at the
solution. If that is the case, then Figure 4.8 will once again represent the graphical solution to this
section also. This will be verified in the next section.

4.4 ANALYTICAL CONDITIONS


Analytical conditions refer to the necessary and sufficient conditions that will permit the
recognition of the solution to the optimal design problem. The conditions developed here
empower the numerical techniques to follow later. They are introduced in the same sequence as in
the previous section. Instead of formal mathematical details, the conditions are established less
formally from the geometrical description of the problem and/or through intuitive reasoning.
Formal development of the analytical conditions can be found in References 6-8. For establishing
the conditions it is expected that the solution is in the interior of the feasible region, and there is
only one minimum.

4.4.1 Unconstrained Problem


The problem used for illustration is

Figure 4.6 provided a contour plot of the problem. Figure 4.9 provides a three-dimensional plot of
the same problem. It is a mesh plot. The commands to create Figure 4.9 are available in Fig4_9.m.
The figure needs to be rotated interactively to appear as illustrated. A tangent plane is drawn at the
minimum for emphasis. Figure 4.9 will be used to identify the properties of the function f(xl,x2) at
the minimum.
The minimum is identified by a superscript asterisk (X* or [x1*, x2*]). Studying Figure 4.9,
the function has a minimum value of -2, while x1*=2 and x2*=2. If the values of x1 and/or x2 were
to change even by a slight amount from the optimal value, in any direction, the value of the
function wil1 certainly increase since X* is the lowest point of the concave surface representing
the function f. Representing the displacement from the optimum values of the variables as △X,
and the change in the function value from the optimum as △f, from direct observation it is clear
that the optimal solution must be a point that satisfies

First-Order Conditions: The same idea can be applied in the limit, that is, for infinitesimal
displacement dx1 and dx2 about X*. The function itself can be approximated by a plane tangent to
the function at the solution (shown in Figure 4.9)(consider this as a first-order Taylor series
expansion). Moving to any point in the plane from the optimum (see Figure 4.9) will not change
the value of the function, therefore df=0. Moving away from optimum implies that dx1 and dx2 are
not zero. Rewriting Equation (4.20) in terms of x1 and x2
Since this should hold for all points in the plane, dx1≠0 and dx2≠0. Therefore,

or the gradient of f at the optimum must be zero. That is,

Equation (4.43) expresses the necessary condition, or first-order conditions (FOC), for
unconstrained optimization. It is termed the first-order condition because Equation (4.43) uses the
gradient or the first derivative. Equation (4.43) is used to identify the possible solutions to the
optimization problem. If the function were to flip over so that the same function were to maximize
the value of the function, the solution for the variables will be at the same value of the design
variable. It is clear that Equation (4.43) applies to the maximum problem also. Equation (4.43) by
itself will not determine the minimum value of the function. Additional considerations are
necessary to ensure that the solution established by the first-order conditions is optimal, in this
case a minimum. For a general unconstrained problem, the necessary conditions of Equation
(4.43) can be stated as

which is primarily a vector expression of the relation in Equation (4.43). Equation (4.44) is used to
establish the value of the design variables X* both analytically and numerically.
Second-Order Conditions: The second-order conditions (SOC) are usually regarded as
sufficient conditions. It can be inferred that these conditions will involve second derivatives of the
function. The SOC is often obtained through the Taylor expansion of the function to second order.
If X* is the solution, and △X represents the change of the variables from the optimal value which
will yield a change △f, then

This is similar to Equation (4.35) except the expansion is about the solution. △f must be greater
than zero. Employing the necessary conditions (4.44), the first term on the right-hand side of
Equation 4.45 is zero. This leaves the following inequality

where H(X*) is the Hessian matrix (the matrix of second derivatives) of the function f at the
possible optimum value X*. For the relations in Equation (4.46) to hold, the matrix H(X) must be
positive definite. There are three ways to establish the H is positive definite.
(i) For all possible △X, △XTH(X*)△X>0.
(ii) The eigenvalues of H(X*) are all positive.
(iii) The determinants of all lower orders (submatrices) of H(X*) that include the main
diagonal are all positive.
Of the three, only (ii) and (iii) can be practically applied, which is illustrated below.
Example

FOC

Equations (4.47a) and (4.47b) represent a pair of linear equations which can be solved as x1*=2,
x2*=2. The value of the function f is -2. So far only the necessary conditions have been satisfied.
The values above can also refer to a poi nt where the function is a maximum, or where there is a
saddle point.
SOC: For this problem [see Equation (4.26)]

Is it positive definite?
(i) Not possible to test all △X
(ii) To calculate the eigenvalues of H
The eigenvalues are λ=1, λ= 3 and the matrix is positive definite.
(iii) To calculate determinants of all orders that include the main diagonal and include the
element in the first row and first column,

The matrix is positive definite.


Sec_4_4-1.m provides the confirmation of the numerical values for this example using
Symbolic Toolbox and basic MATLAB commands.

4.4.2 Equality Constrained Problem


The problem:

Figure 4.7 illustrated the graphical solution. It was noticed that at the solution, the gradient of the
objective function and the gradient of the constraint were parallel and oppositely directed.
Examining other feasible points in Figure 4.7(on the dashed curve) it can be ascertained that the
special geometrical relationship is only possible at the solution. At the solution a proportional
relationship exists between the gradients at the solution. Using the constant of proportionality λ1 (a
positive value) the relationship between the gradients can be expressed as

Equation (4.48) is usually obtained in a more formal way using the method of Lagrange
multipliers.
Method of Lagrange: In this method, the problem is transformed by introducing an
augmented function, called the Lagrangian, as the objective function subject to the same equality
constraints. The Lagrangian is defined as the sum of the original objective function and a linear
combination of the constraints. The coefficients of this linear combination are known as the
Lagrange multipliers. With reference to the example in this section

The complete problem is developed as


Is the solution to the transformed problem [Equations (4.50), (4.39), (4.37)] the same as the
solution to the original problem [Equations (4.36), (4.39), (4.37)]? If the design is feasible, then
most definitely yes. For feasible designs, h1(x1,x2)=0, and the objective functions in Equations
(4.36) and (4.50) are the same. If design is not feasible, then by definition there is no solution
anyway.
The FOC are obtained by considering F(x1,x2,λ1) as an unconstrained function in the variables
x1,x2,λ1. This provides three relations to solve for x1*,x2*,λ1*:

Equations (4.51) express the FOC or necessary conditions for an equality constrained problem in
two variables. The last equation in the above set is the constraint equation. This ensures the
solution is feasible. The first two equations can be assembled in vector form to yield the same
information expressed by Equation (4.48), which was obtained graphically. The left-hand
expressions in the first two equations above are the gradient of the Lagrangian function:

Applying Equations (4.51) to the example of 出 is section:

Equations (4.52) represent three equations in three variables which should determine the values
for x1*, x2*, λ1*. Note that Equations (4.52) only define the necessary conditions, which means the
solution could be a maximum also.
There is one problem with respect to the set of Equations (4.52). The equations are a
nonlinear set. Most prerequisite courses on numerical methods do not attempt to solve a nonlinear
system of equations. Usually, they only handle a single one through Newton-Raphson or the
bisection method. In fact, NLP or design optimization is primarily about techniques for solving a
system of nonlinear equations, albeit of specific forms. This means Equations (4.52) cannot be
solved until we advance further in this course. Fortunately, having a tool like MATLAB obviates
this difficulty, and provides a strong justification of using a numerical/symbolic tool for
supporting the development of the course.
MATLAB Code: In MATLAB, there are two ways of solving Equations (4.52), using
symbolic support functions or using the numerical support functions. Both procedures have
limitations when applied to highly nonlinear functions. The symbolic function is solve and the
numerical function is fsolve. The numerical technique is an iterative one and requires you to
choose an initial guess to start the procedure. Quite often several different guesses may be
required to find the solution. Solutions of systems of equations are fundamental to the rest of the
book. The following code is included as a hands-on exercise. It is available as Sec4_4_2.m. It
requires eqns4_4_2.m to execute fsolve command. The contents of the files are listed below
Sec4_4_2.m
% Necessary/Sufficient conditions for
% Equality constrained problem
%
% Optimization with MATLAB, Section 4.4.2
% Dr. P. Venkataraman
%
% Minimize f(x1,x2)=-x1
%
%-------
% symbolic procedure
%----
% define symbolic variables
format compact.
syms x1 x2 lam1 h1 F
% define F
F=-x1*x2+lam1*(x1*x1/4+x2*x2-1);
h1=x1*x1/4+x2*x2-1;
% the gradient of F
grad1=diff(F,x1);
grad2=diff(F,x2);
% optimal values
% satisfaction of necessary conditions
[lams1 xs1 xs2]=solve(grad1,grad2,h1,'x1,x2,lam1');
% the solution is returned as a vector of
% the three unknowns in case of multiple solutions
% lams1 is the solution vector for lam1 etc.
% IMPORTANT: the results are sorted alphabetically
% fprintf is used to print a string in the
% command window
% disp is used to print values of matrix
f=xs1.*xs2;
fprintf('The solution (x1*, x2*, lam1*, f*):\n'), …
disp(double([xs1 xs2 lams1 f]))
% ----------
% Numerical procedure
%----
% solution to non-linear system using fsolve
% see help fsolve
%
% the unknowns have to be defined as a vector
% the functions have to be set up in an m-file
% define initial values
xinit=[l 1 0.5]'; % initial guess for x1, x2, 1am1
% the equations to be solved are available in
% eqns4_4_2.m
xfina1=fso1ve('eqns4_4_2', xinit);
fprintf('The numerical solution (x1*, x2*, lam1*): \n'),
disp(xfina1);
eqns4_4_2.m
function ret=eqns4_4_2(x)
% x is a vector
% x(1)=x1, x(2)=x2, x(3)=lam1
ret=[(-x(2)+0.5*x(1)*x(3)), …
(-x(1)+2*x(2)*x(3)), …
(0.25*x(1)*x(1)+x(2)*x(2)-1)];
Output In MATLAB Command Window
The solution (x1*, x2*, lam1*, f*) :
1.4142 0.7071 1.0000 -1.0000
-1.4142 -0.7071 1.0000 -1.0000
-1.4142 0.7071 -1.0000 1.0000
1.4142 -0.7071 -1.0000 1.0000
Optimization terminated successful1y:
Relative function value changing by less than OPTIONS.TolFun
The numerica1 solution (x1*, x2*, lam1*) :
1.4141
0.7071
0.9997
The symbolic computation generates four solutions. Only the first one is valid for this problem.
This is decided by the side constraints expressed by Equation (4.37). This is an illustration of the a
priori manner by which the side constraints affect the problem. On the other hand, the numerical
techniques provide only one solution to the problem. This is a function of the initial guess.
Generally, numerical techniques will deliver solutions closest to the point they started from. The
solution is
x1*=1.4141; x2*=0.7071; λ1*=1.0
The solutions for x1 and x2 can be verified from the graphical solution in Figure 4.7. The solutions
can also be verified through the hand-held calculator. The value and sign for λ are usually
immaterial for establishing the optimum. For a well-posed problem it should be positive since
increasing the constraint value is useful only if we are trying to identify a lower value for the
minimum. This forces h and f to move in opposite directions (gradients). Increasing constraint
value can be associated with enlarging the feasible domain which may yield a better design.
Lagrange Multipliers: The Lagrange multiplier method is an elegant formulation to obtain
the solution to a constrained problem. In overview, it seems strange that we have to introduce an
additional unknown (λ1) to solve the constrained problem. This violates the conventional rule for
NLP that the fewer the variables, the better the chances of obtaining the solution. As indicated in
the discussion earlier, the Lagrangian allows the transformation of a constrained problem into an
unconstrained problem. The Lagrange multiplier also has a physical significance. At the solution it
expresses the ratio of the change in the objective function to the change in the constraint value. To
illustrate this consider:

At the solution, the FOC deems that dF=0 (which can also be seen in the above detailed
expansion). Hence,

The above dependence does not affect the establishment of the optimal design. It does have an
important role in the discussion of design sensitivity in NLP problems.
General Equality Constrained Problem: Remembering n-l>0,

The augmented problem with the Lagrangian:

In Equation (4.53), three equivalent representations for the Lagrangian are shown. The FOC are
Equations (4.54) and (4.6) provide the n+l relations to determine the n+l unknowns X*, λ*.
Equation (4.8) is used after the solution is obtained, if necessary. Equation (4.54) is also expressed
as

Second-Order Conditions: At the solution determined by the FOC, the function should
increase for changes in △X. Changes in △X are not arbitrary. They have to satisfy the linearized
equality constraint at the solution. 1t is taken for granted that the sufficient conditions are usually
applied in a small neighborhood of the optimum. Also, changes are contemplated only with
respect to X and not with respect to the Lagrange multiplier. The Lagrange method is often called
the method of undetermined coefficient——indicating it is not a variable. In the analytical
derivation of the FOC for this problem the Lagrangian F was considered unconstrained.
Borrowing from the previous section (unconstrained minimization), the SOC can be expected to
satisfy the following relations:

In the above [▽2F(X*)] is the Hessian of the Lagrangian, with respect to the design variables only
evaluated at the solution. Also, the FOC require that ▽F(X*)=0. With reference to two variables
and one constraint:

Substitute Equation (4.58) in Equation (4.57) and the SOC requires that the expression in braces
must be positive. The derivatives in Equations (4.57) and (4.58) are evaluated at the minimum.
Applying the second-order condition to the example of this section is left as an exercise for the
reader. Compared to the SOC for the unconstrained minimization problem, Equations (4.57) and
(4.58) are not easy to apply, especially the substitution of Equation (4.58). From a practical
perspective the SOC is not imposed for equality constrained problems. It is left to the designer to
ensure by other means that the solution is a minimum solution.

4.4.3 Inequality Constrained Optimization


The problem

The number of variables (n=2) and the number of inequality constraints (m=2) do not depend on
each other. In fact, an inequality constrained problem in a single variable can be usefully designed
and solved. Section 4.4.2 solved the equality constrained problem. If the above problem can be
transformed to an equivalent equality constrained problem, then we have found the solution. The
standard transformation requires a slack variable zj for each inequality constraint gj. Unlike LP
problems, the slack variable for NLP is not restricted in sign. Therefore, the square of the new
variable is added to the left-hand side of the corresponding constraint. This adds a positive value
to the left-hand side to bring the constraint up to zero. Of course a zero value will be added if the
constraint is already zero.
Transformation to an Equality Constrained Problem

There are four variables (x1, x2, z1, z2) and two equality constraints. It is a valid equa1ity
constrained problem. The Lagrange multiplier method can be applied to this transformation. To
distinguish the multipliers associated with inequality constraints the symbol β is used. This is
strictly for clarity.
Method of Lagrange: The augmented function or the Lagrangian is:

If the Lagrangian is considered as an unconstrained objective function, the FOC (necessary


conditions) are
Equations (4.62e) and (4.62f) are equality constraints. Equation set (4.62) provides six equations
to solve for x1*, x2*, z1*, z2*, β1*, β2*. By simple recombination, Equations (4.62c) to (4.62f) can
be collapsed to two equations, while the slack variables z1 and z2 can be eliminated from the
problem.
First multiply Equation (4.62c) by z1. Replace z1 by –g1 from Equation (4.62e). Drop the negative
sign as well as the coefficient 2 to obtain

Equation (4.63b) is obtained by carrying out similar manipulations with Equations (4.62d) and
(4.62f). The FOC can be restated as

These four equations have to be solved for x1*, x2*, β1*, β2*. Note that z1, z2 are not being
determined-which suggests that they can be discarded from the problem altogether. It would be
useful to pretend z's never existed in the first place.
Equations (4.63) lay out a definite feature for a nontrivial solution: either βj is zero (and gj≠0)
or gj is zero (βj≠0). Since simultaneous equations are being solved, the conditions on the
multipliers and constraints must be satisfied simultaneously. For Equations (4.63), this translates
into the following four cases. The information on g in brackets is to emphasize an accompanying
consequence.

In Equation (4.64), if βj≠0 (or corresponding gj=0), then the corresponding constraint is an
equality. In the previous section a simple reasoning was used to show that the sign of the
multiplier must be positive (>0) for a well-formulated problem. While the sign of the multiplier
was ignored for the equality constrained problem, it is included as part of the FOC for the
inequality constrained problem. Before restating Equation (4.64), the Lagrangian is reformulated
without the slack variables as (we are going to pretend z never existed)

which is the same formulation in Equation (4.53). The slack variable was introduced to provide
the transformation to an equality constraint. It is also evident that the construction of the
Lagrangian function is insensitive to the type of constraint. Since the multipliers tied to the
inequality constraint are required to be positive, while those corresponding to the equality
constraints are not , this book will continue to distinguish between the multipliers. This will
serve to enforce clarity of presentation. The FOC for the problem:

Equation set (4.62) and any single case in Equation (4.64) provides four equations to solve for the
four unknowns of the problem. All four sets must be solved for the solution. The best design is
decided by scanning the several solutions.
The sign of the multiplier in the solution is not a sufficient condition for the inequality
constrained problem. The value is unimportant for optimization but may be relevant for sensitivity
analysis. Generally a positive value of the multiplier indicates that the solution is not a local
maximum. Formally verifying a minimum solution requires consideration of the second derivative
of the Lagrangian. In practical situations, if the problem is well defined, the positive value of the
multiplier usually suggests a minimum solution. This is used extensively in the book to identify
the optimum solution.
Solution of the Example: The Lagrangian for the problem is defined as

The FOC are

The side constraints are

Case a: β1=0; β2=0: The solution is trivial and by inspection of Equations (4.66a) and (4.66b).

The inequa1ity constraints are satisfied. The side constraints are satisfied. The values in Equation
(4.67) represent a possible solution.
The solutions for the other cases are obtained using MATLAB. The code is available in
Sec4_4_3.m. For Cases b and c the appropriate multiplier is set to zero and the resulting three
equations in three unknowns are solved. For case d the complete set of four equations in four
unknowns is solved. Case d is also solved numerically. The MATI.AB code will a1so run code
Fig4_8.m which contains the commands to draw the figure. The output from the Command
window is patched below.
The solution *** Case a ***(x1*,x2*,f*,91,92)
0 0 0 -30 -1
The solution ***Case (b)*** (x1*, x2*, b2* f*, g1, g2):
1.4142 0.7071 1.0000 -1.0000 8.8909 0
-1.4142 -0.7071 1.0000 -1.0000 -68.8909 0
-1.4142 0.7071 -1.0000 1.0000 -47.6777 0
1.4142 -0.7071 -1.0000 1.0000 -12.3223 0
The solution *** Case (c) *** (x1*, X2*, b1* f*, g1, 92):
0.7500 1.0000 0.0500 -0.7500 0 0.1406
The solution ***Case (d) ***(x1*, x2*, bl* b2* f*, g1, g2):
1.8150 -0.4200 0.0426 -1.4007 0.7624 0 0
0.8151 0.9132 0.0439 0.0856 -0.7443 0 -0.0
Maximum number of function evaluations exceeded;
Increase options.MaxFunEvals
Optimizer is stuck at a minimum that is not a root
Try again with a new starting guess
The numerical solution (x1*, x2*, b1*, b2*):
0.8187 0.9085 0.0435 0.0913
The solution for Case a was discussed earlier and is confirmed above.
Case b has four solutions. The first one is unacceptable because constraint g1 is not satisfied.
The second solution is feasible ns far as the function as constraints g1, and g2 are concerned but
they do not satisfy the side constraints. The third and fourth solutions are unacceptable for the
same reason. Thus, Case b is not an optimal solution. Case c is unacceptable because constraint g2
is in violation. Case d has two solutions the first of which is not acceptable for several reasons.
The second solution satisfies all of the requirements
•It satisfies the constraints. Both constraints are active.
•The multipliers are positive (maybe a sufficient condition).
•It satisfies the side constraints.
The solution is

This is almost confirmed by the numerical solution which appears to have a problem with
convergence. It is almost at the solution. It must be noted here that the MATLAB function used to
solve the problem is part of the standard package. The functions in the Optimization Toolbox will
certain1y do better. Nevertheless, the incomplete solution is reported below but will be ignored
due to the symbolic solution available in Case d.
x1=0.8187; x2=0.9085; β1=0.0435; λ1=0.0913
There are now two candidates for the solution: the trivial solution in Case a and the solution in
Case d. The solution in Case d is favored as it has a lower objective function value. All of the
cases above can be explored with respect to Figure 4.8. It is an excellent facilitator for
comprehending the cases and the solution.

4.4.4 General Optimization Problem


The general optimization problem is described in the set of Equations (4.1)-(4.8). The specific
problem in this chapter defined in Section 4.3 is

The FOC for this problem is a combination of the conditions in Sections 4.4.2 and 4.4.3. No new
concepts are required. The Lagrange multiplier method is again utilized to set up the FOC. In the
following development the multipliers are kept distinct for comprehension.
Lagrange Multiplier Method: The problem is transformed by minimizing the Lagrangian

The FOC

Two solutions must be examined. The first, Case a, requires the solution of a system of three
equations in three unknowns λ1, x1, x2 using Equations (4.68) and (4.69a). The second is a system
of four equations in the four unknowns λ1, β1, x1, x2 using Equations (4.68) and (4.69b). Solution of
this optimization problem is one of the exercises at the end of the chapter.
Kuhn-Tucker Conditions: The FOC associated with the general optimization problem in
Equations (4.1)-(4.4) or (4.5)-(4.8) is termed the Kuhn-Tucker conditions. These conditions are
established in the same manner as in the previous section. The general optimization problem
(repeated here for sake of completeness) is

The Lagrangian:
There are n+l+m unknowns. The same number of equations are required to solve the problem.
These are provided by the FOC or the Kuhn-Tucker conditions:
n equations are obtained as

l equations are obtained directly through the equality constraints

m equations are app1ied through the 2 m cases. This implies that there are 2 m possib1e solutions.
These solutions must include Equations (4.71) and (4.72). Each case sets the multiplier βj or the
corresponding inequality constraint gj to zero. If the multiplier is set to zero, then the
corresponding constraint must be feasible for an acceptable solution. If the constraint is set to zero
(active constraint), then the corresponding multiplier must be positive for a minimum. With this in
mind the m equations can be expressed as

If conditions in Equations (4.73) are not met, the design is not acceptab1e. In implementing
Equation (4.73), for each case a simu1taneous total of m values and equalities must be assigned.
Once these FOC conditions determine a possible solution, the side constraints have to be checked.
As evidenced in the examples earlier. This is not built into the FOC. It is on1y confirmed after a
possib1e solution has been identified. Equations (4.71)-(4.73) are referred to as the Kuhn-Tucker
conditions [see Example 4.3 (Section 4.5.2) for additional discussion of the Kuhn-Tucker
conditions].

4.5 EXAMPLES
Two examples are presented in this section. The first is an unconstrained problem that has
significant use in data reduction. Specifically, it illustrates the problem of curve fitting or
regression. The second is the beam design problem explored in Chapters 1 and 2.

4.5.1 Example 4.2


Problem: A set of y, z data is given. It is necessary to find the best straight line through the
data. This exercise is termed curve fitting or data reduction or regression. To keep this example
simple the data points are generated using the parabolic equation z=0.5y2. The data are generated
in arbitrary order and nonuniform intervals.
Least Squared Error: The objective function that drives a large number of curve
fitting/regression methods is the minimization of the least squared error. The construction of the
objective requires two entities-the data and the type of mathematical relationship between the data.
In this example it is a straight line. Generally it can be any polynomial, or any function for that
manner. This is usually the way correlation equations are determined in experimental fluid
dynamics and heat transfer. For this example the data are the collection of points (yi, zi), i=1,2,...,n.
The expected straight line can be characterized by

where x1 is the slope and x2 is the intercept. Using optimizing terminology, there are two design
variables. For a cubic polynomial, there will be four design variables. If zpi is the fitted value for
the independent variable yi, the objective function, which is the minimum of the square of the error
over all of the data points, can be expressed as

It is possible to complete the formulation with side constraints although that is not necessary. The
FOC are

Expanding the expressions in the brackets and reorganizing as a linear matrix equation

Note that the matrices can be set up easily and solved using MATLAB. The Hessian matrix is the
square matrix on the left The SOC requires that it be positive definite.
Code: Sec4_5_1.m: Example 4.2 is solved using MATLAB and the results from the Command
window are shown below. Compare this code with that in Chapter 1. The results are displayed on a
plot (not included in the book). New features in the code are the use of random numbers, using the
sum command, and writing to the Command window. From the plot and the tabular values it is
clear that a linear fit is not acceptable for this example. The quality of the fit is not under
discussion in this example as it is apparent that a nonlinear fit should have been chosen. This is
left as an exercise. The following will appear in the Command window when the code is run.
Results from Linear fit
objective function: 16.5827
design variables x1, x2:
2.3458
-1.5305
yi zi zp diff
4.7427 11.2465 9.5951 1.6514
3.7230 6.9304 7.2031 -0.2727
4.3420 9.4266 8.6552 0.7714
0.3766 0.0709 -0.6470 0.7179
2.4040 2.8895 4.1089 -1.2193
4.1054 8.4272 8.1002 0.3270
4.1329 8.5404 8.1646 0.3757
3.3369 5.5674 6.2973 -0.7299
2.7835 3.8739 4.9991 -1.1253
4.2032 8.8334 8.3296 0.5039
3.1131 4.8456 5.7723 -0.9267
3.1164 4.8561 5.7802 -0.9241
3.2415 5.2538 6.0737 -0.8199
0.1954 0.0191 -1.0720 1.0911
0.7284 0.2653 0.1781 0.0871
4.5082 10.1619 9.0450 1.1168
1.9070 1.8183 2.9430 -1.1247
0.4974 0.1237 -0.3638 0.4874
1.9025 1.8098 2.9325 -1.1227
0.1749 0.0153 -1.1203 1.1356
Eigenvalues of Matrix A:
205.4076
4.5422

4.5.2 Example 4.3


This example is the same as Example 2.3, the flagpole problem. It is quite difficult to solve
and is chosen here to reveal several practical features that are not easily resolved. These are
situations where the designer will bring his experience and intuition into play. The problem was
fully developed in Section 2.3.3 and is not repeated here. The graphical solution is available in
Figure 2.9 and 2.10. These figures are exploited in this section to determine the correct
combination of β’s for applying the Kuhn-Tucker conditions. If all of the parameters are
substituted in the various functions, the relations can be expressed with numerical coefficients as:
Kuhn-Tucker conditions: Four Lagrange multipliers are introduced for the four inequality
constraints and the Lagranian is

where X=[x1, x2] and β=[β1,β2,β3, β4]T. The Kuhn-Tucker conditions require two equations using
the gradients of the Lagrangian with respect to the design variables

The actual expression for the gradients obtained from Equation (4.80) is left as an exercise for the
student. The Kuhn-Tucker conditions are applied by identifying the various cases based on the
activeness of various sets of constraints.
For this problem n=2 and m= 4. There are 2m=24=16 cases that must be investigated as part of
the Kuhn-Tucker conditions given by Equation (4.73). While some of these cases are trivial,
nevertheless it is a formidable task. This problem was solved graphically in Chapter 2 and the
same graphical solution can be exploited to identify the particular case that needs to be solved for
the solution. Visiting Chapter 2 and scanning Figure 2.9, it can be identified that constraints g1 and
g3 are active (as confirmed by the zoomed solution in Figure 2.10). The solution is x1*=0.68m and
x2*=0.65m.
If g1 and g3 are active constraints, then the multipliers β1 and β3 must be positive. By the same
reasoning the multipliers associated with the inactive constraints g2 and g4, β2 and β4, must be set
to zero. This information on the active constraints can be used to solve for x1*, x2* as this is a
system of two equations in two unknowns. This does not, however, complete the satisfaction of
the Kuhn-Tucker conditions. β1 and β3 must be solved and verified that they are positive. g2 and g4
must be evaluated and verified that they are less than zero.
Sec4_5_2.m is the code segment that is used to solve the particular case discussed above.
The problem is solved symbolically. Two versions of the problem are solved: the original problem
and a scaled version. The objective function (4.76) is modified to have a coefficient of unity. This
should not change the value of the design variables (why?). You can experiment with the code and
verify if this is indeed true. Note the values of the Lagrange multipliers during this exercise and
infer the relation between the multipliers and the scaling of the objective function. The following
discussion assumes that the code has been run and values available.
Referring to the solution of the case above, the optimal values for the design variables are
x1*=0.6773 m and x2*=0.6443 m. This is very close to the graphical solution. Before we conclude
that the solution is achieved, take note of the multipliers: β1*=5.2622e-010 and β3*=-0.1270. If the
multipliers are negative, then it is not a minimum. The values of the constraints at the optimal
values of the design are g2=-4.5553e+007 and g4= ー 0.0320. Both values suggest that the
constraints are inactive. At least the solution is feasible-all constraints are satisfied. The value of β1
is zero and it is also an active constraint (g1=0). This corresponds to the trivial case in the Kuhn-
Tucker conditions.
Examining other cases that involve intersection of different pairs of inequalities that are
assumed to be active, the trivial case shows up often (left as an exercise to explore). The only case
when the multipliers are positive is when g2 and g3 are considered active. The solution, however,
violates the side constraints. The graphical solution is rather explicit in identifying the minimum at
the points established in the first solution. The trivial case of the multiplier being zero when the
constraint is active is observed with respect to the first two constraints that express stress relations.
In general, the terms in these inequalities have large orders of magnitude, especially if there are
other equations that express constraints on displacements, areas, or the like. Large values exert a
large influence on the problem so much as to cause lower value constraints to have little or no
significance for the problem. This is usually a severe problem in numerical investigations. The
remedy for this is scaling of the functions.
Scaling: Consider the order of magnitude in Equations (4.77a) and (4.77c). Numerical
calculations are driven by larger magnitudes. Inequality (4.77c) will be ignored in relation to the
other functions even though the graphical solution indicates that g3 is active. This is a frequent
occurrence in all kinds of numerical technique. The standard approach to minimize the impact of
large variations in magnitudes among different equations is to normalize the relations. In practice
this is also extended to the variables. This is referred to as scaling the variables and scaling the
functions. Many current software will scale the problem without user intervention.
Scaling Variables: The presence of side constraints in problem formulation allows a natural
definition of scaled variables. The user-defined upper and lower bounds are used to scale each
variable between 0 and 1. Therefore,

In the original problem Equation (4.81b) is used to substitute for the original variables after which
the problem can be expressed in terms of scaled variable. An alternate formulation is to use only
the upper value of the side constraint to scale the design variable:
while this limits the higher scaled value to 1, it does not set 出 e lower scaled value to zero. For
the example of this section, there is no necessity for scaling the design variables since their order
of magnitude is one, which is exactly what scaling attempts to achieve.
Scaling the Constraints: Scaling of the functions in the problem is usually critical for a
successful solution. Numerical techniques used in optimization are iterative. In each iteration
usually the gradient of the functions at the current value of the design variables is involved in the
calculations. This gradient, expressed as a matrix, is called the Jacobian matrix or simply the
Jacobian. Sophisticated scaling techniques [7,8] employ the diagonal entries of this matrix as
metrics to scale the respective functions. These entries are evaluated at the starting values of the
design variables. The function can also be scaled in the same manner as Equations (4.81) and
(4.82). For the former an expected lower and upper values of the constraints are necessary. In this
exercise, the constraints will be scaled using relations similar to the relations expressed by
equalities (4.82). The scaling factor for each constraint will be determined using the starting value
or the initial guess for the variables. A starting value of 0.6 for both design variables is selected to
compute the values necessary for scaling the functions. The scaling constants for the equations are
calculated as

The first three constraints are divided through by their scaling constants. The last equation is
unchanged. The objective function has a coefficient of one. The scaled problem is

Sec4_5_2_scaled.m: The code (could have been part of Sec4_5_2.m) investigates the scaled
problem in the same way that the original problem was investigated. Primarily, this exploits the
information from the graphical solution. It is expected that the solution for the design variables
will be the same although the multiplier values are expected to be different. From the information
in the Workspace window the optimal values for the design variables are x1*=0.6774 and
x2*=0.6445. All of the constraints are feasible. The optimal values of the multipliers have changed.
There is no longer a zero multiplier-it has turned positive. There is still one multiplier with a
negative sign implying the point has not met the Kuhn-Tucker conditions. Actually, it should have
been anticipated since changing of the sign of the multiplier is not possible with scaling.
There is still the matter of inconsistency between the graphical solution and the
dissatisfaction of the Kuhn-Tucker conditions. Exploring this particular example has been very
torturous but has provided the opportunity to explore important but related quantities like scaling.
Another important consideration is the scope of the Kuhn-Tucker conditions. These conditions
apply only at regular points.
Regular Points: Regular points [8] arise when equality constraints, h(X), are present in the
problem. They are also extended to active inequality constraints (pseudo equality constraints).
Kuhn-Tucker conditions are valid only for regular points. The two essential features of a regular
point X* are
•The point X* is feasible (satisfies all constraints).
•The gradients of the equality constraints as well as the active in inequality constraints at X*
must form a linear independent set of vectors.
In the scaled problem of Example 4.3 the constraints g1 and g3 are active. The solution x1*=0.6774
and x2*=0.6445 is feasible. The gradient of g1 is ▽g1=[-37.6314 35.4548]T. The gradient of g2 is
▽g2=[-34.3119 30.4786]T. It appears that the two gradients are almost parallel to each other. This
is also evident in the graphical solution. The gradients therefore are not linearly independent-
Kuhn-Tucker conditions cannot be applied at this point. This does not mean the point is not a local
minimum. There is graphical evidence that it is indeed so.
Kuhn-Tucker are the only available formal conditions for recognition of the optimum values.
It is universally applied without regard to regularity. Some additional considerations should be
kept in mind [8].
If equality constraints are present and all of the inequality constraints are inactive, then the
points satisfying the Kuhn-Tucker conditions may be minimum, maximum, or a saddle point.
Higher-order conditions are necessary to identify the type of solution.
If the multipliers of the active inequality constraints are positive, the point cannot be a local
maximum. It may not be a local minimum either. A point may be a maximum if the multiplier
of an active inequality constraint is zero.

REFERENCES
1. Stein, S. K. Calculus and Analytical Geometry. McGraw-Hill, New York. (1987)
2. Hostetler, G. H., Santina, M. S., and Montalvo, P. D., Analytical, Numerical, and
Computational Methods for Science and Engineering. Prentice-Hall. Englewood Cliffs, NJ. (1
991)
3. Moler, C. and Costa. P. J., Symbolic Math Toolbox- for use with MATLAB-Users Guide,
MathWorks Inc., MA.
4. MATLAB - Demos Symbolic Tool Box-Introduction, online resource in MATLAB. MathWorks
Inc., MA.
5. Burden, R. L. and Faires, J. 0., Numerical Analysis, 4th ed., PWS_KENT Publishing Company,
Boston. (1989)
6. Fox, R. L., Optimization Methods for Engineering Design. Addison-Wesley, Reading, MA.
(1971)
7. Vanderplaats, G. N., Numerical Optimization Techniques for Engineering Design. McGraw-
Hill, New York. (1984)
8. Arora, J. S., Introduction to Optima Design. McGraw-Hill, New York. (1989)
9. Kuhn, H. W. and Tucker, A. W., Nonlinear Programming. Proceedings of the Second Berkeley
Symposium on Mathematical Statistics and Probability, J. Neyman (cd.), University of California
Press, 1951.

PROBLEMS
(In many of the problems be1ow, you are required to obtain the numerical solution.)
4.1 Define two nonlinear functions in two variables. Find their solution through contour plots.
4.2 For the functions in Problem 4.1 obtain the gradients of the function and the Jacobian.
Confirm them using the Symbolic Math Toolbox.
4.3 Define the design space (chooses side constraints) for a two-variable problem. Define two
nonlinear functions in two variables that do not have a solution within this space. Graphically
confirm the result.
4.4 Define a design space for a two-variable problem. Define two nonlinear functions in two
variables that have at least two solutions within the space.
4.5 Define a nonlinear function of two variables. Choose a contour value and draw the contour.
Identify a point on the contour. Calculate the valuc of the gradient at that point. Draw the gradient
at the point using the computed value. Calculate the Hessian at the above point.
4.6 Using the relation in Equation (4.23), establish that the gradient is normal to the tangent [a
physical interpretation for Equation (4.23)].
4.7 Define a nonlinear function of three variables. Choose a point in the design space. Find the
gradient of the function at the point. Calculate the Hessian matrix at the same point.
4.8 Express the Taylor series expansion (quadratic) of the function f(x)=(2-3x+x2)sinx about the
point x=0.707. Confirm your results through the Symbolic Math Tool box. Plot the original
function and the approximation.
4.9 Expand the function f(x, y)=10(1-x2)2+(y-2)2 quadratically about the (1,1). How will you
display the information? Draw the contours of the original function and the approximation.
4.10 Obtain the solution graphically for the case when Equation (4.10) is added to the problem
defined by Equations (4.36), (4.39), and (4.37). Obtain the solution using the Symbolic Math
Toolbox.
4.11 Obtain the solution to the optimization problem in Section 4.4.4 using MATLAB.
4.12 In Section 4.5.1, obtain the coefficients for a quadratic fit to the problem.
4.13 Use an example to show that optimum values for the design variables do not change if the
objective function is multiplied by a constant. Prove the same if a constant is added to the
function.
4.14 How does scaling of the objective function affect the value of the multipliers? Use an
example to infer the result. How does scaling the constraint affect the multipliers? Check with an
example.

You might also like