Optimal Control Dynamic Programming
Optimal Control Dynamic Programming
1 Optimal Control
We consider first simple unconstrained optimization problems, and then
demonstrate how this may be generalized to handle constrained optimiza-
tion problems. With the observation that an optimal control problem is a
form of constrained optimization problem, variational methods are used to
derive an optimal controller, which embodies Pontryagins Minimum Princi-
ple. Subsequently an alternative approach, based on Bellmanss Principle of
Optimality, and Dynamic programming is used to derive the Hamilton-Jacobi
equations.
L:<<
We want to find
min L(u)
u
Let us assume that L is sufficiently smooth, and consider the Taylor expan-
sion:
dL dL2
L(u) = L(u0 ) + |u=u0 (u u0 ) + 2 |u=u0 (u u0 )2 + ..
du du
Then we have a necessary condition
dL
|u=u0 = 0
du
and a sufficient condition
dL2
|u=u0 > 0
du2
Note that these are only conditions for a local minimum. Additional condi-
tions are required to find the global minimum if the function is non-convex.
If we have a function with more than one variable, that is L : <n < we
have the following conditions
L L L
{ , , ... }|u=u0 = 0
u1 u2 un
and
2L 2L
u21
... u1 un
2L
|u=u0 = ... ... ... |u=u0 > 0
u 2
2L 2L
un u1
... u2n
Necessary condition
f 12x1 + 3x2 8
= =0
u 4x2 + 3x1 + 3
41 20
Solving these equations we find x0 = ( 39 , 13 ) and when we insert x0 into
the Hessian Matrix we see that
2L
12 3
(x0 ) = .
u2 3 4
the resulting matrix is positive definite. We conclude that the point x0 is a
minimum.
The plot in Figure 1 shows the function for which we are finding the mini-
mum.
with L : <n < and f : <n <m . Then this problem is equivalent to
The function
H(x, ) = L(x) + f (x)
is called the Hamiltonian of the optimization problem. The coefficients
<m are called Lagrange Multipliers of the system.
Proof
Without loss of generality we consider x <2 . The necessary conditions for
a minimum are
H
=0
x,
or
H L f
= +
x1 x1 x1
H L f
= +
x2 x2 x2
H
= f (x1 , x2 )
The third condition is equivalent to the boundary conditions of the original
problem being satisfied.
The first two conditions are equivalent to saying that the vectors
!
L f
x1 x1
L ; f
x2 x2
are parallel or colinear. If these vectors are parallel, then the matrix
!
L f
x1 x1
L f
x2 x2
has rank less than 2, which means that the linear system obtained by equating
to zero the derivative of the Hamiltonian has a non trivial solution on .
With the help of a diagram in Figure 2, it is easy to understand that where
we have a minimum or maximum the two gradients (with the red vector
representing the gradient of L and the black vector representing the gradient
of f ) have to be parallel, as otherwise one can increase or decrease the value
of L while satisfying the constraint f (x) = 0.
x = y = z.
u : [0, T ] 7 <m
such that the performance index is minimized and the final state constraint
and the system equations are satisfied.
Theorem 2 Solutions of the Optimal Control Problem also solve the follow-
ing set of differential equations:
State Equation:
x = H = f (x), (4)
L f
Co-State Equation: = Hx = + (5)
x x
L f
Optimality Condition: 0 = Hu = + (6)
u u
State initial condition: x(0) = x0 (7)
Co-state final condition: (T ) = (x + x )|x(T ) (8)
J = [x (x) + x (x)]x)|x=x(T ) +
Z T
[Hx x + Hu u + H x + x]dt+
0
(x)|x(T )
Now for the function u : [0, T ] <m to minimize the cost function, J must
be zero for any value of the differentials. Thus, all the expressions before the
differentials have to be zero for every t [0, T ]. This observation gives the
equations as required.
Remark 1 Just as in the static optimization case, where the zeros of the
derivatives represent candidates to be tested for extremum, the solutions of
the system described in Theorem 2 are to be seen as candidates to be the
optimal solution and their optimality must be tested for each particular case.
In other words, the Pontriaguin Maximum Principle delivers necessary, but
not sufficient, conditions for optimality.
for given values of x(t) and u(t). In other words, the Pontryagins Minimum
Principle states that the Hamiltonian is minimized over all admissible u for
optimal values of the state and co-state.
Remark 4 Special attention is needed in the case where Hu = const for all
u, in which case the solution is found where the constraints are active. This
sort of solution is called Bang-Bang solution.
subject to
x1 = x2 , x1 (0) = x10
x2 = u, x2 (0) = x20
(x(T )) = x(T ) = 0.
x1 = H1 = x2 ; x1 (0) = x10
x2 = H2 = u; x2 (0) = x20
1 = Hx1 = 0; 1 (T ) = 1
2 = Hx2 = 1 ; 2 (T ) = 2
where
H(x, u, ) = 0.5u2 + 1 x2 + 2 u
Now we see that
Hu = u + 2 = 0
Thus, we can solve the differential equations and see that
1 (t) = 1
2 (t) = 1 (t T ) + 2
u(t) = 1 (t T ) 2
Placing these linear expressions in the dynamic equations for x and using
the initial conditions x(0) = x0 , we obtain a linear system of equations with
respect to (1 , 2 ), which gives us the final parametrization of the control law
u. Figure 3 shows the result of applying this control law with T = 15.
1 u(t) 1 t 0.
Now
H(x, u, ) = 1 + 1 x2 + 2 u.
We notice that Hu = 2 , i.e. Hu does not depend on u and thus the extremum
is to be reached at the boundaries, i.e. u (t) = 1, t 0. Using
we see that
u (t) = sign2 (t).
Figure 4 depicts the result of deploying this control. Note that the control law
is discontinuous. Further, we observe that the system now needs less time to
reach the origin than in the previous Minimal Energy example. Of course,
this success is obtained at the cost of deploying more actuator energy.
V (x(t), t) = J2 (t)
Definition 1 The function V (x, t) is called the value function or the cost
to go. It represents the value of the solution of the optimal control problem
starting at x at the time t.
x = f (x(t), u(t))
V (x, T ) = (x(tf ), tf )
It follows that
dV
= min L(x(t), u(t))
dt u
V V dx
+ = min L(x(t), u(t))
t x dt u
V V
= min{ f (x(t), u(t)) L(x(t), u(t))}
t u x
V
= min H(x, , u)
u x
Definition 2 This equation is known as the Hamilton-Jacobi equation.
This is a partial differential equation in V , with boundary condition V (x, T ) =
(x(tf ), tf ). Solving this equation gives the solution of the optimal control
problem with the control being a state feedback law given by
V
u? = arg min H(x, , u)
x
Note that the co-state introduced in the previous section is then given
V
by x
.
When the Hamilton-Jacobi-Bellman equation is to be solved in practical
cases, numerical methods seem appropriate. However, the solution of the
Hamilton-Jacobi equation by numerical methods is only tractable for low
dimensional systems. For higher dimensional systems, the number of data
points required increases with the power of the dimension of the system (space
and time discretization). For example, for a nonlinear system of dimension
n where the value function is to be determined on a hyper-cube [a, a] with
n
a spatial discretization , a total of 2a
points will have to be stored. For
even modest requirements the amount of data becomes very large and not
tractable.