Deterministic Continuous Time Optimal Control and The Hamilton-Jacobi-Bellman Equation
Deterministic Continuous Time Optimal Control and The Hamilton-Jacobi-Bellman Equation
In the next three lectures we will look at continuous-time deterministic optimal control. We
begin by deriving the continuous-time analog of the DPA, known as the Hamilton-Jacobi-
Bellman equation.
Dynamics
Consider the continuous-time system
where
• time t ∈ R≥0 and T is the terminal time;
Cost
We consider the following scalar-valued cost function:
Z T
h(x(T )) + g(x(τ ), u(τ ))dτ (9.3)
0
For an initial time t and state x ∈ S, the closed loop cost associated with feedback control law
µ(·, ·) ∈ Π is
Z T
Jµ(t, x) := h(x(T )) + g(x(τ ), u(τ ))dτ subject to (9.1), (9.2). (9.4)
t
Date compiled: November 27, 2018
1
Objective
Construct an optimal feedback control law µ∗ such that
The associated closed loop cost J ∗ (t, x) := Jµ∗(t, x) is called the cost-to-go at state x and time
t, and J ∗ (·, ·) is the cost-to-go function or value function.
Assumption 9.1. For any admissible control law µ, initial time t ∈ [0, T ] and initial condition
x(t) ∈ S , there exists a unique state trajectory x(τ ) that satisfies
Assumption 9.1 is required for the problem to be well defined. Ensuring that it is satisfied for
a particular problem requires tools from the theory of differential equations, and is beyond the
scope of this class.
Example 1: Existence
ẋ(t) = x(t)2 , x(0) = 1
Solution:
1
x(t) =
1−t
⇒ finite escape time: x(t) → ∞ as t → 1.
⇒ solution does not exist for T ≥ 1.
Example 2: Uniqueness
1
ẋ(t) = x(t) 3 , x(0) = 0
Solution:
x(t) = 0 ∀t
(
0 for 0 ≤ t ≤ τ
or x(t) = 2
3/2
3 (t − τ ) for t > τ
⇒ infinite number of solutions
2
T
Let us first divide the time horizon [0, T ] into N pieces, and define δ := N . Furthermore, define
xk := x(kδ), uk := u(kδ) for k = 0, 1, ..., N , and approximate the differential equation
ẋ(kδ) = f (x(kδ), u(kδ)) by
xk+1 − xk
= f (xk , uk ), k = 0, 1, ..., N
δ
which leads to
JN (x) = h(x), ∀x ∈ S
∂J ∗ (t, x) ∂J ∗ (t, x)
∗ ∗
J (t, x) = min g(x, u)δ + J (t, x) + δ+ f (x, u)δ + o(δ)
u∈U ∂t ∂x
∂J ∗ (t, x) ∂J ∗ (t, x)
⇔ 0 = min g(x, u)δ + δ+ f (x, u)δ + o(δ)
u∈U ∂t ∂x
∗ ∗
∂J (t, x) ∂J (t, x) o(δ)
0 = min g(x, u) + + f (x, u) + .
u∈U ∂t ∂x δ
Taking the limit of the above as N → ∞, or equivalently as δ → 0, and assuming we can swap
the limit and minimization operations, results in:
∂J ∗(t, x) ∂J ∗(t, x)
0 =min g(x, u) + + f (x, u) ∀t ∈ [0, T ], ∀x ∈ S
u∈U ∂t ∂x
3
subject to the terminal condition J ∗(T, x) = h(x) for all x ∈ S.
The above equation is called the Hamilton-Jacobi-Bellman (HJB) Equation. Note that in the
above informal derivation we rely on J ∗ (·, ·) being smooth in x and t.
Example 3:
In this example, we will hand-craft an optimal policy and show that the resulting cost-to-go
satisfies the HJB. Consider
Since we only care about the square of the terminal state, we can construct a candidate optimal
policy that drives the state towards 0 as quickly as possible, and maintaining it at 0 once it is
at 0. The corresponding control policy is
−1
if x > 0
u(t) = µ(t, x) = 0 if x = 0
1 if x < 0
= − sgn(x).
For a given initial time t and initial state x, the cost Jµ (t, x) associated with the above policy is
1
Jµ(t, x) = (max{0, |x| − (1 − t)})2 .
2
We will verify that this cost function satisfies the HJB and is therefore indeed the cost-to-go
function.
For fixed t:
∂Jµ(t,x)
Jµ(t, x) ∂x
−(1 − t)
−(1 − t) 0 (1 − t) x 0 (1 − t) x
Figure 9.1: Cost function with fixed t and its partial derivative with respect to x
∂Jµ(t, x)
= sgn(x) max {0, |x| − (1 − t)}
∂x
4
For fixed x:
∂Jµ(t,x)
Jµ(t, x) ∂t
|x| − 1 |x| − 1
0 1 − |x| 1 t 0 1 − |x| 1 t
|x| > 1
|x| ≤ 1
Figure 9.2: Cost function with fixed x and its partial derivative with respect to t
∂Jµ(t, x)
= max {0, |x| − (1 − t)}
∂t
with cost
x(1),
5
Jµ ( 12 , x)
1
xe− 2
1
xe 2
However, it is clear that the associated cost-to-go function is not differentiable with respect to x
at x = 0 (see Figure 9.3), thereby not satisfying the HJB. This illustrates the fact that the HJB
is in general not a necessary condition for optimality, but it is sufficient as we will see next1 .
4
Theorem 9.1. Suppose V (t, x) is a solution to the HJB equation, that is, V is continuously
differentiable in t and x, and is such that
∂V (t, x) ∂V (t, x)
min g(x, u) + + f (x, u) = 0 ∀x ∈ S, 0 ≤ t ≤ T (9.7)
u∈U ∂t ∂x
subject to V (T, x) = h(x) ∀x ∈ S
Suppose also that µ(t, x) attains the minimum in (9.7) for all t and x. Then under As-
sumption 9.1, V (t, x) is equal to the cost-to-go function, i.e.
V (t, x) = J ∗ (t, x) , ∀x ∈ S, 0 ≤ t ≤ T
Proof. For any initial time t ∈ [0, T ] and any initial condition x(t) = x, x ∈ S, let û(τ ) ∈ U
for all τ ∈ [t, T ] be any admissible control trajectory, and let x̂(τ ) be the corresponding state
trajectory where x̂(τ ) is the unique solution to the ODE ẋ(τ ) = f (x(τ ), û(τ )). From (9.7) we
have for all τ ∈ [0, T ],
∂V (τ, x) ∂V (τ, x)
0 ≤ g(x̂(τ ), û(τ )) + + f (x̂(τ ), û(τ ))
∂τ x=x̂(τ ) ∂x x=x̂(τ )
d
0 ≤ g(x̂(τ ), û(τ )) + (V (τ, x̂(τ ))) ,
dτ
1
One can address this shortcoming by introducing generalized solutions to the HJB partial differential equation,
such as viscosity solutions.
6
where d/dτ (·) denotes the total derivative with respect to τ . Integrating the above inequality
over τ ∈ [t, T ] yields
Z T
0≤ g(x̂(τ ), û(τ ))dτ + V (T, x̂(T )) − V (t, x)
t
Z T
V (t, x) ≤ h(x̂(T )) + g(x̂(τ ), û(τ ))dτ
t
The preceding inequalities become equalities for the minimizing µ(τ, x(τ )) of (9.7):
Z T
V (t, x) = h(x(T )) + g(x(τ ), µ(τ, x(τ )))dτ
t
where x(τ ) is the unique solution to the ODE ẋ(τ ) = f (x(τ ), µ(τ, x(τ )) with x(t) = x. Thus
V (t, x) is the cost-to-go at state x at time t, and µ(τ, x(τ )) is an optimal control trajectory. We
thus have