Introduction To Optimal Control Theory and Hamilton-Jacobi Equations

Introduction to Optimal Control Theory
and Hamilton-Jacobi equations
Seung Yeal Ha
Department of Mathematical Sciences
Seoul National University
1
A priori message from SYHA
”The main purpose of these series of lectures is to make

you understand ABC of OCT and help you prepare ad-
vanced course on stochastic optimal control theory that
you might have in future. ”
2
Lecture Schedules
• Lecture 1: ABC of Optimal Control Theory
• Lecture 2: PMP v.s. Bellman’s dynamic programming
• Lecture 3: Hamilton-Jacobi equations (classical theory)
• Lecture 4: Hamilton-Jacobi equations (modern theory)
3
References
• An introduction to mathematical optimal control theory by

Craig Evans. Available at www.math.berkeley.edu/ evans.
• Optimal Control Theory by Donald E. Kirk.
• Introduction to the mathematical theory of control by Alberto

Bressan and Benedetto Piccoli.
4
What do you mean by control of system ?
Control of a system has a double meaning:
• (Weak sense): Checking or testing whether the system’s be-

havior is satisfactory.
• (Strong sense): Putting things in order to guarantee that

the system behaves as desired.
5
Maxims
• ”Since the building of the universe is perfect and is created

by the wisdom creator, nothing arises in the universe in which
one cannot see the sense of some maximum or minimum.” by
Leonhard Euler
• ”The words control theory are, of course, of recent origin, but

the subject itself is much older, since it contains the classical
calculus of variations as a special case, and the first calculus of
variations problems go back to classical Greece.” by Hector J.
Sussmann.
6
Lecture 1: ABC of Optimal Control Theory
7
Goal of OCT
The objective of OCT is to determine the control signals that

will cause a process to satisfy the physical constraints and at
the same time minimize (or maximize) some performance
criterion.
Minimize cost and maximize payoff, utility
”A problem well put is a problem half solved”
8
Key words of Lecture 1
• Controlled system, state, control, Performance measure
• Controllability, reachable set, linear time-invariant system
• Bang-bang control (principle), Kalman’s rank theorem, ob-

servability,
9
Three giants of modern control theory
• Richard Bellman (August 26, 1920 - March 19, 1984)
Bellman’s dynamic programming (1953)
10
• Lev Pontryagin (3 September 1908 – 3 May 1988)
Pontryagin’s Maximum principle (PMP) (1956)
11
• Rudolf E. Kalman (19 May 1930)
Kalman filter, Kalman’s rank theorem (1960)
12
Controlled dynamical systems
• (Dynamical system)
ẋ = f (x, t), x = (x1, · · · , xn), x = x(t) ∈ Rn.
x: state (variable)
• (Controlled dynamical system)
ẋ = f (x, u, t), u = (u1, · · · , um), u = u(t) ∈ Rm.
u: control (parameter)
13
• Example 1: Merton’s optimal investment and consumption
Step A: (Modeling of a problem)
Consider an individual whose wealth today is W 0, and who will

live exactly T years. His task is to plan the rate of consumption
of wealth C(s) for 0 < s < T . All wealth not yet consumed earns
interest at a fixed rate r. We have, for simplicity, assigned no
utility to final-time wealth (a bequest).
Let W (t) to be the amount of wealth at time t.

(
Ẇ = rW − C, 0 < t ≤ T,
W (0) = W 0,
Step B: (Identification of physical constraints)
W (t) ≥ 0, W (T ) = 0, C(t) ≥ 0.
Step C: (Performance measure)

Z T
P [C] = e−ρtU (C(s))ds, max P [C].
0
where ρ is a discounting rate, and U is the utility function of
consumption.
• Reformulation as calculus of variation problem

Z T
max e−ρth(rW − Ẇ )ds, subject to W (0) = W 0.
W (·) 0
• Example 2. (automobile problem): Minimum-time optimal
problem
Step A: (Modeling of a problem)
The car is to be driven in a strainght line away from the point 0

to the point e. The distance of the car from 0 at time t is given
by d(t). For simplicity, we assume that the car is denoted by the
unit point mass that can be accelerate by using the throttle or
decelerated by using the brake.
We set
¨ = α(t) + β(t),
d(t)
where α and β stand for throttle accelerate and braking decel-
eration respectively.
Again we set
x1 := d, ˙
x2 := d, u1 := α, u2 := β.
Then our controlled dynamical system is given by
! !
0 1 0 1
ẋ(t) = x(t) + u(t),
0 0 1 1
and two point boundary conditions:
! !
0 e
x(t0) = , x(tf ) = .
0 0
Step B: (Identification of physical constraints)
State constraints
0 ≤ x1 ≤ e, 0 ≤ x2 .
Control constraints
0 ≤ u1 ≤ M1, −M2 ≤ u2 ≤ 0.
Assume that the amount of gasoline at time t = t0 is G gal-

lons, and the rate of gas consumption is proportional to both
accelerate and speed, thus the amount of gasoline used
Z t
f
k1u1(t) + k2x2(t) dt ≤ G.
t0
Step C: (Performance measure)
Minimum-time control
minimize J := tf − t0.
In general, the performance measure takes the form of
Z t
f
J = g(x(tf ), tf ) + r(x(t), u(t), t)dt,
t0
g ≡ 0; Lagrange problem, r ≡ 0: Mayer problem
Admissible controls: controls satisfying physical constraints,

admissible trajectory
Optimal control problem
Optimal control problem is to find an admissible control u∗ which

causes the system ẋ = f (x, u, t) to follow an admissible trajectory
x∗ that maximize the performance measure (payoff)
Z t
f
P = g(x(tf ), tf ) + r(x(t), u(t), t)} dt.
| {z } t0 | {z
terminal payoff running payoff
In this lecture, we assume that Uad denotes the set of all admis-
sible controls:
Uad := {u : R+ → U : u = u(·) is measurable

and satisfies constraints},
14
and U = [−1, 1]n : symmetric and convex.
• Basic problem
To find a control u∗ = u∗(t) ∈ Uad which maximize th payoff
P [u∗] ≥ P [u], u ∈ Uad.

Main questions
1. Does an optimal control exist ? (Existence of optimal con-

trol)
2. How can we characterize an optimal control mathematically

? (characterization of optimal control)
3. How can we construct an optimal control ? (realization of

optimal control)
Two examples
1. Control of production and consumption
x(t) := amount of output produced at time t ≥ 0
Assumptions:
• We consume some fraction of output at each time
• We reinvest the remaining fraction
15
Let u = u(t) be the fraction of output reinvested at time t ≥ 0,
0 ≤ u ≤ 1. In this case
(
ẋ = ux, 0 < t ≤ T,
x(0) = x0,
where
x ≥ 0, 0 ≤ u ≤ 1, U = [0, 1].
and
Z T
P [u] = (1 − u(t))x(t)dt.
0
2. A pedulum problem
(
θ̈ + λθ̇ + ω 2 sin θ = u, 0 < t ≤ T,
θ(0) = θ0, θ̇(0) = ω0.
We use the approximation
sin θ ≈ θ, |θ| 1
to get the linear approximate equation
(
θ̈ + λθ̇ + ω 2θ = u, 0 < t ≤ T,
θ(0) = θ0, θ̇(0) = ω0.
So the main question is to determine the control u so that (θ, θ̇
approaches (0, 0) as soon as possible. (minimum-time control
problem)
We set
x1 = θ, x2 = θ̇,
then, we have
! ! ! !
d x1 0 1 x1 0
= + .
dt x2 −ω 2 −λ x2 u
We now set
! !
x1 0
τ = τ [α(·)] : first time such that = .
x2 0
We also define
Z τ
P [α] = − 1dt = −τ.
0
Controllability
In order to be able to do whatever we want with the given dynam-

ical system under control input, the system must be controllable.
16
Controllability
Consider a controlled dynamics

(
ẋ = f (x, u), t0 < t < ∞,
x(t0) = x0,
We set the solution of the above controlled system as x(t; t0, x0):
•Natural a prior question before optimal control
For a fixed desired state x0, xf ∈ Rn, can we find a control u ∈ Uad
and tf < ∞ such that
x(tf ; t0, x0) = xf .
17
• Controllability question
Given the initial point x0 ∈ Rn and a set S ⊂ Rn,
does there exist a control u steering the system to the set S in

finite time ?
For the case S = {xf }, controllability question asks us
∃
( T < ∞, u ∈ Uad such that
ẋ = f (x, u), 0 < t < ∞,
x(0) = x0, x(T ) = xf .
Controllability for a linear system
From now on, consider a linear-control system and S = {0}, i.e.,
M ∈ M n×n, N ∈ M n×m,
(
ẋ = M x + N u,
(1)
x(0) = x0.
Definition:
1. A linear-control system (??) is completely controllable ⇐⇒

For any x0, xf ∈ Rn, there exists a control u : [0, tf ] → Rm
such that x(tf ) = xf .
18
2. Reachable set
C(t) := {x0 : ∃ u ∈ A such that x(t, u, x0) = 0},

C := ∪t≥0C(t) : reachable set
• Simple observation
0 ∈ C(t) and
x0 ∈ C(t), t̂ > t =⇒ x0 ∈ C(t̂).
Looking for sufficient and necessary
condition of controllability
19
Snapshot of ODE theory
Consider a homogeneous linear ODE:

M ∈ M n×n : constant matrix
(
ẋ = M x, t > 0,
x(0) = x0.
Then we have
x(t) = Φ(t)x0,
where Φ is a fundamental matrix define by
∞ k k
t M

Φ(t) = etM
X
:= .
k=0
k!
20
Consider an inhomogeneous linear system:
(
ẋ = M x + f (t), t > 0,
x(0) = x0.
Then the solution x can be given by the variation of parameters

formula (Duhamel’s formula)
Z t
x(t) = Φ(t)x0 + Φ(t − s)f (s)ds,
0
where Φ(t − s) = Φ(t)Φ−1(s).
We now return to
(
ẋ = M x + N u, t > 0,
x(0) = x0.
Then by Duhamel’s formula, we have

Z t
x(t) = Φ(t)x0 + Φ(t) Φ−1(s)N u(s)ds.
0
Note that
x0 ∈ C(t) ⇐⇒ x(t) = 0Z
t
⇐⇒ x0 = − Φ−1(s)N u(s)ds, for some u ∈ Uad.
0
• Theorem (Geometry of reachable set)
Rechable set C is symmetric and convex, i.e.,
(i) x0 ∈ C =⇒ −x0 ∈ C.
(ii) x0, x̂0 ∈ C, λ ∈ [0, 1] =⇒ λx0 + (1 − λ)x̂0 ∈ C.
Kalman’s rank theorem (1960)
Consider
ẋ = M x + N u, M ∈ M n×n, N ∈ M n×m. (2)

We define a controllability matrix
G(M, N ) := [N |M N | · · · |M n−1N ].
• Definition
The linear control system (??) is controllable ⇐⇒ C = Rn .
21
• Theorem
rankG(M, N ) = n ⇐⇒ 0 ∈ C 0.
Proof. • (⇐=)
Suppose that 0 ∈ C 0. Note that
rankG(M, N ) ≤ n.
If rankG(M, N ) < n, then there exists b 6= 0 such that
btG(M, N ) = 0.
This yields
btN = btM N = · · · = btM n−1N = O.

By Cayley-Hamilton’s theorem, we also have
btM k N = O, k ≥ 0, btΦ−1(t)N = O.
We now claim
b is perpendicular to C(t), i.e., C 0 = ∅.

If x0 ∈ C(t), then
Z t
x0 = − Φ−1(s)N u(s)ds, u ∈ Uad.
0
Therefore,
Z t
bt x 0 = − btΦ−1(s)N u(s)ds = 0.
0
• (=⇒) Suppose that 0 6∈ C 0, i.e.,
0 ∈ (C 0)c ⊂ ∩t≥0(C 0(t))c.

Then 0 6∈ C 0(t), ∀ t ≥ 0. Therefore
0 ∈ ∂C(t).
Since C(t) is convex, there exists b 6= 0 such that
btx0 ≤ 0, x0 ∈ C(t).
For x0 ∈ C(t),
Z t
x0 = − Φ−1(s)N u(s)ds, u ∈ Uad.
0
Thus
Z t
bt x 0 = − btΦ−1(s)N u(s)ds ≤ 0.
0
This yields
btΦ−1(s)N = 0.
By differentiating the above relation, we have
btN = btM N = · · · = btM n−1N = O, i.e., btG(M, N ) = 0.

Hence rankG(M, N ) < n.
• Theorem Let λ be the eigenvalue of M .
rankG(M, N ) = n and Re(λ) ≤ 0 =⇒ The system (??) is

controllable.
• Theorem Kalman (1960)
The system (??) is controllable ⇐⇒ rank(G) = n.
This rank indicates how many components of the system are sensitive to the
action of the control
• Examples
1.
n
( = 2, m = 1, A = [−1, 1]
ẋ1 = 0, t > 0,
ẋ2 = u(t).
2.
(
ẋ1 = x2, t > 0,
ẋ2 = u(t).
3.
(
ẋ1 = u(t), t > 0,
ẋ2 = u(t).
Observability
In order to see what is going on inside the system under obser-

vation, the system must be observable
22
Observability
• Consider uncontrolled system:

ẋ = M x, t > 0, M ∈ M n×n, x ∈ Rn ,
(
(3)
x(0) = x0 : unknown.
Note that
x(t) = etM x0. (4)
Once we know x0, then we know everything !!
Suppose that we have observed data y:
y(t) = N x(t), N ∈ M m×n, m n.
23
• Observability question
”Given the observation y = y(·) which is low-dimensional, can

we in principle construct high-dimensional real dynamics x = x(·)
?
Thus problem is how to recover x0 ∈ Rn from the observed data

y.
Note that
y(0) = N x(0) = N x0,

ẏ(0) = N ẋ(0) = N M x0,
··· = ··· ,
y (n−1)(0) = N x(n−1)(0) = N M n−1x0.
This yields
   
y(0) N
y 0(0)
 
  
= NM 
 0
 x .
· ·
  
   
y (n−1)(0) N M n−1
Thus, we have
 
N
 NM 
(??) and (??) is observable ⇐⇒ rank   = n.
 
 · 
N M n−1
• Definition
(
ẋ = M x, t > 0,
is observable
x(0) = x0, y = N x
⇐⇒ For two solutions x1 and x2 such that
N x1 = N x2 on [0, t] x1(0) = x2(0).
Example: N = I, N = 0.
• Theorem (Observability is a dual concept of controllability)

(
ẋ = M x, t > 0,
is observable
y = N x,
⇐⇒ ż = M tz + N tu is controllable.
La Salle’s Bang-bang control
Uad := {u : [0, ∞) → U = [−1, 1]m : u is measurable}.
• Definition Let u ∈ Uad.

u = (u1, · · · , um) is a bang-bang control
⇐⇒ |ui(t)| = 1, ∀ t > 0, i = 1, · · · , m.
• Theorem (Bang-bang principle)

(
ẋ = M x + N u, 0 ∈ C(t)
x
x(0) = x0,
=⇒ ∃Z u∗ = u∗(·) : bang-bang control such that
t
x0 = − Φ−1(s)N u∗(s)ds.
0
24
Preliminaries for bang-bang principle
L∞ = L∞(0, t; Rm) = {u : [0, t] → Rm : sup |u(s)| < ∞},

0≤s≤t
||u||L∞ := sup |u(s)|, L1 ⊂ (L∞)∗.
0≤s≤t
• Definition Let un, u ∈ L∞.
un → u in Zweak-star topology
t Z t
⇐⇒ un(s)ϕ(s)ds → u(s)ϕ(s)ds,
0 0
⇐⇒ un converges to u in weak-star topology,
1 Rt
where ϕ is a L -test function with 0 |ϕ(s)|ds < ∞.
25
• Theorem (Banach-Alaoglu)
Any bounded set in L∞ is weak-star compact.
• Corollary
If un ∈ Uad := {u : [0, t] → [−1.1]m : u is measurable},

∃ {unk } : subsequence of un such that
unk → u weak-star topology.
• Definition Let K ⊂ Rn and z ∈ K.
1. K is convex ⇐⇒ ∀ x, y ∈ K, 0 ≤ λ ≤ 1, λx + (1 − λ)y ∈
K.
2. z ∈ K is an extreme point ⇐⇒
there does not exist x, x̂ ∈ K and λ ∈ (0, 1) such that z =
λx + (1 − λ)x̂.
• Theorem (Krein-Milman)
K 6= ∅ : convex and weak-star compact

=⇒ K has at least one extreme pint.
The proof of bang-bang’s principle
Let x0 ∈ C(t). Then, we set
K := {u ∈ Uad : u steers x0 to 0 at time t}.
• Lemma
K 6= ∅ : convex and weak-star compact.
26

Introduction To Optimal Control Theory and Hamilton-Jacobi Equations

Uploaded by

Copyright:

Available Formats

Introduction To Optimal Control Theory and Hamilton-Jacobi Equations

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Optimal Control Theory and Hamilton-Jacobi Equations

Uploaded by

Copyright:

Available Formats

Introduction to Optimal Control Theory

and Hamilton-Jacobi equations

”The main purpose of these series of lectures is to make

• Lecture 1: ABC of Optimal Control Theory

• Lecture 2: PMP v.s. Bellman’s dynamic programming

• Lecture 3: Hamilton-Jacobi equations (classical theory)

• Lecture 4: Hamilton-Jacobi equations (modern theory)

• An introduction to mathematical optimal control theory by

• Optimal Control Theory by Donald E. Kirk.

• Introduction to the mathematical theory of control by Alberto

Control of a system has a double meaning:

• (Weak sense): Checking or testing whether the system’s be-

• (Strong sense): Putting things in order to guarantee that

• ”Since the building of the universe is perfect and is created

• ”The words control theory are, of course, of recent origin, but

The objective of OCT is to determine the control signals that

Minimize cost and maximize payoff, utility

”A problem well put is a problem half solved”

• Controlled system, state, control, Performance measure

• Controllability, reachable set, linear time-invariant system

• Bang-bang control (principle), Kalman’s rank theorem, ob-

• Richard Bellman (August 26, 1920 - March 19, 1984)

Bellman’s dynamic programming (1953)

Pontryagin’s Maximum principle (PMP) (1956)

Kalman filter, Kalman’s rank theorem (1960)

ẋ = f (x, t), x = (x1, · · · , xn), x = x(t) ∈ Rn.

• (Controlled dynamical system)

ẋ = f (x, u, t), u = (u1, · · · , um), u = u(t) ∈ Rm.

 Step A: (Modeling of a problem)

Consider an individual whose wealth today is W 0, and who will

Let W (t) to be the amount of wealth at time t.

 Step C: (Performance measure)

• Reformulation as calculus of variation problem

 Step A: (Modeling of a problem)

The car is to be driven in a strainght line away from the point 0

Assume that the amount of gasoline at time t = t0 is G gal-

g ≡ 0; Lagrange problem, r ≡ 0: Mayer problem

Admissible controls: controls satisfying physical constraints,

Optimal control problem is to find an admissible control u∗ which

Uad := {u : R+ → U : u = u(·) is measurable

To find a control u∗ = u∗(t) ∈ Uad which maximize th payoff

P [u∗] ≥ P [u], u ∈ Uad.

1. Does an optimal control exist ? (Existence of optimal con-

2. How can we characterize an optimal control mathematically

3. How can we construct an optimal control ? (realization of

1. Control of production and consumption

x(t) := amount of output produced at time t ≥ 0

• We consume some fraction of output at each time

• We reinvest the remaining fraction

In order to be able to do whatever we want with the given dynam-

Consider a controlled dynamics

•Natural a prior question before optimal control

x(tf ; t0, x0) = xf .

Given the initial point x0 ∈ Rn and a set S ⊂ Rn,

does there exist a control u steering the system to the set S in

For the case S = {xf }, controllability question asks us

From now on, consider a linear-control system and S = {0}, i.e.,

1. A linear-control system (??) is completely controllable ⇐⇒

C(t) := {x0 : ∃ u ∈ A such that x(t, u, x0) = 0},

Step A: (Modeling of a problem)

Step C: (Performance measure)

Step A: (Modeling of a problem)

y(t) = N x(t), N ∈ M m×n, m n.