Optimal Control Exercise Book
Optimal Control Exercise Book
and Technology
Optimal Control
Exercise Book
Professor: Student:
Donghwan Lee Federico Berto
Contents
Exercises from Lecture 1 3
Exercise 1.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Exercise 1.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Exercise 1.17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Exercise 1.24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Exercise 1.68 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Exercise 1.69 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Exercise 1.78 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Exercise 1.81 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Exercise 1.84 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Exercise 1.85 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Exercise 1.96 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Page 1
Exercise Book Federico Berto
Page 2
Exercise Book Federico Berto
Solution:
We first prove the convergence for the monotonically nondecreasing and bounded
above series.
Let (xk )k ∈ N be such a sequence where xk are its terms. Since by assumption the
series (xk ) is bounded above, then there exists a finite c = supk (xk ). Thus, for some
ε > 0, there exists a K such that xK > c − ε; otherwise, c − ε would be the upper
bound and it would contradict the definition of c. For every k > K, we have that:
|c − xk |≤ |c − xK |< ε. Hence, the limit of (xk )k ∈ N is by definition supk (xk ); which
means that the series converges to that value.
As for the monotonically nondecreasing and bounded below series, it can be easily
shown in a similar way by showing its convergence to a l = infn (xk ).
Exercise 1.15
Devise a set X whose supremum exists and find its supremum. What is the maximum
of the set? Do the same for the infimum.
Solution:
Exercise 1.17
Devise a sequence (xk )∞
k=1 whose upper limit exists and find its upper limit. Do the
same for the infimum.
Solution:
Page 3
Exercise Book Federico Berto
We can devise the following sequence (xk ) whose upper limit exists:
x 1 2 3
= , , ,... (1)
2x + 1 3 5 7
We can easily find its upper limit by:
1
lim sup(xk ) = (2)
k→∞ 2
As for the infimum, we can devise another sequence (xk ) as:
2x + 1 5 7
= 3, , , . . . (3)
x 2 3
We can easily find its infimum by:
Exercise 1.24
Devise examples of an open set and a closed set.
Solution:
An example of an open set S is the area inside a circle, not including its circumference:
S = {x, y | x2 + y 2 < r2 , r ∈ R}. In this case, its border is not included.
If we take the same set but including its border, so: S = {x, y | x2 + y 2 ≤ r2 , r ∈ R},
then this is an example of closed set.
Exercise 1.68
Prove the two following propositions.
Proof:
Page 4
Exercise Book Federico Berto
o(α)
lim =0 (8)
α→0 α
There exists a small enough ε > 0 such that for all α satisfying 0 < α < ε we have
o(α) 0
α < −g (0) (9)
Hence we have
g(α) − g(0) = g 0 (0)α + o(α) ≤ g 0 (0)α + |o(α)| < g 0 (0)α − g 0 (0)α = 0 (11)
Which contradicts the the hypothesis of x∗ being a local minimum. Therefore, the
claim g 0 (0) ≥ 0 is true and by the chain rule:
We have thus shown that given that d is an arbitrary feasible direction, h∇f (x∗ , di ≥
0 for all the feasible directions d.
Proposition 2 (Second-order necessary condition for constrained optimality):
Suppose that f is a C 2 (continuously differentiable) function and x∗ is its local
minimum. Then,
dT ∇2 f (x∗ )d ≥ 0 (14)
for all feasible directions d such that
Page 5
Exercise Book Federico Berto
Proof:
* 0 1 00
g 0 (0)α
g(α) = g(0) + + g (0)α2 + o(α2 ) (17)
2
where g 0 (0)α cancels out since g 0 (0)α = h∇f (x∗ ), di = 0 by our hypothesis. We
claim that g 00 (0) ≥ 0; we can prove the claim by contradiction.
Suppose g 00 (0) < 0. There exists a small enough ε > 0 such that for all α satisfying
0 < α < ε we have, following from the definition of small o:
0(α2 )
1 00
α2 < − 2 g (0) (18)
1
∴ o(α2 ) < − g 00 (0)α2
(19)
2
Hence this yields
1 1 1 1
g(α)−g(0) = g 00 (0)α2 +o(α) ≤ g 00 (0)α2 +|o(α)| < g 00 (0)α2 − g 00 (0)α2 = 0 (20)
2 2 2 2
However, given that g(α) := f (x∗ + αd) this implies that
Which contradicts the the hypothesis of x∗ being a local minimum. Therefore, the
claim g 00 (0) ≥ 0 is true and by the chain rule:
We have thus shown that given that d is an arbitrary feasible direction, dT ∇2 f (x∗ )d ≥
0 for all the feasible directions d.
Exercise 1.69
Suppose that f is a C 2 function and x∗ us a point of its domain at which we have
h∇f (x∗ ), di ≥ 0 and dT ∇2 f (x∗ )d > 0 for every nonzero feasible direction d. Is x∗
Page 6
Exercise Book Federico Berto
Solution:
2 1 00
o(α < g (0)α2 (33)
2
given that o(α2 ) < 0, (34)
1
o(α2 ) > − g 00 (0)α2 (35)
2
1
∴ g 00 (0)α2 + o(α2 ) > 0 (36)
2
Page 7
Exercise Book Federico Berto
Exercise 1.78
Give an example where a local minimum x∗ is not a regular point and the above
necessary condition (namely first-order necessary condition for constrained optimality)
is false (justify both of these claims).
Solution:
Page 8
Exercise Book Federico Berto
which leads to
42 ∗ −2 ∗ −4
+ λ1 + λ2 =0 (46)
42 0 0
However, there exist no such λ∗1 , λ∗2 that can cancel out the second element of ∇f (x∗ );
this is due to the linearly dependent ∇h1 (x∗ ) and ∇h2 (x∗ ). Therefore, we have shown
the necessary condition is false.
Exercise 1.81
Prove that the set of continuous functions f : [a, b] → R, f ∈ C 0 is a vector space
with typical addition and scalar multiplication.
Solution:
Let’s take the generic functions f, g and h which satisfy the above conditions. Let’s
consider a generic point x ∈ [a, b], and the values of the functions are f (x) = i, g(x) =
j, h(x) = k, where i, j, k ∈ R. We also consider the scalar values α, β ∈ R.
We have to show the vector addition + satisfies the following conditions ∀x:
• Closure: f (x) + g(x) = i + j ∈ R
• Commutative law: f (x) + g(x) = i + j = j + i = g(x) + f (x)
• Associative law: (f (x) + g(x)) + h(x) = (i + j) + k = i + (j + k) = g(x) +
(f (x) + h(x))
• Additive identity: f (x + 0) = f (0 + x) = f (x) = i
• Additive inverses: given f (x) = i, we can easily find a function, say l : [a, b] →
R such that l(x) = −i
Since these conditions are all satisfied, then the vector addition + is defined ∀x.
We now have to show that the operation · scalar multiplication satisfies the following
conditions:
• Closure: α · f (x) = α · i ∈ R
• Distributive law: α(f (x) + g(x)) = α · i + α · j = αf (x) + αg(x)
• Distributive law: (α + β) · f (x) = (α + β) · i = α · i + β · i = αf (x) + βf (x)
• Associative law: α(β · f (x)) = α(β · i) = α · β · i = (α · β)i = (α · β)f (x)
Page 9
Exercise Book Federico Berto
Exercise 1.84
For y ∈ C 0 , define:
Z b 1/p
p
|| y ||Lp := | y(x) | dx (47)
a
where p is a positive integer and a, b ∈ R with a < b. Prove that it is an inner
product on the vector space C0
Solution:
We need to prove that the expression represents a norm. Thus, it has to satisfy
the following properties:
• The expression must be positive ∀y(x) ∈ C0 . We can observe that y(x)
is always positive. The exponential term p lets the expression be positive.
Therefore, the integral defined for b > a will also let the expression be positive.
The result will be elevated to the 1/p exponential, which makes the total
expression always positive.
• Given c ∈ R, we have that:
Z b 1/p Z b 1/p
p p p
|| cy ||Lp = | cy(x) | dx = | c | | y(x) | dx (48)
a a
So, we can move c out of the integral and get the result:
Z b 1/p Z b 1/p
p p p p p
|c| | c | | y(x) | dx =| c | | c | | y(x) | dx =|| c || · || y(x) ||Lp
a a
(49)
• We want to show that only y(x) = 0 will make the norm zero. We can notice
that, for our conditions, the only way to make the norm zero is either to have
a = b, which contradicts our hypotesis, or to have | y(x) |= 0; the only way is
of course to have y(x) = 0.
• We want to show that the triangle inequality holds, which means that:
Z b 1/p Z b 1/p Z b 1/p
p p p
| y(x) + z(x) | dx ≤ | y(x) | dx + | z(x) | dx
a a a
(50)
Page 10
Exercise Book Federico Berto
Exercise 1.85
For x, y ∈ C 0 , define Z b
hx, yi = x(t)y(t)dt (54)
a
where a, b ∈ R with a < b. Prove that it is an inner product on the vector space C 0 .
Solution:
If the above expression satisfies the following conditions, then it is an inner product:
• Conjugate symmetry:
Z b Z b
hx, yi = x(t)y(t)dt = y(t)x(t)dt = hy, xi (55)
a a
The expression hold because of the commutative property of the R space. The
conjugate symmetry is just symmetry in R.
Page 11
Exercise Book Federico Berto
• Linearity:
Z b Z b
hαx, yi = αx(t)y(t)dt = α y(t)x(t)dt = αhx, yi (56)
a a
holds because of the properties of the integral, where α ∈ R. Also, we can see
that the following holds
Z b Z b Z b
hx+y, zi = α [x(t) + y(t)] z(t)dt = x(t)z(t)dt+ y(t)z(t)dt = hx, zi+hy, zi
a a a
(57)
thus satisfying the linearity conditions.
• Positive definiteness:
Z b Z b
hx, xi = x(t)x(t)dt = x(t)2 dt > 0 (58)
a a
where x(t)2 > 0. Thus, since the integral of a positive function is positive, by
our definition the expression will be positive unless for x(t) = 0.
Therefore, we can conclude the expression is indeed an inner product on the vector
space C0 .
Exercise 1.96
Consider the space C 0 ([0, 1]) → 1
R 1 R,0 let g : R → R be a C function, and define the
functional J on V by J(y) = 0 g (y(x))dx. Show that its first variation exists and
it is given by the formula
Z 1
δJy (η) = g 0 (y(x))η(x)dx (59)
0
Solution:
Page 12
Exercise Book Federico Berto
Z 1 Z 1
0 ∂ 0
h (0) = h(0) = g (y(x + αη(x)))η(x)dx = g 0 (y(x))η(x)dx (63)
∂α 0 0
Page 13
Exercise Book Federico Berto
Solution:
Let’s take as an example of variational problem the shortest path in the plane
connecting two different points A and B. By intuition, we know that the solution
of this problem will be a straight line connecting the two points; however, we can
prove it by considering the problem as a variational problem.
Let’s consider the following Cartesian coordinates: A = (x0 , y0 ) and B = (x1 , y1 )
with x0 < x1 , and the function y(x) ∈ C 2 is the function describing the coordinate
y with respect to x. Moreover, x0 ≤ x ≤ x1 . Therefore, we want to minimize the
following cost functional:
Z B Z x1 p
L= ds = dx2 + dy 2 (65)
A xo
subject to
y(x0 ) = y0 , y(x1 ) = y1 (67)
Exercise 2.21
Find an extremal for the problem
Z 1
min J(y) = y(x)2 (y 0 (x))2 dx subject to y(0) = 0, y(1) = 1 (68)
y:[0,1]→R 0
Solution:
Let’s consider the simplified notation y(x) = y and y 0 (x) = y 0 for convenience.
The first step is to calculate the Lagrangian of the function, which is given by:
L(x, y, y 0 ) = y 2 y 02 (69)
Page 14
Exercise Book Federico Berto
Ly = 2yy 02 (70)
L0y = 2y 2 y 0 (71)
Exercise 2.22
Find an extremal for the problem
Z 1
min J(y) = {(y 0 (x))2 + 2y(x)ex }dx subject to y(0) = 0, y(1) = 1 (75)
y:[0,1]→R 0
Solution:
Let’s consider the simplified notation y(x) = y and y 0 (x) = y 0 for convenience.
The first step is to calculate the Lagrangian of the function, which is given by:
Page 15
Exercise Book Federico Berto
ex + c0 x + c1 = y (84)
Applying the boundary conditions y(0) = 0 and y(1) = 1 yields:
1 + c1 = 0 (85)
e + c0 − 1 = 0 (86)
(87)
by which we obtain c0 = 1 − e and c1 = −1. So, the extremal for the problem is:
y(x) = ex + (1 − e)x − 1 (88)
Our goal is to find a path between two fixed points in a vertical plane such that a
particle sliding without friction along this path takes the shortest possible time to
travel from one point to the other. The problem can be formulated as following:
Z bp
1 + (y 0 (x))2
min J(y) = p (90)
y:[a,b]→[0,∞) a y(x)
subject to (91)
y(a) = y0 , y(b) = y1 (92)
Page 16
Exercise Book Federico Berto
Let’s consider the simplified notation y(x) = y and y 0 (x) = y 0 for convenience.
Moreover, we consider the problem starting at x = 0 for the sake of convenience.
We can consider the no x result of the Euler-Lagrange equations: in this case, the
Lagrangian does not depend explicitly on x; therefore, the condition can be simplified
as following: r
0 1
L − Ly 0 y = with c ∈ R, c > 0 (93)
2c
q
1
where 2c is a non-negative constant. The reason why we chose this formulation
will become clear during the derivation. √
1+y 02
Given the Lagrangian’s formulation of L = √
y
, we can derive that:
p r
1 + y 02 y 02 1
√ −√ p = (94)
y y 1 + y 02 2c
r
1 1
√ p = (95)
y 1 + y 02 2c
1 1
= (1 + y 02 ) = · · · (96)
y 2c
2 !
1 1 dy
··· = 1+ (97)
y 2c dx
r
dy 2c − y
= (98)
dx y
r
y
dx = dy (99)
2c − y
(100)
We can operate the following substitution to make the integral easier: y = c − ct;
dy = −cdt yielding: r
1−t
dx = −c dt (101)
1+t
In order to proceed with the integration, we can recall the following trigonometric
functions:
Page 17
Exercise Book Federico Berto
So, operating the substitution t = cos θ; dt = − sin θdθ, we can rewrite the equation
as: r s
1 + cos θ 2 sin2 (θ/2)
dx = c sin θdθ = c = c tan(θ/2) sin θdθ (105)
1 − cos θ 2 cos2 (θ/2)
Given that sin θ = 2 sin(θ/2) cos(θ/2) we get:
By integrating, we get:
Z Z
dx = c(1 − cos θ)dθ (107)
We can obtain the value of y noting that t = cos θ and y = c − ct. So:
Given the numerical conditions we could calculate the value of c. Moreover, if the
value of x does not start from 0 but it is translated of a, then we will translate
the value of x by that amount; notice that this will not influence the value of y.
Summarizing, the following equations describe a cycloid :
Exercise 2.30
Confirm directly from the equations (2.10), (2.11), 2.12) (Hamilton’s canonical equations)
that in the ”no y” case p is constant along extremals and in the ”no x” case H is
constant along extremals.
Solution:
We first show that p is constant along the extremals in the ”no y” case. Let’s consider
during the proofs, for the easier notation’s sake, that y(x) = y, y 0 (x) = y, p(x) = p.
The Hamilton’s canonical equation for p is:
dp
= −Hy (x, y, y 0 , p), x ∈ [a, b] (112)
dx
Page 18
Exercise Book Federico Berto
Now, let’s show that H is constant along extremals in the ”no x” case. The
Hamiltonian can be expressed as:
H(y, y 0 , p) = py 0 − L(y, y 0 ) (120)
Given the ”no y” case, then we can write:
H(y, y 0 , p) = py 0 − L(y, y 0 ) (121)
We can derive with respect to x the Hamiltonian yielding
d d
H(y, y 0 , p) = (py 0 − L(y, y 0 )) (122)
dx dx
0
*0
d 0 d
0
= py − L(y, y) (123)
dx dx
=0 (124)
Hence, given that d
dx
H(y, y 0 , p) = 0 we can write that:
H(y, y 0 , p) = constant (125)
thus showing that in the ”no x” case the Hamiltonian H(x, y, y 0 , p) is constant along
extremals.
Page 19
Exercise Book Federico Berto
Exercise 2.36
Find the curve y ∗ that minimizes
Z 1
1 0
J(y) = y (x)2 dx (126)
2 0
Solution:
Let’s consider the simplified notation y(x) = y and y 0 (x) = y 0 for convenience.
Thus, we can consider the first-order necessary condition for integral constrained
optimality:
d d
Ly − Ly0 + λ My − My0 = 0, ∀x ∈ [a, b] (128)
dx dx
where L = 12 y 02 and M = y. Substituting, we obtain:
d 0
y +λ·1=0 (129)
dx
y 00 = −λ (130)
Integrating twice, we get: (131)
0
y = −λx + c1 (132)
−λx2
y= + c1 x + c2 (133)
2
(134)
The optimal curve y ∗ (x) is:
−λx2
y ∗ (x) = + c1 x + c2 (135)
2
Now, we can find the value of λ given the integral constraint. So:
Z 1
−λx2 1
+ c1 x + c2 dx = (136)
0 2 6
3 2
−λx c1 x 1 1
+ + c2 x = (137)
6 2 0 6
−λ c1 1
+ + c2 = (138)
6 2 6
λ = 3c1 + 6c2 − 1 (139)
Page 20
Exercise Book Federico Berto
Solution:
We want to maximize the area under a curve of fixed length. Let’s consider the
Dido’s problem formulated as following:
Z b
max J(y) = y(x)dx (141)
y:[a,b]→R a
Z bp
subject to y(a) = y(b) = 0, 1 + (y 0 (x))2 dx = C0 (142)
a
Let’s also consider the simplified notation y(x) = y and y 0 (x) = y 0 for convenience.
Thus, we can consider the first-order necessary condition for integral constrained
optimality:
d d
Ly − Ly0 + λ My − My0 = 0, ∀x ∈ [a, b] (143)
dx dx
Page 21
Exercise Book Federico Berto
p
where L = y and M = 1 + y 02 . Substituting, we obtain:
d y0
1−λ p =0 (144)
dx 1 + y 02
d λy 0
p =1 (145)
dx 1 + y 02
d λy 0
p = x + c0 (146)
dx 1 + y 02
λ2 y 02 = (1 + y 02 )(x + c0 )2 (147)
(λ2 − (x − c0 )2 )y 02 = (x + c0 )2 (148)
x + co
y0 = p (149)
λ2 − (x − c0 )2
Z
x + co
y0 = p dx (150)
λ2 − (x − c0 )2
(151)
which describes a circle. We can assert that, given the geometry of the problem, this
extremal should be indeed a maximizer of the area functional and not a minimizer
and thus we have shown that curves solving Dido’s problem are circular arcs.
Solution:
Page 22
Exercise Book Federico Berto
Since the catenary will take the shape minimizing the potential energy, the problem
can be stated as following:
Z b p
min J(y) = y(x) 1 + (y 0 (x))2 dx (157)
y:[a,b]→0,∞) a
subject to Z bp
1 + (y 0 (x))2 dx = C0 , y(a) = y0 , y(b) = y1 (158)
a
As in the previous exercises, let’s also consider the simplified notation y(x) = y and
y 0 (x) = y 0 for convenience.
Let’s consider the no x result of the Euler-Lagrange equations: in this case, the
Lagrangian does not depend explicitly on x; therefore, the condition can be simplified
as following:
L − Ly0 y 0 = c with c ∈ R (159)
p
Given that the Langrangian for the problem is L = y 1 + y 02 , we can write the
condition as:
yy 0 p
p − y 1 + y 02 = c (160)
1 + y 02
y
p =c (161)
1 + y 02
(162)
Now, we can solve the second order equation to find the first order derivative:
y 2 = c2 + c2 y 02 (163)
y 2
02
y = −1 (164)
c
which yields r
y 2
y0 = ± −1 (165)
c
Page 23
Exercise Book Federico Berto
so,
r
dy y 2
=± −1 (166)
dx c
dy dx
±p = (167)
y 2 − c2 c
Z Z
dy dx
± p = (168)
2
y −c 2 c
(169)
Solving this integrals leads to the follwing solutions:
p !
y + y 2 − c2 x
ln = (170)
c c
p !
y − y 2 − c2 x
ln =− (171)
c c
(172)
which lead to
p
y+ y 2 − c2 x
= ec (173)
pc
y − y 2 − c2 x
= e− c (174)
c
(175)
The solution can then be written as:
x x
e c + e− c
y=c (176)
2
we can substitute the exponential terms with the hyperbolic cosine
x x
e c + e− c
cosh(x/c) = (177)
2
which leads us to the final equation
y(x) = c cosh(x/c) (178)
which shows the optimal curves for the catenary problem.
Given the numerical conditions, we could also compute the value of c. We notice
that if the reference frame is with a vertical y axis pointing ”up” (opposite direction
with respect to the gravitational field), then the parameter satisfies the condition
c > 0.
Page 24
Exercise Book Federico Berto
Exercise 2.47
Is Legendre’s necessary condition satisfied by the admissible extremal for the problem
minimizing Z 2p
J(y) = 1 + y(x)2 y 0 (x)2 dx (179)
0
subject to y(0) = 1 and y(2) = 3? Find Ly0 y0 (x, y(x), y 0 (x)) explicitly.
Solution:
We have to show that the Legendre’s condition is valid, which means that:
Ly0 y0 (x, y(x), y 0 (x)) ≥ 0, ∀x ∈ [a, b] (180)
Firstly, we will find the admissible extremal via the Lagrange equation. Let’s also
consider the simplified notation y(x) = y, y 0 (x) = y 0 and L = L(x, y(x), y 0 (x)) for
convenience.
p
L = 1 + y 2 y 02 (181)
yy 02
∴ Ly = p (182)
1 + y 2 y 02
y2y0
L0y = p (183)
1 + y 2 y 02
(184)
The Euler-Lagrange equation becomes:
yy 02 d y2y0
p = p (185)
1 + y 2 y 02 dx 1 + y 2 y 02
A general solution for the equation is the following
√
y(x) = c2 c1 + 2x (186)
Given the boundary conditions, we can get:
( ( √ (
y(0) = 1 c2 c1 = 1 c1 = 12
=⇒ √ =⇒ √ (187)
y(1) = 3 c2 c1 + 4 = 3 c2 = 2
√ q1
Therefore the admissible extremal is y ∗ (x) = 2 2 + 2x Now, let’s verify the
Legendre’s condition:
y 2 (1 + y 02 y 2 − y 02 )
Ly 0 y 0 = p ≥ 0, ∀x ∈ [a, b] (188)
( 1 + y 2 y 02 )3
Page 25
Exercise Book Federico Berto
y 02 y 2 − y 02 ≥ −1 (189)
y 02 (y 2 − 1) ≥ −1 (190)
which always holds given the even more restrictive condition y 2 − 1 ≥ 0. Let’s
subsitute:
!
√
r
1
2 + 2x − 1 ≥ 0 (191)
2
−1
1 + 4x≥ 0 (192)
4x ≥ 0 ∀x ∈ [a, b] (193)
Hence, Ly0 y0 (x, y(x), y 0 (x)) ≥ 0, ∀x ∈ [a, b] and thus we have shown the Legendre’s
condition for optimality holds.
Page 26
Exercise Book Federico Berto
subject to
ẋ(t) = u(t), x(0) = 1, x(1) = free (195)
where P > 0 us a weighting parameter. Find an optimal solution u∗ and x∗ using
the Euler-Lagrange condition.
Solution:
H (t, x(t), u(t), p(t)) = hp(t), f ((t), x(t), u(t))i − L(t, x(t), u(t)) = · · · (197)
1 P
· · · = p(t)u(t) − x(t)2 − u(t)2 (198)
2 2
Now, we have to calculate the adjoint vector p(t) which satisfies the following
differential equation:
Therefore, ṗ(t) = x(t) ,with the final state condition p(1) = 0. Now, we have the
first order condition on the Hamiltonian of:
Page 27
Exercise Book Federico Berto
p(t)
from which we obtain u(t) = P
. We can summarize the influence on the state as
We can observe that ẍ(t) = ṗ(t) = x(t) =⇒ ẍ(t) = x(t). We solve this second order
differential equation whose solution is:
(
x(t) = c1 et + c2 e−t
(204)
p(t) = ẋ(t) = c1 et − c2 e−t
Now, we can obtain the values of c1 and c2 from the boundary conditions:
(
x(0) = 1 =⇒ x(0) = c1 + c2 = 1
(205)
p(1) = 0 =⇒ p(1) = c1 e − c2 e−1 =⇒ c2 = c1 e2
where t ∈ [0, 1]. We will now discuss about the influence of the weighting value P
on the control input.
1. Case P → 0:
1
lim u∗ (t) = lim et − e2 e−t = · · ·
2
(208)
P →∞ P →∞ P (1 + e )
et
1 − e2(1−t) = −∞,
· · · = lim 2
t ∈ [0, 1] (209)
P →∞ P (1 + e )
In this case, we can deduce that the weighting P going to zero makes the
control input diverge: intuitively, the more the input is muffled, the larger the
control input will be in its absolute value to obtain the optimal trajectory.
Page 28
Exercise Book Federico Berto
2. Case P → ∞:
1
lim u∗ (t) = lim t 2 −t
e − e e = ··· (210)
P →∞ P →∞ P (1 + e2 )
et 2(1−t)
· · · = lim 1 − e = 0, t ∈ [0, 1] (211)
P →∞ P (1 + e2 )
In this case, we can deduce that the weighting P going to infinity makes the
control input go to zero: this means that, the more the input is amplified,
the smaller the control input’s absolute value will be to obtain the optimal
trajectory.
Page 29
Exercise Book Federico Berto
u∗ := argmin J := tf (212)
u:[0,tf ]→[−1,1]
subject to (213)
( (
ẋ1 (t) = −4x1 (t) + 2x2 (t) + 2u(t) x1 (0) = x0 x2 (0) = 0
(214)
ẋ2 (t) = 3x1 (t) − 3x2 (t) x1 (tf ) = 0, x2 (tf ) = 0
Solution:
For simplicity, let us consider the simplified notation x = x(t), u = u(t) and p = p(t).
We first calculate the Hamiltonian as
Let’s consider the set U = [−1, 1] for which u ∈ U. Then, by the Pontryagin’s
maximum principle, we have
Page 30
Exercise Book Federico Berto
Where the value of p∗1 can be found by the costate optimality equation, which is
independent from p0 in this case:
∗
4p1 − 3p∗2
∗ ∗ ∗ ∗
ṗ = −Hx (t, x , u , p ) = (222)
−2p∗1 + 3p∗2
In order to get the switching time for solving p∗1 (t) = 0, the following ODE can be
solved by imposing the condition on the derivative of the state for the Hamiltonian;
along with the intial and final conditions:
which becomes
ẋ∗1
−4x1 + 2x2 + 2u
= (225)
ẋ∗2 3x1 − 3x2
(226)
Which yields
∗
x1 1 −6t k1 (2e5t + 3) + 2k2 (e5t − 1) 1
= e + k1 , k2 ∈ R, p∗ (t) ≥ 0 (227)
x∗2 5 3k1 (e5t
− 1) + k2 (3e5t
+ 2) 1
∗
x1 1 −6t k1 (2e5t + 3) + 2k2 (e5t − 1) −1
= e + k1 , k2 ∈ R, p∗ (t) < 0 (228)
x∗2 5 3k1 (e5t
− 1) + k2 (3e5t
+ 2) −1
After obtaining the switching time, for which p∗1 changes sign, we can rewrite the
controller by choosing u∗ = 1 if p∗1 = 0. Therefore, the general solution is a bang-bang
controller (in which the controller switches abruptly between two inputs). We can
rewrite the controller as: (
1 if p∗1 ≥ 0
u∗ (t) = (229)
−1 if p∗1 < 0
Page 31
Exercise Book Federico Berto
Z T
∗ 1 1
u := argmin J(u) := q · x(T )2 + u(t)2 dt (230)
u:[0,T ]→R 2 2 0
subject to (231)
ẋ(t) = ax(t) + bu(t), x(0) = x0 (232)
where a, b are given nonzero scalars, x0 ∈ R is an initial state, T > 0 is a fixed final
time, and q > 0 is a given positive scalar.
Solution:
We assume the following: we can consider p0 = −1, given that this problem is in R:
imposing it different than −1 would not change the result (which can be proved i.e.
by finding out the condition for optimality: when we set the gradient to 0, being p
and p0 both scalars, we could just eliminate p0 by including it in the value of the
new p := pnew .) We will consider this assumption also for the other scalar problems.
Page 32
Exercise Book Federico Berto
By substituting c1 into the previous expression, we can get the value for the optimal
p as
p∗ (t) = −qx∗ (T )ea(T −t) (244)
In which the optimal final point x∗ (T ) can be obtained by solving the differential
equation related to the state. Finally, we get the expression for the optimal control:
Given such a control, the corresponding dynamics are provided by the ODE
Page 33
Exercise Book Federico Berto
where the constant k > 0 modeling the growth rate of our reinvestment and u(t)x(t)
is the amount of reinvested output. Let us take as a payoff functional
Z T
J(u) = (1 − u(t))x(t)dt (249)
0
subject to (251)
ẋ(t) = ku(t)x(t) (252)
x(0) = x0 (253)
Solution:
subject to (255)
ẋ(t) = ku(t)x(t) (256)
x(0) = x0 (257)
where minimizing the negative cost functional is the same as maximizing the positive
one. We consider the Hamiltonian formulation as in the Equation 311 and for
simplicity we consider the notation x = x(t), u = u(t) and p = p(t), considering the
scalar case. The Hamiltonian will be:
Page 34
Exercise Book Federico Berto
Let’s consider the set U = [0, 1] for which u ∈ U. Then, by the Pontryagin’s
maximum principle, we have the following
We notice that this is a bang-bang controller with a switching time controlled by the
costate equation. By solving the costate equation, we can get the switching time.
The terminal condition is
p(T ) = Kx (x(T )) = 0 (264)
Therefore the ODE for the costate becomes:
(
ṗ∗ (t) = −1 − u∗ (t)(p∗ (t)k − 1)
(265)
p∗ (T ) = 0
Since p(T ) = 0 and k > 0, then by continuity for t close to the final time T we have
that p(t) ≤ k1 . In this case, u∗ (t) = 0 and the ODE becomes:
∗
ṗ (t) = −1
p∗ (T ) = 0 (266)
=⇒ p∗ (t) = T − t, ∗
p (t) ≤ 1
k
Page 35
Exercise Book Federico Berto
Exercise 4.13
A young investor has earned in the stock market a large amount of money S and
plans to spend it so as to maximize his enjoyment through the rest of his life without
working. He estimates that he will live exactly T more years and that his capital x(t)
should be reduced to zero at time T, i.e., x(T ) = 0. Also he models the evolution of
his capital by the differential equation
where x(0) = S is his initial capital, α > 0 is a given interest rate, and u(t) ≥ 0 is
his rate of expenditure. The total enjoyment he will obtain is given by
Z T p
e−βt u(t)dt (271)
0
Here β is some positive scalar, which serves to discount the future enjoyment. Find
the optimal control u∗ (t), t ∈ [0, T ].
Solution:
subject to (273)
ẋ(t) = αx(t) − u(t), x(0) = S, x(T ) = 0 (274)
where minimizing the negative enjoyment is the same as maximizing the positive
one.
We first calculate the Hamiltonian as in Equation 311 with the simplified notation
x = x(t), u = u(t), p = p(t), considering that the problem is with scalar variables
only:
√
H(t, x, u, p) = p(αx − u) + e−βt u (275)
Page 36
Exercise Book Federico Berto
Solving this differential equation we get that p(t) = c1 e−αt . The maximizing u∗ will
be:
Which is a maximum if the first derivative is equal to zero and the second derivative
is ≤ 0. Thus:
d 1 1
H = e−βt u− 2 − p = 0 (279)
du 2
1
=⇒ u (t) = 2 e−2βt
∗
(280)
4p
also
d2 1 3
2
H = −e−βt u− 2 ≤ 0 (281)
du 4
(282)
1 (2α−2β)t
u∗ (t) = e (283)
4c21
In order to obtain the optimal solution, we need to solve the linear ODE given by
the state
1
ẋ = αx − 2 e(2α−2β)t (284)
4c1
The solution to the ODE will be in the form x(t) = xh (t) + xp (t) where xh (t) is the
homogeneous solution and xp (t) is a particular solution . The homogeneous solution
is:
xh (t) = c2 eαt , c2 = constant (285)
The particular solution can be divided in two cases:
1. Case: α 6= 2β: Solving the ODE we get the general solution:
1
x(t) = xh (t) + xp (t) = c2 eαt − e(2α−2β)t (286)
4c21 (α− 2β)
Page 37
Exercise Book Federico Berto
Given the initial condition of x(0) = S and x(T ) = 0 we get the unknown
coefficients:
1 −S(α − 2β
2
= (287)
4c1 1 − e(α−2β)T
−Seα−2β)T
c2 = (288)
1 − e(α−2β)T
S(2β − α) (2α−2β)t
(α−2β)T
e , for α 6= 2β
u∗ (t) = 1 − e (292)
S e(2α−2β)t = S eαt ,
for α = 2β
T T
Exercise 4.14
Solve the problem
Z T p
∗
u := argmin J(u) = 1 + u(t)2 dt (293)
u:[0,T ]→R 0
subject to (294)
ẋ(t) = u(t), x(0) = x0 (295)
where x0 ∈ R is an initial state, T > 0 is a fixed final time.
Solution:
For simplicity, let us consider the simplified notation x = x(t), u = u(t) and p = p(t).
We first calculate the Hamiltonian as
H(t, x, u, p, p0 ) = hp, f (t, x, u)i + p0 L(t, x, u) (296)
Page 38
Exercise Book Federico Berto
Given that the problem is with scalar variables only, we can write the Hamiltonian
as:
√
H(t, x, u, p) = pu − 1 + u2 (297)
Since the function is concave, we can find the maximal for the Hamiltonian by setting
the derivative with respect to u equal to zero and considering the positive value for
u:
∂H u
=p− √ =0 (298)
∂u 1 + u2
=⇒ p2 + p2 u2 = u2 (299)
2
2 p
u = (300)
1 − p2
s
p2
=⇒ u = (301)
1 − p2
(302)
∗ ∂ h √ i
ṗ = −Hx = − 2
pu − 1 + u = 0 (303)
∂x
=⇒ p∗ (t) = K, where K ∈ R (304)
Moreover, given that this is a fixed-time, free-endpoint problem, we can also apply
the boundary condition:
since the final cost K is 0. Therefore, since the optimal p∗ (t) is constant, then K = 0
and we can write p∗ (t) = 0, ∀t ∈ [0, T ]. The optimal control then is:
s
p∗ (t)2
u∗ (t) = (306)
1 − p∗ (t)2
=⇒ u∗ (t) = 0, ∀t ∈ [0, T ] (307)
In this case, given that ẋ(t) = u(t) = 0, we will have that the state variable is
constant; thus x(t) = x(0) = x0 , ∀t ∈ [0, T ].
Page 39
Exercise Book Federico Berto
Exercise 4.15
Solve the problem
Z T p
∗
u := argmin J(u) = 1 + u(t)2 dt (308)
u:[0,T ]→R 0
subject to (309)
ẋ(t) = u(t), x(0) = x0 , x(T ) = x1 (310)
Solution:
Similarly to the previous exercise, let us consider the simplified notation x = x(t),
u = u(t) and p = p(t). We first calculate the Hamiltonian as
Given that the problem is with scalar variables only, we can write the Hamiltonian
as: √
H(t, x, u, p) = pu − 1 + u2 (312)
Since the function is concave, we can find the maximal for the Hamiltonian by setting
the derivative with respect to u equal to zero:
∂H u
=p− √ =0 (313)
∂u 1 + u2
=⇒ p2 + p2 u2 = u2 (314)
2
2 p
u = (315)
1 − p2
s
p2
=⇒ u = (316)
1 − p2
(317)
∗ ∂ √
ṗ = −Hx = − 2
pu − 1 + u = 0 (318)
∂x
=⇒ p∗ (t) = K, where K ∈ R (319)
Page 40
Exercise Book Federico Berto
We now need to consider the condition on x(T ) = x1 . Hence, given that x(0) = x0
ẋ = u (320)
s
T T
p(t)2
Z Z
=⇒ x(T ) − x(0) = udt = dt (321)
0 0 1 − p(t)2
since p(t) = K ∈ R, p does not depend on time t (322)
Z Ts 2 s
p p2
= dt = T (323)
0 1 − p2 1 − p2
2
p2
x 1 − x0
= (324)
t 1 − p2
(325)
Then, plugging this p∗ back into the equation for the optimal control, we get:
s s v
V
u V
∗
u (t) =
p 2
= 1+V
= t1+
u V = √V (329)
2 V 1+V−V
1−p 1 − 1+V 1+V
Which means that the optimal control will be a constant velocity moving the system
from x0 to x1 .
Page 41
Exercise Book Federico Berto
Solution:
b−a
We can prove this by contradiction. We assume a < b and arbitrarily choose ε = 2
.
By hypothesis, we have
b−a a b
a+ ≥ b ⇐⇒ ≥ ⇐⇒ a ≥ b (331)
2 2 2
which contradicts our assumption. Hence, a ≥ b.
Exercise 5.5
Complete the proof of the principle of optimality by showing the reverse inequality
V (t, x) ≤ V̄ (t, x).
Solution:
After defining
Z t+∆t
V (t, x) := inf L(s, x(s), u(s))ds + V (t + ∆t, x(t + ∆t)) (332)
u[t,t+∆t]→U t
We have shown that V (t, x) ≥ V (t, x). We can finish the proof of the equality by
proving that V (t, x) ≤ V (t, x) By definition of the value function, we have:
Page 42
Exercise Book Federico Berto
= V (t, x) (338)
Thus, we have shown that V (t, x) ≤ V (t, x). Combined with the already proved
V (t, x) ≥ V (t, x), we have shown the equality:
subject to (341)
ẋ(t) = Ax(t) + Bu(t), x(0) = x0 (342)
where
J(u) :=(x(T ) − xd (T ))T Qf (x(T ) − xd (T ))
Z T
(x(t) − xd (t))T Q(x(t) − xd (t)) + (u(t) − ud (t))T R(u(t) − ud (t)) dt
+
0
(343)
Page 43
Exercise Book Federico Berto
Find a formulation of an optimal policy and the corresponding HJB equation (similar
to the Riccati equation) using the sufficient condition.
Hint: consider a candidate of value functions of the following form
Solution:
∂ V̂ (x, t)
= x> Ṡ2 (t)x + x> ṡ1 (t) + ṡ0 (t) (347)
∂t
∂ V̂ (x, t)
= 2x> S2 (t)x + s1 (t) (348)
∂x
and then we can rearrange the HJB by substituting. We notice that, given the
problem’s convexity, we can obtain a global minimum u∗ (t) and therefore replace
the infimum with the minimum:
x> Ṡ2 (t)x + x> ṡ1 (t) + ṡ0 (t) = minm (x − xd (t))> Q(x − xd (t))
u∈R
+ (u − ud (t))> R(u − ud (t))
+ 2x> S2 (t)x + s1 (t) (Ax + Bu)
and we can also set the first the first derivative of the HJB to zero as a sufficient
condition for finding the optimal control policy:
∂
= 2(u − ud (t))> R + (2x> S2 (t) + s>
1 (t))B = 0 (349)
∂u
∗ −1 > 1
=⇒ u (t) = ud (t) − R B S2 (t)x + s1 (t) (350)
2
Page 44
Exercise Book Federico Berto
Plugging this result back in the HJB equation, we can get the conditions for the
coefficients as following:
S2 (T ) = Qf (355)
s1 (T ) = −2Qf xd (T ) (356)
s0 (T ) = xd (T )> Qf xd (T ) (357)
We also notice that the solution for S2 is the same as the simpler LQR problem;
hence as expected it is also positive semidefinite.
Z 1
∗
u := argmin J(u) := [x(t)u(t)]2 dt (358)
u:[0,1]→R 0
subject to (359)
ẋ(t) = x(t)u(t), t ∈ [0, 1] (360)
x(0) = 1 (361)
Obtain the optimal feedback solution by solving the associated HJB equation.
Hint: First show that the HJB partial differential equation admits a solution that is
quadratic in x.
Solution:
Page 45
Exercise Book Federico Berto
where we have substituted the infimum with the minimum; this is due to the
problem’s convexity. We can then find the u(t) minimizing the equation by setting
the first derivative with respect to u equal to 0 (we use the simplified notation here
too):
∂ 2 2
x u + V̂x xu = 0 (364)
∂u
=⇒ 2ux2 + V̂x x = 0 (365)
V̂x (t, x)
=⇒ u∗ (t) = − (366)
2x
(367)
Page 46
Exercise Book Federico Berto
This first-order nonlinear ordinary differential equation yields the following solution:
1
s(t) = (377)
c1 + t
In order to find the value of s(t), we can apply the boundary condition for optimality
for the final time tf inal = 1, which is
s(1) = 0 (379)
Combined with the nonlinear differential equation, we can notice that the only way
to obtain 0 is to make c1 → ∞. This means that, indeed, s(t) is independent from
time and is a constant s ∈ R:
The optimal value function is therefore independent from time and will be:
Since we have that ẋ(t) = x(t)u(t) = 0, then the system will keep the initial state
x(0) = x(t) = 1, ∀t ∈ [0, 1].
Page 47
Exercise Book Federico Berto
ẋ1 (t) 0 1 0 0 x1 (t) 0
ẋ3 (t) 0 0 − mg
M
0 1
x2 (t) + M u(t)
ẋ3 (t) = 0 (383)
0 0 1 x3 (t) 0
(M +m)g
ẋ4 (t) 0 0 Ml
0 x4 (t) − M1 l
where x1 (t) = x(t), x2 (t) = ẋ(t), x3 (t) = θ(t), and x4 (t) = θ̇(t).
Consider the LQR problem
Z ∞
∗
x(t)T Qx(t) + u(t)T Ru(t) dt
u := argmin J(u) := (384)
u:[0,∞)→Rm 0
subject to (385)
ẋ(t) = Ax(t) + Bu(t), x(0) = x0 (386)
where A and B are appropriate system matrices with m=1, M=2, l=3 and g=9.8.
Tasks:
1. Appropriately choose the weighting Q ≥ 0 and R > 0 to maintain the pendulum
at the vertical position, i.e., θ(t) = 0 and θ̇(t) = 0 as much as possible.
2. Find an optimal policy for the LQR problem using Python or Matlab functions.
Page 48
Exercise Book Federico Berto
3. Plot trajectories of the system x(t) and the control input u(t) over certain time
interval to demonstrate the performance of your optimal control policy.
4. In the answer, please include your Python or Matlab codes.
5. Change the weight and see how the performance changes and discuss about the
results.
Solution:
We want to keep the pendulum in the upright position as much as possible. Therefore,
we can choose the values of Q and R as following:
• Matrix Q: the values on the diagonal of the Q matrix act as ”penalties” for
the corresponding state variable i.e. we want the θ, θ̇ as small as possible hence
they have a greater penalty. As such, the designed matrix is:
1 0 0 0
0 1 0 0
Q= 0 0 100 0
(387)
0 0 0 100
1 # Import libraries
2 import numpy as np
3 import torch
4 import scipy.linalg
5
6 # Parameters
7 m = 1 # pendulum mass
8 M = 2 # cart mass
9 l = 3 # pendulum length
10 g = 9.8 # gravity acceleration
11
Page 49
Exercise Book Federico Berto
12 # Build matrices
13 A = np.array([[0, 1, 0, 0],
14 [0, 0, -m*g/M, 0],
15 [0, 0, 0, 1],
16 [0, 0, (M+m)*g/(M*l), 0]])
17 B = np.array([[0], [1/M], [0], [-1/(M*l)]])
18 rho = 0.1
19 Q = np.matrix([
20 [1, 0, 0, 0],
21 [0, 1, 0, 0],
22 [0, 0, 100, 0],
23 [0, 0, 0, 100]])
24 R = rho * np.eye(1) # control size is 1
Then we define the LQR controller as following, where X is the solution to the
Riccati equations solved via the Python library SciPy. The closed loop gain K is
then defined as:
K = R−1 B T X (389)
Python code:
1 def LQR(A,B,Q,R):
2 """Solve the continuous time lqr controller.
3 dx/dt = A x + B u
4 cost = integral x.T*Q*x + u.T*R*u
5 """
6 #ref Bertsekas, p.151
7
16 return K, X, eigVals
17
18 K, X, eigVals = LQR(A, B, Q, R)
Page 50
Exercise Book Federico Berto
In order to simulate the system, we design a class simulating the inverted pendulum
and its physical behavior in time, which is discretized in steps of τ = 0.02s:
1 class ControlledPendulum():
2 ''' Continuous version of the OpenAI Gym cartpole '''
3 def __init__(self, M, m, l, tau=0.02, g=9.81):
4 self.gravity = g
5 self.masscart = M
6 self.masspole = m
7 self.total_mass = (self.masspole + self.masscart)
8 self.length = l # actually half the pole's length
9 self.polemass_length = (self.masspole * self.length)
10 self.force_mag = 30.0
11 self.tau = tau # seconds between state updates
12 self.state = None # Initialize through reset
13
Page 51
Exercise Book Federico Berto
We can now start the simulation. We set up the model and set as initial state:
1
0
x0 = 0
(391)
0
11 # Control loop
12 for i in range(500):
13 u = - K.dot(x) # the control input is u* = -Kx
14 x = model.step(u) # propagate
15 trajectory.append(x)
16 controls.append(u)
Position and velocity plots We use the following code for plotting:
1 # Trajectory plotting
2 import matplotlib.pyplot as plt
3
Page 52
Exercise Book Federico Berto
We get the following result: We can see from Figure 2 that the system is able to
stabilize both the position and velocity of the system.
Angular position and angular velocity plots Using the following code we
obtain
We can infer from Figure 3 that the system is able to stabilize both the angular
Page 53
Exercise Book Federico Berto
position and angular velocity of the system and that their values are indeed very
close to 0.
Page 54
Exercise Book Federico Berto
We can see from Figure 4 that the system is using a control input u(t) which is able
to stabilize the state. Moreover, as expected, limt→∞ u(t) = 0.
Changing the weights We can now try to change the weights to see how the
system performance changes. Let’s say we want to penalize the position variation
more in order to get a faster stabilization of the position and velocity. We design
the matrix Q as:
Page 55
Exercise Book Federico Berto
100 0 0 0
0 100 0 0
Q=
0
(393)
0 100 0
0 0 0 100
while keeping all the other parameters the same for a fair comparison. We get the
control input gain as:
K = −31.6227766 −69.00397721 −612.25803618 −334.32523211 (394)
By comparison with the previous gain, we notice that the absolute value of the
parameters is higher; this will result in a more ”aggressive” control. The plot are
the following, in the same order as the experiment before in the below plots.
These results confirm our hypothesis: the position and velocity stabilize faster, at
the cost of a higher variation for the angular position and velocity. Moreover, since
the weight for the control input was chosen as 0.1, its absolute value is much larger
than before; the first was in the range of u(t) ∈ (−4, 2) while the new control input
has a much wider range around u(t) ∈ (−40, 20). Thus, a more aggressive controller
results in using more energy for a faster stabilization.
Page 56
Exercise Book Federico Berto
Page 57
Exercise Book Federico Berto
Solution:
In order to solve the Algebraic Riccati Equation, we have to find the matrices defining
the system. In this case, we have:
0 1 0 1 0
A= , B= , Q= , R= 1 (398)
0 0 1 0 1
We can now use the Python’s library SciPy to solve the ARE:
5 # Build matrices
6 A = np.matrix([[0, 1],
7 [0, 0]])
8 B = np.matrix([[0],[1]])
9 Q = np.eye(2) # Identity matrix, dim = 2
10 R = np.eye(1) # Identity matrix, dim = 1
11
By which we get
√
1.73205081 1 3 √1
P = ∼ (399)
1 1.73205081 1 3
Page 58
Exercise Book Federico Berto
By which we get
0 0
Ṗ = (401)
0 0
hence proving P is indeed the steady-state solution of the RDE.
Exercise 6.24
Consider the infinite-horizon LQR problem
Z ∞
∗
qx(t)2 + ru(t)2 dt
u := argmin J(u) := (402)
u:[0,∞)→R 0
subject to (403)
ẋ(t) = ax(t) + bu(t), t ∈ [t0 , ∞) (404)
x(t0 ) = x0 (405)
where a, q, r > 0 and b ∈ R is arbitrary. Find the optimal control law. Moreover,
show that for r → 0, the eigenvalue of the optimal closed-loop system moves off to
−∞, while for r → ∞, the eigenvalue of the optimal closed-loop system tends to −a
Solution:
Page 59
Exercise Book Federico Berto
with P being the solution to the ARE. In this case, we have scalars instead of
matrices, so the ARE becomes
1
0 = aP + P a + q − pb bp (409)
r
b2 P 2
=⇒ − + 2aP + q = 0 (410)
r
Solving the equation, we get as a solution (choosing the positive sign, since we have
the condition that P ≥ 0):
q
2 b2 q
r a+ a + r
P = (411)
b2
The eigenvalue of the closed loop system is the value of the scalar a − Kb in this
case.
Page 60
Exercise Book Federico Berto
where
−10 0 0 0 0 0 1
0.0729 −0.0558 −0.997 0.0802 0.0415 0 0
−4.75 0.598 −0.115 −0.0318 0 0 0
A= , B=
0 (419)
1.53
−3.05 0.388 −0.465 0 0
0 0 0.0805 1 0 0 0
0 0 1 0 0 −0.3333 0
and
C = 0 0 1 0 0 −0.3333 , D=0 (420)
Minimize the sum of the energy of the output y and the energy of the control u. The
main effort is to minimize the energy of y which is supposed to be zero in a steady
state condition. So we put a weight q = 9.527 > 1 on the energy of y. The problem
now is as follows.
Z ∞
∗
u := argmin J(u) := [qy(t)T y(t) + u(t)2 ]dt (421)
u:[0,∞)→R 0
subject to (422)
ẋ(t) = Ax(t) + Bu(t), t ∈ [t0 , ∞) (423)
y(t) = Cx(t) + Du(t), t ∈ [t0 , ∞) (424)
x(t0 ) = x0 (425)
Tasks:
1. Find an optimal policy for the LQR problem by solving ARE using Python or
Matlab functions.
2. Plot trajectories of y(t) and u(t).
3. In the answer, please include your Python or Matlab codes.
Solution:
Page 61
Exercise Book Federico Berto
First of all, we need to find the matrix Q. We can see that, given D = 0, then
hence we can consider Q = C T qC. We solve the problem in Python. The first step
is declaring all the variables:
1 import numpy as np
2 import torch
3 import scipy.linalg
4
5 # Build matrices
6 A = np.matrix([[-10, 0, 0, 0, 0, 0],
7 [0.0729, -0.0558, -0.997, 0.0802, 0.0415, 0],
8 [-4.75, 0.598, -0.115, -0.0318, 0, 0],
9 [1.53, -3.05, 0.388, -0.465, 0, 0],
10 [0, 0, 0.0805, 1, 0, 0],
11 [0, 0, 1, 0, 0, -0.3333]])
12 B = np.matrix([[1], [0], [0], [0], [0], [0]])
13 C = np.matrix([0, 0, 1, 0, 0, -0.3333])
14 D = 0*np.eye(1)
15
Then we define the LQR controller as following, where X is the solution to the
Riccati equations solved via the Python library SciPy. The closed loop gain K is
then defined as:
K = R−1 B T X (428)
Python code:
1 def LQR(A,B,Q,R):
2 """Solve the continuous time lqr controller.
3 dx/dt = A x + B u
4 cost = integral x.T*Q*x + u.T*R*u
Page 62
Exercise Book Federico Berto
5 """
6 #ref Bertsekas, p.151
7
16 return K, X, eigVals
17
18 K, X, eigVals = LQR(A, B, Q, R)
By running the code with our designed controller we get the gain as
1.05951967 −0.19104882 −2.31972318 0.09916995 0.03704914 0.48581648
(429)
In order to simulate the system, we design a class simulating the lateral model of
the Boeing 747 and its physical behavior in time, which is discretized in steps of
τ = 0.02s:
1 class ControlledBoeing747():
2 def __init__(self, A, B, C, D, dt = 0.02):
3 """ Simulate the system by calculating the state
4 variation with: x_{t+1} = x_t + dx*dt
5 where dx = Ax + Bu and y = Cx +Du
6
Page 63
Exercise Book Federico Berto
We set as initial condition the following x0 with the Numpy’s random function
generator:
1 # Declare model
2 model = ControlledBoeing747(A, B, C, D)
3
4 # Initial condition
5 x0 = np.matrix(np.random.uniform(-0.5, 0.5, size=6)).T
6 model.reset(x_initial=x0)
and then we run for 1000 time steps, equivalent to a simulation of 20 seconds:
Page 64
Exercise Book Federico Berto
1 # Trajectory plotting
2 import matplotlib.pyplot as plt
3
8 y, u = [], []
9 for i in range(len(trajectory_outputs)):
10 y.append(trajectory_outputs[i].item())
11 u.append(controls[i].item())
12
Control input plot We use the following code for plotting the control input:
Page 65
Exercise Book Federico Berto
As we can see from the graphs, the LQR controller is able to stabilize the output
and bring it to a stable state within the time span we set. Moreover, as we expected,
the control input satisfies the condition that limt→∞ u∗ (t) = 0 and also zeroes out
the output in the steady state.
Page 66