8-Data Driven MPC
8-Data Driven MPC
Learning-based MPC
Alberto Bemporad
imt.lu/ab
Hybrid MPC
Stochastic MPC
Course page:
http://cse.lab.imtlucca.it/~bemporad/mpc_course.html
mechanical design
Lyapunov methods LMI-based methods
nonlinear control stability analysis
1970 1980
feedback synthesis
1900 semidefinite
functional
programming robust control
analysis
complex
analysis
1930-1950
root locus
robust control
>1990
\
machine
?
learning
linear (ML)
algebra
past future r(t)
yk
>2020
pole-placement uk
predicted outputs
1960-1970
t t+k t+N
Kalman filtering numerical
optimization model predictive control (MPC)
state-space
>1990
(source: https://books.google.com/ngrams)
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 4/85
Machine Learning (ML)
• Massive set of techniques to extract mathematical models from data
- Ridge classification
- Linear PCA
- Logistic regression
- Nonlinear PCA Dimensionality
- Na•ve Bayes classification Classification
- Autoencoders Reduction
- É
- É
- Support vector machines
- K-nearest neighbors
- Decision trees
- Ensemble methods (bagging,
Unsupervised Machine Supervised bootstrap, random forests)
Learning - Neural networks
Learning Learning
- É
Regression
Clustering
Semi-
Supervised
Learning - Linear regression (least-squares,
- K-means clustering ridge regression, Lasso, elastic-net)
- Density-based spatial clustering Reinforcement - Kernel least-squares
- É Learning - Support vector regression
- Gaussian process regression
- É
– Policy gradient methods: learn optimal policy coef昀椀cients directly from data using
stochastic gradient descent
x prediction
y
data
model
Gauss
• Gray-box (or physics-informed) models: mix of the two, can be quite effective
– I/O data only: set xt = value of an inner layer of the network (Prasad, Bequette, 2003)
such as an autoencoder (Masti, Bemporad, 2021)
• Alternative for MPC: learn entire prediction (Masti, Smarra, D'Innocenzo, Bemporad, 2020)
model-based optimizer
process
(aecdiagnostics.com)
state estimator
measurements
1 2 3 4
collect train codegen deploy
C2 H4 + 21 O2 → C2 H4 O
C2 H4 + 3O2 → 2CO2 + 2H2 O
C2 H4 O + 25 O2 → 2CO2 + 2H2 O
x1 = gas density
= u1 (1 − x1 x4 ) x2 = ethylene concentration
ẋ1
γ1 γ2
= ethylene oxide concentration
1 1
x3
u1 (u2 − x2 x4 ) − A1 e x4 (x2 x4 ) 2 − A2 e x4 (x2 x4 ) 4
ẋ2 =
γ1 γ3
= temperature in reactor
1 1
x4
= x x
−u1 x3 x4 + A1 e 4 (x2 x4 ) 2 − A3 e 4 (x3 x4 ) 2
ẋ3
γ1 1
γ2 1
u1 (1−x4 )+B1 e x4 (x2 x4 ) 2 +B2 e x4 (x2 x4 ) 4
ẋ4 = u1 = feed volumetric 昀氀ow rate
x1
γ3
u2 = ethylene concentration in feed
1
B e x4 (x3 x4 ) 2 −B4 (x4 −Tc )
+ 3
x1
y = x3
xk+1 = N (xk , uk )
0.05
0.04
6 inputs, 4 outputs
0.01
0 50 100 150 200
→ 112 coe昀케cients
5
x fit error
3
4
-2
-4
0 50 100 150 200
x3,k+1 reproduced from xk , uk with max 0.4% error
validation sample
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 13/85
MPC of Ethylene Oxidation Plant
• MPC settings:
sampling time Ts = 5 s measured disturbance @t=200
prediction horizon N = 10
control horizon Nu = 3
constraints 0.0704 ≤ u1 ≤ 0.7042
PN −1
cost function 2
k=0 (yk+1 − rk+1 ) +
1
100 (u1,k − u1,k−1 )2
0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250
decoder D D
f
xk x*k+1 xk+1
state
update
map
encoder E E
publicdomainvectors.org
s.t. xk = e(Ik−1 ), k = k0 , . . . , N
x⋆k+1 = f (xk , uk ), k = k0 , . . . , N − 1 ′
Ok =[yk ′
. . . yk−m ]′
Ôk = d(xk ), Ok⋆ = d(x⋆k ), k = k0 , . . . , N ′ ′
Ik =[yk . . . yk−n u′ . . . u′k−n
a +1 k
]′
b +1
• Quasi-LPV structure for MPC: set f (xk , uk ) = A(xk , uk ) [ x1k ] + B(xk , uk )uk
(Aij , Bij , Cij = feedforward NNs) yk = C(xk , uk ) [ x1k ]
yk = hk (x0 , u0 , . . . , uk−1 ), k = 1, . . . , N
1.5
0.0
N
X −1
min r(¹) + ℓ(yk , f (xk , ¹))
θ
k=0
xk+1 = fx (xk , uk , ¹x )
yk = fy (xk , ¹y ) vj = Aj fj−1 (vj−1 ) + bj
fx , f y = feedforward neural network
θ = (A1 , b1 , . . . , AL , bL )
N −1
1 X
min r(x0 , ¹x , ¹y ) + ℓ(yk , fy (xk , ¹y ))
θx , θ y N
k=0
x0 , x1 , . . . , xN −1
s.t. xk+1 = fx (xk , uk , ¹x )
• Main issue: xk are hidden states, i.e., are unknowns of the problem
• RNN and its hidden state xk can be estimated on line from a streaming dataset
{uk , yk }, and/or of昀氀ine by processing multiple epochs of a given dataset
• N =2000 data used for training, 1499 for testing the model
• Same data used in NNARX modeling demo of SYS-ID Toolbox for MATLAB
10 4
MSE loss
3
10 1
0 5 10 15 20
MSE loss
10 3
2
10
1
10
0 100 200 300 400 500
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. epoch 26/85
Training RNNs by EKF - Examples
• Compare BFR1 wrt NNARX model (SYS-ID TBX):
Test data: open-loop simulation (on a model instance)
80
60
-20
-40
h i -60
θx
• Repeat training with ℓ1 -penalty τ θy -80
1 0 500 1000 1500
samples
95 100
percentage of zeros
90 80
BFR (%)
85 60
BFR (test data)
80 BFR (training data) 40
percentage of zeros in 3x ,3y
75 20
70 0
-6 -5 -4 -3
10 10 10 10
`1 -regularization parameter =
• Linearized model: given a current guess ¹xh , ¹yh , xh0 , . . . , xhN −1 , approximate
M0x = I, M0θx = 0
M(k+1)x = ∇x fx (xhk , uk , ¹xh )Mkx
M(k+1)θx = ∇x fx (xhk , uk , ¹xh )Mkθx + ∇θx fx (xhk , uk , ¹xh )
• Update xh+1
0 , ¹xh+1 , ¹yh+1 by applying either a
10 2
0 1 2 3 4 5 6 7 8 9 10 20 40 60
training time (s) training time (s)
PN −1
minθx ,θy ,x0 r(x0 , ¹x , ¹y ) + k=0 ℓ(yk , fy (xk , ¹y )) + g(¹x , ¹y )
s.t. xk+1 = fx (xk , uk , ¹x )
• ADMM + Seq. LS = NAILS algorithm (Nonconvex ADMM Iterations and Sequential LS)
xt+1
" #
0 t t 2
(sequential) LS
h i
ρ θx −νx +wx
t+1
θx = arg minx0 ,θx ,θy V (x0 , ¹x , ¹y ) + 2 t
θy −νy t
+wy
t+1
θy 2
t+1
proximal step
νx
t+1
νy
= prox 1 g (¹xt+1 + wxt , ¹yt+1 + wyt )
ρ
t+1 h t+1 t+1
update dual vars
wx wx +θx −νx
t+1
wy
= h
wy t+1
+θy t+1
−νy
percentage of zeros
80 80
τx = τ y = τ
BFR (%)
g Pn g
• Fluid-damper example: group-Lasso regularization g(νi ) = τg i=1
x
∥νi ∥2
to zero entire rows and columns and reduce state-dimension automatically
100 8
80
BFR (test data)
4
(best 昀椀t on test data)
70
BFR (training data) 2
final model order
60 0
10 -4 10 -3 10 -2 10 -1 10 0 10 1
group-lasso regularization parameter =g
-0.2
test data training data
0 2 4 6 8 10 12
4
#10
input [V]
0.1
0
(Schoukens, Ljung, 2019)
-0.1 test data training data
0 2 4 6 8 10 12
sample #10
4
0.01 2 0.1
• ℓ2 -regularization: r(¹x0 , ¹x , ¹y ) = 2 (∥¹x ∥2 + ∥¹y ∥22 ) + 2
2 ∥¹x0 ∥2
• Training: use NAILM over 150 epochs (1 epoch = 77505 training samples)
time [s]
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 42/85
Deep Nonlinear MPC for Autonomous Driving
• Model validation on test data:
one-step ahead prediction on test data open-loop predictions
Tensorflow
Keras automatic Embedded MPC
PyTorch C-code gen
scikit-learn
ODYS-NN training
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 44/85
Direct Data-driven MPC
Direct data-driven MPC
optimization
prediction model algorithm
model-based optimizer
process
(aecdiagnostics.com)
measurements
pp
d
d
rr ee uu yo yy rrvv yy
−
Kp G M
M
M
r
r0 y
u
M’
θ u
4.5 5
r
with MPC
u [V]
4 without MPC 0
-5
3.5 5 10 15 20 25 30
θ [rad]
∆u
3 0.5
∆ u [V]
0
2.5
-0.5
2 5 10 15 20 25 30
5 10 15 20 25 30 Time [s]
constraints on input
Time [s]
desired tracking
performance achieved increments satisfied
d
r e uu yo y
rro MPC −
Kp G
M
? p
• Evaluating J(¹) requires synthesizing Kp (¹) from data and simulating the
nominal model and control law
yp (θ, t) = M(θ)r(t) up (θ, t) = Kp (θ)(r(t) − yp (θ, t))
∆up (θ, t) = up (θ, t) − up (θ, t − 1)
z − 0.4 4
G(z) = 2
z + 0.15z − 0.325 2
-4
LQR design) -8
2.4 2.6 2.8 3 3.2 3.4 3.6
1.5
yL (t) = G(z)u(t) 1
0.5
-0.5
-1
ut = π(st , pt )
• Simpli昀椀cations:
Kt ← Kt−1 − αt D(Kt−1 )
Np Nq
Ns X
(i) (j) (k)
X X
D(Kt−1 ) = ∇K JL (πKt−1 , s0 , {rℓ , dℓ }) st
1. Acquire current st
0
ni no L
3 3 20
−2
rt N0 Nr Nq
yt 50 1 10
−4
0 10000 20000 30000
Time t
0
0 10000 20000 30000
Time t
model is unknown
Feed:
- concentration: 10kg mol/m3
- temperature: 298.15K
" # " #
1 0 0.01 0
Qy = R = 0.1 Qq =
0 0 0 0
8
Validation phase
7 rt
yt
Cost of KSGD = 4.3 · 103 Continuously Stirred Tank Reactor (CSTR)
temperature T
10
330 (courtesy: apmonitor.com)
8
320
6
yt
x2t
310
4
KSGD
300 2 rt
290
coolant temperature TC
Cost of KID = 2.4 · 104
320
10
SGD beats SYS-ID + LQR
300 KID
8 rt
ut
280 6
yt
4
260
2
0 5000 10000
Time t 0 10000 20000
Time t
T
X
f (x) = ∥y(t) − r(t)∥2
t=0
(tracking quality)
• The algorithm should not require the gradient ∇f (x) of f (x), in particular if
experiments are involved (derivative-free or black-box optimization )
• The algorithm should not get stuck on local minima (global optimization)
• The algorithm should make the fewest evaluations of the cost function f
(which is expensive to evaluate)
s.t. ℓ ≤ x ≤ u
1
g(x) ≤ 0
0.5
• CAVEAT: build and minimize fˆ(xi ) iteratively may easily miss global optimum!
2 −1 ∑N 1
z(x) = π ∆F tan 1.5
i=1 wi (x)
or 0 if x ∈ {x1 , . . . , xN } 1
0.5
2
e−∥x−xi ∥ 0
where wi (x) =
-3 -2 -1 0 1 2 3
∥x − xi ∥2
∆F = observed range of f (xi )
200
6000 camelsixhumps 2 4.82 0.62
BO BO
GLIS 4000 GLIS
hartman3 3 26.27 3.35
100
2000 hartman6 6 54.37 8.80
0
5 10 15 20 25
0
5 10 15 himmelblau 2 7.40 0.90
0 0
rosenbrock8 8 63.09 13.73
BO
GLIS
BO
GLIS stepfunction2 4 11.72 1.81
-2 -2
styblinski-tang5 5 37.02 6.10
-4 -4
10 20 30 40 50 20 40 60 80
1500
BO
3
10 8
BO
Results computed on 20 runs per test
1000 GLIS 2 GLIS
500 1
0
5 10 15 20
0
20 40 60 80
BO = MATLAB's bayesopt fcn
10 4
4
BO 200 BO
GLIS GLIS
2
0
0 -200
5 10 15 20 25 10 20 30 40 50 60
50−1
X
min (yk+1 − r(t))2 + (W ∆u (uk − uk−1 ))2
k=0
s.t. xk+1 = Axk + Buk
yc = Cxk
−1.5 ≤ uk ≤ 1.5
uk ≡ uNu , ∀k = Nu , . . . , N − 1 t t+Nu t+N
• Range: −5 ≤ x1 ≤ 3 and 1 ≤ x2 ≤ 50
-0.5
180
-1
-1.5
0 10 20 30 40 50 60 70 80 90 100 160
input 140
1.5
1
120
0.5
0 100
-0.5
80
-1
-1.5 60
0 10 20 30 40 50 60 70 80 90 100 0 50 100 150
function evaluations
Z T
• Closed-loop performance score: J = |p(t) − pref (t)| + 30|ϕ(t)|dt
0
Position (m)
pref pref
0.5 0.5
0.0 0.0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
10 φ 10 φ
Angle (deg)
Angle (deg)
0 0
−10 −10
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
u u
5 5
Force (N)
Force (N)
0 0
−5 −5
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
• Bayesian optimization gives similar results, but with larger computation effort
• Cons:
Often have multiple objectives, not clear how to blend them in a single one
• Key idea: learn a surrogate of the (latent) objective function from preferences
T arget 1 2
3 4
• Realistic image synthesis of material appearance are based on models with
many parameters x1 , . . . , xn
• De昀椀ning an objective function f (x) is hard, while a human can easily assess
whether an image resembles the target one or not
• Preference gallery tool: at each iteration, the user compares two images
generated with two different parameter instances
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 73/85
Active preference learning algorithm
(Bemporad, Piga, Machine Learning, 2021)
latent function f(x)
surrogate function ^
f(x)
xN+1
• Fit a surrogate fˆ(x) that respects the preferences expressed by the decision
maker at sampled points (by solving a QP)
parameters to test
k=0
input
s.t. xk+1 = Axk + Buk 2
1
yc = Cxk
0
−1.5 ≤ uk ≤ 1.5
-1
uk ≡ uNu , ∀k = Nu , . . . , N − 1
-2
0 10 20 30 40 50 60 70 80 90 100
• Result: W ∆u = 0.6888, Nu = 2
30
120
25
110
20
100
15
90
10
5 80
0
10-6 10-4 10-2 100 102 104 0 20 40 60 80 100 120 140
Wdu
– sampling time
– prediction and control horizons
– weights on input increments ∆v , ∆δ
Ts = 0.332 s, Nu = 16, Np = 17, log(qu11) = 0.06, Ts = 0.243 s, Nu = 12, Np = 17, log(qu11) = 0.19,
log(qu22) = 2.02,tcomp: 0.0867 s log(qu22) = 0.70, tcomp: 0.0846 s
vehicle vehicle
obstacle obstacle
6 vehicle OA 6 vehicle OA
yf [m]
yf [m]
obstacle OA obstacle OA
3 3
0 0
0 50 100 150 200 250 0 50 100 150 200 250
80 Input
80 Input
v [km/hr]
v [km/hr]
Reference Reference
70 70
60 60
50 50
40 40
0 50 100 150 200 250 0 50 100 150 200 250
50 50
25 25
[°]
[°]
0 0
s
-25 -25
-50 -50
0 50 100 150 200 250 0 50 100 150 200 250
xf [m] xf [m]
4
3
vehicle
obstacle
Optimal MPC parameters:
yf [m]
2 vehicle OA
1 obstacle OA
0
-1
0 50 100 150 200 250
– sample time = 85 ms (CPU time = 80.8 ms)
75
70
Input – prediction horizon = 16
v [km/hr]
Reference
65
60
55
– control horizon = 5
– weight on ∆v = 1.82
50
0 50 100 150 200 250
20
10
– weight on ∆δ = 8.28
[°]
0
s
-10
-20
0 50 100 150 200 250
xf [m]
• Example:
– x = (initial distance between ego car and obstacle, obstacle acceleration, …)
– Critical scenario: time-to-collision is too short, excessive jerk of ego car, …
v sin(δ)
¹̇ = xf
L
(xf , wf ) = front-wheel position
i
Icollision = true if ∃t ∈ Tsim s.t. wf
SV
(dSV,i SV,i
xf (x, t) ≤ L) & (dwf (x, t) ≤ W )
SV
79 28.09 30.00 70.29 10.00 74.79 31.74 SV 1
40 34.30 30.00 60.59 10.00 77.80 35.97
red = optimal solution found by GLIS solver
Ego car changes lane to avoid #1, but
cannot brake fast enough to avoid #2
predicted outputs
manipulated inputs