Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
79 views

8-Data Driven MPC

This document discusses using machine learning techniques to design model predictive control (MPC). It describes using machine learning to identify prediction models from data and then using reinforcement learning to learn the optimal MPC law directly from data. Specifically, it mentions using autoencoders or recurrent neural networks to identify nonlinear models, and Q-learning or policy gradient methods to learn the MPC policy. The goal is to design MPC systems from data using combined machine learning and control approaches.

Uploaded by

hinsermu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

8-Data Driven MPC

This document discusses using machine learning techniques to design model predictive control (MPC). It describes using machine learning to identify prediction models from data and then using reinforcement learning to learn the optimal MPC law directly from data. Specifically, it mentions using autoencoders or recurrent neural networks to identify nonlinear models, and Q-learning or policy gradient methods to learn the MPC policy. The goal is to design MPC systems from data using combined machine learning and control approaches.

Uploaded by

hinsermu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

Model Predictive Control

Learning-based MPC

Alberto Bemporad

imt.lu/ab

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 1/85


Course structure
 Basic concepts of model predictive control (MPC) and linear MPC

 Linear time-varying and nonlinear MPC

 Quadratic programming (QP) and explicit MPC

 Hybrid MPC

 Stochastic MPC

• Learning-based MPC (or data-driven MPC)

Course page:
http://cse.lab.imtlucca.it/~bemporad/mpc_course.html

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 2/85


Machine learning and control engineering

mechanical design
Lyapunov methods LMI-based methods
nonlinear control stability analysis
1970 1980
feedback synthesis
1900 semidefinite
functional
programming robust control
analysis
complex
analysis

frequency domain system identification


statistics
Bode, Nyquist

1930-1950
root locus

robust control

>1990
\
machine
?
learning
linear (ML)
algebra
past future r(t)

yk
>2020
pole-placement uk
predicted outputs

LQR manipulated inputs

1960-1970
t t+k t+N
Kalman filtering numerical
optimization model predictive control (MPC)
state-space
>1990

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 3/85


MPC and ML
• MPC and ML = main trends in control R&D in industry !

model predictive control machine learning

nonlinear control system identification PID control

(source: https://books.google.com/ngrams)
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 4/85
Machine Learning (ML)
• Massive set of techniques to extract mathematical models from data

- Ridge classification
- Linear PCA
- Logistic regression
- Nonlinear PCA Dimensionality
- Na•ve Bayes classification Classification
- Autoencoders Reduction
- É
- É
- Support vector machines
- K-nearest neighbors
- Decision trees
- Ensemble methods (bagging,
Unsupervised Machine Supervised bootstrap, random forests)
Learning - Neural networks
Learning Learning
- É

Regression
Clustering
Semi-
Supervised
Learning - Linear regression (least-squares,
- K-means clustering ridge regression, Lasso, elastic-net)
- Density-based spatial clustering Reinforcement - Kernel least-squares
- É Learning - Support vector regression
- Gaussian process regression
- É

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 5/85


Machine Learning (ML)

• Good mathematical foundations from arti昀椀cial intelligence, statistics,


optimization

• Works very well in practice (despite training is most often a nonconvex


optimization problem ...)

• Used in myriads of very diverse application domains

• Availability of excellent open-source software tools also explains success


scikit-learn, TensorFlow/Keras, PyTorch, JAX, Flux.jl, ...

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 6/85


MPC design from data
1. Use machine learning to get a prediction model from data (system
identi昀椀cation)

– Autoencoders, recurrent neural networks (nonlinear models)

– Online learning of feedforward/recurrent neural networks by EKF

– Piecewise af昀椀ne regression to learn hybrid models

2. Use reinforcement learning to learn the MPC law from data

– Q-learning: learn Q-function de昀椀ning the MPC law from data

– Policy gradient methods: learn optimal policy coef昀椀cients directly from data using
stochastic gradient descent

– Global optimization methods: learn MPC parameters (weights, models, horizon,


solver tolerances, ...) by optimizing observed closed-loop performance

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 7/85


Learning prediction models for MPC
Control-oriented nonlinear models
• Black-box models: purely data-driven. Use training data to 昀椀t a prediction
model that can explain them

x prediction
y
data
model

• Physics-based models: use physical principles to create a prediction model


(e.g.: weather forecast, chemical reaction, mechanical laws, ...)

Gauss

Pascal Faraday x prediction y


Galileo
model

Maxwell Newton Boyle

• Gray-box (or physics-informed) models: mix of the two, can be quite effective

"All models are wrong, but some are useful."


(George E. P. Box)
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 8/85
Models for control systems design
• Prediction models for model predictive control: Linear models
- linear I/O models (ARX, ARMAX,...)
– Complex model = complex controller
- subspace linear SYS-ID
→ model must be as simple as possible
- linear regression
– Easy to linearize (to get Jacobian matrices (ridge, elastic-net, Lasso)
for nonlinear optimization)

• Prediction models for state estimation: Piecewise linear models


– Complex model = complex Kalman 昀椀lter - decision-trees
- neural nets + (leaky)ReLU
– Easy to linearize
- K-means + linear models

• Models for virtual sensing:


– No need to use simple models Nonlinear linear models
(except for computational reasons) - basis functions + linear regression
- neural networks
• Models for diagnostics: - K-nearest neighbors
- support vector machines
– Usually a classi昀椀cation problem to solve
- kernel methods
– Complexity is also less of an issue - random forests
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 9/85
Nonlinear SYS-ID based on Neural Networks
• Neural networks proposed for nonlinear system identi昀椀cation since the ’90s
(Narendra, Parthasarathy, 1990) (Hunt et al., 1992) (Suykens, Vandewalle, De Moor, 1996)

• NNARX models: use a feedforward neural network to approximate the


nonlinear difference equation yt ≈ N (yt−1 , . . . , yt−na , ut−1 , . . . , ut−nb )

• Neural state-space models:


– w/ state data: 昀椀t a neural network model xt+1 ≈ Nx (xt , ut ), yt ≈ Ny (xt )

– I/O data only: set xt = value of an inner layer of the network (Prasad, Bequette, 2003)
such as an autoencoder (Masti, Bemporad, 2021)

• Alternative for MPC: learn entire prediction (Masti, Smarra, D'Innocenzo, Bemporad, 2020)

yt+k = hk (xt , ut , . . . , ut+k−1 ), k = 1, . . . , N

• Recurrent neural networks are more appropriate for accurate open-loop


predictions, but more dif昀椀cult to train (see later ...)
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 10/85
NLMPC based on Neural Networks
• Approach: use a neural network model for prediction
neural
prediction nonlinear
model optimization
algorithm

model-based optimizer
process

set-points inputs outputs


r(t) u(t) y(t)

(aecdiagnostics.com)

state estimator
measurements

• MPC design work昀氀ow:

1 2 3 4
collect train codegen deploy

data neural model NLMPC controller

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 11/85


MPC of Ethylene Oxidation Plant
• Chemical process = oxidation of ethylene to ethylene oxide in a nonisothermal
continuously stirred tank reactor (CSTR)

C2 H4 + 21 O2 → C2 H4 O
C2 H4 + 3O2 → 2CO2 + 2H2 O
C2 H4 O + 25 O2 → 2CO2 + 2H2 O

• Nonlinear model (dimensionless variables): (Durand, Ellis, Christo昀椀des, 2016)

x1 = gas density
= u1 (1 − x1 x4 ) x2 = ethylene concentration

ẋ1
γ1 γ2

= ethylene oxide concentration
 1 1
x3

u1 (u2 − x2 x4 ) − A1 e x4 (x2 x4 ) 2 − A2 e x4 (x2 x4 ) 4

ẋ2 =



γ1 γ3
= temperature in reactor

1 1
x4



= x x
−u1 x3 x4 + A1 e 4 (x2 x4 ) 2 − A3 e 4 (x3 x4 ) 2
 ẋ3


γ1 1
γ2 1
u1 (1−x4 )+B1 e x4 (x2 x4 ) 2 +B2 e x4 (x2 x4 ) 4
ẋ4 = u1 = feed volumetric 昀氀ow rate

x1


γ3

u2 = ethylene concentration in feed

 1
B e x4 (x3 x4 ) 2 −B4 (x4 −Tc )



 + 3
x1




y = x3

• u1 = manipulated variables, x3 = controlled output, u2 = measured disturbance

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 12/85


Neural Network Model of Ethylene Oxidation Plant
• Train state-space neural-network model x3 (validation data)
0.07
x
3

0.06 open-loop predicted x


3

xk+1 = N (xk , uk )
0.05

0.04

1,000 training samples {uk , xk }


0.03

2 layers (6 neurons, 6 neurons)


0.02

6 inputs, 4 outputs
0.01
0 50 100 150 200

sigmoidal activation function


10-5

→ 112 coe昀케cients
5
x fit error
3
4

• NN model trained by ODYS Deep Learning toolset 1

(model 昀椀tting + Jacobians → neural model in C) -1

-2

• Model validated on 200 samples.


-3

-4
0 50 100 150 200
x3,k+1 reproduced from xk , uk with max 0.4% error
validation sample
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 13/85
MPC of Ethylene Oxidation Plant
• MPC settings:
sampling time Ts = 5 s measured disturbance @t=200
prediction horizon N = 10
control horizon Nu = 3
constraints 0.0704 ≤ u1 ≤ 0.7042
PN −1
cost function 2
k=0 (yk+1 − rk+1 ) +
1
100 (u1,k − u1,k−1 )2

• We compare 3 different con昀椀gurations:

– NLMPC based on physical model

– Switched linear MPC based on 3 linear models obtained by linearizing the


nonlinear model at C2 H4 O = {0.03, 0.04, 0.05}

– NLMPC based on black-box neural network model

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 14/85


MPC of Ethylene Oxidation Plant - Closed-loop results
C2 H4 O concentration C2 H4 O concentration C2 H4 O concentration
0.06 0.06 0.06

0.055 0.055 0.055

0.05 0.05 0.05

model-based NLMPC switched linear MPC neural NLMPC


0.045 0.045 0.045

0.04 0.04 0.04

0.035 0.035 0.035

0.03 0.03 0.03

0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250

time (s) time (s) time (s)

• Neural and model-based NLMPC have similar closed-loop performance


• Neural NLMPC requires no physical model
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 15/85
Learning nonlinear state-space models for MPC
(Masti, Bemporad, 2021)

• Idea: use autoencoders and arti昀椀cial neural networks to learn a nonlinear


state-space model of desired order from input/output data

yk, &, yk-n+1 yk+1, &, yk-n+2

decoder D D
f

xk x*k+1 xk+1
state
update
map
encoder E E

publicdomainvectors.org

yk-1, &, yk-n uk yk, &, yk-n+1


uk-1, &, uk-m uk, &, uk-m+1
ANN with hourglass structure
(Hinton, Salakhutdinov, 2006)

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 16/85


Learning nonlinear state-space models for MPC
• Training problem: choose na , nb , nx and solve
output
map
N
∑ −1 ( )
min ³ ℓ1 (Ôk , Ok ) + ℓ1 (Ôk+1 , Ok+1 )
f,d,e
k=k0 state map
+´ℓ2 (x⋆k+1 , xk+1 ) + µℓ3 (Ok+1 , Ok+1
⋆ ) dead-beat
observer

s.t. xk = e(Ik−1 ), k = k0 , . . . , N
x⋆k+1 = f (xk , uk ), k = k0 , . . . , N − 1 ′
Ok =[yk ′
. . . yk−m ]′
Ôk = d(xk ), Ok⋆ = d(x⋆k ), k = k0 , . . . , N ′ ′
Ik =[yk . . . yk−n u′ . . . u′k−n
a +1 k
]′
b +1

• Model complexity can be reduced by adding group-LASSO penalties

• Quasi-LPV structure for MPC: set f (xk , uk ) = A(xk , uk ) [ x1k ] + B(xk , uk )uk
(Aij , Bij , Cij = feedforward NNs) yk = C(xk , uk ) [ x1k ]

• Different options for the state-observer:


– use encoder e to map past I/O into xk (deadbeat observer)
– design extended Kalman 昀椀lter based on obtained model f, d
– simultaneously 昀椀t state observer x̂k+1 = s(xk , uk , yk ) with loss ℓ4 (x̂k+1 , xk+1 )
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 17/85
Learning nonlinear neural state-space models for MPC
• Example: nonlinear two-tank benchmark problem
 p
 x1 (t + 1) = x1 (t) − k1 px1 (t) + k2 p
 u(t)
x2 (t + 1) = x2 (t) + k3 x1 (t) − k4 x2 (t)

 y(t) = x (t) + u(t)
2

Model is totally unknown to learning algorithm


LTV MPC

● The performance achieved with the derivative-based controller suggests that


www.mathworks.com an LTV-MPC formulation might also works well. We also assess its
robustness using a model achieving 61% BFR in open loop

• Arti昀椀cial neural network (ANN): 3 hidden layers


60 exponential linear unit (ELU) neurons

• For given number of model parameters,


autoencoder approach is superior to NNARX

• Jacobians directly obtained from ANN structure


Computation time per step: ~40ms
for Kalman 昀椀ltering & MPC problem construction LTV-MPC results
ODYS CONFIDEN

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 18/85


Learning affine neural predictors for MPC
(Masti, Smarra, D'Innocenzo, Bemporad, 2020)

• Alternative: learn the entire prediction

yk = hk (x0 , u0 , . . . , uk−1 ), k = 1, . . . , N

• LTV-MPC formulation: linearize hk around nominal inputs ūj


k−1
X ∂hk
yk = hk (x0 , ū0 , . . . , ūk−1 ) + (x0 , ū0 , . . . , ūk−1 )(uj − ūj )
j=0
∂uj

Example: ūk = MPC sequence optimized @k − 1

• Avoid computing Jacobians by 昀椀tting hk in the af昀椀ne form


u0 −ū0
" #
yk = fk (x0 , ū0 , . . . , ūk−1 ) + gk (x0 , ū0 , . . . , ūk−1 ) ..
.
uk−1 −ūk−1
cf. (Liu, Kadirkamanathan, 1998)

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 19/85


Learning affine neural predictors for MPC
• Example: apply af昀椀ne neural predictor to nonlinear Prediction step BFR
two-tank benchmark problem 1 0.959
2 0.958
10000 training samples, ANN with 2 layers of 20 ReLU neurons
4 0.948
7 0.915
∥ŷ − y∥2
{ }
Best 昀椀t rate BFR = max 0, 1 − 10 0.858
∥y − ȳ∥2

• Closed-loop LTV-MPC results: 2.0

1.5

• Model complexity reduction: 1.0

add group-LASSO term with penalty λ 0.5

0.0

λ BFR (average # nonzero −0.5

on all prediction steps) weights −1.0 controlled system


reference to track
.01 0.853 328 −1.5 control action

0.005 0.868 363 0 50 100 150 200 250 300 350

0.001 0.901 556


0.0005 0.911 888
0 0.917 9000

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 20/85


On the use of neural networks for MPC
• Neural prediction models can speed up the MPC design a lot

• Experimental data need to well cover the operating range


(as in linear system identi昀椀cation)

• No need to de昀椀ne linear operating ranges with NN’s,


it is a one-shot model-learning step

• Physical models may better predict unseen situations


than black box models

• Physical modeling can help driving the choice of the


nonlinear model structure to use (gray-box models)

• NN model can be updated online for adaptive nonlinear MPC

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 21/85


Learning neural network models for control
Training feedforward neural networks
• Feedforward neural network model: v1
v2
vL
= A 1 xk + b 1

 v1k x y

v2k = A2 f1 (v1k ) + b2




. .

yk = fy (xk , θ) = . .
. .
θ = (A1 , b1 , . . . , AL , bL )


v = ALy fL−1 (v(L−1)k ) + bL

Lk




ŷk = fL (vLk )

E.g.: xk = current state & input, or xk = (yk−1 , . . . , yk−na , uk−1 , . . . , uk−nb )

• Training problem: given a dataset {x0 , y0 , . . . , xN −1 , yN −1 } solve

N
X −1
min r(¹) + ℓ(yk , f (xk , ¹))
θ
k=0

• It is a nonconvex, unconstrained, nonlinear programming problem that can be


solved by stochastic gradient descent, quasi-Newton methods, ... and EKF !
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 22/85
Training recurrent NN's via EKF
Training feedforward neural networks by EKF
(Singhal, Wu, 1989) (Puskorius, Feldkamp, 1994)

• Key idea: treat parameter vector ¹ of the feedforward neural network as a


constant state
(
¹k+1 = ¹k + ¸k
yk = f (xk , ¹k ) + ·k

and use EKF to estimate ¹k on line from a streaming dataset {xk , yk }

• Ratio Var[¸k ]/ Var[·k ] is related to the learning-rate

• Initial matrix (P0|−1 )−1 is related to quadratic regularization on ¹

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 23/85


Recurrent neural networks v1
v2
vL
• Recurrent Neural Network (RNN) model: x y

xk+1 = fx (xk , uk , ¹x )
yk = fy (xk , ¹y ) vj = Aj fj−1 (vj−1 ) + bj
fx , f y = feedforward neural network
θ = (A1 , b1 , . . . , AL , bL )

(e.g.: general RNNs, LSTMs, RESNETS, physics-informed NNs, …)

• Training problem: given a dataset {u0 , y0 , . . . , uN −1 , yN −1 } solve

N −1
1 X
min r(x0 , ¹x , ¹y ) + ℓ(yk , fy (xk , ¹y ))
θx , θ y N
k=0
x0 , x1 , . . . , xN −1
s.t. xk+1 = fx (xk , uk , ¹x )

• Main issue: xk are hidden states, i.e., are unknowns of the problem

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 24/85


Training RNNs via Extended Kalman Filtering
Training RNNs by EKF
(Puskorius, Feldkamp, 1994) (Wang, Huang, 2011) (Bemporad, 2023)

• Estimate both hidden states xk and parameters ¹x , ¹y by EKF based on model




 " xk+1# = fx (xk #
" , uk , ¹xk ) + ξk Ratio Var[¸k ]/ Var[·k ] related to
learning-rate of training algorithm

 ¹
x(k+1) ¹xk
= + ¸k
 ¹y(k+1) ¹yk
Inverse of initial matrix P0 related to


yk = fy (xk , ¹yk ) + ·k

ℓ2 -penalty on ¹x , ¹y

• RNN and its hidden state xk can be estimated on line from a streaming dataset
{uk , yk }, and/or of昀氀ine by processing multiple epochs of a given dataset

• Can handle general smooth strongly convex loss fncs/regularization terms


h i
θx
• Can add ℓ1 -penalty λ θy to sparsify ¹x , ¹y by changing EKF update into
1
 x̂(k|k)
  x̂(k|k−1)
  
0
θx (k|k) = θx (k|k−1) +M (k)e(k)−λP (k|k − 1) sign(θx (k|k−1))
θy (k|k) θy (k|k−1) sign(θy (k|k−1))

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 25/85


Training RNNs by EKF - Examples
• Dataset: magneto-rheological 昀氀uid damper
3499 I/O data (Wang, Sano, Chen, Huang, 2009)

• N =2000 data used for training, 1499 for testing the model

• Same data used in NNARX modeling demo of SYS-ID Toolbox for MATLAB

10 4

• RNN model: 4 hidden states, shallow Adam


EKF

MSE loss
3

state-update and output functions


10

6 neurons, atan activation, I/O feedthrough 10 2

10 1
0 5 10 15 20

• Compare with gradient descent (Adam) 4


training time [s])
10
Adam

MATLAB+CasADi implementation (Macbook Pro 14'' M1 Max)


EKF

MSE loss
10 3

2
10

1
10
0 100 200 300 400 500

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. epoch 26/85
Training RNNs by EKF - Examples
• Compare BFR1 wrt NNARX model (SYS-ID TBX):
Test data: open-loop simulation (on a model instance)
80

60

EKF = 92.82, Adam = 89.12, NNARX(6,2) = 88.18 (training) 40

EKF = 89.78, Adam = 85.51, NNARX(6,2) = 85.15 (test)


20
EKF: 90.67%
Narx_6_2: 85.15%
0 measured

-20

-40

h i -60
θx
• Repeat training with ℓ1 -penalty τ θy -80
1 0 500 1000 1500
samples

95 100

percentage of zeros
90 80
BFR (%)

85 60
BFR (test data)
80 BFR (training data) 40
percentage of zeros in 3x ,3y
75 20

70 0
-6 -5 -4 -3
10 10 10 10
`1 -regularization parameter =

1 Best 昀椀t rate BFR=100(1 ∥Y −Ŷ ∥2


− ∥Y −ȳ∥2
), averaged over 20 runs from different initial weights

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 27/85


Training LSTMs by EKF - Examples
• Use EKF to train Long Short-Term Memory (LSTM) model
(Hochreiter, Schmidhuber, 1997) (Bonassi et al., 2020)

xa (k + 1) = σG (WF u(k) + Uf xb (k) + bf ) ⊙ xa (k)


+σG (WI u(k) + UI xb (k) + bI ) ⊙ σC (WC u(k) + UC xb (k) + bC )
xb (k + 1) = σG (WO u(k) + UO xb (k) + bO ) ⊙ σC (xa (k + 1))
y(k) = fy (xb (k), u(k), ¹y )
1
σG (α) = 1+e−α , σC (α) = tanh(α)

• Training results (mean and std over 20 runs):


BFR Adam EKF
RNN training 89.12 (1.83) 92.82 (0.33)
nθ = 107 test 85.51 (2.89) 89.78 (0.58)
LSTM training 89.60 (1.34) 92.63 (0.43)
nθ = 139 test 85.56 (2.68) 88.97 (1.31)

• EKF training applicable to arbitrary classes of black/gray box recurrent models!

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 28/85


Training RNNs by EKF - Examples
• Dataset: 2000 I/O data of linear system with binary outputs
h .8 .2 −.1
i h −1 i
x(k + 1) = 0 .9 .1 x(k) + .5 u(k) + ξ(k) Var[ξi (k)] = σ 2
.1 −.1 .7 1
(
1 if [ −2 1.5 0.5 ] x(k) − 2 + ·(k) ≥ 0
y(k) = Var[·(k)] = σ 2
0 otherwise
• N =1000 data used for training, 1000 for testing the model
EKF accuracy [%]
• Train linear state-space model with 3 states σ test training
and sigmoidal output function 0.000 98.02 97.91
0.001 95.33 98.66
−Ay y
f1y (y)
′ ′
= 1/(1 + e 1 [x (k) u(k)] −b1 ) 0.010 97.99 98.52
0.100 94.56 95.44
0.200 93.71 92.22
• Training loss: (modi昀椀ed) cross-entropy loss
ny
X
ℓCEϵ (y(k), ŷ) = −yi (k) log(ϵ + ŷi ) − (1 − yi (k)) log(1 + ϵ − ŷi )
i=1

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 29/85


Training RNNs via Sequential Least Squares
Training RNNs by Sequential Least-Squares
(Bemporad, 2021 - http://arxiv.org/abs/2112.15348)

• RNN training problem = optimal control problem:


N
X −1
minθx ,θy ,x0 ,x1 ,...,xN −1 r(x0 , ¹x , ¹y ) + ℓ(yk , ŷk )
k=0
s.t. xk+1 = fx (xk , uk , ¹x )
ŷk = fy (xk , uk , ¹y )

– ¹x , ¹y , x0 = manipulated variables, ŷk = output, yk = reference, uk = meas. dist.


– r(x0 , ¹x , ¹y ) = input penalty, ℓ(yk , ŷk ) = output penalty
– N = prediction horizon, control horizon = 1

• Linearized model: given a current guess ¹xh , ¹yh , xh0 , . . . , xhN −1 , approximate

∆xk+1 = (∇x fx )′ ∆xk + (∇θx fx )′ ∆¹x


∆yk = (∇xk fy )′ ∆xk + (∇θy fy )′ ∆¹y

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 30/85


Training RNNs by Sequential Least-Squares
• Linearized dynamic response: ∆xk = Mkx ∆x0 + Mkθx ∆¹x

M0x = I, M0θx = 0
M(k+1)x = ∇x fx (xhk , uk , ¹xh )Mkx
M(k+1)θx = ∇x fx (xhk , uk , ¹xh )Mkθx + ∇θx fx (xhk , uk , ¹xh )

• Take 2nd -order expansion of the loss ℓ and regularization term r

• Solve least-squares problem to get increments ∆x0 , ∆¹x , ∆¹y

• Update xh+1
0 , ¹xh+1 , ¹yh+1 by applying either a

– line-search (LS) method based on Armijo rule


– or a trust-region method (Levenberg-Marquardt) (LM)

• The resulting training method is a Generalized Gauss-Newton method


very good convergence properties (Messerer, Baumgärtner, Diehl, 2021)

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 31/85


Training RNNs by Sequential LS and ADMM
(Bemporad, 2021 - http://arxiv.org/abs/2112.15348)

• Fluid-damper example: (4 states, shallow NNs w/ 4 neurons, I/O feedthrough)


NAILS AMSGrad
NAILM
EKF
10 3 AMSGrad

MSE loss on training data,


mean value and range over 20
MSE

10 2

runs from di昀昀erent random


initial weights
10 1

0 1 2 3 4 5 6 7 8 9 10 20 40 60
training time (s) training time (s)

BFR training test


NAILS = GNN method with line search NAILS 94.41 (0.27) 89.35 (2.63)
NAILM 94.07 (0.38) 89.64 (2.30)
NAILM = GNN method with LM steps
EKF 91.41 (0.70) 87.17 (3.06)
AMSGrad 84.69 (0.15) 80.56 (0.18)
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 32/85
Training RNNs by Sequential LS and ADMM
(Bemporad, 2021 - http://arxiv.org/abs/2112.15348)

• We also want to handle non-smooth (and non-convex) regularization terms

PN −1
minθx ,θy ,x0 r(x0 , ¹x , ¹y ) + k=0 ℓ(yk , fy (xk , ¹y )) + g(¹x , ¹y )
s.t. xk+1 = fx (xk , uk , ¹x )

• Idea: use alternating direction method of multipliers (ADMM) by splitting


PN −1
minθx ,θy ,x0 ,νx ,νy r(x0 , ¹x , ¹y ) + k=0 ℓ(yk , fy (xk , ¹y )) + g(νx , νy )
s.t. xk+1 =hfx (x i k , uk , ¹x )
[ ννxy ] = θx
θy

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 33/85


Training RNNs by Sequential LS and ADMM
(Bemporad, 2021 - http://arxiv.org/abs/2112.15348)

• ADMM + Seq. LS = NAILS algorithm (Nonconvex ADMM Iterations and Sequential LS)
xt+1
" #
0 t t 2
(sequential) LS
h i
ρ θx −νx +wx
t+1
θx = arg minx0 ,θx ,θy V (x0 , ¹x , ¹y ) + 2 t
θy −νy t
+wy
t+1
θy 2
 
t+1
proximal step
νx
t+1
νy
= prox 1 g (¹xt+1 + wxt , ¹yt+1 + wyt )
ρ
   
t+1 h t+1 t+1
update dual vars
wx wx +θx −νx
t+1
wy
= h
wy t+1
+θy t+1
−νy

• Fluid-damper example: Lasso regularization g(νx , νy ) = τx ∥νx ∥1 + τy ∥νy ∥1


100 100

percentage of zeros
80 80
τx = τ y = τ
BFR (%)

BFR (test data)


60 60
BFR (training data)
40 percentage of zeros in 3 x ,3 y 40

20 20 (mean results over 20 runs


0
10
-3
10
-2
10
-1
10
0
10
1
0
from di昀昀erent initial weights)
`1 -regularization parameter =

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 34/85


Training RNNs by Sequential LS and ADMM
(Bemporad, 2021 - http://arxiv.org/abs/2112.15348)

• Fluid-damper example: Lasso regularization g(νx , νy ) = 0.2∥νx ∥1 + 0.2∥νy ∥1


training BFR BFR sparsity CPU #
algorithm training test % time epochs
NAILS 91.00 (1.66) 87.71 (2.67) 65.1 (6.5) 11.4 s 250
≈ same 昀椀t than
NAILM 91.32 (1.19) 87.80 (1.86) 64.1 (7.4) 11.7 s 250 SGD/EKF but sparser
EKF 89.27 (1.48) 86.67 (2.71) 47.9 (9.1) 13.2 s 50
AMSGrad 91.04 (0.47) 88.32 (0.80) 16.8 (7.1) 64.0 s 2000
models and faster
Adam 90.47 (0.34) 87.79 (0.44) 8.3 (3.5) 63.9 s 2000 (CPU: Apple M1 Pro)
Di昀昀Grad 90.05 (0.64) 87.34 (1.14) 7.4 (4.5) 63.9 s 2000

g Pn g
• Fluid-damper example: group-Lasso regularization g(νi ) = τg i=1
x
∥νi ∥2
to zero entire rows and columns and reduce state-dimension automatically
100 8

-nal model order


90 6
good choice: nx = 3
BFR (%)

80
BFR (test data)
4
(best 昀椀t on test data)
70
BFR (training data) 2
final model order
60 0
10 -4 10 -3 10 -2 10 -1 10 0 10 1
group-lasso regularization parameter =g

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 35/85


Training RNNs by Sequential LS and ADMM
(Bemporad, 2021 - http://arxiv.org/abs/2112.15348)

• Fluid-damper example: quantization of ¹x , ¹y for simplifying model arithmetic


+leaky-ReLU activation function
(
0 if ¹i ∈ Q
g(¹i ) = Q = multiples of 0.1 between -0.5 and 0.5
+∞ otherwise

– BFR = 84.36 (training), 78.43 (test) ← NAILS w/ quantization


– BFR = 17.64 (training), 12.79 (test) ← no ADMM, just quantize after training
– Training time: ≈ 12 s (w/ quantization), 7 s (no ADMM)

• Note: no convergence to a global minimum is guaranteed

• NAILS/LM = 昀氀exible & ef昀椀cient algorithm for training control-oriented RNNs

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 36/85


Training RNNs - Silverbox benchmark
(Wigren, Schoukens, 2013)

• Silverbox benchmark (Duf昀椀n oscillator): 10 traces of ≈8600 data used for


training, 40000 for testing
output [V]
0.2

-0.2
test data training data
0 2 4 6 8 10 12
4
#10
input [V]
0.1

0
(Schoukens, Ljung, 2019)
-0.1 test data training data
0 2 4 6 8 10 12
sample #10
4

Data download: http://www.nonlinearbenchmark.org

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 37/85


Training RNNs - Silverbox benchmark
(Bemporad, 2021 - http://arxiv.org/abs/2112.15348)

• RNN model: 8 states, 3 layers of 8 neurons, atan activation, no I/O feedthrough

• Initial-state: encode x0 as the output of a NN with atan activation, 2 layers of 4


neurons, receiving 8 past inputs and 8 past outputs
 y−1 
M N −1
 ... 
X X
minθx0 ,θx ,θy r(¹x0 , ¹x , ¹y ) + ℓ(ykj , ŷkj )
 y−8 
j=1 k=0 v=
 u−1 

s.t. xjk+1 = fx (xjk , ujk , ¹x ), ŷkj = fy (xjk , ujk , ¹y )  . 
..
xj0 = fx0 (v j , ¹x0 ) u−8

0.01 2 0.1
• ℓ2 -regularization: r(¹x0 , ¹x , ¹y ) = 2 (∥¹x ∥2 + ∥¹y ∥22 ) + 2
2 ∥¹x0 ∥2

• Total number of parameters nθx + nθy + nθx0 =296+225+128=649

• Training: use NAILM over 150 epochs (1 epoch = 77505 training samples)

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 38/85


Training RNNs - Silverbox benchmark
(Bemporad, 2021 - http://arxiv.org/abs/2112.15348)

• Identi昀椀cation results on test data 2 :

identi昀椀cation method RMSE [mV] BFR [%]


ARX (ml) [1] 16.29 [4.40] 69.22 [73.79] [1] Ljung, Zhang, Lindskog, Juditski, 2004
NLARX (ms) [1] 8.42 [4.20] 83.67 [92.06] [2] Ljung, Andersson, Tiels, Schön, 2020
NLARX (mlc) [1] 1.75 [1.70] 96.67 [96.79]
NLARX (ms8c50) [1] 1.05 [0.30] 98.01 [99.43] [3] Beintema, Toth, Schoukens, 2021
Recurrent LSTM model [2] 2.20 95.83
SS encoder [3] (nx = 4) [1.40] [97.35]
NAILM 0.35 99.33

• NAILM training time ≈ 400 s (MATLAB+CasADi on Apple M1 Max CPU)


20 ml
15
10 ms
• Repeat training with ℓ1 -regularization: RMSE (mv)
3 NAILM LSTM
2 mlc
1 ms8c50
0.5
5 10 20 40 80 160 320 640 1280
number of model parameters

2 Trained RNN: http://cse.lab.imtlucca.it/~bemporad/shared/silverbox/rnn888.zip

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 39/85


Training RNNs
• Computation time (Intel Core i9-10885H CPU @2.40GHz):

EKF /time step seq. LS /epoch


language autodi昀昀 CPU time CPU time
Python 3.8.1 PyTorch ≈ 30 ms (N/A)
Python 3.8.1 JAX ≈ 9 ms ≈ 1.0 s
Julia 1.7.1 Flux.jl ≈ 2 ms ≈ 0.8 s
MATLAB R2021a CasADi ≈ 0.5 ms ≈ 0.1 s

• Several sparsity patterns can be exploited in EKF updates


(supported by ODYS EKF and ODYS Deep Learning libraries)

• Note: Extension to gray-box identi昀椀cation + state-estimation is immediate

• Note: RNN training by EKF can be used to generalize output disturbance


models for offset-free set-point tracking to nonlinear I/O disturbance models

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 40/85


Deep Nonlinear MPC for Autonomous Driving
• Goal: track desired longitudinal speed (vy ), lateral yb xb
displacement (ey ) and orientation (∆Ψ) ±
¢Ã

• Inputs: wheel torque Tw and steering angle δ t


s
• Constraints: on ey and lateral displacement s (for
obstacle avoidance) and manipulated inputs Tw , δ

• Sampling time: 100 ms

• Model: gray-box bicycle model

- kinematics is simple to model (white box)

- tire forces harder to model + stiff wheel slip ratio


dynamics (kf , kr ) ⇒ small integration step required

- learn a black-box neural-network model !


(Boni, Capelli, Frascati @ODYS, 2021)
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 41/85
Deep Nonlinear MPC for Autonomous Driving
• ODYS Deep Learning Toolset used to learn a neural-network with input
(vx , vy , ω, kf , kr , Tw , δ) @k and output (vx , vy , ω, kf , kr ) @k + 1

• Data generated from high-昀椀delity simulation model with noisy measurements,


sampled @10Hz

• Neural network model: 2 hidden layers, 55 neurons each

vehicle body states


• Advantages of black-box (neural network) model:
vx [m/s]

– No physical model required describing


tire-road interaction
vy [m/s]

– directly learn the model in discrete-time


(Ts = 100 ms)
ω [rad/s]

time [s]
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 42/85
Deep Nonlinear MPC for Autonomous Driving
• Model validation on test data:
one-step ahead prediction on test data open-loop predictions

sample time [s]

• C-code (network+Jacobians) automatically generated for ODYS MPC

Tensorflow
Keras automatic Embedded MPC
PyTorch C-code gen
scikit-learn

ODYS-NN training

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 43/85


Deep Nonlinear MPC for Autonomous Driving
• Closed-loop MPC: overtake vehicle #1, keep safety distance from vehicle #2
± [deg] Tw [Nm] tot #QP iterations SQP iterations

¢Ã [deg] ey [m] vx [m/s]

time [s] time [s] time [s]

• Good reference tracking, constraints on ey , vx satis昀椀ed,


smooth command action


"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 44/85
Direct Data-driven MPC
Direct data-driven MPC

optimization
prediction model algorithm

model-based optimizer
process

set-points inputs outputs

r(t) u(t) y(t)

(aecdiagnostics.com)

measurements

• Can we design an MPC controller without 昀椀rst identifying a model of the


open-loop process ?

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 45/85


Data-driven direct controller synthesis
(Campi, Lecchini, Savaresi, 2002) (Formentin et al., 2015)

pp

d
d
rr ee uu yo yy rrvv yy

Kp G M

M
M

• Collect a set of data {u(t), y(t), p(t)}, t = 1, . . . , N

• Specify a desired closed-loop linear model M from r to y

• Compute rv (t) = M# y(t) from pseudo-inverse model M# of M

• Identify linear (LPV) model Kp from ev = rv − y (virtual tracking error) to u

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 46/85


Direct data-driven MPC
• Design a linear MPC (reference governor) to generate the reference r
(Bemporad, Mosca, 1994) (Gilbert, Kolmanovsky, Tan, 1994)
p
d
r e u y
r0
desired
reference M Linear prediction model
(totally known !)

r
r0 y

u
M’

• MPC designed to handle input/output constraints and improve performance


(Piga, Formentin, Bemporad, 2017)

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 47/85


Direct data-driven MPC - An example
• Experimental results: MPC handles soft constraints on u, ∆u and y
(motor equipment by courtesy of TU Delft)

θ u
4.5 5
r
with MPC

u [V]
4 without MPC 0

-5
3.5 5 10 15 20 25 30
θ [rad]

∆u
3 0.5

∆ u [V]
0
2.5

-0.5
2 5 10 15 20 25 30
5 10 15 20 25 30 Time [s]

constraints on input
Time [s]

desired tracking
performance achieved increments satisfied

No open-loop process model is identi昀椀ed to design the MPC controller!

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 48/85


Optimal direct data-driven MPC

• Question: How to choose the reference model M ?


pp

d
r e uu yo y
rro MPC −
Kp G

M
? p

• Can we choose M from data so that Kp is an optimal controller ?

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 49/85


Optimal direct data-driven MPC
(Selvi, Piga, Bemporad, 2018)

• Idea: parameterize desired closed-loop model M(¹) and optimize


N −1
1 X
min J(θ) = Wy (r(t) − yp (θ, t))2 + W∆u ∆u2p (θ, t) + Wfit (u(t) − uv (θ, t))2
θ N t=0 | {z } | {z }
performance index identification error

• Evaluating J(¹) requires synthesizing Kp (¹) from data and simulating the
nominal model and control law
yp (θ, t) = M(θ)r(t) up (θ, t) = Kp (θ)(r(t) − yp (θ, t))
∆up (θ, t) = up (θ, t) − up (θ, t − 1)

• Optimal ¹ obtained by solving a (non-convex) nonlinear programming problem

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 50/85


Optimal direct data-driven MPC
(Selvi, Piga, Bemporad, 2018)

• Results: linear process 8

z − 0.4 4

G(z) = 2
z + 0.15z − 0.325 2

Data-driven controller only 1.3% worse than -2

-4

model-based LQR (=SYS-ID on same data + -6

LQR design) -8
2.4 2.6 2.8 3 3.2 3.4 3.6

• Results: nonlinear (Wiener) process


2.5

1.5

yL (t) = G(z)u(t) 1

0.5

y(t) = |yL (t)| arctan(yL (t)) 0

-0.5

-1

The data-driven controller is 24% better than -1.5

LQR based on identi昀椀ed open-loop model ! -2


0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 51/85


Data-driven optimal policy search
Data-driven optimal policy search
(Ferrarotti, Bemporad, 2019)

• Plant + environment dynamics (unknown):

– st states of plant & environment


st+1 = h(st , pt , ut , dt )
– pt exogenous signal (e.g., reference)
– ut control input
– dt unmeasured disturbances

• Control policy: π : Rns +np −→ Rnu deterministic control policy

ut = π(st , pt )

• Closed-loop performance of an execution is de昀椀ned as



X

J∞ (π, s0 , {pℓ , dℓ }ℓ=0 ) = ρ(sℓ , pℓ , π(sℓ , pℓ ))
ℓ=0

ρ(sℓ , pℓ , π(sℓ , pℓ )) = stage cost

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 52/85


Optimal Policy Search Problem
• Optimal policy:
π∗ = arg minπ J (π)
J (π) = Es0 ,{pℓ ,dℓ } [J∞ (π, s0 , {pℓ , dℓ })] expected performance

• Simpli昀椀cations:

– Finite parameterization: π = πK (st , pt ) with K = parameters to optimize


L−1
X
L−1
– Finite horizon: JL (π, s0 , {pℓ , dℓ }ℓ=0 ) = ρ(sℓ , pℓ , π(sℓ , pℓ ))
ℓ=0

• Optimal policy search: use stochastic gradient descent (SGD)

Kt ← Kt−1 − αt D(Kt−1 )

with D(Kt−1 ) = descent direction

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 53/85


Descent Direction
• The descent direction D(Kt−1 ) is computed by generating:
(i)
– Ns perturbations s0 around the current state st
(j)
– Nr random reference signals rℓ of length L,
(h)
– Nd random disturbance signals dℓ of length L,

Np Nq
Ns X
(i) (j) (k)
X X
D(Kt−1 ) = ∇K JL (πKt−1 , s0 , {rℓ , dℓ }) st

i=1 j=1 k=1

SGD step = mini-batch of size M = Ns · Nr · Nd

• Computing ∇K JL requires predicting the effect of π over L future steps

• We use a local linear model just for computing ∇K JL , obtained by running


recursive linear system identi昀椀cation

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 54/85


Optimal Policy Search Algorithm
• At each step t:

1. Acquire current st

2. Recursively update the local linear model

3. Estimate the direction of descent D(Kt−1 )

4. Update policy: Kt ← Kt−1 − αt D(Kt−1 )

• If policy is learned online and needs to be applied to the process:


– Compute the nearest policy Kt⋆ to Kt that stabilizes the local model

Kt⋆ = argmin∥K − Kts ∥22


K
s.t. K stabilizes local linear model linear matrix inequality

• When policy is learned online, exploration is guaranteed by the reference rt

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 55/85


Special Case: Output Tracking
• xt = [ yt , yt−1 , . . . , yt−no , ut−1 , ut−2 , . . . , ut−ni ]
∆ut = ut − ut−1 control input increment

• Stage cost: ∥ yt+1 − rt ∥2Qy + ∥ ∆ut ∥2R + ∥ qt+1 ∥2Qq

• Integral action dynamics qt+1 = qt + (yt+1 − rt )


" #
xt
st = , p t = rt .
qt

• Linear policy parametrization:


" #
Ks
πK (st , rt ) = −K s · st − K r · rt , K=
Kr

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 56/85


Example: retrieve LQR from data
 h −0.669 0.378 0.233
i h −0.295 i
 xt+1 = −0.288 −0.147 −0.638 xt + −0.325 ut
−0.337 0.589 0.043 −0.258
model is unknown
yt = [ −1.139 0.319 −0.571 ] xt

Online tracking performance (no disturbance, dt = 0):


4
Qy = 1
2 R = 0.1
Qq = 1

0
ni no L
3 3 20
−2
rt N0 Nr Nq
yt 50 1 10
−4
0 10000 20000 30000
Time t

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 57/85


Example: retrieve LQR from data
Evolution of the error ∥Kt − Kopt ∥2 :

4 ||Kt − Kopt ||2

0
0 10000 20000 30000
Time t

KSGD = [−1.255, 0.218, 0.652, 0.895, 0.050, 1.115, −2.186]

Kopt = [−1.257, 0.219, 0.653, 0.898, 0.050, 1.141, −2.196]

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 58/85


Nonlinear Example

model is unknown

Feed:
- concentration: 10kg mol/m3
- temperature: 298.15K

Continuously Stirred Tank Reactor (CSTR)


apmonitor.com

T = T̂ + ηT , CA = CˆA + ηC , ηT , ηC ∼ N (0, σ 2 ), σ = 0.01

" # " #
1 0 0.01 0
Qy = R = 0.1 Qq =
0 0 0 0

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 59/85


Nonlinear Example
ni no L
Online learning 2 3 10

concentration CA and reference rt N0 Nr Nq


50 20 20
9
yt

8
Validation phase
7 rt
yt
Cost of KSGD = 4.3 · 103 Continuously Stirred Tank Reactor (CSTR)
temperature T
10
330 (courtesy: apmonitor.com)
8
320
6

yt
x2t

310
4
KSGD
300 2 rt
290
coolant temperature TC
Cost of KID = 2.4 · 104
320
10
SGD beats SYS-ID + LQR
300 KID
8 rt
ut

280 6
yt

4
260
2
0 5000 10000
Time t 0 10000 20000
Time t

• Extended to switching-linear and nonlinear policy, and to collaborative


learning
(Ferrarotti, Bemporad, 2020a) (Ferrarotti, Bemporad, 2020b) (Ferrarotti, Breschi, Bemporad, 2021)
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 60/85
Learning optimal MPC calibration
MPC calibration problem
• The design depends on a vector x of MPC parameters

• Parameters can be many things:


– MPC weights, prediction model coef昀椀cients, horizons x1
x
– Covariance matrices used in Kalman 昀椀lters x2 4
– Tolerances used in numerical solvers x3
– …
• De昀椀ne a performance index f over a closed-loop simulation or real experiment.
For example:

T
X
f (x) = ∥y(t) − r(t)∥2
t=0
(tracking quality)

• Auto-tuning = 昀椀nd the best combination of parameters by solving


the global optimization problem
min f (x)
x
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 61/85
Global optimization algorithms for auto-tuning

What is a good optimization algorithm to solve min f (x) ?

• The algorithm should not require the gradient ∇f (x) of f (x), in particular if
experiments are involved (derivative-free or black-box optimization )

• The algorithm should not get stuck on local minima (global optimization)

• The algorithm should make the fewest evaluations of the cost function f
(which is expensive to evaluate)

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 62/85


Auto-tuning - Global optimization algorithms
• Several derivative-free global optimization algorithms exist: (Rios, Sahidinis, 2013)

– Lipschitzian-based partitioning techniques:


• DIRECT (DIvide in RECTangles) (Jones, 2001)
• Multilevel Coordinate Search (MCS) (Huyer, Neumaier, 1999)

– Response surface methods


• Kriging (Matheron, 1967), DACE (Sacks et al., 1989)
• Ef昀椀cient global optimization (EGO) (Jones, Schonlau, Welch, 1998)
• Bayesian optimization (Brochu, Cora, De Freitas, 2010)

– Genetic algorithms (GA) (Holland, 1975)

– Particle swarm optimization (PSO) (Kennedy, 2010)


– ...
• New method: radial basis function surrogates + inverse distance weighting
(GLIS) (Bemporad, 2020) cse.lab.imtlucca.it/~bemporad/glis

pip install glis

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 63/85


Auto-tuning - GLIS
2.5

• Goal: solve the global optimization problem 2

minx f (x) 1.5

s.t. ℓ ≤ x ≤ u
1

g(x) ≤ 0
0.5

• Step #0: Get random initial samples x1 , . . . , xNinit


(Latin Hypercube Sampling)
0
-3 -2 -1 0 1 2 3

• Step #1: given N samples of f at x1 , . . . , xN , build the surrogate function


ϕ = radial basis function
N
X
fˆ(x) = βi ϕ(ϵ∥x − xi ∥2 ) Example: ϕ(ϵd) = 1
1+(ϵd)2
i=1
(inverse quadratic)

Vector β solves fˆ(xi ) = f (xi ) for all i = 1, . . . , N (=linear system)

• CAVEAT: build and minimize fˆ(xi ) iteratively may easily miss global optimum!

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 64/85


Auto-tuning - GLIS
2.5

• Step #2: construct the IDW exploration function


2

 
2 −1 ∑N 1
z(x) = π ∆F tan 1.5

i=1 wi (x)
or 0 if x ∈ {x1 , . . . , xN } 1

0.5

2
e−∥x−xi ∥ 0

where wi (x) =
-3 -2 -1 0 1 2 3

∥x − xi ∥2
∆F = observed range of f (xi )

• Step #3: optimize the acquisition function

xN +1 = arg min fˆ(x) − δz(x) δ = exploitation vs


s.t. ℓ ≤ x ≤ u, g(x) ≤ 0 exploration tradeoff

to get new sample xN +1

• Iterate the procedure to get new samples xN +2 , . . . , xNmax

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 65/85


GLIS vs Bayesian Optimization
1
10
BO BO problem n BO [s] GLIS [s]
GLIS 0 GLIS
ackley 2 29.39 3.13
5 -1
adjiman 2 3.29 0.68
0 -2
10 20 30 40 50 60 5 10 15
branin 2 9.66 1.17

200
6000 camelsixhumps 2 4.82 0.62
BO BO
GLIS 4000 GLIS
hartman3 3 26.27 3.35
100
2000 hartman6 6 54.37 8.80
0
5 10 15 20 25
0
5 10 15 himmelblau 2 7.40 0.90

0 0
rosenbrock8 8 63.09 13.73
BO
GLIS
BO
GLIS stepfunction2 4 11.72 1.81
-2 -2
styblinski-tang5 5 37.02 6.10

-4 -4
10 20 30 40 50 20 40 60 80

1500
BO
3
10 8
BO
Results computed on 20 runs per test
1000 GLIS 2 GLIS

500 1

0
5 10 15 20
0
20 40 60 80
BO = MATLAB's bayesopt fcn
10 4
4
BO 200 BO
GLIS GLIS
2
0

0 -200
5 10 15 20 25 10 20 30 40 50 60

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 66/85


Auto-tuning: MPC example
• We want to auto-tune the linear MPC controller

50−1
X
min (yk+1 − r(t))2 + (W ∆u (uk − uk−1 ))2
k=0
s.t. xk+1 = Axk + Buk
yc = Cxk
−1.5 ≤ uk ≤ 1.5
uk ≡ uNu , ∀k = Nu , . . . , N − 1 t t+Nu t+N

• Calibration parameters: x = [log10 W ∆u , Nu ]

• Range: −5 ≤ x1 ≤ 3 and 1 ≤ x2 ≤ 50

• Closed-loop performance objective:


T
X 1
f (x) = (y(t) − r(t))2 + (u(t) − u(t − 1))2 + 2Nu
t=0
| {z } 2
| {z
|{z}
} small QP
track well
smooth control action
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 67/85
Auto-tuning: Example
output
1.5
1
output
reference best function value
220
0.5
0 200

-0.5
180
-1
-1.5
0 10 20 30 40 50 60 70 80 90 100 160

input 140
1.5
1
120
0.5
0 100

-0.5
80
-1
-1.5 60
0 10 20 30 40 50 60 70 80 90 100 0 50 100 150

function evaluations

• Result: x⋆ = [−0.2341, 2.3007] W ∆u = 0.5833, Nu = 2


"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 68/85
MPC Autotuning Example
(Forgione, Piga, Bemporad, 2020)

• Linear MPC applied to cart-pole system: 14 parameters to tune


m
– sample time
'
– weights on outputs and input increments
L
M – prediction and control horizons
F
– covariance matrices of Kalman 昀椀lter
– absolute and relative tolerances of QP solver

Z T
• Closed-loop performance score: J = |p(t) − pref (t)| + 30|ϕ(t)|dt
0

• MPC parameters tuned using 500 iterations of GLIS

• Performance tested with simulated cart on two hardware platforms


(PC, Raspberry PI)

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 69/85


MPC Autotuning Example
MPC optimized for desktop PC MPC optimized for Raspberry PI
1.0 1.0
p p
Position (m)

Position (m)
pref pref
0.5 0.5

0.0 0.0

0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
10 φ 10 φ
Angle (deg)

Angle (deg)
0 0

−10 −10
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40

u u
5 5
Force (N)

Force (N)
0 0

−5 −5

0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40

optimal sample time = 6 ms optimal sample time = 22 ms

• MPC parameters tuned by GLIS global optimizer (500 fcn evals)

• Auto-calibration can squeeze max performance out of the available hardware

• Bayesian optimization gives similar results, but with larger computation effort

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 70/85


Auto-tuning: Pros and Cons
• Pros:

 Selection of calibration parameters x to test is fully automatic

 Applicable to any calibration parameter (weights, horizons, solver tolerances, ...)

 Rather arbitrary performance index f (x) (tracking performance, response time,


worst-case number of 昀氀ops, ...)

• Cons:

 Need to quantify an objective function f (x)

 No room for qualitative assessments of closed-loop performance

 Often have multiple objectives, not clear how to blend them in a single one

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 71/85


Active preference learning
(Bemporad, Piga, Machine Learning, 2021)

• Objective function f (x) is not available (latent function)

• We can only express a preference between two choices:



 −1
 if x1 “better” than x2 [f (x1 ) < f (x2 )]
π(x1 , x2 ) = 0 if x1 “as good as” x2 [f (x1 ) = f (x2 )]

 1 if x2 “better” than x1 [f (x1 ) > f (x2 )]

• We want to 昀椀nd a global optimum x⋆ (=“better” than any other x)

昀椀nd x⋆ such that π(x⋆ , x) ≤ 0, ∀x ∈ X , ℓ ≤ x ≤ u

• Active preference learning: iteratively propose a new sample to compare

• Key idea: learn a surrogate of the (latent) objective function from preferences

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 72/85


Preference-learning example
(Brochu, de Freitas, Ghosh, 2007)

T arget 1 2

3 4
• Realistic image synthesis of material appearance are based on models with
many parameters x1 , . . . , xn

• De昀椀ning an objective function f (x) is hard, while a human can easily assess
whether an image resembles the target one or not

• Preference gallery tool: at each iteration, the user compares two images
generated with two different parameter instances
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 73/85
Active preference learning algorithm
(Bemporad, Piga, Machine Learning, 2021)
latent function f(x)

surrogate function ^
f(x)

acquisition function a(x) = ^f(x)-z(x)

exploration function z(x)

xN+1

• Fit a surrogate fˆ(x) that respects the preferences expressed by the decision
maker at sampled points (by solving a QP)

• Minimize an acquisition function fˆ(x) − δz(x) to get a new sample xN +1

• Compare xN +1 to the current “best” point and iterate


"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 74/85
Semi-automatic calibration by preference-based learning
• Use preference-based optimization (GLISp) algorithm for semi-automatic
tuning of MPC (Zhu, Bemporad, Piga, 2021)

• Latent function = calibrator’s (unconscious) score


of closed-loop MPC performance
testing &
• GLISp proposes a new combination xN +1 of MPC assessment

parameters to test

• By observing test results, the calibrator expresses a


preference
control
preference, telling if xN +1 is “better”, “similar”, or parameters
preference-
“worse” than current best combination based learning
algorithm
• Preference learning algorithm: update the
surrogate fˆ(x) of the latent function, optimize the
acquisition function, ask preference, and iterate

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 75/85


Preference-based tuning: MPC example
output
• Semi-automatic tuning of
1.5
output
1 reference

x = [log10 W ∆u , Nu ] in linear MPC 0.5


0
-0.5

50−1
-1
min (yk+1 − r(t))2 + (W ∆u (uk − uk−1 ))2 -1.5
0 10 20 30 40 50 60 70 80 90 100

k=0
input
s.t. xk+1 = Axk + Buk 2

1
yc = Cxk
0
−1.5 ≤ uk ≤ 1.5
-1
uk ≡ uNu , ∀k = Nu , . . . , N − 1
-2
0 10 20 30 40 50 60 70 80 90 100

• Same performance index to assess closed-loop quality, but unknown:


only preferences are available

• Result: W ∆u = 0.6888, Nu = 2

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 76/85


Preference-based tuning: MPC example

Sampled points during preference learning Best function value


50
170
45
160
40 150
140
35
130
control horizon

30
120
25
110
20
100
15
90
10

5 80

0
10-6 10-4 10-2 100 102 104 0 20 40 60 80 100 120 140
Wdu

tested combinations of MPC params (latent) performance index

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 77/85


Preference-based tuning: MPC example
(Zhu, Bemporad, Piga, 2021)

• Example: calibration of a simple MPC for lane-keeping (2 inputs, 3 outputs)


v ±
y

µ
 ẋ
 = v cos(¹ + δ)
L
ẏ = v sin(¹ + δ)
1

 ¹̇ = L v sin(δ) x

• Multiple control objectives:


“optimal obstacle avoidance”, “pleasant drive”, “CPU time small enough”, …
not easy to quantify in a single function

• 5 MPC parameters to tune:

– sampling time
– prediction and control horizons
– weights on input increments ∆v , ∆δ

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 78/85


Preference-based tuning: MPC example
• Preference query window:

Ts = 0.332 s, Nu = 16, Np = 17, log(qu11) = 0.06, Ts = 0.243 s, Nu = 12, Np = 17, log(qu11) = 0.19,
log(qu22) = 2.02,tcomp: 0.0867 s log(qu22) = 0.70, tcomp: 0.0846 s
vehicle vehicle
obstacle obstacle
6 vehicle OA 6 vehicle OA
yf [m]

yf [m]
obstacle OA obstacle OA
3 3

0 0
0 50 100 150 200 250 0 50 100 150 200 250
80 Input
80 Input
v [km/hr]

v [km/hr]
Reference Reference
70 70
60 60
50 50
40 40
0 50 100 150 200 250 0 50 100 150 200 250

50 50
25 25
[°]

[°]

0 0
s

-25 -25
-50 -50
0 50 100 150 200 250 0 50 100 150 200 250
xf [m] xf [m]

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 79/85


Preference-based tuning: MPC example
• Convergence after 50 GLISp iterations (=49 queries):

4
3
vehicle
obstacle
Optimal MPC parameters:
yf [m]

2 vehicle OA
1 obstacle OA
0
-1
0 50 100 150 200 250
– sample time = 85 ms (CPU time = 80.8 ms)
75
70
Input – prediction horizon = 16
v [km/hr]

Reference
65
60
55
– control horizon = 5
– weight on ∆v = 1.82
50
0 50 100 150 200 250
20
10
– weight on ∆δ = 8.28
[°]

0
s

-10
-20
0 50 100 150 200 250
xf [m]

• Note: no need to de昀椀ne a closed-loop performance index explicitly!

• Extended to handle also unknown constraints (Zhu, Piga, Bemporad, 2021)

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 80/85


Corner-case detection
Corner-case detection problem
(Zhu, Bemporad, Kneissl, Esen, 2022)

• Goal: detect undesired simulation scenarios (=corner-cases)

• Let x = parameters de昀椀ning the scenario, XODD = operational design domain


x ∈ XODD ⊆ Rn

• critical scenario = vector x∗ for which the closed-loop behavior is critical

• Example:
– x = (initial distance between ego car and obstacle, obstacle acceleration, …)
– Critical scenario: time-to-collision is too short, excessive jerk of ego car, …

• Key idea: use global optimizer GLIS to generate critical corner-cases

x∗ ∈ arg min f (x) f (x) = criticality of closed-loop simulation (or


experiment) determined by scenario x
x∈XODD

s.t. ℓ ≤ x ≤ u (the smaller f (x), the more critical x is)

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 81/85


Corner-case detection: Case study
• Problem: 昀椀nd critical scenarios in automated driving w/ obstacles

• MPC controller for lane-keeping and obstacle-avoidance based on simple


kinematic bicycle model (Zhu, Piga, Bemporad, 2021)
v �㗯
ẋf =v cos(¹ + δ) wf �㗰
ẇf =v sin(¹ + δ) L

v sin(δ)
¹̇ = xf
L
(xf , wf ) = front-wheel position

• Black-box optimization problem: given k obstacles, solve


wf
X
k
min dSV,i
xf ,critical (x) + dSV ,i
wf ,critical (x)
SV
ℓ≤x≤u
i=1

s.t. other constraints


xf
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 82/85
Corner-case detection: Case study
• Cost function terms to minimize: for each obstacle #i de昀椀ne

min time of collision with #i



min dSV,i (x, t) i
Icollision
t∈Tcollision xf



collision with other #j ̸= #i

dSV,i
xf ,critical (x) = L i
∼ Icollision & Icollision
no collision

 P dSV,i (x, t)

∼ Icollision

 xf
t∈Tsim



t∈Tmin dSV ,i
wf (x, t)
i
Icollision
 collision

dSV,i
wf ,critical (x) = wf,safe i
∼ Icollision & Icollision
 P SV,i
dwf (x, t)



 ∼ Icollision
t∈Tsim

i
Icollision = true if ∃t ∈ Tsim s.t. wf
SV
(dSV,i SV,i
xf (x, t) ≤ L) & (dwf (x, t) ≤ W )

Icollision = true if ∃h s.t. Icollision


h
= true
xf

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 83/85


Corner-case detection: Case study
• Logical scenario 1: GLIS identi昀椀es 64 collision cases within 100 simulations
x
iter
x0f 1 v10 x0f 2 v20 x0f 3 v30
SV SV SVSV
51 15.00 30.00 44.14 10.00 49.10 47.39
2 3

SV
79 28.09 30.00 70.29 10.00 74.79 31.74 SV 1
40 34.30 30.00 60.59 10.00 77.80 35.97
red = optimal solution found by GLIS solver
Ego car changes lane to avoid #1, but
cannot brake fast enough to avoid #2

• Logical scenario 2: GLIS identi昀椀es 9 collision cases within 100 simulations


x
iter
x0f 1 v10 tc
28 12.57 46.94 16.75 1
SV

16 17.53 47.48 23.65


SV 1
88 44.54 41.26 16.02
red = optimal solution found by GLIS solver
Ego car changes lane to avoid #1, but cannot decelerate
in time for the sudden lane-change of #1
"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 84/85
Learning-based MPC: final remarks
• Learning-based MPC is a formidable combination for advanced control:
– MPC / online optimization is an extremely powerful control methodology
– ML extremely useful to get control-oriented models and control laws from data

• Ignoring ML tools would be a mistake (a lot to “learn” from machine learning)

• ML cannot replace control engineering:


– Black-box modeling can be a failure. Better use gray-box models when possible
– Approximating the control law can be a failure. Don’t abandon online optimization
– Pure AI-based reinforcement learning methods can be also a failure

• A wide spectrum of research opportunities


and new practices is open ! past future

predicted outputs

manipulated inputs

"Model Predictive Control" - © 2023 A. Bemporad. All rights reserved. 85/85

You might also like