Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Adaptive Uncertainty Quantification for Scenario-based Control Using Meta-learning of Bayesian Neural Networks

Yajie Bao    Javad Mohammadpour Velni Intelligent Fusion Technology, Inc., Germantown, MD 20874 USA (e-mail: yajie.bao@intfusiontech.com). Clemson University, Clemson, SC 29634 USA (e-mail: javadm@clemson.edu)
Abstract

Scenario-based optimization and control has proven to be an efficient approach to account for system uncertainty. In particular, the performance of scenario-based model predictive control (MPC) schemes depends on the accuracy of uncertainty quantification. However, current learning- and scenario-based MPC (sMPC) approaches employ a single time-invariant probabilistic model (learned offline), which may not accurately describe time-varying uncertainties. Instead, this paper presents a model-agnostic meta-learning (MAML) of Bayesian neural networks (BNN) for adaptive uncertainty quantification that would be subsequently used for adaptive-scenario-tree model predictive control design of nonlinear systems with unknown dynamics to enhance control performance. In particular, the proposed approach learns both a global BNN model and an updating law to refine the BNN model. At each time step, the updating law transforms the global BNN model into more precise local BNN models in real time. The adapted local model is then used to generate scenarios for sMPC design at each time step. A probabilistic safety certificate is incorporated in the scenario generation to ensure that the trajectories of the generated scenarios contain the real trajectory of the system and that all the scenarios adhere to the constraints with a high probability. Experiments using closed-loop simulations of a numerical example demonstrate that the proposed approach can improve the performance of scenario-based MPC compared to using only one BNN model learned offline for all time steps.

keywords:
Uncertainty quantification, learning-based control, scenario-based control, model predictive control, meta-learning, Bayesian neural networks.
thanks: This work was financially supported by the United States National Science Foundation under award CMMI-2302219.

1 Introduction

Uncertainty is ubiquitous in control systems and may lead to system constraint violation and/or performance deterioration of a designed controller. Scenario-based optimization, especially scenario-based model predictive control (MPC), has been developed to account for system uncertainties by representing uncertainties with a scenario tree, and reduce the conservativeness inherent to open-loop robust MPC by introducing recourse into the optimal control problem (Bemporad et al., 2002). However, generating scenarios offline based on worst-case uncertainty descriptions obtained a priori can limit the performance of sMPC. To generate a scenario tree that can accurately represent the evolution of uncertainties depends on an adequate description of uncertainties, which is generally not available in practice. Using data-driven models of uncertain nonlinear systems for MPC design (aka learning-based MPC) has attracted increasing attention (Hewing et al., 2020; Mesbah et al., 2022), as machine learning has proven to be effective for modeling time-varying and/or hard-to-model dynamics. However, there are still challenges for data-driven modeling to provide accurate and generalizable models which are critical for sMPC performance. One particular challenge is how to cope with the discrepancy between the application environment and the data-collecting environment (aka data drift (Ackerman et al., 2020)), which can degrade model accuracy and subsequently control performance. Therefore, adapting data-driven models to new environments in real-time is necessary to ensure a desired control performance (Bao et al., 2020b, 2021).

Gaussian process regression (GPR) (Rasmussen, 2003) is the most widely-used approach for data-driven characterization of model uncertainty in learning-based MPC (Koller et al., 2018; Hewing et al., 2019; Soloperto et al., 2018; Bonzanini et al., 2021). While experiments demonstrated that GPR was able to capture structural model uncertainty, GPR suffers from a cubic complexity to data size, which may restrict the size of data used for efficacy offline training and online evaluation. Moreover, GPR assumes that the model uncertainty can be described by a joint Gaussian distribution, which may be invalid for applications. Alternatively, Bayesian neural networks (BNNs) have been increasingly used to quantify uncertainties (Bao et al., 2021; Bao and Mohammadpour Velni, 2022b) and learning-based MPC (Bao et al., 2023b, a). BNNs treat the weights of deterministic neural network (NN) models as random variables and provide estimates of the posterior distributions conditioned on a dataset. Compared to GPR, BNNs can model both epistemic and aleatoric uncertainties with arbitrary distributions, be trained efficiently using ‘Bayes by Backprop’ (Blundell et al., 2015), and be quickly evaluated without using the training dataset to compute kernel matrices. BNNs can be viewed as an ensemble of deterministic networks (ANN models) combined by the posterior distributions and can provide robust predictions using cheap model averaging (Carbone et al., 2020). However, BNNs suffer from high computational cost and noisy gradient as a result of estimating the evidence lower bound (ELBO) from a single sample of weights (Jospin et al., 2020), which increases the difficulty in adapting BNN models on a small batch of data in real time.

To tackle the challenges of online adaptation of BNNs, we resort to model-agnostic meta-learning (MAML) methods. MAML (Finn et al., 2017) is compatible with BNNs that are trained with gradient descent (GD). Moreover, MAML trains models that are easy to be fine-tuned, and enable fast adaptation of deep networks with good generalization performance. Specifically, MAML explicitly trains a model on various tasks such that using a small amount of training data to update the model parameters for a small number of GD steps produces a good model for a new task. In the case of control design, predicting system outputs/behaviors at each time step is viewed as a different task such that the sudden changes (at any time step) in the environments can be coped with through model adaptation. Moreover, different from Bao et al. (2020b) that requires a batch of closed-loop data for online transfer learning and Bao et al. (2021) that uses adaptive sliding mode control to derive the updating law, the adaptation is performed by a parameterized updating law which takes recent system trajectory and the global model parameters as inputs and outputs the adapted parameters of the precise local model.

For sMPC design using BNN models, a BNN model is initially learned to model state- and input-dependent uncertainties, and the statistics of the BNN predictions are then used to generate adaptive scenarios online. Additionally, the sMPC (Bao et al., 2023b) using BNNs improved the robust control performance with respect to sMPC with a fixed scenario tree and with respect to an adaptive scenario-based MPC using Gaussian process regression, by realizing a less conservative estimation of the model uncertainty. This paper aims to further improve the control performance by adapting the BNN model online. Fig. 1 illustrates the proposed scheme for adaptive sMPC design using model-agnostic meta-learning of BNNs.

Refer to caption
Figure 1: Schematic of adaptive sMPC using MAML of BNNs, where a global BNN model is adapted online using recent trajectories for scenario generation.

Additionally, Finn (2018) proposed a meta-learning approach for adaptive control using reinforcement learning with deterministic models. To quantify uncertainty, Harrison et al. (2018) used meta-learning of Bayesian linear regression in the feature space to model long-term dynamics and employed an LQR-type control scheme with a linearized dynamics model for uncertainty-aware control. Instead, Richards et al. (2021) proposed adaptive control-oriented meta-learning to directly train a parametric adaptive controller using closed-loop simulations, which adapts well to each model of an ensemble constructed from past inputs/outputs data. Different from the aforementioned works, this paper presents a MAML approach for fast online adaptation of BNN-based uncertainty model and then employs the adapted model for sMPC design of uncertain constrained nonlinear systems. The main contribution of this paper lies in presenting MAML of BNN models for time-varying, state- and input-dependent uncertainty quantification.

2 Problem Formulation and Preliminaries

Consider a constrained, discrete-time nonlinear system with state- and input-dependent uncertainty of the form

x(k+1)𝑥𝑘1\displaystyle x(k+1)italic_x ( italic_k + 1 ) =f(x(k),u(k))+g(x(k),u(k)),absent𝑓𝑥𝑘𝑢𝑘𝑔𝑥𝑘𝑢𝑘\displaystyle=f(x(k),u(k))+g(x(k),u(k)),= italic_f ( italic_x ( italic_k ) , italic_u ( italic_k ) ) + italic_g ( italic_x ( italic_k ) , italic_u ( italic_k ) ) , (1a)
x𝑥\displaystyle xitalic_x 𝒳,u𝒰,formulae-sequenceabsent𝒳𝑢𝒰\displaystyle\in\mathcal{X},u\in\mathcal{U},∈ caligraphic_X , italic_u ∈ caligraphic_U , (1b)

where x𝑥xitalic_x is the state, u𝑢uitalic_u is the control input, and k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N is the time instant; f:𝒳×𝒰𝒳:𝑓𝒳𝒰𝒳f:\mathcal{X}\times\mathcal{U}\rightarrow\mathcal{X}italic_f : caligraphic_X × caligraphic_U → caligraphic_X describes a known, Lipschitz continuous model (1a) while g:𝒳×𝒰Ω:𝑔𝒳𝒰Ωg:\mathcal{X}\times\mathcal{U}\rightarrow\Omegaitalic_g : caligraphic_X × caligraphic_U → roman_Ω represents a priori unknown model error term which is assumed to be Lipschitz continuous; 𝒳nx𝒳superscriptsubscript𝑛𝑥\mathcal{X}\subseteq\mathbb{R}^{n_{x}}caligraphic_X ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and 𝒰nu𝒰superscriptsubscript𝑛𝑢\mathcal{U}\subseteq\mathbb{R}^{n_{u}}caligraphic_U ⊆ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT in (1b) are the constraint sets of the states and inputs, respectively. 𝒳𝒳\mathcal{X}caligraphic_X is assumed to be convex. The initial state x(0)=x0𝑥0subscript𝑥0x(0)=x_{0}italic_x ( 0 ) = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

Remark 1

The known model f𝑓fitalic_f can be a first principles-based model or obtained as a data-driven model. One interesting class of models is linear parameter-varying (LPV) models. LPV models use a linear structure to capture time-varying and nonlinear dynamics, which allows the development of computationally efficient control design methods. Moreover, data-driven methods have been increasingly developed for the global identification of state-space LPV models (e.g., see Rizvi et al. (2018); Bao and Mohammadpour Velni (2022a)).

Assuming a dataset 𝒟={(x(i),u(i),x(i+1))|i=1,,N𝒟}𝒟conditional-setsuperscript𝑥𝑖superscript𝑢𝑖superscript𝑥𝑖1𝑖1subscript𝑁𝒟\mathcal{D}=\{(x^{(i)},u^{(i)},x^{(i+1)})|i=1,\cdots,N_{\mathcal{D}}\}caligraphic_D = { ( italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( italic_i + 1 ) end_POSTSUPERSCRIPT ) | italic_i = 1 , ⋯ , italic_N start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT } that covers the entire feasible space 𝒳×𝒰×𝒳𝒳𝒰𝒳\mathcal{X}\times\mathcal{U}\times\mathcal{X}caligraphic_X × caligraphic_U × caligraphic_X has been collected from the real system (1), this paper aims to learn an adaptable BNN model of g𝑔gitalic_g using MAML for sMPC design with safety guarantees.

The closed-loop system model using sMPC can be expressed by

x(k+1)=f(x(k),κ(x(k)))+g(x(k),κ(x(k)))Φκ(x(k))𝑥𝑘1𝑓𝑥𝑘𝜅𝑥𝑘𝑔𝑥𝑘𝜅𝑥𝑘subscriptΦ𝜅𝑥𝑘\begin{split}x(k+1)&=f(x(k),\kappa(x(k)))+g\left(x(k),\kappa(x(k))\right)\\ &\triangleq\Phi_{\kappa}(x(k))\end{split}start_ROW start_CELL italic_x ( italic_k + 1 ) end_CELL start_CELL = italic_f ( italic_x ( italic_k ) , italic_κ ( italic_x ( italic_k ) ) ) + italic_g ( italic_x ( italic_k ) , italic_κ ( italic_x ( italic_k ) ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≜ roman_Φ start_POSTSUBSCRIPT italic_κ end_POSTSUBSCRIPT ( italic_x ( italic_k ) ) end_CELL end_ROW (2)

with κ:𝒳×𝒰:𝜅𝒳𝒰\kappa:\mathcal{X}\times\mathbb{N}\rightarrow\mathcal{U}italic_κ : caligraphic_X × blackboard_N → caligraphic_U denoting the control law. We use 𝐱(k|x0){x(1),,x(k)}𝐱conditional𝑘subscript𝑥0𝑥1𝑥𝑘\mathbf{x}(k|x_{0})\triangleq\{x(1),\cdots,x(k)\}bold_x ( italic_k | italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ≜ { italic_x ( 1 ) , ⋯ , italic_x ( italic_k ) } to denote the solutions to the model (2) given the initial state x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.

Definition 1 (Koller et al. (2018))

Given x0𝒳subscript𝑥0𝒳x_{0}\in\mathcal{X}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_X, the system (1a) is said to be safe under a control law κ𝜅\kappaitalic_κ if

k,Φκ(x(k))𝒳,κ(x(k))𝒰.formulae-sequencefor-all𝑘formulae-sequencesubscriptΦ𝜅𝑥𝑘𝒳𝜅𝑥𝑘𝒰\forall k\in\mathbb{N},\Phi_{\kappa}(x(k))\in\mathcal{X},\kappa(x(k))\in% \mathcal{U}.∀ italic_k ∈ blackboard_N , roman_Φ start_POSTSUBSCRIPT italic_κ end_POSTSUBSCRIPT ( italic_x ( italic_k ) ) ∈ caligraphic_X , italic_κ ( italic_x ( italic_k ) ) ∈ caligraphic_U . (3)

The system (1a) is δ𝛿\deltaitalic_δ-safe under κ𝜅\kappaitalic_κ if

k,Pr[Φκ(x(k))𝒳,κ(x(k))𝒰]δ,formulae-sequencefor-all𝑘PrsubscriptΦ𝜅𝑥𝑘𝒳𝜅𝑥𝑘𝒰𝛿\forall k\in\mathbb{N},\Pr\left[\Phi_{\kappa}(x(k))\in\mathcal{X},\kappa(x(k))% \in\mathcal{U}\right]\geq\delta,∀ italic_k ∈ blackboard_N , roman_Pr [ roman_Φ start_POSTSUBSCRIPT italic_κ end_POSTSUBSCRIPT ( italic_x ( italic_k ) ) ∈ caligraphic_X , italic_κ ( italic_x ( italic_k ) ) ∈ caligraphic_U ] ≥ italic_δ , (4)

where Pr[]Pr\Pr[\cdot]roman_Pr [ ⋅ ] denotes the probability of an event.

Next, we introduce the preliminaries of meta-learning and BNNs for uncertainty quantification.

2.1 Meta-learning

Meta-learning seeks the adaptation of machine learning models to unseen tasks that are vastly different from trained tasks (Peng, 2020). Specifically, given a distribution p(𝒯)𝑝𝒯p(\mathcal{T})italic_p ( caligraphic_T ) of tasks, meta-learning aims to solve minw𝔼𝒯p(w)(𝒟;w)subscript𝑤subscript𝔼similar-to𝒯𝑝𝑤𝒟𝑤\min_{w}\mathbb{E}_{\mathcal{T}\sim p(w)}\mathcal{L}(\mathcal{D};w)roman_min start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT caligraphic_T ∼ italic_p ( italic_w ) end_POSTSUBSCRIPT caligraphic_L ( caligraphic_D ; italic_w ) where 𝒯={𝒟,}𝒯𝒟\mathcal{T}=\{\mathcal{D},\mathcal{L}\}caligraphic_T = { caligraphic_D , caligraphic_L } denotes a task consisting of a dataset 𝒟𝒟\mathcal{D}caligraphic_D and a loss function \mathcal{L}caligraphic_L, and w𝑤witalic_w denotes the meta-knowledge (such as the choice of optimizer and the function class for a task).

Meta-learning consists of an inner level for base learning and an outer level as the meta-learner. At the inner level, a new task with a dataset 𝒟source(i)superscriptsubscript𝒟𝑠𝑜𝑢𝑟𝑐𝑒𝑖\mathcal{D}_{source}^{(i)}caligraphic_D start_POSTSUBSCRIPT italic_s italic_o italic_u italic_r italic_c italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT from a set of M𝑀Mitalic_M source tasks 𝒟source={(𝒟sourcetrain,𝒟sourceval)(i)}i=1Msubscript𝒟𝑠𝑜𝑢𝑟𝑐𝑒superscriptsubscriptsuperscriptsuperscriptsubscript𝒟𝑠𝑜𝑢𝑟𝑐𝑒𝑡𝑟𝑎𝑖𝑛superscriptsubscript𝒟𝑠𝑜𝑢𝑟𝑐𝑒𝑣𝑎𝑙𝑖𝑖1𝑀\mathcal{D}_{source}=\{(\mathcal{D}_{source}^{train},\mathcal{D}_{source}^{val% })^{(i)}\}_{i=1}^{M}caligraphic_D start_POSTSUBSCRIPT italic_s italic_o italic_u italic_r italic_c italic_e end_POSTSUBSCRIPT = { ( caligraphic_D start_POSTSUBSCRIPT italic_s italic_o italic_u italic_r italic_c italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_s italic_o italic_u italic_r italic_c italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v italic_a italic_l end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT is presented, and the agent aims at quickly learning the associated concepts with the task from the training observations, i.e., finding ρ=argmaxρlogp(ρ|w,𝒟sourcetrain)superscript𝜌subscript𝜌log𝑝conditional𝜌superscript𝑤superscriptsubscript𝒟𝑠𝑜𝑢𝑟𝑐𝑒𝑡𝑟𝑎𝑖𝑛\rho^{*}=\arg\max_{\rho}\text{log}~{}p(\rho|w^{*},\mathcal{D}_{source}^{train})italic_ρ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT log italic_p ( italic_ρ | italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_s italic_o italic_u italic_r italic_c italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t italic_r italic_a italic_i italic_n end_POSTSUPERSCRIPT ). This quick adaptation is facilitated by knowledge accumulated across earlier tasks (Huisman et al., 2021) (aka, meta-knowledge). At the outer level, the learner updates the inner level algorithm such that an outer objective function (e.g., generalization performance) is improved for learning meta-knowledge, i.e., w=argmaxwlogp(w|𝒟source)superscript𝑤subscript𝑤log𝑝conditional𝑤subscript𝒟𝑠𝑜𝑢𝑟𝑐𝑒w^{*}=\arg\max_{w}\textup{log}~{}p(w|\mathcal{D}_{source})italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT log italic_p ( italic_w | caligraphic_D start_POSTSUBSCRIPT italic_s italic_o italic_u italic_r italic_c italic_e end_POSTSUBSCRIPT ). In this way, meta-learning can continually perform self-improvement as the number of tasks increases.

To quantify uncertainties, we use BNNs as the base learner for MAML. For fast online adaptation, rather than collecting a batch of closed-loop data for online transfer learning (Bao et al., 2020b), we adopt an unsupervised domain adaptation approach, and the online adaptation will be based on the updating law, which avoids noisy gradients of training BNNs by backpropagation. The details of MAML of BNNs will be provided in Section 4.

2.2 Bayesian Neural Networks

The key component of a BNN is the DenseVariational layer which approximates the posterior density p(W|𝒟)𝑝conditional𝑊𝒟p(W|\mathcal{D})italic_p ( italic_W | caligraphic_D ) of the parameters W𝑊Witalic_W by variational inference (VI) given a prior density p(W)𝑝𝑊p(W)italic_p ( italic_W ) where 𝒟𝒟\mathcal{D}caligraphic_D denotes a dataset. A reparameterization trick is employed to parameterize q(Wj;θj)𝑞subscript𝑊𝑗subscript𝜃𝑗q(W_{j};\theta_{j})italic_q ( italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ; italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) with parameters θjsubscript𝜃𝑗\theta_{j}italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for approximating p(W|𝒟)𝑝conditional𝑊𝒟p(W|\mathcal{D})italic_p ( italic_W | caligraphic_D ), i.e.,

Wj=μWj+σWjϵWjW_{j}=\mu_{W{{}_{j}}}+\sigma_{W_{j}}\odot\epsilon_{W_{j}}italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_W start_FLOATSUBSCRIPT italic_j end_FLOATSUBSCRIPT end_POSTSUBSCRIPT + italic_σ start_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊙ italic_ϵ start_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT (5)

where θj={μWj,σWj}subscript𝜃𝑗subscript𝜇subscript𝑊𝑗subscript𝜎subscript𝑊𝑗\theta_{j}=\{\mu_{W_{j}},\sigma_{W_{j}}\}italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = { italic_μ start_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT } in this case; direct-product\odot denotes element-wise multiplication; ϵWj𝒩(0,I)similar-tosubscriptitalic-ϵsubscript𝑊𝑗𝒩0𝐼\epsilon_{W_{j}}\sim\mathcal{N}(0,I)italic_ϵ start_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_I ). A multi-layer, fully connected BNN is used to model the unknown vector-valued function g𝑔gitalic_g. 𝒟g={x(i)=(x(i),u(i)),g(i)|i=1,,Ng}subscript𝒟𝑔conditional-setsuperscriptx𝑖superscript𝑥𝑖superscript𝑢𝑖superscript𝑔𝑖𝑖1subscript𝑁𝑔\mathcal{D}_{g}=\{\mathrm{x}^{(i)}=(x^{(i)},u^{(i)}),g^{(i)}|i=1,\cdots,N_{g}\}caligraphic_D start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = { roman_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = ( italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) , italic_g start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT | italic_i = 1 , ⋯ , italic_N start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT } for training the BNN is obtained by computing g(i)=x(i+1)f(x(i),u(i))=g(x(i),u(i))superscript𝑔𝑖superscript𝑥𝑖1𝑓superscript𝑥𝑖superscript𝑢𝑖𝑔superscript𝑥𝑖superscript𝑢𝑖g^{(i)}=x^{(i+1)}-f(x^{(i)},u^{(i)})=g(x^{(i)},u^{(i)})italic_g start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = italic_x start_POSTSUPERSCRIPT ( italic_i + 1 ) end_POSTSUPERSCRIPT - italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) = italic_g ( italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) on the dataset 𝒟𝒟\mathcal{D}caligraphic_D, where x(i)superscriptx𝑖\mathrm{x}^{(i)}roman_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT denotes the input to the BNN, and g(i)superscript𝑔𝑖g^{(i)}italic_g start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT is the uncertainty to be predicted by the BNN. The details of training BNNs can be found in Bao et al. (2021). With the trained BNN, the probability density function of g^^𝑔\hat{g}over^ start_ARG italic_g end_ARG for a given (x(k),u(k))𝑥𝑘𝑢𝑘(x(k),u(k))( italic_x ( italic_k ) , italic_u ( italic_k ) ) can be approximated by sampling from the posterior distributions of weights using Monte Carlo (MC) methods and computing g^^𝑔\hat{g}over^ start_ARG italic_g end_ARG for each set of sampled weights.

2.3 Scenario-based MPC Design Approach

At the time instant k𝑘kitalic_k, the stochastic MPC minimizes

𝔼{i=0N1(x(i|k),u(i|k))+VN(x(N|k))}𝔼superscriptsubscript𝑖0𝑁1𝑥conditional𝑖𝑘𝑢conditional𝑖𝑘subscript𝑉𝑁𝑥conditional𝑁𝑘\displaystyle\mathbb{E}\left\{\sum_{i=0}^{N-1}\ell(x(i|k),u(i|k))+V_{N}(x(N|k)% )\right\}blackboard_E { ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT roman_ℓ ( italic_x ( italic_i | italic_k ) , italic_u ( italic_i | italic_k ) ) + italic_V start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x ( italic_N | italic_k ) ) } (6)

where 𝔼{}𝔼\mathbb{E}\{\cdot\}blackboard_E { ⋅ } is the expectation operator over the random vector sequence 𝐠={g(0),,g(N1)}𝐠𝑔0𝑔𝑁1\mathbf{g}=\{g(0),\cdots,g(N-1)\}bold_g = { italic_g ( 0 ) , ⋯ , italic_g ( italic_N - 1 ) }. The uncertainties of g𝑔gitalic_g are propagated forward through the prediction model (1a), making it difficult to derive the closed-form probability density function of 𝐠𝐠\mathbf{g}bold_g. To evaluate the cost in (6), scenario-based MPC (sMPC) uses a tree of discrete scenarios to represent the uncertainty evolution of a system. Consequently, the scenario-based optimal control problem for an uncertain system at time step k𝑘kitalic_k can be formulated as follows:

minxj,ujsubscriptsuperscript𝑥𝑗superscript𝑢𝑗\displaystyle\min_{x^{j},u^{j}}roman_min start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_POSTSUBSCRIPT j=1Spj[i=0N1(xj(i|k),uj(i|k))+VN(xj(N|k))]superscriptsubscript𝑗1𝑆superscript𝑝𝑗delimited-[]superscriptsubscript𝑖0𝑁1superscript𝑥𝑗conditional𝑖𝑘superscript𝑢𝑗conditional𝑖𝑘subscript𝑉𝑁superscript𝑥𝑗conditional𝑁𝑘\displaystyle~{}~{}\sum_{j=1}^{S}p^{j}\left[\sum_{i=0}^{N-1}\ell\left(x^{j}(i|% k),u^{j}(i|k)\right)+V_{N}\left(x^{j}(N|k)\right)\right]∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT [ ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT roman_ℓ ( italic_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_i | italic_k ) , italic_u start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_i | italic_k ) ) + italic_V start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_N | italic_k ) ) ] (7a)
s.t. xj(i+1|k)=f(xj(i|k),uj(i|k))+g^j(i|k),superscript𝑥𝑗𝑖conditional1𝑘𝑓superscript𝑥𝑗conditional𝑖𝑘superscript𝑢𝑗conditional𝑖𝑘superscript^𝑔𝑗conditional𝑖𝑘\displaystyle~{}~{}x^{j}(i+1|k)=f\left(x^{j}(i|k),u^{j}(i|k)\right)+\hat{g}^{j% }(i|k),italic_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_i + 1 | italic_k ) = italic_f ( italic_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_i | italic_k ) , italic_u start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_i | italic_k ) ) + over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_i | italic_k ) , (7b)
(xj(i|k),uj(i|k))𝒳×𝒰,superscript𝑥𝑗conditional𝑖𝑘superscript𝑢𝑗conditional𝑖𝑘𝒳𝒰\displaystyle~{}~{}\left(x^{j}(i|k),u^{j}(i|k)\right)\in\mathcal{X}\times% \mathcal{U},( italic_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_i | italic_k ) , italic_u start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_i | italic_k ) ) ∈ caligraphic_X × caligraphic_U , (7c)
xj(0|k)=x(k),superscript𝑥𝑗conditional0𝑘𝑥𝑘\displaystyle~{}~{}x^{j}(0|k)=x(k),italic_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( 0 | italic_k ) = italic_x ( italic_k ) , (7d)
uj(i|k)=ul(i|k)ifxp(j)(i|k)=xp(l)(i|k),superscript𝑢𝑗conditional𝑖𝑘superscript𝑢𝑙conditional𝑖𝑘ifsuperscript𝑥𝑝𝑗conditional𝑖𝑘superscript𝑥𝑝𝑙conditional𝑖𝑘\displaystyle~{}~{}u^{j}(i|k)=u^{l}(i|k)~{}\text{if}~{}x^{p(j)}(i|k)=x^{p(l)}(% i|k),italic_u start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_i | italic_k ) = italic_u start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ( italic_i | italic_k ) if italic_x start_POSTSUPERSCRIPT italic_p ( italic_j ) end_POSTSUPERSCRIPT ( italic_i | italic_k ) = italic_x start_POSTSUPERSCRIPT italic_p ( italic_l ) end_POSTSUPERSCRIPT ( italic_i | italic_k ) , (7e)

where the superscript j{1,,S}𝑗1𝑆j\in\{1,\ldots,S\}italic_j ∈ { 1 , … , italic_S } is the index of the scenario; pjsuperscript𝑝𝑗p^{j}italic_p start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT is the probability of the j𝑗jitalic_j-th scenario; (xj(i|k),uj(i|k))superscript𝑥𝑗conditional𝑖𝑘superscript𝑢𝑗conditional𝑖𝑘\ell\left(x^{j}(i|k),u^{j}(i|k)\right)roman_ℓ ( italic_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_i | italic_k ) , italic_u start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_i | italic_k ) ) and VN(xj(N|k))subscript𝑉𝑁superscript𝑥𝑗conditional𝑁𝑘V_{N}\left(x^{j}(N|k)\right)italic_V start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ( italic_N | italic_k ) ) are the stage cost and terminal cost for the j𝑗jitalic_j-th scenario, respectively; N𝑁Nitalic_N is the prediction horizon length; g^jsuperscript^𝑔𝑗\hat{g}^{j}over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT is the uncertainty realization in the j𝑗jitalic_j-th scenario; and (7e) is the non-anticipativity constraint. The control law can be determined by the solution to (7) as κ(x(k))=𝐮(0|k)𝜅𝑥𝑘superscript𝐮conditional0𝑘\kappa\left(x(k)\right)=\mathbf{u}^{\star}(0|k)italic_κ ( italic_x ( italic_k ) ) = bold_u start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( 0 | italic_k ).

3 Learning-based Scenario Generation for sMPC Design

At each time step k𝑘kitalic_k, we draw N¯MCsubscript¯𝑁MC\bar{N}_{\text{MC}}over¯ start_ARG italic_N end_ARG start_POSTSUBSCRIPT MC end_POSTSUBSCRIPT samples from normal distributions and calculate weights W(i)superscript𝑊𝑖W^{(i)}italic_W start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT by the transformation (5) to the i𝑖iitalic_i-th sample. Although Lemma 1 in Bao and Mohammadpour Velni (2023) states that the trajectories of the N¯MCsubscript¯𝑁MC\bar{N}_{\text{MC}}over¯ start_ARG italic_N end_ARG start_POSTSUBSCRIPT MC end_POSTSUBSCRIPT sampled models encompass the system trajectory, N¯MCsubscript¯𝑁MC\bar{N}_{\text{MC}}over¯ start_ARG italic_N end_ARG start_POSTSUBSCRIPT MC end_POSTSUBSCRIPT can be too large to be practical for online optimization. To reduce the number of scenarios, we instead evaluate the estimate g^(i)(k)superscript^𝑔𝑖𝑘\hat{g}^{(i)}(k)over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_k ) using W(i)superscript𝑊𝑖W^{(i)}italic_W start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT, then calculate the sample mean μ^g(k)=1N¯MCi=1N¯MCg^(i)(k)subscript^𝜇𝑔𝑘1subscript¯𝑁MCsuperscriptsubscript𝑖1subscript¯𝑁MCsuperscript^𝑔𝑖𝑘\hat{\mu}_{g(k)}=\frac{1}{\bar{N}_{\text{MC}}}\sum_{i=1}^{\bar{N}_{\text{MC}}}% \hat{g}^{(i)}(k)over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_g ( italic_k ) end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG over¯ start_ARG italic_N end_ARG start_POSTSUBSCRIPT MC end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over¯ start_ARG italic_N end_ARG start_POSTSUBSCRIPT MC end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_k ) and standard deviation σ^g(k)=1N¯MCi=1N¯MC(g^(i)(k)μ^g(k))(g^(i)(k)μ^g(k))subscript^𝜎𝑔𝑘1subscript¯𝑁MCsuperscriptsubscript𝑖1subscript¯𝑁MCsuperscriptsuperscript^𝑔𝑖𝑘subscript^𝜇𝑔𝑘topsuperscript^𝑔𝑖𝑘subscript^𝜇𝑔𝑘\hat{\sigma}_{g(k)}=\sqrt{\frac{1}{\bar{N}_{\text{MC}}}\sum_{i=1}^{\bar{N}_{% \text{MC}}}(\hat{g}^{(i)}(k)-\hat{\mu}_{g(k)})^{\top}(\hat{g}^{(i)}(k)-\hat{% \mu}_{g(k)})}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_g ( italic_k ) end_POSTSUBSCRIPT = square-root start_ARG divide start_ARG 1 end_ARG start_ARG over¯ start_ARG italic_N end_ARG start_POSTSUBSCRIPT MC end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT over¯ start_ARG italic_N end_ARG start_POSTSUBSCRIPT MC end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_k ) - over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_g ( italic_k ) end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over^ start_ARG italic_g end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_k ) - over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_g ( italic_k ) end_POSTSUBSCRIPT ) end_ARG, and use μ^g(k)subscript^𝜇𝑔𝑘\hat{\mu}_{g(k)}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_g ( italic_k ) end_POSTSUBSCRIPT, μ^g(k)+mjσ^g(k)subscript^𝜇𝑔𝑘superscript𝑚𝑗subscript^𝜎𝑔𝑘\hat{\mu}_{g(k)}+m^{j}\hat{\sigma}_{g(k)}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_g ( italic_k ) end_POSTSUBSCRIPT + italic_m start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_g ( italic_k ) end_POSTSUBSCRIPT, μ^g(k)mjσ^g(k),j=1,,S12formulae-sequencesubscript^𝜇𝑔𝑘superscript𝑚𝑗subscript^𝜎𝑔𝑘𝑗1𝑆12\hat{\mu}_{g(k)}-m^{j}\hat{\sigma}_{g(k)},j=1,\cdots,\frac{S-1}{2}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_g ( italic_k ) end_POSTSUBSCRIPT - italic_m start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_g ( italic_k ) end_POSTSUBSCRIPT , italic_j = 1 , ⋯ , divide start_ARG italic_S - 1 end_ARG start_ARG 2 end_ARG where mjsuperscript𝑚𝑗m^{j}italic_m start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT are the tuning parameters and S𝑆Sitalic_S is the number of scenarios. The probabilities of the scenarios are calculated using the moment matching method (Høyland and Wallace, 2001) to maintain the original statistical properties. To save computational cost, we only update the uncertainty estimation when solving (7) and fix the scenarios over the prediction horizon. Specifically, we use the solution u(1|k1)superscript𝑢conditional1𝑘1u^{*}(1|k-1)italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 1 | italic_k - 1 ) to (7) at k1𝑘1k-1italic_k - 1 and the state x(k)𝑥𝑘x(k)italic_x ( italic_k ) to estimate uncertainty g^(k)^𝑔𝑘\hat{g}(k)over^ start_ARG italic_g end_ARG ( italic_k ), and g^(i|k)=g^(k),i=0,,N1formulae-sequence^𝑔conditional𝑖𝑘^𝑔𝑘𝑖0𝑁1\hat{g}(i|k)=\hat{g}(k),i=0,\cdots,N-1over^ start_ARG italic_g end_ARG ( italic_i | italic_k ) = over^ start_ARG italic_g end_ARG ( italic_k ) , italic_i = 0 , ⋯ , italic_N - 1 for (7) at k𝑘kitalic_k. This approach is more tractable than considering time-varying uncertainties and adaptive scenarios within the prediction horizon, as the uncertainties are input-dependent and the control input sequence in the prediction horizon are decision variables of the sMPC problem. When the uncertainties do not change significantly within the prediction horizon, fixing the uncertainty estimation is reasonable and less conservative than using worst-case error bounds.

4 Meta-learning of BNN for sMPC Design

In this section, we present the model-agnostic meta-learning (MAML) approach for fast adaptation of BNN models. In particular, MAML learns a parameterized adaptation law that transforms a global BNN model (i.e., the meta-knowledge on a distribution of tasks) into a local BNN model with improved accuracy at each time step, to provide a tighter uncertainty quantification for scenario generation and thus improves the control performance. Specifically, meta-learning aims to find meta-knowledge that is useful for a distribution of tasks p(𝒯)𝑝𝒯p(\mathcal{T})italic_p ( caligraphic_T ), i.e.,

w=argminw𝔼𝒯p(𝒯)(𝒟;w)superscript𝑤subscript𝑤subscript𝔼similar-to𝒯𝑝𝒯𝒟𝑤w^{*}=\arg\min_{w}\mathbb{E}_{\mathcal{T}\sim p(\mathcal{T})}\mathcal{L}(% \mathcal{D};w)italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_arg roman_min start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT caligraphic_T ∼ italic_p ( caligraphic_T ) end_POSTSUBSCRIPT caligraphic_L ( caligraphic_D ; italic_w ) (8)

where 𝒟𝒟\mathcal{D}caligraphic_D denotes a dataset, \mathcal{L}caligraphic_L is the loss function, and 𝒯={𝒟,}𝒯𝒟\mathcal{T}=\{\mathcal{D},\mathcal{L}\}caligraphic_T = { caligraphic_D , caligraphic_L }. To adapt to a specific task 𝒯(i)superscript𝒯𝑖\mathcal{T}^{(i)}caligraphic_T start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT, meta-learning finds task-specific parameters θ(i)superscript𝜃𝑖\theta^{(i)}italic_θ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT by solving

minθ(i)(i)(𝒟(i),θ(i);w)subscriptsuperscript𝜃𝑖superscript𝑖superscript𝒟𝑖superscript𝜃𝑖superscript𝑤\min_{\theta^{(i)}}\mathcal{L}^{(i)}(\mathcal{D}^{(i)},\theta^{(i)};w^{*})roman_min start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( caligraphic_D start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_θ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ; italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) (9)

based on the meta-knowledge, which facilitates quick adaptation. In our case, inspired by Finn (2018), we view the model adaptation at each time step as a different task and obtain the parameters of the model at time step k𝑘kitalic_k by θ(k)=w+Δθψ(τ(kM,k1))𝜃𝑘superscript𝑤Δsubscript𝜃𝜓𝜏𝑘𝑀𝑘1\theta(k)=w^{*}+\Delta\theta_{\psi}(\tau(k-M,k-1))italic_θ ( italic_k ) = italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + roman_Δ italic_θ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_τ ( italic_k - italic_M , italic_k - 1 ) ) where τ(kM,k1)=(x(kM),u(kM),,x(k1),u(k1))𝜏𝑘𝑀𝑘1𝑥𝑘𝑀𝑢𝑘𝑀𝑥𝑘1𝑢𝑘1\tau(k-M,k-1)=\left(x(k-M),u(k-M),\cdots,x(k-1),u(k-1)\right)italic_τ ( italic_k - italic_M , italic_k - 1 ) = ( italic_x ( italic_k - italic_M ) , italic_u ( italic_k - italic_M ) , ⋯ , italic_x ( italic_k - 1 ) , italic_u ( italic_k - 1 ) ) denotes the trajectory of the system in the last M𝑀Mitalic_M time steps and ΔθψΔsubscript𝜃𝜓\Delta\theta_{\psi}roman_Δ italic_θ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT is the step size function represented by an ANN with parameters ψ𝜓\psiitalic_ψ. Fig. 2 demonstrates the online adaptation of the global BNN model parameterized by wsuperscript𝑤w^{*}italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to the local BNN model parameterized by θ(k)𝜃𝑘\theta(k)italic_θ ( italic_k ). Then, the meta-learning problem is

minw,ψsubscript𝑤𝜓\displaystyle\min_{w,\psi}~{}roman_min start_POSTSUBSCRIPT italic_w , italic_ψ end_POSTSUBSCRIPT 𝔼i𝒰{M,,NK}[(τ(i,i+K),θ(i))]subscript𝔼similar-to𝑖𝒰𝑀𝑁𝐾delimited-[]𝜏𝑖𝑖𝐾𝜃𝑖\displaystyle\mathbb{E}_{i\sim\mathcal{U}\{M,\cdots,N-K\}}\left[\mathcal{L}(% \tau(i,i+K),\theta(i))\right]blackboard_E start_POSTSUBSCRIPT italic_i ∼ caligraphic_U { italic_M , ⋯ , italic_N - italic_K } end_POSTSUBSCRIPT [ caligraphic_L ( italic_τ ( italic_i , italic_i + italic_K ) , italic_θ ( italic_i ) ) ] (10)
s.t. θ(i)=w+Δθψ(τ(iM,i1)).𝜃𝑖𝑤Δsubscript𝜃𝜓𝜏𝑖𝑀𝑖1\displaystyle\theta(i)=w+\Delta\theta_{\psi}(\tau(i-M,i-1)).italic_θ ( italic_i ) = italic_w + roman_Δ italic_θ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_τ ( italic_i - italic_M , italic_i - 1 ) ) . (11)

Moreover, the loss function of a task is

\displaystyle\mathcal{L}caligraphic_L =(τ(i,i+K),θ(i))absent𝜏𝑖𝑖𝐾𝜃𝑖\displaystyle=\mathcal{L}(\tau(i,i+K),\theta(i))= caligraphic_L ( italic_τ ( italic_i , italic_i + italic_K ) , italic_θ ( italic_i ) )
=1Kk=1K(x^(i+k),x(i+k);θ(i+k1)).absent1𝐾superscriptsubscript𝑘1𝐾^𝑥𝑖𝑘𝑥𝑖𝑘𝜃𝑖𝑘1\displaystyle=\frac{1}{K}\sum_{k=1}^{K}\mathcal{L}(\hat{x}(i+k),x(i+k);\theta(% i+k-1)).= divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT caligraphic_L ( over^ start_ARG italic_x end_ARG ( italic_i + italic_k ) , italic_x ( italic_i + italic_k ) ; italic_θ ( italic_i + italic_k - 1 ) ) .

It is noted that the loss function is evaluated on the future transitions such that the prediction errors of the adapted models are minimized in the next K𝐾Kitalic_K time steps. The training process is summarized in Algorithm 1, and the testing process is presented in Algorithm 2.

Algorithm 1 Learning the meta-knowledge
0:  learning rate α,β+𝛼𝛽superscript\alpha,\beta\in\mathbb{R}^{+}italic_α , italic_β ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT for ψ,w𝜓𝑤\psi,witalic_ψ , italic_w, respectively; number of tasks N𝒯subscript𝑁𝒯N_{\mathcal{T}}italic_N start_POSTSUBSCRIPT caligraphic_T end_POSTSUBSCRIPT; dataset 𝒟𝒟\mathcal{D}caligraphic_D
1:  Initialize w𝑤witalic_w
2:  while not done do
3:     for i=1𝑖1i=1italic_i = 1 to N𝒯subscript𝑁𝒯N_{\mathcal{T}}italic_N start_POSTSUBSCRIPT caligraphic_T end_POSTSUBSCRIPT do
4:        Sample i𝒰{M,,NK}similar-to𝑖𝒰𝑀𝑁𝐾i\sim\mathcal{U}\{M,\cdots,N-K\}italic_i ∼ caligraphic_U { italic_M , ⋯ , italic_N - italic_K } and have τ(iM,i1),τ(i,i+K)𝜏𝑖𝑀𝑖1𝜏𝑖𝑖𝐾\tau(i-M,i-1),\tau(i,i+K)italic_τ ( italic_i - italic_M , italic_i - 1 ) , italic_τ ( italic_i , italic_i + italic_K )
5:        θ(i)=w+Δθψ(τ(iM,i1))𝜃𝑖𝑤Δsubscript𝜃𝜓𝜏𝑖𝑀𝑖1\theta(i)=w+\Delta\theta_{\psi}(\tau(i-M,i-1))italic_θ ( italic_i ) = italic_w + roman_Δ italic_θ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_τ ( italic_i - italic_M , italic_i - 1 ) )
6:        j=(τ(i,i+K),θ(i))subscript𝑗𝜏𝑖𝑖𝐾𝜃𝑖\mathcal{L}_{j}=\mathcal{L}(\tau(i,i+K),\theta(i))caligraphic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = caligraphic_L ( italic_τ ( italic_i , italic_i + italic_K ) , italic_θ ( italic_i ) )
7:        ψψαψjabsent𝜓subscript𝜓𝜓𝛼subscript𝑗\psi\xleftarrow[]{}\psi-\alpha\bigtriangledown_{\psi}\mathcal{L}_{j}italic_ψ start_ARROW start_OVERACCENT end_OVERACCENT ← end_ARROW italic_ψ - italic_α ▽ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
8:     end for
9:     meta-update: wwβ1N𝒯j=1N𝒯wjabsent𝑤subscript𝑤𝑤𝛽1subscript𝑁𝒯superscriptsubscript𝑗1subscript𝑁𝒯subscript𝑗w\xleftarrow[]{}w-\beta\frac{1}{N_{\mathcal{T}}}\sum_{j=1}^{N_{\mathcal{T}}}% \bigtriangledown_{w}\mathcal{L}_{j}italic_w start_ARROW start_OVERACCENT end_OVERACCENT ← end_ARROW italic_w - italic_β divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT caligraphic_T end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT caligraphic_T end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ▽ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
10:  end while
11:  return  wsuperscript𝑤w^{*}italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and ΔθψΔsubscript𝜃𝜓\Delta\theta_{\psi}roman_Δ italic_θ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT
Refer to caption
Figure 2: Online adaptation of the BNN model.
Algorithm 2 Online adaptive control design approach
0:  meta-knowledge wsuperscript𝑤w^{*}italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT; update rule ΔθψΔsubscript𝜃𝜓\Delta\theta_{\psi}roman_Δ italic_θ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT; experience τ(M,1)𝜏𝑀1\tau(-M,-1)italic_τ ( - italic_M , - 1 ) and time steps N¯¯𝑁\bar{N}over¯ start_ARG italic_N end_ARG for control
1:  for k=0𝑘0k=0italic_k = 0 to N¯¯𝑁\bar{N}over¯ start_ARG italic_N end_ARG do
2:     Adapt model by θ(k)=w+Δθψ(τ(kM,k1))𝜃𝑘superscript𝑤Δsubscript𝜃𝜓𝜏𝑘𝑀𝑘1\theta(k)=w^{*}+\Delta\theta_{\psi}(\tau(k-M,k-1))italic_θ ( italic_k ) = italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + roman_Δ italic_θ start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT ( italic_τ ( italic_k - italic_M , italic_k - 1 ) )
3:     Compute control input u(k)𝑢𝑘u(k)italic_u ( italic_k ) using the model with θ(k)𝜃𝑘\theta(k)italic_θ ( italic_k )
4:     Apply u(k)𝑢𝑘u(k)italic_u ( italic_k ) and collect data (x(k),u(k),x(k+1))𝑥𝑘𝑢𝑘𝑥𝑘1(x(k),u(k),x(k+1))( italic_x ( italic_k ) , italic_u ( italic_k ) , italic_x ( italic_k + 1 ) ) for validating/fine-tuning the model
5:  end for

We directly learn a set of parameters of model weights posteriors as the meta-knowledge and adapt to task-specific model weights posteriors by modeling the task-specific parameters of the model weights posteriors as a function of the past trajectory and the meta-knowledge for fast adaptation. Moreover, we initialize the meta-knowledge with the model learned in Section 2.2 to facilitate learning.

Prior distributions affect the accuracy of the BNN models, and their selection has been studied by Fortuin et al. (2021). A proper prior is usually unknown or hard to choose. For MAML of BNNs, we directly use the learned posterior as the prior of the BNN weights. That way, the local model adjusts the posteriors in a way that is as close as possible to the posteriors of the global model, i.e.,

minw,ψ(𝔼q(W;w,ψ)[logq(W;w,ψ)]𝔼q(W;w,ψ)[logp(W;θ0)]𝔼q(W;w,ψ)[logp(𝒟|W)]),subscript𝑤𝜓subscript𝔼𝑞𝑊𝑤𝜓delimited-[]𝑞𝑊𝑤𝜓subscript𝔼𝑞𝑊𝑤𝜓delimited-[]𝑝𝑊subscript𝜃0subscript𝔼𝑞𝑊𝑤𝜓delimited-[]𝑝|𝒟𝑊\begin{split}\min_{w,\psi}\Big{(}&\mathbb{E}_{q(W;w,\psi)}\left[\log q(W;w,% \psi)\right]-\mathbb{E}_{q(W;w,\psi)}\left[\log p(W;\theta_{0})\right]\\ &-\mathbb{E}_{q(W;w,\psi)}\left[\log p(\mathcal{D}|W)\right]\Big{)},\end{split}start_ROW start_CELL roman_min start_POSTSUBSCRIPT italic_w , italic_ψ end_POSTSUBSCRIPT ( end_CELL start_CELL blackboard_E start_POSTSUBSCRIPT italic_q ( italic_W ; italic_w , italic_ψ ) end_POSTSUBSCRIPT [ roman_log italic_q ( italic_W ; italic_w , italic_ψ ) ] - blackboard_E start_POSTSUBSCRIPT italic_q ( italic_W ; italic_w , italic_ψ ) end_POSTSUBSCRIPT [ roman_log italic_p ( italic_W ; italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - blackboard_E start_POSTSUBSCRIPT italic_q ( italic_W ; italic_w , italic_ψ ) end_POSTSUBSCRIPT [ roman_log italic_p ( caligraphic_D | italic_W ) ] ) , end_CELL end_ROW (12)

where θ0subscript𝜃0\theta_{0}italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT denotes the parameters of the learned posterior.

5 Experimental Results and Validation

Consider the following nonlinear system

{x˙1=10x1x1x22+0.5x12+0.5x1u1+0.5u2x˙2=x2+0.1x12+3x12x2x1x2u1casessubscript˙𝑥110subscript𝑥1subscript𝑥1superscriptsubscript𝑥220.5superscriptsubscript𝑥120.5subscript𝑥1subscript𝑢10.5subscript𝑢2subscript˙𝑥2subscript𝑥20.1superscriptsubscript𝑥123superscriptsubscript𝑥12subscript𝑥2subscript𝑥1subscript𝑥2subscript𝑢1\left\{\begin{array}[]{l}\dot{x}_{1}=10x_{1}-x_{1}x_{2}^{2}+0.5x_{1}^{2}+0.5x_% {1}u_{1}+0.5u_{2}\\ \dot{x}_{2}=-x_{2}+0.1x_{1}^{2}+3x_{1}^{2}x_{2}-x_{1}x_{2}u_{1}\end{array}\right.{ start_ARRAY start_ROW start_CELL over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 10 italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 0.5 italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 0.5 italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 0.5 italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 0.1 italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY (13)

with the state and input constraints

5x13,0x210;1u11,1u21.\displaystyle\begin{split}&-5\leq x_{1}\leq 3,~{}~{}~{}~{}0\leq x_{2}\leq 10;% \\ &-1\leq u_{1}\leq 1,~{}-1\leq u_{2}\leq 1.\end{split}start_ROW start_CELL end_CELL start_CELL - 5 ≤ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ 3 , 0 ≤ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 10 ; end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL - 1 ≤ italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ 1 , - 1 ≤ italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ 1 . end_CELL end_ROW

The system model (13) is assumed to be unknown but we can collect data from the system for modeling and control design purposes.

5.0.1 Plant-model Mismatch Modeling

We applied a random input sequence drawn from the uniform distribution U(0.5,0.5)𝑈0.50.5U(-0.5,0.5)italic_U ( - 0.5 , 0.5 ) to the system and collected a dataset 𝒟={u(i),x(i),x(i+1)|i=1,,1000}𝒟conditional-setsuperscript𝑢𝑖superscript𝑥𝑖superscript𝑥𝑖1𝑖11000\mathcal{D}=\{u^{(i)},x^{(i)},x^{(i+1)}|i=1,\cdots,1000\}caligraphic_D = { italic_u start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( italic_i + 1 ) end_POSTSUPERSCRIPT | italic_i = 1 , ⋯ , 1000 } with the sampling time of 0.10.10.10.1 s. Additionally, the dataset is randomly split into training and testing sets by the ratio of 75%/25%percent75percent2575\%/25\%75 % / 25 %. First, using the approach (Bao et al., 2020a), we learned an ANN-based linear parameter-varying (LPV) model as the nominal model, i.e., f(x(k),u(k))=A(ρ(k))x(k)+B(ρ(k))u(k)𝑓𝑥𝑘𝑢𝑘𝐴𝜌𝑘𝑥𝑘𝐵𝜌𝑘𝑢𝑘f(x(k),u(k))=A(\rho(k))x(k)+B(\rho(k))u(k)italic_f ( italic_x ( italic_k ) , italic_u ( italic_k ) ) = italic_A ( italic_ρ ( italic_k ) ) italic_x ( italic_k ) + italic_B ( italic_ρ ( italic_k ) ) italic_u ( italic_k ) where ρ=[x1;x1x2]𝜌subscript𝑥1subscript𝑥1subscript𝑥2\rho=[x_{1};x_{1}x_{2}]italic_ρ = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] are used as the scheduling variables. In particular, we use two one-layer ANNs without activation functions to represent matrix function A𝐴Aitalic_A and B𝐵Bitalic_B, respectively. Fig. 3 shows the validation results of the ANN-based LPV model.

Refer to caption
(a) BFRx1=93.94%subscriptBFRsubscript𝑥1percent93.94\textup{BFR}_{x_{1}}=93.94\%BFR start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 93.94 %.
Refer to caption
(b) BFRx2=92.77%subscriptBFRsubscript𝑥2percent92.77\textup{BFR}_{x_{2}}=92.77\%BFR start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 92.77 %.
Figure 3: Validation results of the nominal model for the second example.

Then, we used the proposed MAML of BNN to model the mismatch between the system and the nominal model using the dataset 𝒟g={(x(i),u(i)),x(i+1)f(x(i),u(i))|i=1,,1000}subscript𝒟𝑔conditional-setsuperscript𝑥𝑖superscript𝑢𝑖superscript𝑥𝑖1𝑓superscript𝑥𝑖superscript𝑢𝑖𝑖11000\mathcal{D}_{g}=\{(x^{(i)},u^{(i)}),x^{(i+1)}-f(x^{(i)},u^{(i)})|i=1,\cdots,10% 00\}caligraphic_D start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = { ( italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) , italic_x start_POSTSUPERSCRIPT ( italic_i + 1 ) end_POSTSUPERSCRIPT - italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT , italic_u start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) | italic_i = 1 , ⋯ , 1000 }.

Technical details for MAML of the BNN: We consider the form g(x(k),u(k))=h(x(k))[ρ(k);u(k)]𝑔𝑥𝑘𝑢𝑘𝑥𝑘𝜌𝑘𝑢𝑘g(x(k),u(k))=h(x(k))[\rho(k);u(k)]italic_g ( italic_x ( italic_k ) , italic_u ( italic_k ) ) = italic_h ( italic_x ( italic_k ) ) [ italic_ρ ( italic_k ) ; italic_u ( italic_k ) ]. We used one DenseVariational layer without an activation function fully connected to a three-layer fully-connected ANN with the ELU as activation functions to represent hhitalic_h. The prior is p(W4)=π𝒩(0,(σ1)2)+(1π)𝒩(0,(σ2)2)𝑝superscript𝑊4𝜋𝒩0superscriptsubscript𝜎121𝜋𝒩0superscriptsubscript𝜎22p(W^{4})=\pi\mathcal{N}(0,(\sigma_{1})^{2})+(1-\pi)\mathcal{N}(0,(\sigma_{2})^% {2})italic_p ( italic_W start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ) = italic_π caligraphic_N ( 0 , ( italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) + ( 1 - italic_π ) caligraphic_N ( 0 , ( italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) with π=0.5𝜋0.5\pi=0.5italic_π = 0.5, σ1=1.5subscript𝜎11.5\sigma_{1}=1.5italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1.5, and σ2=0.1subscript𝜎20.1\sigma_{2}=0.1italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.1. Each of the two hidden layers has 8888 units. Moreover, we used a fully-connected layer to model the updating rule δθ(τ(kM,k1))𝛿𝜃𝜏𝑘𝑀𝑘1\delta\theta(\tau(k-M,k-1))italic_δ italic_θ ( italic_τ ( italic_k - italic_M , italic_k - 1 ) ), and M=5𝑀5M=5italic_M = 5. Additionally, we only update the posteriors of the weights and biases terms in the DenseVariational layer to reduce the model complexity, which was sufficient for modeling and safe control as shown in the experimental results. First, we trained a BNN model and then used the posteriors of the weights in this BNN as the priors of the global model weights, to enhance the learning efficiency. Specifically, similar to the transfer learning approach for BNN training Bao et al. (2021), we first trained an ANN model which shares the same architecture with the BNN model and then transferred the weights of the ANN model to the BNN model to improve the training efficiency of the BNN model. For model optimization, we used the Adam optimizer. Both the ANN and BNN model were trained for 30,0003000030,00030 , 000 epochs with a batch size of 32323232 and the initial learning rate of 1e31𝑒31e-31 italic_e - 3. The meta-learning of the BNN by Algorithm 1 was executed for 100100100100 epochs with the initial learning rate of 1e51𝑒51e-51 italic_e - 5.

Results and discussion: The performance comparison between BNN and MAML of BNN on the testing set is shown in Fig. 4. It is noted that the accuracy of the average model by MAML-BNN is much higher than that by BNN and the credible intervals of MAML-BNN are far less conservative than those of BNN, which demonstrates that the proposed approach can improve the uncertainty quantification of BNN. Moreover, the closed-loop simulation results will demonstrate that the improved model will also improve the control performance.

Refer to caption
(a) Comparison between the average models w.r.t. x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.
Refer to caption
(b) Comparison between the average models w.r.t. x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.
Refer to caption
(c) CIs by BNN w.r.t. x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.
Refer to caption
(d) CIs of x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT by MAML-BNN.
Refer to caption
(e) CIs by BNN w.r.t. x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.
Refer to caption
(f) CIs of x2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT by MAML-BNN.
Figure 4: Comparison between BNN and MAML-BNN for the second example.

5.0.2 Closed-loop Simulations

Next, we use the MAML-BNN for scenario generation and design of an adaptive sMPC scheme. Similar to the first example, the control objective here is to stabilize the system while satisfying the system constraints.

At each time instant, we sampled N¯MC=50subscript¯𝑁MC50\bar{N}_{\text{MC}}=50over¯ start_ARG italic_N end_ARG start_POSTSUBSCRIPT MC end_POSTSUBSCRIPT = 50 models to estimate the mean μgsubscript𝜇𝑔\mu_{g}italic_μ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT and standard deviation σgsubscript𝜎𝑔\sigma_{g}italic_σ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT. Subsequently, at each node of the scenario tree in the robust horizon, we used μ^gsubscript^𝜇𝑔\hat{\mu}_{g}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT, μ^g+3σ^gsubscript^𝜇𝑔3subscript^𝜎𝑔\hat{\mu}_{g}+3\hat{\sigma}_{g}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT + 3 over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT and μ^g3σ^gsubscript^𝜇𝑔3subscript^𝜎𝑔\hat{\mu}_{g}-3\hat{\sigma}_{g}over^ start_ARG italic_μ end_ARG start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT - 3 over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT as the discrete scenarios branching from that node. Furthermore, the worst-case bounds of the mismatch are |g1|0.21subscript𝑔10.21|g_{1}|\leq 0.21| italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ≤ 0.21 and |g2|0.85subscript𝑔20.85|g_{2}|\leq 0.85| italic_g start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ≤ 0.85. When the predictions of the scenarios are out of the bounds of g𝑔gitalic_g, we use the bounds instead of the predictions and uniform distribution as the probability of scenarios. For sMPC design, we used a prediction horizon N=7𝑁7N=7italic_N = 7 and a robust horizon Nr=1subscript𝑁𝑟1N_{r}=1italic_N start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = 1. The parameters of the adopted quadratic stage and terminal cost functions are Q=I𝑄𝐼Q=Iitalic_Q = italic_I, R=100I𝑅100𝐼R=100Iitalic_R = 100 italic_I, and P=I𝑃𝐼P=Iitalic_P = italic_I, and the initial state is x(0)=[1;5]𝑥015x(0)=[-1;5]italic_x ( 0 ) = [ - 1 ; 5 ]. As shown in Fig. 5, the proposed approach stabilized the system while satisfying the system constraints. Moreover, the trajectories of the three scenarios contain the trajectory of the system with a high probability, which guarantees safety.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Closed-loop simulation results of the proposed sMPC with BNN by MAML for uncertainty estimation and weighted scenarios for system stabilization.
Refer to caption
Refer to caption
Figure 6: Closed-loop simulation results of the sMPC with BNN, clearly showing that the controller designed based on the BNN model failed to stabilize the system due to the inaccurate uncertainty quantification.

For comparison, we also examine the performance of the adaptive sMPC using the predictions of the plant-model mismatch by BNN without MAML. Fig. 6 illustrates that x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT did not converge to 00, which implies that the controller has failed to stabilize the system due to the inaccurate uncertainty quantification of the global BNN model.

6 Concluding Remarks

A model-agnostic meta-learning approach was presented to fine-tune BNN models online for sMPC design with safety guarantees. In particular, a global BNN model and an updating law for model adaptation were learned from data in the training phase. Then, the local BNN model adapted from the global model using the updating law was used to generate scenarios online for sMPC design purposes. To ensure safety, the behaviors of the generated scenarios contained the system behavior with high probability, and the constraints were enforced for all the scenarios. The closed-loop simulations demonstrated that the proposed approach improved model accuracy and control performance compared with sMPC designed based on a global BNN model.

References

  • Ackerman et al. (2020) Ackerman, S., Farchi, E., Raz, O., Zalmanovici, M., and Dube, P. (2020). Detection of data drift and outliers affecting machine learning model performance over time. 10.48550/ARXIV.2012.09258. URL https://arxiv.org/abs/2012.09258.
  • Bao et al. (2021) Bao, Y., Mohammadpour Velni, J., and Shahbakhti, M. (2021). Epistemic uncertainty quantification in state-space LPV model identification using bayesian neural networks. IEEE Control Systems Letters, 5(2), 719–724.
  • Bao et al. (2023a) Bao, Y., Abbas, H.S., and Mohammadpour Velni, J. (2023a). A learning- and scenario-based mpc design for nonlinear systems in LPV framework with safety and stability guarantees. International Journal of Control. 10.1080/00207179.2023.2212814.
  • Bao et al. (2023b) Bao, Y., Chan, K.J., Mesbah, A., and Velni, J.M. (2023b). Learning-based adaptive-scenario-tree model predictive control with improved probabilistic safety using robust bayesian neural networks. International Journal of Robust and Nonlinear Control, 33(5), 3312–3333.
  • Bao and Mohammadpour Velni (2022a) Bao, Y. and Mohammadpour Velni, J. (2022a). An overview of data-driven modeling and learning-based control design methods for nonlinear systems in LPV framework. In Proc. of the 5th IFAC Workshop on Linear Parameter Varying Systems.
  • Bao and Mohammadpour Velni (2022b) Bao, Y. and Mohammadpour Velni, J. (2022b). Safe control of nonlinear systems in LPV framework using model-based reinforcement learning. International Journal of Control, 1–12.
  • Bao and Mohammadpour Velni (2023) Bao, Y. and Mohammadpour Velni, J. (2023). A hybrid neural network approach for adaptive scenario-based model predictive control in the lpv framework. IEEE Control Systems Letters, 7, 1921–1926. 10.1109/LCSYS.2023.3283493.
  • Bao et al. (2020a) Bao, Y., Mohammadpour Velni, J., Basina, A., and Shahbakhti, M. (2020a). Identification of state-space linear parameter-varying models using artificial neural networks. IFAC-PapersOnLine, 53(2), 5286–5291.
  • Bao et al. (2020b) Bao, Y., Mohammadpour Velni, J., and Shahbakhti, M. (2020b). An online transfer learning approach for identification and predictive control design with application to rcci engines. In Dynamic Systems and Control Conference, volume 84270, V001T21A003. American Society of Mechanical Engineers.
  • Bao et al. (2021) Bao, Y., Thesma, V., and Mohammadpour Velni, J.M. (2021). Physics-guided and neural network learning-based sliding mode control. IFAC-PapersOnLine, 54(20), 705–710.
  • Bemporad et al. (2002) Bemporad, A., Borrelli, F., Morari, M., et al. (2002). Model predictive control based on linear programming~ the explicit solution. IEEE transactions on automatic control, 47(12), 1974–1985.
  • Blundell et al. (2015) Blundell, C., Cornebise, J., Kavukcuoglu, K., and Wierstra, D. (2015). Weight uncertainty in neural network. In International Conference on Machine Learning, 1613–1622. PMLR.
  • Bonzanini et al. (2021) Bonzanini, A.D., Paulson, J.A., Makrygiorgos, G., and Mesbah, A. (2021). Fast approximate learning-based multistage nonlinear model predictive control using gaussian processes and deep neural networks. Computers & Chemical Engineering, 145, 107174.
  • Carbone et al. (2020) Carbone, G., Wicker, M., Laurenti, L., Patane, A., Bortolussi, L., and Sanguinetti, G. (2020). Robustness of bayesian neural networks to gradient-based attacks. arXiv preprint arXiv:2002.04359.
  • Finn et al. (2017) Finn, C., Abbeel, P., and Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, 1126–1135. PMLR.
  • Finn (2018) Finn, C.B. (2018). Learning to learn with gradients. University of California, Berkeley.
  • Fortuin et al. (2021) Fortuin, V., Garriga-Alonso, A., Wenzel, F., Rätsch, G., Turner, R., van der Wilk, M., and Aitchison, L. (2021). Bayesian neural network priors revisited. arXiv preprint arXiv:2102.06571.
  • Harrison et al. (2018) Harrison, J., Sharma, A., Calandra, R., and Pavone, M. (2018). Control adaptation via meta-learning dynamics. In Workshop on Meta-Learning at NeurIPS, volume 2018.
  • Hewing et al. (2019) Hewing, L., Kabzan, J., and Zeilinger, M.N. (2019). Cautious model predictive control using Gaussian process regression. IEEE Transactions on Control Systems Technology, 28(6), 2736–2743.
  • Hewing et al. (2020) Hewing, L., Wabersich, K.P., Menner, M., and Zeilinger, M.N. (2020). Learning-based model predictive control: Toward safe learning in control. Annual Review of Control, Robotics, and Autonomous Systems, 3, 269–296.
  • Høyland and Wallace (2001) Høyland, K. and Wallace, S.W. (2001). Generating scenario trees for multistage decision problems. Management science, 47(2), 295–307.
  • Huisman et al. (2021) Huisman, M., van Rijn, J.N., and Plaat, A. (2021). A survey of deep meta-learning. Artificial Intelligence Review, 1–59.
  • Jospin et al. (2020) Jospin, L.V., Buntine, W., Boussaid, F., Laga, H., and Bennamoun, M. (2020). Hands-on bayesian neural networks–a tutorial for deep learning users. arXiv preprint arXiv:2007.06823.
  • Koller et al. (2018) Koller, T., Berkenkamp, F., Turchetta, M., and Krause, A. (2018). Learning-based model predictive control for safe exploration. In 2018 IEEE Conference on Decision and Control (CDC), 6059–6066. IEEE.
  • Mesbah et al. (2022) Mesbah, A., Wabersich, K.P., Schoellig, A.P., Zeilinger, M.N., Lucia, S., Badgwell, T.A., and Paulson, J.A. (2022). Fusion of machine learning and mpc under uncertainty: What advances are on the horizon? Proceedings of the American Control Conference.
  • Peng (2020) Peng, H. (2020). A comprehensive overview and survey of recent advances in meta-learning. arXiv preprint arXiv:2004.11149.
  • Rasmussen (2003) Rasmussen, C.E. (2003). Gaussian processes in machine learning. Summer school on machine learning, 63–71. Springer.
  • Richards et al. (2021) Richards, S.M., Azizan, N., Slotine, J.J.E., and Pavone, M. (2021). Adaptive-control-oriented meta-learning for nonlinear systems. arXiv preprint arXiv:2103.04490.
  • Rizvi et al. (2018) Rizvi, S.Z., Mohammadpour Velni, J., Abbasi, F., Tóth, R., and Meskin, N. (2018). State-space LPV model identification using kernelized machine learning. Automatica, 88, 38–47.
  • Soloperto et al. (2018) Soloperto, R., Müller, M.A., Trimpe, S., and Allgöwer, F. (2018). Learning-based robust model predictive control with state-dependent uncertainty. IFAC-PapersOnLine, 51(20), 442–447.