Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
\coltauthor\Name

Yuliang Gu \Emailyuliang3@illinois.edu
\NameSheng Cheng \Emailchengs@illinois.edu
\NameNaira Hovakimyan \Emailnhovakim@illinois.edu
\addrMechanical Science and Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801

Proto-MPC: An Encoder-Prototype-Decoder Approach for Quadrotor Control in Challenging Winds

Abstract

Quadrotors are increasingly used in the evolving field of aerial robotics for their agility and mechanical simplicity. However, inherent uncertainties, such as aerodynamic effects coupled with quadrotors’ operation in dynamically changing environments, pose significant challenges for traditional, nominal model-based control designs. To address these challenges, we propose a multi-task meta-learning method called Encoder-Prototype-Decoder (EPD), which has the advantage of effectively balancing shared and distinctive representations across diverse training tasks. Subsequently, we integrate the EPD model into a model predictive control problem (Proto-MPC) to enhance the quadrotor’s ability to adapt and operate across a spectrum of dynamically changing tasks with an efficient online implementation. We validate the proposed method in simulations, which demonstrates Proto-MPC’s robust performance in trajectory tracking of a quadrotor being subject to static and spatially varying side winds.

keywords:
Multi-task Learning, Meta Learning, Model Predictive Control, Aerial Robotics

1 Introduction

In the evolving field of aerial robotics, quadrotors are widely used due to their agility and versatility in various applications. To fully leverage the agility of quadrotors, controller designs are heavily based on quadrotor models. Generally, these models are derived following the Newton-Euler equations, which can hardly accommodate dynamic uncertainties in real-world applications (e.g., wind, aerodynamic effects, slung or slosh payloads). To address this limitation, recent research has focused on using advanced machine learning methods, such as Gaussian Process (Torrente et al., 2021) and NeuralODE (Chee et al., 2022), to learn an accurate dynamical model from real-world data and integrate it with model-based control design, which can significantly enhance the system performance.

Quadrotors operating in real-world scenarios frequently encounter a range of structurally similar yet appearingly different tasks, each with unique dynamical uncertainties. For instance, a quadrotor might face varying side wind conditions or be tasked with transporting slung payloads of unknown mass. These varied tasks pose a unique challenge for the above-mentioned control methods. While relying on a single data-driven model often falls short of achieving optimal performance across diverse scenarios, training multiple models for case-specific tasks is inefficient due to 1) challenges in data collection for each specific case and 2) potentially time-consuming online switches of different trained models that use a relatively large amount of parameters for each individual task. To tackle these challenges, a growing line of research investigates the use of online learning and meta-learning techniques. These methods operate in an offline-online framework (O’Connell et al., 2022; Jiahao et al., 2023; Richards et al., 2021; Wang et al., 2024), allowing for adaptation of the learned models or real-time retraining of new models to align with the changing characteristics of operational tasks. (A more detailed literature review is available in the Appendix.)

Integrating online learning methods into model-based control design poses several key challenges: 1) adaptivity: the system must rapidly respond to real-time changing conditions; 2) model fidelity: as data-driven models evolve through online learning, they risk losing essential knowledge learned from the initial training data, which can lead to unpredictable behaviors and reduced performance in situations that they were originally designed to handle; 3) exploration vs. exploitation: reaching the right balance between exploring new data and exploiting existing knowledge is critical to ensure reliable real-time performance.

Refer to caption
Figure 1: Framework Overview. a) Collecting data on multiple tasks; b) Pretraining to ensure that encoder-decoder pairs can capture the overall patterns of the data; c) Jointly training task-specific prototype decoders to capture distinctive task features and regularizing the encoder to avoid overfitting; d) Online implementation of Proto-MPC with prototype-decoder-based adaptation.

To address these challenges, we introduce Proto-MPC, a novel multi-task meta-learning-based model predictive control (MPC) framework. Central to our method is an Encoder-Prototype-Decoder (EPD) model, which is designed to learn the residual dynamics of the quadrotor from diverse tasks. The EPD model comprises two key components: a universal deep neural network (DNN) encoder and a set of task-specific linear prototype decoders. On the one hand, the encoder learns the common and essential patterns across various task datasets, providing a generalized understanding of the tasks by producing their representations on a low-dimensional manifold (i.e., features). On the other hand, the linear prototype decoder captures the distinctive characteristics of a specific task in a computationally efficient way (due to its linear form). In the online inference stage, the encoder processes incoming data into features, while prototype decoders are used as a “basis” to interpolate encoded features as residuals in the dynamics. This architecture allows fast computation of a new decoder aligned with the current task’s characteristics online. Moreover, this adaptive approach ensures the MPC has an accurate, up-to-date residual dynamical model. We evaluate the proposed framework on a quadrotor under various speeds of side wind. The results showcase the generalization and fast adaptation of the proposed Proto-MPC framework.

The contributions of this paper are summarized as follows: we propose Proto-MPC, a novel model predictive control framework for quadrotor control subject to uncertainties and disturbances. We propose an EPD model as a data-driven augmentation to the physics-based dynamics to capture the uncertainties. The EPD model can achieve the balance between generalizing across a wide array of tasks, both trained and unseen tasks, and rapidly adapting to dynamically evolving task conditions with tunable parameters.

2 Background: Nonlinear MPC for Quadrotor Control

We consider the 6 DoF rigid body dynamics of the quadrotor (with mass m𝑚mitalic_m and inertia J𝐽Jitalic_J)

\boldsymbolp˙=\boldsymbolv,\boldsymbolv˙=m1f\boldsymbolzB+\boldsymbolg,\boldsymbolq˙=12\boldsymbolq[0\boldsymbolω],\boldsymbolω˙=J1(\boldsymbolM\boldsymbolω×J\boldsymbolω),formulae-sequence˙\boldsymbol𝑝\boldsymbol𝑣formulae-sequence˙\boldsymbol𝑣superscript𝑚1𝑓\boldsymbolsubscript𝑧𝐵\boldsymbol𝑔formulae-sequence˙\boldsymbol𝑞tensor-product12\boldsymbol𝑞superscriptdelimited-[]0\boldsymbolsuperscript𝜔toptop˙\boldsymbol𝜔superscript𝐽1\boldsymbol𝑀\boldsymbol𝜔𝐽\boldsymbol𝜔\dot{\boldsymbol{p}}=\boldsymbol{v},\ \dot{\boldsymbol{v}}=m^{-1}f\boldsymbol{% z}_{B}+\boldsymbol{g},\ \dot{\boldsymbol{q}}=\frac{1}{2}\boldsymbol{q}\otimes[% 0\ \boldsymbol{\omega}^{\top}]^{\top},\>\dot{\boldsymbol{\omega}}=J^{-1}(% \boldsymbol{M}-\boldsymbol{\omega}\times J\boldsymbol{\omega}),over˙ start_ARG italic_p end_ARG = italic_v , over˙ start_ARG italic_v end_ARG = italic_m start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_f italic_z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT + italic_g , over˙ start_ARG italic_q end_ARG = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_q ⊗ [ 0 italic_ω start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , over˙ start_ARG italic_ω end_ARG = italic_J start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_M - italic_ω × italic_J italic_ω ) , (1)

where \boldsymbolp3\boldsymbol𝑝superscript3\boldsymbol{p}\in\mathbb{R}^{3}italic_p ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, \boldsymbolv3\boldsymbol𝑣superscript3\boldsymbol{v}\in\mathbb{R}^{3}italic_v ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT stand for the position and velocity of the quadrotor in the inertial frame, \boldsymbolq=[q0,q]𝕊3\boldsymbol𝑞superscriptsubscript𝑞0superscript𝑞toptopsuperscript𝕊3\boldsymbol{q}=[q_{0},q^{\top}]^{\top}\in\mathbb{S}^{3}italic_q = [ italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_q start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_S start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT (where q0subscript𝑞0q_{0}\in\mathbb{R}italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R and q3𝑞superscript3q\in\mathbb{R}^{3}italic_q ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT) is the unit quaternion for rotation from the inertial to body frame, and \boldsymbolω3\boldsymbol𝜔superscript3\boldsymbol{\omega}\in\mathbb{R}^{3}italic_ω ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT is the angular velocity in body frame. The gravitational acceleration is denoted by \boldsymbolg\boldsymbol𝑔\boldsymbol{g}italic_g. The vector \boldsymbolzB\boldsymbolsubscript𝑧𝐵\boldsymbol{z}_{B}italic_z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT is the unit vector aligning with the z𝑧zitalic_z-axis of the body frame. The state \boldsymbolx=[\boldsymbolp\boldsymbolv\boldsymbolq\boldsymbolω]\boldsymbol𝑥superscriptdelimited-[]\boldsymbolsuperscript𝑝top\boldsymbolsuperscript𝑣top\boldsymbolsuperscript𝑞top\boldsymbolsuperscript𝜔toptop\boldsymbol{x}=[\boldsymbol{p}^{\top}\ \boldsymbol{v}^{\top}\ \boldsymbol{q}^{% \top}\ \boldsymbol{\omega}^{\top}]^{\top}italic_x = [ italic_p start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_v start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_q start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_ω start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT follows a discretized version of the dynamics in \eqrefeq: nominal dynamics of quadrotors as \boldsymbolxk+1=f\textnom(\boldsymbolxk,\boldsymboluk)\boldsymbolsubscript𝑥𝑘1subscript𝑓\text𝑛𝑜𝑚\boldsymbolsubscript𝑥𝑘\boldsymbolsubscript𝑢𝑘{\boldsymbol{x}}_{k+1}=f_{\text{nom}}(\boldsymbol{x_{k}},\boldsymbol{u}_{k})italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_n italic_o italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) with control being \boldsymbolu=[f\boldsymbolM]4\boldsymbol𝑢superscriptdelimited-[]𝑓\boldsymbolsuperscript𝑀toptopsuperscript4\boldsymbol{u}=[f\ \boldsymbol{M}^{\top}]^{\top}\in\mathbb{R}^{4}italic_u = [ italic_f italic_M start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT (total thrust f𝑓fitalic_f and moment \boldsymbolM3\boldsymbol𝑀superscript3\boldsymbol{M}\in\mathbb{R}^{3}italic_M ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT).

The model in \eqrefeq: nominal dynamics of quadrotors describes the nominal case with no dynamical uncertainty. In general, uncertainties (e.g., wind or aerodynamic effects) exist in a real system. We consider lumped uncertainties (see (Wu et al., 2022)), denoted by fΔsubscript𝑓Δf_{\Delta}italic_f start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT, in the dynamics to account for the impact to the system, resulting in the real dynamics f\textreal=f\textnom+fΔsubscript𝑓\text𝑟𝑒𝑎𝑙subscript𝑓\text𝑛𝑜𝑚subscript𝑓Δf_{\text{real}}=f_{\text{nom}}+f_{\Delta}italic_f start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_n italic_o italic_m end_POSTSUBSCRIPT + italic_f start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT. In this paper, we will learn the lumped uncertainties as f^Δsubscript^𝑓Δ\hat{f}_{\Delta}over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT. The objective is to ensure that the learned dynamics f\textnom+f^Δsubscript𝑓\text𝑛𝑜𝑚subscript^𝑓Δf_{\text{nom}}+\hat{f}_{\Delta}italic_f start_POSTSUBSCRIPT italic_n italic_o italic_m end_POSTSUBSCRIPT + over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT closely approximate the actual dynamics f\textrealsubscript𝑓\text𝑟𝑒𝑎𝑙f_{\text{real}}italic_f start_POSTSUBSCRIPT italic_r italic_e italic_a italic_l end_POSTSUBSCRIPT, which allows us to use it as a trustworthy model in an MPC formulation. We consider the following nonlinear MPC

{aligned}\boldsymbolu0:N1=&\underset\boldsymbolu0:N1\textargmink=0N1\lVert\boldsymbolxk\boldsymbolx¯k\rVertQ2+\lVert\boldsymboluk\boldsymbolu¯k\rVertR2+\lVert\boldsymbolxN\boldsymbolx¯N\rVertQN2\textsubjectto\boldsymbolxk+1=f\textnom(\boldsymbolxk,\boldsymboluk)+f^Δ(\boldsymbolxk,\boldsymboluk),\boldsymbolx0=\boldsymbolx\textinit,\boldsymbolumin\boldsymbolu\boldsymbolumax,formulae-sequence{aligned}\boldsymbolsuperscriptsubscript𝑢:0𝑁1&\underset\boldsymbolsubscript𝑢:0𝑁1\text𝑎𝑟𝑔𝑚𝑖𝑛superscriptsubscript𝑘0𝑁1\lVert\boldsymbolsubscript𝑥𝑘subscript¯\boldsymbol𝑥𝑘subscriptsuperscript\rVert2𝑄\lVert\boldsymbolsubscript𝑢𝑘subscript¯\boldsymbol𝑢𝑘subscriptsuperscript\rVert2𝑅\lVert\boldsymbolsubscript𝑥𝑁subscript¯\boldsymbol𝑥𝑁subscriptsuperscript\rVert2subscript𝑄𝑁\text𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑡𝑜\boldsymbolsubscript𝑥𝑘1subscript𝑓\text𝑛𝑜𝑚\boldsymbolsubscript𝑥𝑘\boldsymbolsubscript𝑢𝑘subscript^𝑓Δ\boldsymbolsubscript𝑥𝑘\boldsymbolsubscript𝑢𝑘formulae-sequence\boldsymbolsubscript𝑥0\boldsymbolsubscript𝑥\text𝑖𝑛𝑖𝑡\boldsymbolsubscript𝑢\boldsymbol𝑢\boldsymbolsubscript𝑢\aligned\boldsymbol{u}_{0:N-1}^{\star}=\ &\underset{\boldsymbol{u}_{0:N-1}}{% \text{argmin}}\sum_{k=0}^{N-1}\lVert\boldsymbol{x}_{k}-\bar{\boldsymbol{x}}_{k% }\rVert^{2}_{Q}+\lVert\boldsymbol{u}_{k}-\bar{\boldsymbol{u}}_{k}\rVert^{2}_{R% }+\lVert\boldsymbol{x}_{N}-\bar{\boldsymbol{x}}_{N}\rVert^{2}_{Q_{N}}\\ \text{subjectto}\boldsymbol{x}_{k+1}=f_{\text{nom}}(\boldsymbol{x}_{k},% \boldsymbol{u}_{k})+\hat{f}_{\Delta}(\boldsymbol{x}_{k},\boldsymbol{u}_{k}),\ % \boldsymbol{x}_{0}=\boldsymbol{x}_{\text{init}},\\ \boldsymbol{u}_{\min}\leq\boldsymbol{u}\leq\boldsymbol{u}_{\max},italic_u start_POSTSUBSCRIPT 0 : italic_N - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT = & italic_u start_POSTSUBSCRIPT 0 : italic_N - 1 end_POSTSUBSCRIPT italic_a italic_r italic_g italic_m italic_i italic_n ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT - over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_s italic_u italic_b italic_j italic_e italic_c italic_t italic_t italic_o italic_x start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_n italic_o italic_m end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ≤ italic_u ≤ italic_u start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , (2)

where \boldsymbolx¯ksubscript¯\boldsymbol𝑥𝑘\bar{\boldsymbol{x}}_{k}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and \boldsymbolu¯ksubscript¯\boldsymbol𝑢𝑘\bar{\boldsymbol{u}}_{k}over¯ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denote the reference state and control, Q𝑄Qitalic_Q and R𝑅Ritalic_R are the penalty matrices for deviating from the references, and \boldsymbolumin\boldsymbolsubscript𝑢\boldsymbol{u}_{\min}italic_u start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT and \boldsymbolumax\boldsymbolsubscript𝑢\boldsymbol{u}_{\max}italic_u start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT represent the limits on the control actions.

3 Method

3.1 Dataset

Consider a set of N𝑁Nitalic_N tasks, 𝒯={Tk}k=1:N𝒯subscriptsubscript𝑇𝑘:𝑘1𝑁\mathcal{T}=\{T_{k}\}_{k=1:N}caligraphic_T = { italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 : italic_N end_POSTSUBSCRIPT. We are given their corresponding datasets, 𝒟={DTk}k=1:N𝒟subscriptsuperscript𝐷subscript𝑇𝑘:𝑘1𝑁\mathcal{D}=\{D^{T_{k}}\}_{k=1:N}caligraphic_D = { italic_D start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k = 1 : italic_N end_POSTSUBSCRIPT, where DTk={(x,y)}Tksuperscript𝐷subscript𝑇𝑘superscript𝑥𝑦subscript𝑇𝑘D^{T_{k}}=\{(x,y)\}^{T_{k}}italic_D start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT = { ( italic_x , italic_y ) } start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT consists of task-specific identically independently distributed input-output pairs. The joint distribution of the input-output pairs DTksuperscript𝐷subscript𝑇𝑘D^{T_{k}}italic_D start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is PTk(x,y)superscript𝑃subscript𝑇𝑘𝑥𝑦P^{T_{k}}(x,y)italic_P start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x , italic_y ). The task-specific batch data (of size n𝑛nitalic_n) is DnTksuperscriptsubscript𝐷𝑛subscript𝑇𝑘D_{n}^{T_{k}}italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, which is uniformly sampled from DTksuperscript𝐷subscript𝑇𝑘D^{T_{k}}italic_D start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, denoted as DnTkDTksimilar-tosuperscriptsubscript𝐷𝑛subscript𝑇𝑘superscript𝐷subscript𝑇𝑘D_{n}^{T_{k}}\sim D^{T_{k}}italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∼ italic_D start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, and its empirical distribution is PnTk(x,y)superscriptsubscript𝑃𝑛subscript𝑇𝑘𝑥𝑦P_{n}^{T_{k}}(x,y)italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x , italic_y ).

3.2 Prototype-Decoder-Based Meta-Learning

In our approach, we decompose the learned residual dynamics into the following form:

y=f^Δ(\boldsymbolx,\boldsymbolu)=wϕθ(x),𝑦subscript^𝑓Δ\boldsymbol𝑥\boldsymbol𝑢𝑤subscriptitalic-ϕ𝜃𝑥y=\hat{f}_{\Delta}(\boldsymbol{x},\boldsymbol{u})=w\phi_{\theta}(x),italic_y = over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ( italic_x , italic_u ) = italic_w italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) , (3)

where x=concat[\boldsymbolx,\boldsymbolu]𝑥𝑐𝑜𝑛𝑐𝑎𝑡\boldsymbol𝑥\boldsymbol𝑢x=concat[\boldsymbol{x},\boldsymbol{u}]italic_x = italic_c italic_o italic_n italic_c italic_a italic_t [ italic_x , italic_u ] represents the concatenated state and control input vectors, ϕθsubscriptitalic-ϕ𝜃\phi_{\theta}italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is an encoder, and w𝑤witalic_w is a linear decoder. Here, the ϕθsubscriptitalic-ϕ𝜃\phi_{\theta}italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT: 17psuperscript17superscript𝑝\mathbb{R}^{17}\rightarrow\mathbb{R}^{p}blackboard_R start_POSTSUPERSCRIPT 17 end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT is a DNN parameterized by θ𝜃\thetaitalic_θ that encodes input data into a feature space in psuperscript𝑝\mathbb{R}^{p}blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT. The decoder w𝑤witalic_w is a matrix with appropriate dimension and w𝒲={w:w2=σmax(w)<w0}𝑤𝒲conditional-set𝑤subscriptnorm𝑤2subscript𝜎𝑤subscript𝑤0w\in\mathcal{W}=\{w:\|w\|_{2}=\sigma_{\max}(w)<w_{0}\}italic_w ∈ caligraphic_W = { italic_w : ∥ italic_w ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_σ start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ( italic_w ) < italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT }. The decoder maps the encoded features to the output as residuals in the dynamics.

We use the encoder-decoder as shown in \eqrefeq:residual_dyn to capture the residual dynamics when a quadrotor conducts structurally similar yet appearingly different tasks, such as flying in side-wind of different speeds. However, for an encoder-decoder pair with fixed parameters to adapt to different tasks, significant modifications or separate models may be required. To tackle this multi-task scenario, we introduce the EPD model that comprises a task-agnostic encoder ϕθsubscriptitalic-ϕ𝜃\phi_{\theta}italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and a set of task-specific prototype encoders 𝐖={𝐰k}k=1:N𝐖subscriptsubscript𝐰𝑘:𝑘1𝑁\mathbf{W}=\{\mathbf{w}_{k}\}_{k=1:N}bold_W = { bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 : italic_N end_POSTSUBSCRIPT. (Note that we use the bold font 𝐰𝐰\mathbf{w}bold_w to denote the prototype decoder, which should be distinguished from an arbitrary decoder denoted by w𝑤witalic_w.) On the one hand, the encoder ϕθsubscriptitalic-ϕ𝜃\phi_{\theta}italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is trained to be task-agnostic in the sense that it captures the essential characteristics of all task datasets and allows for fast adjustments of the decoder. On the other hand, each prototype decoder 𝐰ksubscript𝐰𝑘\mathbf{w}_{k}bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT takes the encoded features and outputs precise task-relevant residuals, which essentially fine-tunes the EPD model to operate on the given task Tksubscript𝑇𝑘T_{k}italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. As key components in our method, prototype decoders are used as a “basis” to span a subspace in the task space, which enables 1) offline inter-task regularization and 2) online inter-task interpolation.

3.3 Prototype Decoder

In this subsection, we formally define and derive a prototype decoder. In brief, given an encoder ϕθsubscriptitalic-ϕ𝜃\phi_{\theta}italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, a prototype decoder is the most representative of the given task data in some set of decoders. The representativeness of an encoder-decoder pair (ϕθ,w)subscriptitalic-ϕ𝜃𝑤(\phi_{\theta},w)( italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , italic_w ) for a task Tksubscript𝑇𝑘T_{k}italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is measured by its empirical risk on task Tksubscript𝑇𝑘T_{k}italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT’s batch data:

nTk(w,ϕθ)=1n(xi,yi)DnTkyiy^i2,subscriptsuperscriptsubscript𝑇𝑘𝑛𝑤subscriptitalic-ϕ𝜃1𝑛subscriptsubscript𝑥𝑖subscript𝑦𝑖superscriptsubscript𝐷𝑛subscript𝑇𝑘superscriptnormsubscript𝑦𝑖subscript^𝑦𝑖2\mathcal{R}^{T_{k}}_{n}(w,\phi_{\theta})=\frac{1}{n}\sum_{(x_{i},y_{i})\in D_{% n}^{T_{k}}}\|y_{i}-\hat{y}_{i}\|^{2},caligraphic_R start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (4)

where y^i=wϕθ(xi)subscript^𝑦𝑖𝑤subscriptitalic-ϕ𝜃subscript𝑥𝑖\hat{y}_{i}=w\phi_{\theta}(x_{i})over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_w italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), and DnTksuperscriptsubscript𝐷𝑛subscript𝑇𝑘D_{n}^{T_{k}}italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is sampled from DTksuperscript𝐷subscript𝑇𝑘D^{T_{k}}italic_D start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. To ensure that the pair (ϕθ,w)subscriptitalic-ϕ𝜃𝑤(\phi_{\theta},w)( italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , italic_w ) captures the overall data patterns effectively, the empirical risk must be bounded in a predefined threshold. We define this property as the achievability condition as follows:

Definition 3.1.

(Achievability) For a task Tk𝒯subscriptTk𝒯T_{k}\in\mathcal{T}italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ caligraphic_T, an encoder-decoder pair (ϕθ,w)subscriptϕθw(\phi_{\theta},w)( italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , italic_w ) is achievable with some R0+subscriptR0superscriptR_{0}\in\mathbb{R}^{+}italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT if:

limnnTk(w,ϕθ)=limn𝔼DnTkDTk[(w,ϕθ)]=limn(w,ϕθ)𝑑PnTk(x,y)R0.subscript𝑛superscriptsubscript𝑛subscript𝑇𝑘𝑤subscriptitalic-ϕ𝜃subscript𝑛subscript𝔼similar-tosuperscriptsubscript𝐷𝑛subscript𝑇𝑘superscript𝐷subscript𝑇𝑘delimited-[]𝑤subscriptitalic-ϕ𝜃subscript𝑛𝑤subscriptitalic-ϕ𝜃differential-dsuperscriptsubscript𝑃𝑛subscript𝑇𝑘𝑥𝑦subscript𝑅0\lim_{n\to\infty}\mathcal{R}_{n}^{T_{k}}(w,\phi_{\theta})=\lim_{n\to\infty}% \mathbb{E}_{D_{n}^{T_{k}}\sim D^{T_{k}}}\big{[}\mathcal{R}(w,\phi_{\theta})% \big{]}=\lim_{n\to\infty}\int\mathcal{R}(w,\phi_{\theta})dP_{n}^{T_{k}}(x,y)% \leq R_{0}.roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_w , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) = roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∼ italic_D start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ caligraphic_R ( italic_w , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) ] = roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT ∫ caligraphic_R ( italic_w , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) italic_d italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_x , italic_y ) ≤ italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT . (5)

The achievability condition essentially imposes an upper bound on the expected risk to ensure that an encoder-decoder pair has a bounded error over the entire task dataset. One can pretrain the encoder by the alternating minimization method (minimization of the empirical risk by alternating between ϕθsubscriptitalic-ϕ𝜃\phi_{\theta}italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and w𝑤witalic_w. More discussions are given in Remark 3.3) to satisfy the achievability condition. We summarize the pretraining procedure in Algorithm 1 in the Appendix. The pretraining step is critical in the sense that the model can learn from data in a “lossy” way while staying anchored to the core features.

Given an encoder ϕθsubscriptitalic-ϕ𝜃\phi_{\theta}italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, we define the set of decoders satisfying \eqrefeq: achievability as a task-achievable decoder set 𝒜ϕθ(Tk)={w𝒲:limnnTk(w,ϕθ)<R0}subscript𝒜subscriptitalic-ϕ𝜃subscript𝑇𝑘conditional-set𝑤𝒲subscript𝑛superscriptsubscript𝑛subscript𝑇𝑘𝑤subscriptitalic-ϕ𝜃subscript𝑅0\mathcal{A}_{\phi_{\theta}}(T_{k})=\big{\{}w\in\mathcal{W}:\lim_{n\to\infty}% \mathcal{R}_{n}^{T_{k}}(w,\phi_{\theta})<R_{0}\big{\}}caligraphic_A start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = { italic_w ∈ caligraphic_W : roman_lim start_POSTSUBSCRIPT italic_n → ∞ end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_w , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) < italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT }. The task-achievable decoder set 𝒜ϕθ(Tk)subscript𝒜subscriptitalic-ϕ𝜃subscript𝑇𝑘\mathcal{A}_{\phi_{\theta}}(T_{k})caligraphic_A start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) consequently specifies a task-specific achievable region in 𝒲𝒲\mathcal{W}caligraphic_W. We are now ready to introduce a novel component called the prototype decoder, which is a representative in 𝒜ϕθ(Tk)subscript𝒜subscriptitalic-ϕ𝜃subscript𝑇𝑘\mathcal{A}_{\phi_{\theta}}(T_{k})caligraphic_A start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). The prototype decoder is a critical part of our model, aimed at effectively capturing the individual characteristics of each task:

Definition 3.2.

(Prototype Decoder) For a given task Tk𝒯superscriptTk𝒯T^{k}\in\mathcal{T}italic_T start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ caligraphic_T, the prototype decoder, denoted by 𝐰ksubscript𝐰k\mathbf{w}_{k}bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, achieves the minimal empirical risk over the achievable set: 𝐰k=\operatornameargminw𝒜ϕθ(Tk)nTk(w,ϕθ)subscript𝐰k\operatornameargmisubscriptnwsubscript𝒜subscriptϕθsubscriptTksuperscriptsubscriptnsubscriptTkwsubscriptϕθ\mathbf{w}_{k}=\operatorname{argmin}_{w\in\mathcal{A}_{\phi_{\theta}}(T_{k})}% \mathcal{R}_{n}^{T_{k}}(w,\phi_{\theta})bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_a italic_r italic_g italic_m italic_i italic_n start_POSTSUBSCRIPT italic_w ∈ caligraphic_A start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT caligraphic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_w , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ).

The prototype decoder captures the central characteristics of its corresponding task to achieve minimal risk among all the achievable decoders. This choice aligns with the principle of risk minimization, focusing on achieving the most efficient and effective learning outcome for each task. In practice, the prototype decoder can be computed empirically via

𝐰k,emp=argminw𝒲¯nTk(w,ϕθ),subscript𝐰𝑘𝑒𝑚𝑝subscriptargmin𝑤¯𝒲subscriptsuperscriptsubscript𝑇𝑘𝑛𝑤subscriptitalic-ϕ𝜃\displaystyle\mathbf{w}_{k,emp}=\operatorname{argmin}_{w\in\bar{\mathcal{W}}}% \mathcal{R}^{T_{k}}_{n}(w,\phi_{\theta}),bold_w start_POSTSUBSCRIPT italic_k , italic_e italic_m italic_p end_POSTSUBSCRIPT = roman_argmin start_POSTSUBSCRIPT italic_w ∈ over¯ start_ARG caligraphic_W end_ARG end_POSTSUBSCRIPT caligraphic_R start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) , (6)

where 𝒲¯¯𝒲\bar{\mathcal{W}}over¯ start_ARG caligraphic_W end_ARG is finite set of achievable decoders. This empirical computation results in a geometric interpretation of the role of the prototype decoder: it is the geometric center of the achievable decoders under the “distance” defined by the risk, which is a concept that closely relates to Prototypical Networks (Snell et al., 2017) for few-shots classification. Similarly the prototype decoder acts as a representative of the associated task in our EPD model framework.

Remark 3.3.

In Rate-Distortion Theory, the definition of empirical risk in \eqrefeq:emp_risk is in fact a distortion measure between sequences (Cover, 1999). In our formulation, an achievable decoder set 𝒜ϕθ(Tk)subscript𝒜subscriptitalic-ϕ𝜃subscript𝑇𝑘\mathcal{A}_{\phi_{\theta}}(T_{k})caligraphic_A start_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) with an encoder ϕθsubscriptitalic-ϕ𝜃\phi_{\theta}italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT specifies a rate-distortion region for a given task Tksubscript𝑇𝑘T_{k}italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Moreover, the encoder-prototype-decoder pair  \eqrefproto is the rate-distortion function that achieves the infimum rate for a given distortion threshold R0subscript𝑅0R_{0}italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. The Blahut-Arimoto algorithm (Arimoto, 1972) was proposed for calculating the rate-distortion function, which is an alternating minimization procedure. This algorithm can be specialized in our setting to pretrain the model to ensure achievability by alternating between encoder and decoder to minimize the empirical risk. In addition, such an achievability constraint in effect imposes an information bottleneck (Tishby et al., 2000) to balance the compression-representation trade-off.

With the prototype decoder effectively capturing task-specific characteristics, we next introduce a Prototype-Decoder Based Meta-Update method to fine-tune the encoder. This approach prevents overfitting on the training tasks, ensuring that the encoder remains general enough for diverse tasks while preserving the EPD model’s ability to adapt effectively to specific tasks online.

Refer to caption
Figure 2: Illustration of the statistical model of task distribution.

3.4 Encoder Meta-Update based on Prototype Decoder

The prototype decoder is a local definition that only represents its corresponding task. The global relationships among prototypes are embedded within the encoder in a black-box manner, which determines our ability to understand the underlying task similarities and leverage them for task generalization. To explore the global relationships among the prototypes, we introduce an N𝑁Nitalic_N-dimensional statistical model with the prototype set as a basis in the “task” distribution space (see Figure  2):

𝒮𝐖(𝐚)={i=1Nai𝐰ii=1Nai=1\textandai0},subscript𝒮𝐖𝐚conditional-setsuperscriptsubscript𝑖1𝑁subscript𝑎𝑖subscript𝐰𝑖superscriptsubscript𝑖1𝑁subscript𝑎𝑖1\text𝑎𝑛𝑑subscript𝑎𝑖0\mathcal{S}_{\mathbf{W}}(\mathbf{a})=\{\sum_{i=1}^{N}a_{i}\mathbf{w}_{i}\mid% \sum_{i=1}^{N}a_{i}=1\ \text{and}\ a_{i}\geq 0\},caligraphic_S start_POSTSUBSCRIPT bold_W end_POSTSUBSCRIPT ( bold_a ) = { ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∣ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 italic_a italic_n italic_d italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 0 } , (7)

where 𝐚=[a1,a2,,aN]T𝐚superscriptsubscript𝑎1subscript𝑎2subscript𝑎𝑁𝑇\mathbf{a}=[a_{1},\ a_{2},\ \ldots,a_{N}]^{T}bold_a = [ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_a start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is the coordinates in the prototype basis, representing the location of a task distribution in this model. With this model structure \eqrefstatistical_model, we introduce a prototype-decoder-based meta-update strategy for jointly training the decoders, focusing on exploring the subspace spanned by the prototype basis. The exploration is achieved by adjusting the learning direction through negative weighting of the risk gradients of other tasks’ prototypes (see Figure 3). For task Tk𝒯subscript𝑇𝑘𝒯T_{k}\in\mathcal{T}italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ caligraphic_T, the one-step meta update is given by:

θθϵ((1β)θnTk(𝐰k,ϕθ)β𝐰𝐖{𝐰k}θnTk(𝐰,ϕθ)),𝜃𝜃italic-ϵ1𝛽subscript𝜃subscriptsuperscriptsubscript𝑇𝑘𝑛subscript𝐰𝑘subscriptitalic-ϕ𝜃𝛽subscriptsuperscript𝐰𝐖subscript𝐰𝑘subscript𝜃subscriptsuperscriptsubscript𝑇𝑘𝑛superscript𝐰subscriptitalic-ϕ𝜃\theta\leftarrow\theta-\epsilon\Big{(}(1-\beta)\nabla_{\theta}\mathcal{R}^{T_{% k}}_{n}(\mathbf{w}_{k},\phi_{\theta})-\beta\sum_{\mathbf{w^{\prime}}\in\mathbf% {W}\setminus\{\mathbf{w}_{k}\}}\nabla_{\theta}\mathcal{R}^{T_{k}}_{n}(\mathbf{% w^{\prime}},\phi_{\theta})\Big{)},italic_θ ← italic_θ - italic_ϵ ( ( 1 - italic_β ) ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_R start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) - italic_β ∑ start_POSTSUBSCRIPT bold_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ bold_W ∖ { bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_R start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) ) , (8)

where β[0,1)𝛽01\beta\in[0,1)italic_β ∈ [ 0 , 1 ) is a trade-off parameter, balancing task-specific learning and inter-task interpolation, and ϵitalic-ϵ\epsilonitalic_ϵ is the learning rate. In the case of β=0𝛽0\beta=0italic_β = 0, the task-specific prototype remains highly representative of its corresponding task, yet this choice restricts the interpolation on the statistical model \eqrefstatistical_model. Increasing the value of β𝛽\betaitalic_β broadens the model’s interpolation and coverage during the learning phase but will degrade the representativeness in the sense of a higher risk for the given task. The selection of β𝛽\betaitalic_β should align with specific performance metrics: a smaller β𝛽\betaitalic_β for concentrated representation to trained tasks and a larger β𝛽\betaitalic_β for better extrapolation to new tasks.

Refer to caption
Figure 3: Illustration of meta update. \textcolorblueBlue indicates the weighted gradients and \textcolorredred indicates the update direction

The meta update is reminiscent of gradient manipulation in the multi-task learning (Maninis et al., 2019; Liu et al., 2021a, b), which aims to balance the learning quality between task-shared and task-specific representations. Note that this balance is explicitly addressed by our EPD structure. Here, the use of adversarial gradient regularization is specifically designed to explore the vicinity of a given task by introducing tendencies towards other tasks.

3.5 Proto-MPC

The EPD model offers an adaptation strategy when used with the MPC to handle uncertainties or disturbances associated with tasks. If privileged information about a task is available online, then Proto-MPC can utilize a task-specific residual dynamics model provided by the prototype decoder. Otherwise, in scenarios where task information is not immediately available, we can use prototype decoders to interpolate online data to infer a residual dynamics model.

With Privileged Task Information: Under this condition, MPC can readily choose which model (i.e., prototype decoder) to use. Formally, we describe the task information to be provided by external modules in terms of Privileged Information denoted as PI as follows 𝐰k=PI(DnT\textquery),subscript𝐰𝑘PIsubscriptsuperscript𝐷subscript𝑇\text𝑞𝑢𝑒𝑟𝑦𝑛\mathbf{w}_{k}=\texttt{PI}(D^{T_{\text{query}}}_{n}),bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = PI ( italic_D start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_q italic_u italic_e italic_r italic_y end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , where DnT\textquery={(x1,y1),,(xn,yn)}T\textquerysubscriptsuperscript𝐷subscript𝑇\text𝑞𝑢𝑒𝑟𝑦𝑛superscriptsubscript𝑥1subscript𝑦1subscript𝑥𝑛subscript𝑦𝑛subscript𝑇\text𝑞𝑢𝑒𝑟𝑦D^{T_{\text{query}}}_{n}=\{(x_{1},y_{1}),...,(x_{n},y_{n})\}^{T_{\text{query}}}italic_D start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_q italic_u italic_e italic_r italic_y end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = { ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) } start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_q italic_u italic_e italic_r italic_y end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is a batch of data from the real-time task T\textquerysubscript𝑇\text𝑞𝑢𝑒𝑟𝑦T_{\text{query}}italic_T start_POSTSUBSCRIPT italic_q italic_u italic_e italic_r italic_y end_POSTSUBSCRIPT. This operation essentially outputs the prototype candidate to be used by the MPC.

Without Privileged Task Information: When task information is not immediately available during operation, the statistical model 𝒮𝐖(𝐚)subscript𝒮𝐖𝐚\mathcal{S}_{\mathbf{W}}(\mathbf{a})caligraphic_S start_POSTSUBSCRIPT bold_W end_POSTSUBSCRIPT ( bold_a ) (with prototype decoders as a basis) enables a more efficient sampling-based real-time adaptation strategy than recursively solving the empirical risk minimization. Intuitively, this strategy sequentially locates the operational task T\textquerysubscript𝑇\text𝑞𝑢𝑒𝑟𝑦T_{\text{query}}italic_T start_POSTSUBSCRIPT italic_q italic_u italic_e italic_r italic_y end_POSTSUBSCRIPT in the (sub)space spanned by the prototype decoders.

Different from the offline learning stage, we shift our focus from exploration to exploitation at the stage of online adaptation. For exploitation, a challenge comes from the center region in 𝒮𝐖(𝐚)subscript𝒮𝐖𝐚\mathcal{S}_{\mathbf{W}}(\mathbf{a})caligraphic_S start_POSTSUBSCRIPT bold_W end_POSTSUBSCRIPT ( bold_a ) (see Figure 2) being a low-confidence region which is poorly represented in the training data. In particular, the point 𝐚=[1N1N1N]superscript𝐚delimited-[]1𝑁1𝑁1𝑁\mathbf{a}^{*}=[\frac{1}{N}\ \frac{1}{N}\ ...\ \frac{1}{N}]bold_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = [ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG … divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ] at the center of 𝒮𝐖(𝐚)subscript𝒮𝐖𝐚\mathcal{S}_{\mathbf{W}}(\mathbf{a})caligraphic_S start_POSTSUBSCRIPT bold_W end_POSTSUBSCRIPT ( bold_a ) represents the state of highest uncertainty, where each task is equally probable. To address this challenge, we propose a prototype-based coordinates sampling method with an acceptance criterion, which sequentially updates 𝐚𝐚\mathbf{a}bold_a in the high confidence region of 𝒮𝐖(𝐚)subscript𝒮𝐖𝐚\mathcal{S}_{\mathbf{W}}(\mathbf{a})caligraphic_S start_POSTSUBSCRIPT bold_W end_POSTSUBSCRIPT ( bold_a ).

For the prototype coordinate 𝐚𝐚\mathbf{a}bold_a, its k𝑘kitalic_kth element 𝐚ksubscript𝐚𝑘\mathbf{a}_{k}bold_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT has a probabilistic interpretation as the probability of the task T\textquerysubscript𝑇\text𝑞𝑢𝑒𝑟𝑦T_{\text{query}}italic_T start_POSTSUBSCRIPT italic_q italic_u italic_e italic_r italic_y end_POSTSUBSCRIPT being Tksubscript𝑇𝑘T_{k}italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, i.e., 𝐚k=P(T\textquery=Tk)subscript𝐚𝑘𝑃subscript𝑇\text𝑞𝑢𝑒𝑟𝑦subscript𝑇𝑘\mathbf{a}_{k}=P(T_{\text{query}}=T_{k})bold_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_P ( italic_T start_POSTSUBSCRIPT italic_q italic_u italic_e italic_r italic_y end_POSTSUBSCRIPT = italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). Therefore, the coordinate 𝐚𝐚\mathbf{a}bold_a essentially gives the probability distribution of T\textquerysubscript𝑇\text𝑞𝑢𝑒𝑟𝑦T_{\text{query}}italic_T start_POSTSUBSCRIPT italic_q italic_u italic_e italic_r italic_y end_POSTSUBSCRIPT over task set 𝒯𝒯\mathcal{T}caligraphic_T. In practice, given DnT\textquerysubscriptsuperscript𝐷subscript𝑇\text𝑞𝑢𝑒𝑟𝑦𝑛D^{T_{\text{query}}}_{n}italic_D start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_q italic_u italic_e italic_r italic_y end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, we can empirically approximate 𝐚ksubscript𝐚𝑘\mathbf{a}_{k}bold_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT using Boltzmann distribution:

𝐚k=P(T\textquery=Tk)Pemp(T\textquery=Tk)=𝐚emp,k=exp(γnT\textquery(𝐰k,ϕθ))𝐰𝐖exp(γnT\textquery(𝐰,ϕθ)),subscript𝐚𝑘𝑃subscript𝑇\text𝑞𝑢𝑒𝑟𝑦subscript𝑇𝑘subscript𝑃𝑒𝑚𝑝subscript𝑇\text𝑞𝑢𝑒𝑟𝑦subscript𝑇𝑘subscript𝐚𝑒𝑚𝑝𝑘𝛾subscriptsuperscriptsubscript𝑇\text𝑞𝑢𝑒𝑟𝑦𝑛subscript𝐰𝑘subscriptitalic-ϕ𝜃subscriptsuperscript𝐰𝐖𝛾superscriptsubscript𝑛subscript𝑇\text𝑞𝑢𝑒𝑟𝑦superscript𝐰subscriptitalic-ϕ𝜃\mathbf{a}_{k}=P(T_{\text{query}}=T_{k})\approx P_{emp}(T_{\text{query}}=T_{k}% )=\mathbf{a}_{emp,k}=\frac{\exp\big{(}-\gamma\mathcal{R}^{T_{\text{query}}}_{n% }(\mathbf{w}_{k},\phi_{\theta})\big{)}}{\sum_{\mathbf{w^{\prime}}\in\mathbf{W}% }\exp\big{(}-\gamma\mathcal{R}_{n}^{T_{\text{query}}}(\mathbf{w^{\prime}},\phi% _{\theta})\big{)}},bold_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_P ( italic_T start_POSTSUBSCRIPT italic_q italic_u italic_e italic_r italic_y end_POSTSUBSCRIPT = italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ≈ italic_P start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT ( italic_T start_POSTSUBSCRIPT italic_q italic_u italic_e italic_r italic_y end_POSTSUBSCRIPT = italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = bold_a start_POSTSUBSCRIPT italic_e italic_m italic_p , italic_k end_POSTSUBSCRIPT = divide start_ARG roman_exp ( - italic_γ caligraphic_R start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_q italic_u italic_e italic_r italic_y end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT bold_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ bold_W end_POSTSUBSCRIPT roman_exp ( - italic_γ caligraphic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_q italic_u italic_e italic_r italic_y end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( bold_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) ) end_ARG , (9)

where γ>0𝛾0\gamma>0italic_γ > 0 is a scaling parameter that controls the weighting to the risk (i.e., a lower value of γ𝛾\gammaitalic_γ tends to “flatten out” Pempsubscript𝑃𝑒𝑚𝑝P_{emp}italic_P start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT). To keep 𝐚empsubscript𝐚𝑒𝑚𝑝\mathbf{a}_{emp}bold_a start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT away from the highest uncertain point 𝐚superscript𝐚\mathbf{a}^{*}bold_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, we define an acceptance criterion using Kullback–Leibler divergence with a predefined acceptance threshold D0subscript𝐷0D_{0}italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, i.e., if the following inequality holds DKL(𝐚emp𝐚)>D0subscript𝐷𝐾𝐿conditionalsubscript𝐚𝑒𝑚𝑝superscript𝐚subscript𝐷0D_{KL}(\mathbf{a}_{emp}\|\mathbf{a}^{*})>D_{0}italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( bold_a start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT ∥ bold_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) > italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, then 𝐚empsubscript𝐚𝑒𝑚𝑝\mathbf{a}_{emp}bold_a start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT is considered as bounded away from 𝐚superscript𝐚\mathbf{a}^{*}bold_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and will be accepted. In the inference stage, 𝐚empsubscript𝐚𝑒𝑚𝑝\mathbf{a}_{emp}bold_a start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT can be recursively computed using a moving horizon data buffer to sequentially update the decoder weights online while the acceptance criterion ensures that 𝐚empsubscript𝐚𝑒𝑚𝑝\mathbf{a}_{emp}bold_a start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT stays away from the central low-confidence region. We summarize the adaptation scheme of Proto-MPC in Algorithm 2. The block diagram of Proto-MPC for controlling the quadrotor is illustrated in Fig. 1d.

4 Experiments

In this section, we evaluate our method in simulation. We use the RotorPy simulator (Folk et al., 2023), a multirotor simulation environment with aerodynamic wrenches, to collect data for training the EPD model and test the Proto-MPC.

Experimental Setup: The learning task set is designed for constant side wind in the x𝑥xitalic_x-direction at speeds of 2, 4, and 6 m/s. In this scenario, the lumped forces dominate the residual dynamics fΔsubscript𝑓Δf_{\Delta}italic_f start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT. Therefore, only the lumped forces are considered in the learned residual dynamics of this experimental setup. See the Appendix for details on the MPC implementation, data collection, and training results of the EPD model.

Experimental Results: To evaluate our method, we compare it with 1) nonlinear MPC with nominal model f\textnomsubscript𝑓\text𝑛𝑜𝑚f_{\text{nom}}italic_f start_POSTSUBSCRIPT italic_n italic_o italic_m end_POSTSUBSCRIPT, 2) KNODE-MPC-Online  (Jiahao et al., 2023), and 3) MPC with task-specific DNN residual model (fθTksubscriptsuperscript𝑓subscript𝑇𝑘𝜃f^{T_{k}}_{\theta}italic_f start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is a DNN trained using the Tksubscript𝑇𝑘T_{k}italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT-specific dataset). In other words, for each task, a DNN residual model is trained and used for deployment on the given task. On the contrary, for all the testing trials we conduct in this subsection, the prototype-based meta-model is kept fixed so that the adaptation is only based on prototype decoders corresponding to constant side wind with speeds of 2, 4, and 6 m/s. We evaluate our method under static and dynamic wind scenarios. For the former, we command the quadrotors to track the training trajectory under constant side wind of various speeds. For the latter, we command the quadrotors to track different testing trajectories under spatially dependent winds (0–10 m/s along the x𝑥xitalic_x-direction; see the illustration in Figure 4).

Refer to caption
Figure 4: Spatially varying wind distribution.

Constant Side Wind: Table 1 presents a comparison of tracking RMSE for nominal MPC, task-specific DNN-MPC, KNODE-MPC-Online, and Proto-MPC under constant side wind conditions with speeds ranging from 0 to 10 m/s. We followed the implementation of KNODE-MPC-Online as described in (Jiahao et al., 2023) for handling sudden mass changes online but adapted it for our side wind setup. Empirically, we found that the original implementation suffers from instability issues with the online learned model in our experimental setup. To address this, we applied spectral normalization to control the Lipschitz constant of the online-learned KNODE model, thereby improving its closed-loop stability.

The result shows a substantial reduction in RMSE for all task-specific DNN-MPC, KNODE-MPC-Online and Proto-MPC compared to the baseline MPC. Note that the task-specific DNN-MPC is expected to exhibit superior tracking performance, as the DNN is specifically trained for each task’s wind condition. Both KNODE-MPC-Online and Proto-MPC consistently halve the RMSE relative to the nominal MPC across all test wind speeds. However, Proto-MPC requires less online computation compared to KNODE-MPC-Online, as it only updates the decoders instead of training the whole model online. This comparison demonstrates not only a significant improvement over the baseline MPC but also demonstrates Proto-MPC’s robust generalization capabilities on tasks unseen during training, with significantly lower computational demands.

Table 1: Tracking RMSE on the training trajectory (shown in Figure 6) under constant side winds of different speeds. The bold font for 2, 4, and 6 m/s cases indicate the wind speeds for the training tasks.
RMSE[m] axis 0 m/s 2 m/s 4 m/s 6 m/s 8 m/s 10 m/s
nominal-MPC x𝑥xitalic_x 0.10 0.15 0.24 0.36 0.48 0.63
y𝑦yitalic_y 0.07 0.07 0.08 0.08 0.09 0.10
z𝑧zitalic_z 0.03 0.03 0.05 0.08 0.12 0.16
Task-DNN-MPC x𝑥xitalic_x - 0.08 0.09 0.11 0.12 0.15
y𝑦yitalic_y - 0.06 0.05 0.06 0.05 0.05
z𝑧zitalic_z - 0.03 0.03 0.03 0.04 0.04
KNODE-MPC-Online (with Spectral Normalization) x𝑥xitalic_x 0.08 0.09 0.11 0.18 0.26 0.31
y𝑦yitalic_y 0.05 0.07 0.07 0.13 0.16 0.11
z𝑧zitalic_z 0.06 0.05 0.08 0.17 0.18 0.22
Proto-MPC (with PI) x𝑥xitalic_x 0.09 0.07 0.09 0.11 0.17 0.30
y𝑦yitalic_y 0.04 0.04 0.05 0.05 0.05 0.05
z𝑧zitalic_z 0.03 0.02 0.03 0.03 0.03 0.04
Proto-MPC (without PI) x𝑥xitalic_x 0.10 0.07 0.13 0.17 0.24 0.32
y𝑦yitalic_y 0.04 0.04 0.04 0.05 0.05 0.05
z𝑧zitalic_z 0.03 0.02 0.02 0.03 0.03 0.04
\floatconts

fig: testing results \subfigure[][c] Refer to caption \subfigure[][c] Refer to caption \subfigure[][c] Refer to caption \subfigure[][c] Refer to caption \subfigure[][c] Refer to caption \subfigure[][c] Refer to caption \subfigure[][c] Refer to caption \subfigure[][c] Refer to caption \subfigure[][c] Refer to caption

Figure 5: Tracking performance subject to spatially varying winds on different trajectories. The first row (5, 5, 5) shows the tracking performance of MPC with the nominal model, the second row (5, 5, 5) shows the tracking performance of KNODE-MPC-Online (with spectral normalization) and the third row (5, 5, 5) shows the tracking performance of Proto-MPC. The colorbar highlights the deviation from the reference trajectory

Spatially Varying Wind: Under this condition, to test Proto-MPC’s task-adaptation capacity, the quadrotor is subject to a varying-speed wind in the x𝑥xitalic_x-direction from 0 to 10 m/s. We compare it with nominal-MPC and KNODE-MPC-Online (with spectral normalization) on various trajectories. Figure 5 shows the tracking performance with the colorbar highlighting the deviation from the reference trajectory. Table 2 shows the RMSE of the three methods on the testing trajectories (the associated box plot is attached in the Appendix, see Fig. 9.). Compared with nominal MPC and KNODE-MPC-online, the Proto-MPC achieves the best trajectory tracking under drastically changing wind conditions with significantly less online computation.

Table 2: Tracking RMSE on the testing trajectories (shown in Figure 5) under spatially varying wind.
RMSE[m] axis trajectory 1 trajectory 2 trajectory 3
nominal-MPC x𝑥xitalic_x 0.25 0.31 0.35
y𝑦yitalic_y 0.05 0.06 0.11
z𝑧zitalic_z 0.06 0.06 0.09
KNODE-MPC-Online (with Spectral Normalization) x𝑥xitalic_x 0.15 0.17 0.22
y𝑦yitalic_y 0.06 0.05 0.09
z𝑧zitalic_z 0.09 0.08 0.06
Proto-MPC (without PI) x𝑥xitalic_x 0.12 0.15 0.18
y𝑦yitalic_y 0.03 0.03 0.12
z𝑧zitalic_z 0.02 0.02 0.08

5 Conclusion

This paper proposes a novel EPD model designed to capture shared and distinctive features across various training tasks. The EPD model consists of a universal task-agnostic DNN encoder and a set of task-specific linear prototype decoders to balance task-shared and task-specific representations. In the online setting, the encoder processes incoming data into features. Simultaneously, the linear prototype decoders are used as a “basis” to interpolate encoded features, which allows fast computation of a new decoder aligned with the current task’s characteristics. We then use the EPD model to capture residual dynamics in our Proto-MPC, which can quickly adapt the model to cope with uncertainties from dynamically evolving task scenarios. We evaluate Proto-MPC’s performance in controlling a quadrotor to track agile trajectories under various static and dynamic side wind conditions, which demonstrates its robust performance compared to nominal MPC and its generalization capacity compared to MPC augmented with task-specific DNN residual models. Future directions include deploying this framework in real-world experiments and investigate how the geometric properties of prototype decoders help to better understand the underlying relationships between tasks on the manifold.

\acks

This work is supported by NASA under the Cooperative Agreement 80NSSC20M0229 and University Leadership Initiative grant 80NSSC22M0070, NSF-AoF Robust Intelligence award #2133656, NSF SLES #2331878, and DoD HQ00342110002.

References

  • Arimoto (1972) Suguru Arimoto. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Transactions on Information Theory, 18(1):14–20, 1972.
  • Chee et al. (2022) Kong Yao Chee, Tom Z Jiahao, and M Ani Hsieh. KNODE-MPC: A knowledge-based data-driven predictive control framework for aerial robots. IEEE Robotics and Automation Letters, 7(2):2819–2826, 2022.
  • Chen et al. (2018) Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
  • Cover (1999) Thomas M Cover. Elements of information theory. John Wiley & Sons, 1999.
  • Diehl et al. (2006) Moritz Diehl, Hans Georg Bock, Holger Diedam, and P-B Wieber. Fast direct multiple shooting algorithms for optimal robot control. Fast motions in biomechanics and robotics: optimization and feedback control, pages 65–93, 2006.
  • Folk et al. (2023) Spencer Folk, James Paulos, and Vijay Kumar. RotorPy: A python-based multirotor simulator with aerodynamics for education and research. arXiv preprint arXiv:2306.04485, 2023.
  • Hwangbo et al. (2017) Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter. Control of a quadrotor with reinforcement learning. IEEE Robotics and Automation Letters, 2(4):2096–2103, 2017.
  • Jiahao et al. (2023) Tom Z Jiahao, Kong Yao Chee, and M Ani Hsieh. Online dynamics learning for predictive control with an application to aerial robots. In Proceedings of the Conference on Robot Learning, pages 2251–2261. PMLR, 2023.
  • Joshi et al. (2021) Girish Joshi, Jasvir Virdi, and Girish Chowdhary. Asynchronous deep model reference adaptive control. In Proceedings of the Conference on Robot Learning, pages 984–1000. PMLR, 2021.
  • Kabzan et al. (2019) Juraj Kabzan, Lukas Hewing, Alexander Liniger, and Melanie N Zeilinger. Learning-based model predictive control for autonomous racing. IEEE Robotics and Automation Letters, 4(4):3363–3370, 2019.
  • Lambert et al. (2019) Nathan O Lambert, Daniel S Drew, Joseph Yaconelli, Sergey Levine, Roberto Calandra, and Kristofer SJ Pister. Low-level control of a quadrotor with deep model-based reinforcement learning. IEEE Robotics and Automation Letters, 4(4):4224–4230, 2019.
  • Liu et al. (2021a) Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, and Qiang Liu. Conflict-averse gradient descent for multi-task learning. Advances in Neural Information Processing Systems, 34:18878–18890, 2021a.
  • Liu et al. (2021b) Liyang Liu, Yi Li, Zhanghui Kuang, J Xue, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. Towards impartial multi-task learning. In Proceedings of the International Conference on Learning Representations, 2021b.
  • Maninis et al. (2019) Kevis-Kokitsi Maninis, Ilija Radosavovic, and Iasonas Kokkinos. Attentive single-tasking of multiple tasks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1851–1860, 2019.
  • Mellinger and Kumar (2011) Daniel Mellinger and Vijay Kumar. Minimum snap trajectory generation and control for quadrotors. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, pages 2520–2525. IEEE, 2011.
  • O’Connell et al. (2022) Michael O’Connell, Guanya Shi, Xichen Shi, Kamyar Azizzadenesheli, Anima Anandkumar, Yisong Yue, and Soon-Jo Chung. Neural-fly enables rapid learning for agile flight in strong winds. Science Robotics, 7(66):eabm6597, 2022.
  • Richards et al. (2021) SM Richards, N Azizan, J-JE Slotine, and M Pavone. Adaptive-control-oriented meta-learning for nonlinear systems. In Robotics science and systems, 2021.
  • Saviolo et al. (2022) Alessandro Saviolo, Guanrui Li, and Giuseppe Loianno. Physics-inspired temporal learning of quadrotor dynamics for accurate model predictive trajectory tracking. IEEE Robotics and Automation Letters, 7(4):10256–10263, 2022.
  • Saviolo et al. (2023) Alessandro Saviolo, Jonathan Frey, Abhishek Rathod, Moritz Diehl, and Giuseppe Loianno. Active learning of discrete-time dynamics for uncertainty-aware model predictive control. IEEE Transactions on Robotics, 2023.
  • Snell et al. (2017) Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
  • Tishby et al. (2000) Naftali Tishby, Fernando C Pereira, and William Bialek. The information bottleneck method. arXiv preprint physics/0004057, 2000.
  • Torrente et al. (2021) Guillem Torrente, Elia Kaufmann, Philipp Föhn, and Davide Scaramuzza. Data-driven MPC for quadrotors. IEEE Robotics and Automation Letters, 6(2):3769–3776, 2021.
  • Verschueren et al. (2018) Robin Verschueren, Gianluca Frison, Dimitris Kouzoupis, Niels van Duijkeren, Andrea Zanelli, Rien Quirynen, and Moritz Diehl. Towards a modular software package for embedded optimization. IFAC-PapersOnLine, 51(20):374–380, 2018.
  • Wang et al. (2024) Bingheng Wang, Zhengtian Ma, Shupeng Lai, and Lin Zhao. Neural moving horizon estimation for robust flight control. IEEE Transactions on Robotics, 40:639–659, 2024.
  • Williams et al. (2017) Grady Williams, Nolan Wagener, Brian Goldfain, Paul Drews, James M Rehg, Byron Boots, and Evangelos A Theodorou. Information theoretic mpc for model-based reinforcement learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 1714–1721. IEEE, 2017.
  • Wu et al. (2022) Zhuohuan Wu, Sheng Cheng, Kasey A Ackerman, Aditya Gahlawat, Arun Lakshmanan, Pan Zhao, and Naira Hovakimyan. 1subscript1\mathcal{L}_{1}caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT adaptive augmentation for geometric tracking control of quadrotors. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), pages 1329–1336. IEEE, 2022.

6 Appendix

6.1 Related Work

We give a brief overview of learning-based control methods with a focus on their applications to quadrotors.  (Hwangbo et al., 2017) uses model-free reinforcement learning to train an end-to-end neural network-based control policy to stabilize a quadrotor under challenging initial poses (i.e., upside-down). In contrast to end-to-end methods,  (Lambert et al., 2019) learns a deep neural network (DNN) dynamical model and uses model-based reinforcement learning to achieve stable attitude control near the hover state. Within the model predictive control framework, using an accurate data-driven model has been demonstrated to enhance control performance, as shown in previous work (Williams et al., 2017; Kabzan et al., 2019) on racing cars. Similarly,  (Saviolo et al., 2022) designs MPC based on models learned from real-world data using a physics-inspired Temporal Convolutional Network. Alternatively, rather than learning the full dynamics, a series of works employ machine learning methods in the MPC formulation to learn a robust augmented model that consists of both a first-principle nominal model and a data-driven residual dynamical model. For example,  (Torrente et al., 2021) uses the Gaussian Process to account for aerodynamic effects that arise due to the fast ego-motion of the quadrotor.  (Chee et al., 2022) proposes KNODE-MPC, which explicitly incorporates the prior physical knowledge (nominal model) into the learning of the augmented model using NeuralODE (Chen et al., 2018).

Real-time adaptation to uncertainties is critical for robots operating in dynamic and uncertain environments. Following this direction, online (active) learning (Saviolo et al., 2023) and meta-learning (Richards et al., 2021) techniques are increasingly used in model-based control design.  (Jiahao et al., 2023) extends the KNODE-MPC (Chee et al., 2022) to an online setting, which recursively constructs a real-time data-augmented dynamical model during deployment. In addition to retraining or learning a new model, one can fine-tune an offline-trained model using real-time data, such as adapting the weights on the last layer of a DNN-represented parametric uncertainty (Joshi et al., 2021). A closely related work to our Proto-MPC is NeuralFly (O’Connell et al., 2022), which uses a DNN basis function to learn the shared representations of various strong wind conditions. NeuralFly explicitly removes task-specificity from the learned DNN through adversarial learning. Consequently, to ensure a stable update of linear coefficients of the basis functions during operation, a Kalman-filter estimation is required to regulate the covariance of the DNN outputs. While effective, it introduces additional estimation and control gain tuning. On the other hand, Proto-MPC, equipped with an encoder for shared representation and a set of task-specific prototype decoders, not only effectively generalizes across diverse tasks but is also capable of quickly adapting to dynamically changing task conditions.

6.2 Algorithms

\SetKw

KwRequireRequire: \SetKwKwInputInput: \SetKwKwOutputOutput: \KwRequire Risk threshold: R0subscript𝑅0R_{0}italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
\KwInputTraining dataset: 𝒟={DTk}k=1:N𝒟subscriptsuperscript𝐷subscript𝑇𝑘:𝑘1𝑁\mathcal{D}=\{D^{T_{k}}\}_{k=1:N}caligraphic_D = { italic_D start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_k = 1 : italic_N end_POSTSUBSCRIPT
\KwOutputEncoder ϕθsubscriptitalic-ϕ𝜃\phi_{\theta}italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and prototype decoder set 𝐖={𝐰k}k=1:N𝐖subscriptsubscript𝐰𝑘:𝑘1𝑁\mathbf{W}=\{\mathbf{w}_{k}\}_{k=1:N}bold_W = { bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 : italic_N end_POSTSUBSCRIPT\BlankLine

for k=1,2,,N𝑘12𝑁k=1,2,...,Nitalic_k = 1 , 2 , … , italic_N do Random initialize w𝑤witalic_w \triangleright Pretrain to ensure achievability (3.1) 

while Tk(w,ϕθ)>R0superscriptsubscript𝑇𝑘𝑤subscriptitalic-ϕ𝜃subscript𝑅0\mathcal{R}^{T_{k}}(w,\phi_{\theta})>R_{0}caligraphic_R start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_w , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) > italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT do wminw𝒲nTk(w,ϕθ)𝑤subscript𝑤𝒲subscriptsuperscriptsubscript𝑇𝑘𝑛𝑤subscriptitalic-ϕ𝜃w\leftarrow\min_{w\in\mathcal{W}}\mathcal{R}^{T_{k}}_{n}(w,\phi_{\theta})italic_w ← roman_min start_POSTSUBSCRIPT italic_w ∈ caligraphic_W end_POSTSUBSCRIPT caligraphic_R start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT )
θθϵθnTk(w,ϕθ)𝜃𝜃italic-ϵsubscript𝜃subscriptsuperscriptsubscript𝑇𝑘𝑛𝑤subscriptitalic-ϕ𝜃\theta\leftarrow\theta-\epsilon\nabla_{\theta}\mathcal{R}^{T_{k}}_{n}(w,\phi_{% \theta})italic_θ ← italic_θ - italic_ϵ ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_R start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_w , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT )\BlankLine

while not done do

for k=1,2,,N𝑘12𝑁k=1,2,...,Nitalic_k = 1 , 2 , … , italic_N do 𝐰kargminw𝒜Tk(w)nTk(ϕθ,w)subscript𝐰𝑘subscript𝑤superscript𝒜subscript𝑇𝑘𝑤subscriptsuperscriptsubscript𝑇𝑘𝑛subscriptitalic-ϕ𝜃𝑤\mathbf{w}_{k}\leftarrow\arg\min_{w\in\mathcal{A}^{T_{k}}(w)}\mathcal{R}^{T_{k% }}_{n}(\phi_{\theta},w)bold_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ← roman_arg roman_min start_POSTSUBSCRIPT italic_w ∈ caligraphic_A start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_w ) end_POSTSUBSCRIPT caligraphic_R start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , italic_w ) \triangleright Compute prototype decoder \eqrefeq:emp prototype random sample Ti𝒯similar-tosubscript𝑇𝑖𝒯T_{i}\sim\mathcal{T}italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_T
θθϵ((1β)θnTi(𝐰i,ϕθ)β𝐰𝐖{𝐰i}θnTi(𝐰,ϕθ))𝜃𝜃italic-ϵ1𝛽subscript𝜃subscriptsuperscriptsubscript𝑇𝑖𝑛subscript𝐰𝑖subscriptitalic-ϕ𝜃𝛽subscriptsuperscript𝐰𝐖subscript𝐰𝑖subscript𝜃subscriptsuperscriptsubscript𝑇𝑖𝑛superscript𝐰subscriptitalic-ϕ𝜃\theta\leftarrow\theta-\epsilon\Big{(}(1-\beta)\nabla_{\theta}\mathcal{R}^{T_{% i}}_{n}(\mathbf{w}_{i},\phi_{\theta})-\beta\sum_{\mathbf{w^{\prime}}\in\mathbf% {W}\setminus\{\mathbf{w}_{i}\}}\nabla_{\theta}\mathcal{R}^{T_{i}}_{n}(\mathbf{% w^{\prime}},\phi_{\theta})\Big{)}italic_θ ← italic_θ - italic_ϵ ( ( 1 - italic_β ) ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_R start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) - italic_β ∑ start_POSTSUBSCRIPT bold_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ bold_W ∖ { bold_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_R start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( bold_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ) ) \triangleright Meta update \eqrefregularized_meta_update

Algorithm 1 Training loop for prototype-decoder-based meta-leaning. The empirical risk is computed on the batch dataset that is uniformly sampled from the task dataset, i.e. nTk(ϕθ,w)=1n(xi,yi)DnTkyiϕθ(xi)w2subscriptsuperscriptsubscript𝑇𝑘𝑛subscriptitalic-ϕ𝜃𝑤1𝑛subscriptsubscript𝑥𝑖subscript𝑦𝑖superscriptsubscript𝐷𝑛subscript𝑇𝑘superscriptnormsubscript𝑦𝑖subscriptitalic-ϕ𝜃subscript𝑥𝑖𝑤2\mathcal{R}^{T_{k}}_{n}(\phi_{\theta},w)=\frac{1}{n}\sum_{(x_{i},y_{i})\in D_{% n}^{T_{k}}}\|y_{i}-\phi_{\theta}(x_{i})w\|^{2}caligraphic_R start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , italic_w ) = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∥ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) italic_w ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT where DnTkDTksimilar-tosuperscriptsubscript𝐷𝑛subscript𝑇𝑘superscript𝐷subscript𝑇𝑘D_{n}^{T_{k}}\sim D^{T_{k}}italic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∼ italic_D start_POSTSUPERSCRIPT italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT
\SetKw

KwRequire \Kw: moving horizon data buffer 𝒟nsubscript𝒟𝑛\mathcal{D}_{n}caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. The moving horizon data buffer 𝒟n(t)subscript𝒟𝑛𝑡\mathcal{D}_{n}(t)caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) stores a sequence of real-time data of fixed length n𝑛nitalic_n, i.e., 𝒟n(t)={(xi,yi)}i=tntsubscript𝒟𝑛𝑡superscriptsubscriptsubscript𝑥𝑖subscript𝑦𝑖𝑖𝑡𝑛𝑡\mathcal{D}_{n}(t)=\{(x_{i},y_{i})\}_{i=t-n}^{t}caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) = { ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = italic_t - italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. 
\Kw: acceptance criterion D0subscript𝐷0D_{0}italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
randomly initialize 𝐚0,w0subscript𝐚0subscript𝑤0\mathbf{a}_{0},w_{0}bold_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT

for current time t=0,1,𝑡01t=0,1,\dotsitalic_t = 0 , 1 , … do \eIfPrivileged Information is available wtsubscript𝑤𝑡absentw_{t}\leftarrowitalic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← PI(𝒟n(t)subscript𝒟𝑛𝑡\mathcal{D}_{n}(t)caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t )) 𝐚empsubscript𝐚𝑒𝑚𝑝absent\mathbf{a}_{emp}\leftarrowbold_a start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT ← EmpDistribution(𝒟n(t)subscript𝒟𝑛𝑡\mathcal{D}_{n}(t)caligraphic_D start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t )) \triangleright Compute empirical distribution \eqrefeq:empirical a 
\eIfDKL(𝐚emp𝐚)>D0subscript𝐷𝐾𝐿conditionalsubscript𝐚𝑒𝑚𝑝superscript𝐚subscript𝐷0D_{KL}(\mathbf{a}_{emp}\|\mathbf{a}^{*})>D_{0}italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( bold_a start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT ∥ bold_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) > italic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT Accept 𝐚empsubscript𝐚𝑒𝑚𝑝\mathbf{a}_{emp}bold_a start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT and 𝐚t𝐚empsubscript𝐚𝑡subscript𝐚𝑒𝑚𝑝\mathbf{a}_{t}\leftarrow\mathbf{a}_{emp}bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← bold_a start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT \triangleright Acceptance criterion Reject 𝐚empsubscript𝐚𝑒𝑚𝑝\mathbf{a}_{emp}bold_a start_POSTSUBSCRIPT italic_e italic_m italic_p end_POSTSUBSCRIPT and 𝐚t𝐚t1subscript𝐚𝑡subscript𝐚𝑡1\mathbf{a}_{t}\leftarrow\mathbf{a}_{t-1}bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← bold_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT wt𝐚t[1]𝐰1++𝐚t[N]𝐰Nsubscript𝑤𝑡subscript𝐚𝑡delimited-[]1subscript𝐰1subscript𝐚𝑡delimited-[]𝑁subscript𝐰𝑁w_{t}\leftarrow\mathbf{a}_{t}[1]\mathbf{w}_{1}+...+\mathbf{a}_{t}[N]\mathbf{w}% _{N}italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT [ 1 ] bold_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + … + bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT [ italic_N ] bold_w start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT \triangleright Compute decoder using prototypes MPC f^Δ=ϕθ(x)wtabsentsubscript^𝑓Δsubscriptitalic-ϕ𝜃𝑥subscript𝑤𝑡\leftarrow\hat{f}_{\Delta}=\phi_{\theta}(x)w_{t}← over^ start_ARG italic_f end_ARG start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x ) italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT \triangleright MPC with adapted residual model

Algorithm 2 Proto-MPC

6.3 Experimental Setup

Refer to caption
Figure 6: Reference trajectory for training and data collection.

MPC Implementation: in our implementation, we follow the formulation in equation 2 over time horizon T=1s𝑇1𝑠T=1sitalic_T = 1 italic_s with discretization step Δt=T/N=1/20sΔ𝑡𝑇𝑁120𝑠\Delta t=T/N=1/20sroman_Δ italic_t = italic_T / italic_N = 1 / 20 italic_s. We transform the optimal control problem into a nonlinear programming (NLP) via multiple shooting method and solve it using sequential quadratic programming in a real-time iteration scheme (SQP-RTI) (Diehl et al., 2006). The NLP is implemented using acados (Verschueren et al., 2018).

Data Collection: we consider the polynomial trajectory shown in Figure 6 for data collection, which is obtained using the minimum-snap trajectory generation algorithm  (Mellinger and Kumar, 2011). The data is collected by a quadrotor controlled by a nonlinear MPC with the nominal model f\textnomsubscript𝑓\text𝑛𝑜𝑚f_{\text{nom}}italic_f start_POSTSUBSCRIPT italic_n italic_o italic_m end_POSTSUBSCRIPT. The learning task set is designed for constant side wind in the x𝑥xitalic_x-direction at speeds of 2, 4, and 6 m/s. For each wind condition, we collected 50 seconds of data for training the EPD model.

Training EPD model: in this experimental setup, the EPD model takes states and controls [𝐱,𝐮]17𝐱𝐮superscript17[\mathbf{x},\mathbf{u}]\in\mathbb{R}^{17}[ bold_x , bold_u ] ∈ blackboard_R start_POSTSUPERSCRIPT 17 end_POSTSUPERSCRIPT as its input and outputs the residual lumped forces Δf3Δ𝑓superscript3\Delta f\in\mathbb{R}^{3}roman_Δ italic_f ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. The encoder is a deep neural network of size [17,64,64,50,4]176464504[17,64,64,50,4][ 17 , 64 , 64 , 50 , 4 ] and the linear decoder is matrix w4×3𝑤superscript43w\in\mathbb{R}^{4\times 3}italic_w ∈ blackboard_R start_POSTSUPERSCRIPT 4 × 3 end_POSTSUPERSCRIPT with σ(w)<3.0𝜎𝑤3.0\sigma(w)<3.0italic_σ ( italic_w ) < 3.0. We follow the Algorithm 1 for training the EPD model.

6.4 Learning Results

Figure 7 shows the task-specific batch loss curve during training. The gradual reduction of the loss indicates that the decoders capture the essential features of their corresponding tasks, while the stable variance band implies a lossy representation of the encoder in a controllable manner, which leaves room for adaptation online. Figure 7 validates the learned network’s inference capability on the training task. The impact of the trade-off parameter β𝛽\betaitalic_β to inter-task regularization is discussed in the Appendix.

\floatconts

fig: learning results \subfigure[][c] Refer to caption \subfigure[][c] Refer to caption

Figure 7: Results for training the EPD model. (a) Ground truth vs. predicted forces on a validation dataset. Note the scale differences. (b) Smoothed batch loss curve with task highlighted.

6.5 Impact of the trade-off parameter β𝛽\betaitalic_β to inter-task regularization

Figure 8 illustrates the role of the trade-off parameter β𝛽\betaitalic_β. The progression across the plots suggests that as β𝛽\betaitalic_β increases, the model transitions from task-specific learning to a more regularized task learning. As β=0𝛽0\beta=0italic_β = 0, no inter-task regularization is performed, and risks of different tasks show distinctive patterns. We highlight the case when β=0.4𝛽0.4\beta=0.4italic_β = 0.4: the model strikes a desirable balance between different task representations and uniform across-task regularization. The uniformity of the clusters suggests that the encoder is trained to capture the inherent patterns of the residual dynamics. Note that the systematic variation in the x𝑥xitalic_x-direction of the clusters is aligned with the prior physical knowledge of side winds in the x𝑥xitalic_x-direction of different intensities.

Refer to caption
Figure 8: Normalized error distribution with task highlighted: β=𝛽absent\beta=italic_β = 0, 0.2, 0.4, 0.6. For better visualization, we only show the task-specific distribution of the normalized risk for the x𝑥xitalic_x and y𝑦yitalic_y components.

6.6 Error distribution for tracking performance subject to spatially varying wind

Figure 9 shows the tracking error distribution under the spatially varying wind, supplementing the RMSE results in Table 2. The box shows the interquartile range of errors from 25th to 75th percentiles. Compared to nominal MPC and KNODE-MPC-online, Proto-MPC not only demonstrates reduced mean tracking errors in all components but also shows a more concentrated error distribution, indicating its consistent tracking performance.

\floatconts

fig:testing results \subfigure[][c] Refer to caption\subfigure[][c] Refer to caption\subfigure[][c] Refer to caption

Figure 9: Error[m] distribution on testing trajectories associated with Fig. 5 subject to spatially varying wind.