Learning to Boost the Performance
of Stable Nonlinear Systems
Abstract
The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning algorithms. However, maintaining closed-loop stability while boosting the performance of nonlinear control systems using data-driven and deep-learning approaches stands as an important unsolved challenge. In this paper, we tackle the performance-boosting problem with closed-loop stability guarantees. Specifically, we establish a synergy between the Internal Model Control (IMC) principle for nonlinear systems and state-of-the-art unconstrained optimization approaches for learning stable dynamics. Our methods enable learning over arbitrarily deep neural network classes of performance-boosting controllers for stable nonlinear systems; crucially, we guarantee closed-loop stability even if optimization is halted prematurely, and even when the ground-truth dynamics are unknown, with vanishing conservatism in the class of stabilizing policies as the model uncertainty is reduced to zero. We discuss the implementation details of the proposed control schemes, including distributed ones, along with the corresponding optimization procedures, demonstrating the potential of freely shaping the cost functions through several numerical experiments.
Index Terms:
Optimal control, Closed-loop stability, Learning for control, Internal model control, Uncertain systems, Distributed controlI Introduction
The success of control systems across a broad spectrum of applications — from manufacturing to water, power, and transportation networks [1] — is rooted not only in advancements in sensing, computation, and communication but also in the growing availability of methods for designing model-based controllers capable of stabilizing nonlinear systems at nominal operating conditions.
However, in many applications, merely stabilizing the closed-loop system is not sufficient; achieving satisfactory performance is also crucial, often necessitating the integration of additional control loops. In Nonlinear Optimal Control (NOC), performance requirements are typically encoded in the shape of the cost function that the control policy strives to minimize. Consequently, it is beneficial to develop NOC algorithms that accommodate general nonlinear costs to enable sophisticated closed-loop behaviors, such as collision avoidance or waypoint tracking in swarms of robots.
In this paper, we tackle the following performance-boosting problem: given a discrete-time nonlinear system that is stable or has been pre-stabilized using a base controller, how can we enhance its performance during the transient — that is, before the system settles into a steady state — by employing general cost functions without compromising stability?
A first approach to designing performance-boosting regulators involves resorting to NOC methods with stability guarantees. Despite extensive research in this area [2], the NOC problem is fully understood only when the system dynamics are linear and the cost admits a convex reformulation. For nonlinear systems, traditional methods for addressing NOC include dynamic programming and the maximum principle [3, 4]. However, the computation of NOC policies through these methods often faces significant computational challenges [4]. Furthermore, to ensure stability, stringent limitations must be imposed on the class of costs that can be utilized. An alternative approach to tackling performance-boosting is offered by receding-horizon control schemes, such as Nonlinear Model Predictive Control (NMPC) [5]. These controllers are based on real-time optimization; a finite-horizon NOC problem is solved at each time instant to determine the control input. However, a significant limitation of NMPC is that the control policy can seldom be precomputed and stored in an explicit form, which makes NMPC inapplicable when the control platform lacks the computational resources necessary to solve mathematical programs in real-time. Moreover, similar to NOC, ensuring stability requires imposing strong limitations on the class of admissible cost functions [5].
More recently, Reinforcement Learning (RL) and Deep Neural Networks (DNNs) have emerged as powerful tools that enable agents to understand and optimally interact with complex environments and dynamical systems, e.g., [6, 7]. Many RL approaches are based on minimizing arbitrary cost functions, calling for the use of broad sets of candidate nonlinear control policies. To this end, RL methods often employ families of policies that incorporate deep Neural Networks (NNs), due to their ability to model rich classes of nonlinear functions. These capabilities have led to remarkable applications, such as four-legged robots navigating challenging terrains [8] and drones that can outperform humans in races [9, 10]. On the other hand, general methodologies for designing RL policies for nonlinear dynamical systems, while ensuring closed-loop stability, are currently scarce and may be limited by strong assumptions [11, 12, 13]. As a result, so far the applicability of RL approaches has been mainly limited to systems that are not safety-critical.
Independent of their application in RL, NNs have been employed in model-based control since the 1990s for approximating nonlinear receding horizon policies [14, 15] or synthesizing nonlinear regulators from scratch [16]. Recent results on the design of provably stabilizing DNN control policies fall into two categories. The first one comprises constrained optimization approaches [11, 17, 18] that ensure global or local stability by enforcing Lyapunov-like inequalities during optimization. However, conservative stability constraints can severely restrict the range of admissible policies or fail to produce a viable controller even when it exists. Additionally, enforcing constraints such as linear matrix inequalities becomes a computational bottleneck in large-scale applications.
The second category embraces unconstrained optimization approaches, aiming to define classes of control policies with built-in stability guarantees [19, 20, 21]. These methods, which are similar to those developed in this paper, allow unconstrained optimization over finitely many parameters — using, for instance, standard gradient descent techniques — without sacrificing stability, regardless of the chosen parameter values. Optimizing over sets of stabilizing policies has two main benefits. First, it completely decouples the stabilization problem from the choice of the cost being optimized. Second, it enables fail-safe design, that is, the ability to guarantee closed-loop stability even if the policy optimization ends at a local minimum or is prematurely halted. However, these approaches are limited to discrete-time linear systems [19, 20] or to continuous-time systems in the port-Hamiltonian form [21]. While recent work surpasses the limitations above [22, 23], in real-world applications, the knowledge about the system model is not perfect. The impact of modeling errors on the parametrizations of stable closed-loop maps for nonlinear systems has remained largely unexplored.
I-A Contributions
This paper explores approaches to solve performance-boosting problems in general discrete-time, time-varying systems. Specifically, we develop unconstrained optimization approaches based on classes of state-feedback policies that induce closed-loop dynamics described by stable and arbitrarily deep NNs.
After formally stating the performance-boosting problem in Section II, we present our first contribution, which provides a complete characterization of the class of stability-preserving controllers for stable systems. This result is presented in Section III and reveals that an Internal Model Control (IMC) structure [24, 25, 26] allows characterizing, without conservatism, the class of all stability-preserving controllers, where the only free parameter is an operator. Our results hinge on adapting nonlinear variants of the Youla parametrization [27, 28] to discrete-time systems. Further, we examine the relationship with the recently proposed nonlinear System Level Synthesis (SLS) framework developed in [29]. In Section IV, our main contribution is that the proposed approach is compatible with scenarios where only an approximate system description is available, such as models identified from data or derived from simplified physical principles. Specifically, under a finite gain assumption on the model mismatch, stability can always be preserved by embedding a nominal system model and optimizing over nonlinear controllers with a sufficiently reduced gain on the free parameter. Importantly, the method ensures vanishing conservatism as the model uncertainty approaches zero. Additionally, by considering networks of interconnected subsystems, we demonstrate how the IMC structure of our controllers naturally lends itself to the development of distributed policies where the communication topology mirrors the subsystem couplings.
Finally, Section V bridges the gap between theoretical developments and computations, showing how to use Recurrent Equilibrium Networks (RENs) [30, 31] to obtain a finite-dimensional parametrization of performance-boosting controllers that can include DNNs. The final part of the paper in Section VI presents several simulations by considering coordination problems for mobile robots. Specifically, we show how, similarly to RL, the freedom in specifying the optimization cost allows designing NN controllers that can boost various forms of performance and safety, reaching beyond classical optimal control objectives consisting of the sum of stage-costs over time [3].
I-B Notation
Signals and operators: The set of all sequences , where , , is denoted as . Moreover, belongs to with if , where denotes any vector norm. We say that if . When clear from the context, we omit the superscript from and . An operator is said to be -stable111We also say that the operator is stable, for short, when the value of is clear from the context. if it is causal and for all . Equivalently, we write . We say that an operator has finite -gain if , for all .
Time-series: We use the notation to refer to the truncation of to the finite-dimensional vector . An operator is said to be causal if . If in addition , then is said to be strictly causal. Similarly, we define . For a matrix , .
Graph theory: Given an undirected graph described by the set of nodes and the set of edges , we denote set of neighbors of node , including itself by . We denote with col a vector which consists of the stacked subvectors from to and with a vector composed by the stacked subvectors of all neighbors of node , i.e., . For a signal , where , , and , we denote with the sequence . Similarly, we define sequence .
II The Performance-boosting Problem
We consider nonlinear discrete-time time-varying systems
(1) |
where is the state vector, is the control input, stands for unknown process noise with , and . The system model (1) is very general. For instance, it can describe the dynamics of the error between the state of a nonlinear system and a reference trajectory in . In operator form, system (1) is equivalent to
(2) |
where is the strictly causal operator such that . Note that and collects all data needed for defining the system evolution over an infinite horizon. As an example, when the system (1) takes the Linear Time Invariant (LTI) form
(3) |
the model (2) becomes
We consider disturbances with support following a random vector distribution , that is, and for every . In order to control the behavior of system (1), we consider nonlinear, state-feedback, time-varying control policies
(4) |
where is a causal operator to be designed. Note that the controller can be dynamic, as can depend on the whole past history of the system state. Since for each and the system (1) produces a unique state sequence , equation (2) defines a unique transition operator
which provides an input-to-state model of system (1). Similarly, for each the closed-loop system (1)-(4) produces unique trajectories. Hence, the closed-loop mapping is well-defined. Specifically, for a system and a controller , we denote the corresponding induced closed-loop operators and as and , respectively. Therefore, we have and for all .
Our goal is to synthesize a control policy solving the following problem.
Problem 1 (Performance boosting).
Assume that lies in . Find solving the finite-horizon Nonlinear Optimal Control (NOC) problem
(5a) | ||||
(5b) |
where defines a loss over realized trajectories and , and the expectation removes the effect of disturbances on the realized values of the loss.222Another common choice is to use instead of the expectation. Other useful choices include , , and weighted combinations of all the above. In practice, one can approximate the chosen operator that removes the effect of disturbances from the cost by performing multiple experiments.
The main feature of (5) is that the cost is optimized over the finite horizon , but under the strict requirement that the closed-loop system is stable when it evolves over . In other words, the feedback controller must preserve stability of , and its role is to boost the performance of the system in the transient . As it will be clear in the sequel, we consider iterative control design algorithms based on gradient descent that are fail-safe, in the sense that they search in sets of controllers that are stability-preserving by design. This guarantees closed-loop stability during the optimization of the policy parameters. Note also that, as it is standard in NOC, we do not expect gradient descent to find the globally optimal solution for any initialization — this is generally impossible for problems beyond Linear Quadratic Gaussian (LQG) control, which enjoy convexity of the cost and linearity of the optimal policies [32, 33]. Furthermore, the expected value in (5a) can seldom be computed333For instance because it is too costly or the distribution is unknown. and is approximated by using samples of . Fail-safe design guarantees that, in spite of all these limitations, closed-loop stability is never lost.
III Unconstrained Parametrization of all Stability-preserving Controllers
As a preliminary step towards fail-safe design for stable systems, we show how to parametrize all stability-preserving policies by using an IMC control architecture [24, 25], depending on an operator that can be freely chosen in . Specifically, the block diagram of the proposed control architecture is represented in Figure 1 and it includes a copy of the system dynamics, which is used for computing the estimate of the disturbance .
We are now in a position to introduce the main result.
Theorem 1.
Assume that the operator is -stable, i.e. if , and consider the evolution of (2) where is chosen as
(6) |
for a causal operator . Let be the operator such that is equivalent to (6).444This operator always exists because is strictly causal. Hence depends on the inputs and can be computed recursively from past inputs and — see formula (11). The following two statements hold true.
-
1.
If , then the closed-loop system is -stable.
-
2.
If there is a causal policy such that , then
(7) gives .
Proof.
We prove . For compactness, define . As highlighted in [25], since there is no model mismatch between the plant and the model used to define , one has , hence opening the loop. More specifically, from Figure 1 and Equation (2) one has
(8) |
Therefore, by definition of the closed-loop maps, one has and , . When , one has because . Moreover and imply that the operator defined by the composition of the operators and is in as well. This is due to the property that the composition of operators in is in .
We prove . Set, for short, , , , and . By assumption, one has and since also . By definition, is the operator and, from (8) and Figure 1, it coincides with . Hence
(9) |
It remains to prove that . Similar to [22], we proceed by induction. First, we show that . Since and , one has from (1) that the closed-loop map is the identity, irrespectively of the controller. Therefore . Assume now that, for a positive we have for all . Since and are closed-loop maps, from (2) they verify
(10) |
But, from (9), one has and, by using the inductive assumption, one obtains . This implies .
∎
Several comments are in order. First, Theorem 1 is about nominal stability only as there is no model mismatch between the plant model and the one used in the controller. We analyze robust stability in Section IV. Second, it is well known that many IMC architectures are sufficient for preserving stability, both in the linear [24] and the nonlinear [25] case.555Note, however, that IMC in [25] is developed in terms of continuous-time nonlinear input-output models, for which the effect of process noise is difficult to analyze. Moreover, the control objective is to track a reference signal to the plant output, which raises the problem of approximating inverses of nonlinear operators. In our work, we use instead discrete-time input-to-state models and analyze the closed-loop maps from process noise to control inputs and system states. Moreover, our goal is to solve optimal control rather than tracking problems. It is also known that in the LTI setting, IMC is also necessary for preserving stability [34] and provides an alternative to the Youla-Koucera parametrization [35]. In this respect, Theorem 1 provides a necessary condition for preserving stability also for nonlinear systems. This result is perhaps not surprising given that necessary and sufficient conditions for stabilizing wide classes of input-output nonlinear models, in the spirit of the Youla- Koucera parametrization, have been derived since the 80’s [27]. However, these controllers are not conceived in the IMC form.
Following [24, 25], we argue that the IMC structure facilitates the design of performance-boosting policies. Indeed, it is straightforward to deploy controllers using the block-diagram structure shown in Figure 1. In equation form, for a chosen operator , one simply computes the control input as follows:
(11a) | |||
(11b) |
Moreover, Theorem 1 highlights that it is sufficient to search in the space of operators for describing all and only performance-boosting policies. While finding a parametrization of all operators might be prohibitive, we will show in Section V that one can use NNs for describing broad subsets of these operators. Moreover, the IMC structure lends itself to the development of policies that enjoy a distributed structure (see Section IV).
III-1 The case of LTI systems with nonlinear costs
Consider the linear system (3) and let denote the time-shift operator. When the system is asymptotically stable, the classical Youla parametrization [35] states that all linear state-feedback stabilizing control policies can be written as
(12) |
where is the so-called Youla parameter. Here, denotes the set of stable transfer matrices — that is, the set of matrices whose scalar entries are stable transfer functions. The class of linear control policies is globally optimal for standard LQG problems, and it allows optimizing over using simple pole approximations and convex programming — we refer to [36, 37] for state-of-the-art results. However, nonlinear policies can be significantly more performing when the controller is distributed [38], or the cost function is nonlinear. As an immediate corollary of Theorem 1, and in accordance with the core contribution of [39], we have the following result for linear systems controlled by nonlinear policies.
Corollary 1.
Consider the linear system (3) and assume that it is asymptotically stable. Then, all and only control policies that make the closed-loop system -stable are expressed as
(13) |
where .
Proof.
III-2 Relationships with [22] and nonlinear SLS
In [22], we provided a slight generalization of Theorem 1 and the results in Section III-1 by also considering unstable systems for which a pre-stabilizing controller exists, so that the overall policy is
(14) |
By letting , and assuming that both and lie in , Theorem 1 coincides with Theorem 2 in [22]. However, when , Theorem 2 in [22] highlights that may no longer be a necessary condition for closed-loop -stability, while being still sufficient.
Moreover, as highlighted in [22], there is a deep link between Theorem 1 and the SLS parametrization of stabilizing controllers [40, 29]. The idea behind the SLS approach [40, 29] is to circumvent the difficulty of characterizing stabilizing controllers, by instead directly designing stable closed-loop maps. Let us define the set of all achievable closed-loop maps for system as
(15) |
and the set of all achievable and stable closed-loop maps as
(16) |
Note that, if , then and for all . Based on Theorem III.3 of [29], and adding the requirement that the closed-loop maps must belong to , we summarize the main SLS result for nonlinear discrete-time systems.
Theorem 2 (Nonlinear SLS parametrization [29]).
The following two statements hold true.
-
1.
The set of all achievable and stable closed-loop responses admits the following characterization:
(17a) (17b) (17c) -
2.
For any , the operator is invertible and the causal controller
(18) is the only one that achieves the stable closed-loop responses .
Theorem 2 clarifies that any policy achieving -stable closed-loop maps can be described in terms of two causal operators complying with the nonlinear functional equality (17b). Therefore, the NOC problem admits an equivalent Nonlinear SLS (N-SLS) formulation:
() | |||||
According to Theorem 2, the constraint is equivalent to requiring that are causal and verify (17b)-(17c). The constraint (17b) simply defines the operator in terms of and it can be computed explicitly because is strictly causal. The main challenge is to comply with (17c). Indeed, it is hard to generate such that the corresponding satisfies . The paper [29] suggests directly searching over -stable operators and abandoning the goal of complying with (17b) exactly. One can then study robust stability when (17b) only holds approximately as per Theorem IV.2 in [29]. However, with the exception of polynomial systems [41], this way of proceeding may result in conservative control policies or fail to produce a stabilizing controller. Instead, for the case of stable or pre-stabilized systems, Theorem 1 can be seen as a way of parametrizing all stabilizing controllers that circumvents completely the problem of fulfilling (17b)-(17c).
IV Beyond Closed-loop Stability: Handling Model Uncertainty and Distributed Architectures
This section tackles the performance boosting problem (Problem 1) under more intricate real-world constraints beyond just closed-loop stability. Firstly, Theorem 1 suffers from requiring perfect plant knowledge for controller design. In reality, ensuring closed-loop stability despite an imperfect model is crucial. Secondly, control policies in large-scale applications like power grids and traffic systems are inherently distributed. This means they rely solely on local sensor data and communication, posing significant challenges to achieving network-level robustness and stability.
IV-A Robustness against model-mismatch
Let us denote the nominal model available for design as and the real unknown plant as
(19) |
where is a strictly causal operator representing the model mismatch. Let be the time representation of the mismatch operator . Since for each sequence of disturbances and inputs the dynamics represented by (1) with replaced by produces a unique state sequence , the equation
(20) |
defines again a unique transition operator , which provides an input-to-state model of the perturbed system.
Here, we show that when can be described by an operator with finite gain, we can always design operators with sufficiently small -gain that stabilize the real closed-loop system. More specifically, letting be the maximum gain of the model mismatch , it is possible to design controllers that comply with the following robust version of the stability constraint (5b):
(21) |
This result, which is given in the next theorem, refers to the control scheme in Figure 2.
Theorem 3.
Assume that the mismatch operator in (19) has finite -gain . Furthermore, assume that the operator has finite -gain . Then, for any such that
(22) |
the control policy given by
(23a) | |||
(23b) |
stabilizes the closed-loop system.
Proof.
We first show that operators and verify
(24) |
This follows by substituting in (20). We now compute the gain of the operator in the right frame of Figure 2:
(25) |
where the first equality follows from (24). Using the definition of -gain for the operator one has , and, by using (25) and , one obtains
The relationship above implies that
(26) |
Next, we plug the upperbound (26) into the inequality to obtain
(27) |
and subsequently, we plug (27) into the inequality to obtain
(28) |
The last step is to verify that the maps and have a finite -gain. This is done by checking that the gains in (27) and (28) are positive values when the gain of is sufficiently small. If (22) holds, we have that , and hence the denominator in (27) is positive. Since the numerator of (27) is always positive, we conclude that the map has an -gain. Similarly for (28), since (22) implies that , we have that both numerator and denominator are positive. This implies that the map has an -gain, as desired. ∎
The robustness condition (22) highlights a trade-off between () the degree of tolerable uncertainty in the mismatch between nominal and real dynamics, and () the extent of the set of stabilizing control policies that we are permitted to optimize over. Specifically, (22) ensures that, for any model mismatch , there always exists a range of admissible gains for such that the closed-loop is stable. This enables one to freely learn over all appropriately gain-bounded operators. Further note that Theorem 3 is not conservative when — this is unlike the classical application of the small-gain theorem [42] which would enforce that even when . Indeed, when the model is fully known, the right-hand side of (22) diverges to infinity, allowing the gain of to be any finite value, although without imposing an upper bound, and therefore recovering the completeness result of Theorem 1.
Remark 1 (Robust stability of nonlinear SLS).
The authors of [29] characterize robust stability of nonlinear SLS against mismatch in satisfying the achievability constraint (17b). Specifically, [29] focuses on the scenario where the control policy is the mapping in the form
(29) | ||||
(30) |
for some which are not assumed to perfectly comply with (17b). Accordingly, the authors define a mismatch operator
(31) |
Then, Theorem IV.2 of [29] proves closed-loop stability as long as . Since measures the degree of violation of the achievability constraint rather than the degree of model uncertainty, a robust stability analysis based on verifying tailored to the case may not be straightforward, and it is not attempted in [29]. For this case, instead, Theorem 3 provides an upper bound on the admissible gains for ; this is achieved by exploiting the IMC structure of the policy (23), and bounding the effect of model uncertainty on the closed-loop map for the ground-truth system.
IV-B Distributed controllers for large-scale plants
When dealing with large-scale cyber-physical systems, one may consider that the plant (1) is composed of a network of dynamically interconnected nonlinear subsystems. To model this scenario, we introduce an undirected coupling graph , where the nodes represent the subsystems in the network, and the set of edges encode pairs of subsystems that are dynamically interconnected through state variables. Specifically, the dynamics of each subsystem is
(32) |
where state and input of each subsystem at time are denoted by and respectively, and the initial state is . In operator form we have
(33) |
where . Note that, by stacking the subsystem dynamics in (32) together, we recover a system in the form (1), where , , and .
When controlling networked systems in the form (33), a common scenario is that the local feedback controller can only access information made available by its neighbors according to a communication network with the same topology of . This requirement translates into imposing the following additional constraint to the performance-boosting problem (Problem 1):
(34) |
The challenge becomes to parametrize only those stabilizing policies that are distributed according to (34). This can be achieved by exploiting the IMC controller architecture (11) in combination with the network sparsity of highlighted in (33). Let us consider, for example, the networked plant of Figure 3, where depends on the local disturbance reconstructions only, that is, . In order to reconstruct , agent needs to evaluate the local dynamics ; this, in turns, requires a measurement of the state over time. Repeating this reasoning for the agents and , one obtains an overall control policy whose agent-wise components are computed relying on measurements from neighboring subsystems only, thus complying with (34). We formalize this reasoning in the next proposition.
Proposition 1.
Proof.
The result of Proposition 1 can be extended to more complex cases. First, one can use local operators that, besides , have access to disturbance reconstructions or control variables computed at locations . While these architectures can be beneficial, e.g. for counteracting disturbances affecting other subsystems before they propagate to the subsystem through coupling, they require additional communication channels if . Moreover, one has to use local operators guaranteeing that the whole operator belongs to . To this purpose, in general, it is not enough that because the dependency on and for can induce loop interconnections that can destabilize the closed-loop system. Classes of local operators yielding have been proposed in [43, 44] by using dissipativity theory.
V Learning to Boost Performance using Unconstrained Optimization
Leveraging the theoretical results of previous sections, we reformulate the performance-boosting problem in a form that facilitates optimizing by automatic differentiation and unconstrained gradient descent. This enables the use of highly flexible cost functions for complex nonlinear optimal control tasks. By design, the proposed approach guarantees closed-loop stability throughout the optimization process. We assess the effectiveness of the proposed methodology in achieving optimal performance through numerical experiments, in Section VI.
V-A IMC-based reformulation of performance boosting
(35a) | ||||
(35b) | ||||
(35c) |
Indeed, (6) corresponds to (35b)-(35c). If the exact dynamics in (35b) is not known, it must be simply replaced by the nominal model .
The reformulation (35) offers significant computational advantages as compared to Problem 1. In the classical linear quadratic case,666That is, when and are linear and is quadratic positive definite. (35) becomes strongly convex in — enabling to use efficient convex optimization for finding a globally optimal solution [45, 40, 46, 47, 36]. In the general nonlinear case, searching over nonlinear operators remains significantly easier than tackling Problem 1 directly. Indeed, the set of controllers complying with (5b) is, in general, difficult to parametrize. This is mainly because, given two stabilizing policies , their convex combinations with and their cascaded composition do not result in stabilizing policies, in general; these issues are very well-known for the special case of linear systems [48, 45]. Hence, it is difficult to parameterize stabilizing policies, for instance, by composing or summing together base stabilizing operators. Instead, thanks to being convex and closed under composition, there exist methods for parametrizing rich subsets of through free parameters , that is, to define operators such that
(36) |
This allows turning (35) into an unconstrained optimization problem over .
The last issue to be addressed is the computation of the average in (35a) that, as noticed before, is generally intractable. This is usually circumvented by approximating the exact average with its empirical counterpart obtained using a set of samples drawn from the distribution . One then obtains the finite-dimensional optimization problem:
(37a) | |||||
(37b) | |||||
(37c) |
where and are the inputs and states obtained when the disturbance is applied. While in this work we only consider the empirical cost in the optimization problem (37a), the closed-loop performance when faced with out-of-sample noise sequences is further investigated in [49].
Finally, we highlight that (37b) and (37c) can be seen as the equations of the layer of a neural network with depth and parametrized by . When , for is sufficiently smooth, the absence of constraints on enables the use of powerful packages, such as TensorFlow [50] and PyTorch [51], leveraging automatic differentiation and backpropagation for optimizing the controller through gradient descent.
V-B Free parameterizations of subsets
As highlighted in Section V.V-A, the possibility of obtaining effective controllers by solving (37) critically depends on our ability to parametrize operators. The main obstacle is that the space is infinite-dimensional. Hence, for implementation, one usually restrict the search in subsets of described by finitely many parameters. When linear systems are considered, one can search over Finite Impulse Response (FIR) transfer matrices . and then optimize over the finitely many real matrices . Less and less conservative solutions can be obtained by increasing the FIR order . However, the FIR approach limits the search to linear control policies.
Recently, [30, 31, 52] have proposed finite-dimensional DNN approximations of nonlinear operators. In the sequel we briefly review the Recurrent Equilibrium Network (REN) models proposed in [31]. An operator is a REN if the relationship is recursively generated by the following dynamical system:
(38) |
where , , 777This is slightly different from the the original REN model, where these signals [31] are assumed to be constant. and — the activation function — is applied element-wise. Further, must be piecewise differentiable and with first derivatives restricted to the interval . As noted in [31], RENs subsume many existing DNN architectures. In general, RENs define deep equilibrium network models [53] due to the implicit relationships defining in the second block row of (38). By restricting to be strictly lower-triangular, the value of can be computed explicitly, thus significantly speeding-up computations [31]. To give an example of the expressivity of (38), by suitably choosing the size and zero pattern of matrices in (38), RENs can provide nonlinear systems in the form
where , , , are arbitrary matrices of suitable dimensions and , , are neural networks of depth given by the relations
where and are the layer weights and biases, respectively, and is the NN output.
For an arbitrary choice of and , the map induced by (38) may not lie in . The work [31] provides an explicit smooth mapping from unconstrained training parameters to a matrix defining (38), with the property that the corresponding operator lies in by design when .888Furthermore, RENs enjoy contractivity — although the theoretical results of this paper do not rely on this property. This approach can be easily generalized by including vectors , in the set of trainable parameters and assuming for . Recently, free parameterizations of continuous-time operators through RENs and port-Hamiltonian systems have been also proposed in [52] and [54], respectively.
VI Numerical Experiments: the Magic of the Cost
In this section, we test the flexibility of performance boosting by considering cooperative robotics problems. Firstly, we validate the fail-safe feature of the design approach by showing that closed-loop stability is preserved during and after training — both when the system model is known and when it is uncertain. Secondly, we exploit the freedom in selecting the cost to include appropriate terms aimed at promoting complex closed-loop behaviors.
In all the examples, we consider two point-mass vehicles, each with position and velocity , for , subject to nonlinear drag forces (e.g., air or water resistance). The discrete-time model for vehicle is
(39) |
where is the mass, denotes the force control input, is the sampling time and is a drag function given by , for some . Each vehicle must reach a target position with zero velocity in a stable way. This elementary goal can be achieved by using a base proportional controller
(40) |
with and . The overall dynamics in (1) is given by (39)-(40) with , where and is a performance-boosting control input to be designed. As per (1), we consider additive disturbances affecting the system dynamics. Thanks to the use of the prestabilizing controller (40), one can show that .
The goal of the performance-boosting policy is to enforce additional desired behaviors, on top of stability, which are specified in each of the following subsections. In all cases, we parametrize the operator as a REN, see (38). Appendix -A presents all the implementation details, such as parameter values and exact definitions of the cost functions. The code to reproduce our examples as well as various movies are available in our Github repository.999https://github.com/DecodEPFL/performance-boosting_controllers.git
VI-A Robust stability preservation during optimization
We consider the scenario mountains in Figure 4 where each vehicle must reach the target position in a stable way while avoiding collisions between themselves and with two grey obstacles. Each agent is represented with a circle that indicates its radius for the collision avoidance specifications. When using the base controller (40), the vehicles successfully achieve the target, however, they do so with poor performance since collisions are not avoided, as shown in Figure 4(a).
We select a loss as the sum of stage costs , that is, with
(41) |
where with penalizes the distance of agents from their targets and the control energy, and penalize collisions between agents and with obstacles, respectively.
In order to train the performance-boosting controller, we solve (37), using a REN (38) of dimension . The training data consists of a set of 100 initial positions, i.e., we set and , for , where and denote the and coordinates of the vehicles in the Cartesian plane, respectively. Initial positions are sampled from a Gaussian distribution around the nominal initial condition. Figure 4(b-c) shows the nominal and training initial conditions marked with ‘’ and ‘’, respectively, and three test trajectories after the training of the IMC controller. The trained control policies avoid collisions and achieve optimized trajectories thanks to minimizing (41).
VI-A1 Early stopping of the training
We validate the fail-safe property of our IMC control policies. We consider the scenario mountains as above but where the training process is interrupted before achieving a local minimum, as per the one in Figure 4. In particular, we stop the optimization algorithm after 25%, 50%, and 75% of the total number of epochs. The obtained trajectories are shown in Figure 5. We observe that even if the performance is not optimized, closed-loop stability is always guaranteed.
VI-A2 Model mismatch
We test our trained IMC controller when considering model mismatch on the system. In particular, we assume that the true vehicles have an incertitude over the mass of , and we apply IMC control policies embedding the nominal system with the nominal mass value. Figures 6 (a-b) validate the robust -stability of the closed-loop trajectories when the vehicles are lighter and heavier, respectively. Theorem 3 suggests that, in this case, the gain of may be sufficiently low to counteract the effect of model uncertainty. Note, however, that checking the sufficient condition (22) requires computing an upper bound on — a cumbersome task for general nonlinear systems. Nonetheless, Theorem 3 ensures that, in practical implementation, we can always reduce enough to eventually meet (22).
VI-B Boosting for safety and invariance certificates
A challenging task in many control applications is to deal with stringent safety constraints on the state variables. Ideally, one would directly add the constraint that
(42) |
in the IMC-based performance-boosting problem (35), where defines a safety region. Unfortunately, (42) generally results in intractable constraints over . Indeed, it may be challenging to even verify that (42) holds for a certain due to the infinite-horizon requirement and the involved nonlinearities. Many state-of-the-art approaches for guaranteeing safety hinge on either predictive safety filters [55, 56] or Control Barrier Functions (CBFs) [57, 58]. Safety filters are used during deployment: they override the control input with a different (suboptimal) control variable when deemed necessary for guaranteeing safety. Instead, CBFs can be used for safety verification of a given policy, as they allow characterizing as a forward invariant set based on a safety-set-defining function satisfying for all . Certifying the forward invariance of translates into determining if is a CBF through verification of some safety conditions.101010An exact definition of CBFs for the discrete-time can be found in [58]; for a more general discussion on CBFs we refer the reader to [57]. In particular, one can verify that, for any , if there exists an input giving such that it holds
(43) |
where , then is a CBF.
While optimizing over such that (42) holds by design remains an open challenge, we aim to promote forward invariant sets by shaping the cost to include soft safety specifications over a horizon of length . In particular, the new cost term penalizes violations of (43) as per
(44) |
We consider the mountains scenario again and add the requirement that for each vehicle and every , where denotes the -coordinate of each center-of-mass position on the Cartesian plane. In other words, we only allow an overshoot of in the vertical direction with respect to the target position for each vehicle. By defining we add the term (44) to the loss function (37a). Upon training without including in the cost, the masses violate the constraints, on average, on of the time over 100 runs — typical trajectories are shown in Figure 4. The violation ratio is decreased to when is included, as shown in Figure 6(c), where the gray area indicates the unsafe region to be avoided by the vehicles. Note that shaping the cost through is also beneficial if one implements an online safety filter such as [55, 56] during deployment. This is because penalizing drastically decreases constraint violations of the closed-loop system, and hence, the suboptimal online intervention of the safety filter would be much less frequent.
VI-C Boosting for temporal logic specifications
The success of many policy learning algorithms, e.g., in RL, is highly dependent on the choice of the reward functions for capturing the desired behavior and constraints of an agent. When tasks become complex, specifying loss functions that are the sum over time of stage costs can be restrictive. For instance, consider the case of an agent that must optimally visit a set of locations. A loss function composed of a stage-cost summed over time — that is, the one considered in dynamic programming and classical optimal control [59, 3] — cannot easily capture this task, as it would need a-priori information about the optimal timings to visit each location. To overcome this problem, one could use more complex loss functions, as per those derived from temporal logic formulations. In particular, truncated linear temporal logic (TLTL) is a specification language leveraging a set of operators defined over finite-time trajectories [60, 61]. It allows incorporating domain knowledge, and constraints (in a soft fashion) into the learning process, such as “always avoid obstacles”, “eventually visit location ”, or “do not visit location until visiting location ”. Then, using quantitative semantics one can automatically transform TLTL formulae into real-valued loss functions that are compositions of and functions over a finite period of time [60, 61].
To test the efficacy of TLTL specifications for shaping complex stable closed-loop behavior, we consider the scenario waypoint-tracking, shown in Figure 7, where the two vehicles have to visit a sequence of waypoints while avoiding collisions between them and the gray obstacles. The blue vehicle’s goal is to visit , then and then , while the goal for the orange vehicle is to visit the waypoints in the following order: , and . Following [60], the loss formulation for the orange agent is translated into plain English as “Visit then then ; and don’t visit or until visiting ; and don’t visit until visiting ; and if visited , don’t visit again; and if visited , don’t visit again; and always avoid obstacles; and always avoid collisions; and eventually state at the final goal.” Its mathematical formulation can be found in Appendix -A.-A2.
Figure 7 shows the waypoint-tracking scenario before and after the training of a performance-boosting controller. As described in Section V.V-B, we use a REN with for approximating the operator . Furthermore, we allow for a time-varying bias of the form , in (38), with for . While the system always starts at the same initial condition indicated with ‘,’ the data consists of disturbance sequences with fixed and as i.i.d. samples drawn from a Gaussian distribution with zero mean and standard deviation of . Our result highlights the power of complex costs — expressed through the TLTL loss function — which promotes vehicles visiting the predefined waypoints in the correct order while avoiding collisions between them and with the obstacles.
VII Conclusion
Embedding safety and stability emerges as a crucial challenge when control systems are equipped with high-performance machine learning components. This work aims to contribute to this rapidly developing field by uncovering the theoretical and computational potential of IMC for safely boosting the performance of closed-loop nonlinear systems with machine learning models such as DNNs.
The results of this work open up several future research directions. First, motivated by the recent results of [49], it would be relevant to apply statistical learning theory to rigorously assess the generalization capabilities of performance-boosting controllers in uncertain environments and over extended timeframes. Second, drawing on insights from [62], integrating extensive RL-based offline learning with real-time adjustments similar to MPC presents a promising approach. Third, within the IMC framework, there is a significant opportunity to develop richer parametrizations of stable dynamical systems in , and to theoretically prove their approximation capabilities. Lastly, building upon [63], it is interesting to explore how learning-based IMC methods could generate new optimization algorithms with formal guarantees for tackling complex optimal control and machine learning tasks.
-A Implementation details for the numerical experiments in Section VI
We set and as the parameters for each vehicle , in the model (39) with the pre-stabilizing controller (40). The collision-avoidance radius of each agent is 0.5.
-A1 Mountains scenario
As shown in Figure 4, the vehicles start at and , and their goal is to go to the target positions and , respectively. The training data consists of initial positions sampled from a Gaussian distribution around the initial position with a standard deviation of .
Let with . The terms of the cost function (41) are defined as follows:
where and are hyperparameters, denotes the distance between agent and , is a fixed positive small constant such that the loss remains bounded for all distance values and is a safe distance between the center of mass of each the agent; we set it to 1.2.
Motivated by [64], we represent the obstacles based on a Gaussian density function
with mean and covariance with . The term is given by
For the hyperparameters, we set , , and . We use stochastic gradient descent with Adam to minimize the loss function, setting a learning rate of . We train for epochs with one trajectory per batch size.
-A2 Waypoint-tracking scenario
As shown in Figure 4, the vehicles start at and . The goal points , and are located at , and , respectively. To describe the TLTL loss, let us define, for each vehicle, the following functions of time:
-
•
, for , is the distance between the vehicle and the goal point ;
-
•
, for , is the distance between the vehicle and the obstacle;
-
•
is the distance between the two vehicles;
where , and are the waypoints in the correct visiting order, for each vehicle. Following the notation of [60], the temporal logic form of the cost function, for each vehicle, is
(45) |
where are predicates defined in Table I, and and are the radii of the obstacles and vehicles, respectively.111111Note that in the waypoint-tracking scenario, we do not model the obstacles with a Gaussian density function. The Boolean operators , , and stand for negation (not), disjunction (or), and conjunction (and). The temporal operators , , , and stand for ‘then’, ‘until’, ‘eventually’, and ‘always’. Mathematically, each term can be automatically translated following [60, 61]. For instance, translates into
and translates into
The full mathematical expression of (45), which can be obtained following [60], is implemented in our Github repository.
Predicates | Expression |
---|---|
We also add a small regularization term for promoting that the vehicles stay close to the end target point, which reads , with . We use stochastic gradient descent with Adam to minimize the loss function, setting a learning rate of . We train for 3000 epochs with a single trajectory per batch size.
References
- [1] A. M. Annaswamy, K. H. Johansson, and G. J. Pappas, “Control for societal-scale challenges: Road map 2030,” IEEE Control Systems Society Publication, 2023.
- [2] S. Sastry, Nonlinear systems: analysis, stability, and control. Springer Science & Business Media, 2013, vol. 10.
- [3] D. P. Bertsekas, “Dynamic programming and optimal control: Vol. I-II,” Belmont, MA: Athena Scientific, 2011.
- [4] L. S. Pontryagin, Mathematical theory of optimal processes. Routledge, 2018.
- [5] J. B. Rawlings, D. Q. Mayne, and M. Diehl, Model predictive control: theory, computation, and design. Nob Hill Publishing Madison, WI, 2017, vol. 2.
- [6] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
- [7] L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, pp. 411–444, 2022.
- [8] J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science robotics, vol. 5, no. 47, p. eabc5986, 2020.
- [9] Y. Song, M. Steinweg, E. Kaufmann, and D. Scaramuzza, “Autonomous drone racing with deep reinforcement learning,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 1205–1212.
- [10] E. Kaufmann, L. Bauersfeld, A. Loquercio, M. Müller, V. Koltun, and D. Scaramuzza, “Champion-level drone racing using deep reinforcement learning,” Nature, vol. 620, no. 7976, pp. 982–987, 2023.
- [11] F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” Advances in Neural Information Processing Systems 30, vol. 2, pp. 909–919, 2018.
- [12] M. Zanon and S. Gros, “Safe reinforcement learning using robust MPC,” IEEE Transactions on Automatic Control, vol. 66, no. 8, pp. 3638–3652, 2020.
- [13] M. Jin and J. Lavaei, “Stability-certified reinforcement learning: A control-theoretic perspective,” IEEE Access, vol. 8, pp. 229 086–229 100, 2020.
- [14] T. Parisini and R. Zoppoli, “A receding-horizon regulator for nonlinear systems and a neural approximation,” Automatica, vol. 31, no. 10, pp. 1443–1451, Oct. 1995.
- [15] T. Parisini, M. Sanguineti, and R. Zoppoli, “Nonlinear stabilization by receding-horizon neural regulators,” International Journal of Control, vol. 70, no. 3, pp. 341–362, Jan. 1998.
- [16] A. Levin and K. Narendra, “Control of nonlinear dynamical systems using neural networks. II. Observability, identification, and control,” IEEE Transactions on Neural Networks, vol. 7, no. 1, pp. 30–42, Jan. 1996.
- [17] F. Gu, H. Yin, L. El Ghaoui, M. Arcak, P. Seiler, and M. Jin, “Recurrent neural network controllers synthesis with stability guarantees for partially observed systems,” in AAAI, 2022, pp. 5385–5394.
- [18] P. Pauli, J. Köhler, J. Berberich, A. Koch, and F. Allgöwer, “Offset-free setpoint tracking using neural network controllers,” in Learning for Dynamics and Control. PMLR, 2021, pp. 992–1003.
- [19] R. Wang, N. H. Barbara, M. Revay, and I. Manchester, “Learning over all stabilizing nonlinear controllers for a partially-observed linear system,” IEEE Control Systems Letters, vol. 7, pp. 91–96, 2022.
- [20] R. Wang and I. R. Manchester, “Youla-REN: Learning nonlinear feedback policies with robust stability guarantees,” in 2022 American Control Conference (ACC). IEEE, 2022, pp. 2116–2123.
- [21] L. Furieri, C. L. Galimberti, M. Zakwan, and G. Ferrari-Trecate, “Distributed neural network control with dependability guarantees: a compositional port-Hamiltonian approach,” in Learning for Dynamics and Control Conference. PMLR, 2022, pp. 571–583.
- [22] L. Furieri, C. L. Galimberti, and G. Ferrari-Trecate, “Neural system level synthesis: Learning over all stabilizing policies for nonlinear systems,” in 2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022, pp. 2765–2770.
- [23] N. H. Barbara, R. Wang, and I. R. Manchester, “Learning over contracting and Lipschitz closed-loops for partially-observed nonlinear systems,” in 2023 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023, pp. 1028–1033.
- [24] C. E. Garcia and M. Morari, “Internal model control. a unifying review and some new results,” Industrial & Engineering Chemistry Process Design and Development, vol. 21, no. 2, pp. 308–323, 1982.
- [25] C. G. Economou, M. Morari, and B. O. Palsson, “Internal model control: Extension to nonlinear system,” Industrial & Engineering Chemistry Process Design and Development, vol. 25, no. 2, pp. 403–411, 1986.
- [26] F. Bonassi and R. Scattolini, “Recurrent neural network-based internal model control design for stable nonlinear systems,” European Journal of Control, vol. 65, p. 100632, 2022.
- [27] V. Anantharam and C. A. Desoer, “On the stabilization of nonlinear systems,” IEEE Transactions on Automatic Control, vol. 29, no. 6, pp. 569–572, 1984.
- [28] K. Fujimoto and T. Sugie, “State-space characterization of Youla parametrization for nonlinear systems based on input-to-state stability,” in Proceedings of the 37th IEEE Conference on Decision and Control, vol. 3. IEEE, 1998, pp. 2479–2484.
- [29] D. Ho, “A system level approach to discrete-time nonlinear systems,” in 2020 American Control Conference (ACC). IEEE, 2020, pp. 1625–1630.
- [30] K.-K. K. Kim, E. R. Patrón, and R. D. Braatz, “Standard representation and unified stability analysis for dynamic artificial neural network models,” Neural Networks, vol. 98, pp. 251–262, 2018.
- [31] M. Revay, R. Wang, and I. R. Manchester, “Recurrent equilibrium networks: Flexible dynamic models with guaranteed stability and robustness,” IEEE Transactions on Automatic Control, 2023.
- [32] Y. Tang, Y. Zheng, and N. Li, “Analysis of the optimization landscape of linear quadratic Gaussian (LQG) control,” in Learning for Dynamics and Control. PMLR, 2021, pp. 599–610.
- [33] L. Furieri and M. Kamgarpour, “First order methods for globally optimal distributed controllers beyond quadratic invariance,” in 2020 American Control Conference (ACC). IEEE, 2020, pp. 4588–4593.
- [34] D. E. Rivera, M. Morari, and S. Skogestad, “Internal model control: Pid controller design,” Industrial & engineering chemistry process design and development, vol. 25, no. 1, pp. 252–265, 1986.
- [35] K. Zhou and J. C. Doyle, Essentials of robust control. Prentice hall Upper Saddle River, NJ, 1998, vol. 104.
- [36] M. W. Fisher, G. Hug, and F. Dörfler, “Approximation by simple poles–part I: Density and geometric convergence rate in hardy space,” IEEE Transactions on Automatic Control, 2023.
- [37] ——, “Approximation by simple poles–part II: System level synthesis beyond finite impulse response,” arXiv preprint arXiv:2203.16765, 2022.
- [38] L. Furieri, Y. Zheng, A. Papachristodoulou, and M. Kamgarpour, “Sparsity invariance for convex design of distributed controllers,” IEEE Transactions on Control of Network Systems, vol. 7, no. 4, pp. 1836–1847, 2020.
- [39] R. Wang, N. H. Barbara, M. Revay, and I. R. Manchester, “Learning over all stabilizing nonlinear controllers for a partially-observed linear system,” IEEE Control Systems Letters, vol. 7, pp. 91–96, 2022.
- [40] Y.-S. Wang, N. Matni, and J. C. Doyle, “A system-level approach to controller synthesis,” IEEE Transactions on Automatic Control, vol. 64, no. 10, pp. 4079–4093, 2019.
- [41] L. Conger, J. S. L. Li, E. Mazumdar, and S. L. Brunton, “Nonlinear system level synthesis for polynomial dynamical systems,” in 2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022, pp. 3846–3852.
- [42] G. Zames, “On the input-output stability of time-varying nonlinear feedback systems part one: Conditions derived using concepts of loop gain, conicity, and positivity,” IEEE transactions on automatic control, vol. 11, no. 2, pp. 228–238, 1966.
- [43] L. Massai, D. Saccani, L. Furieri, and G. Ferrari-Trecate, “Unconstrained learning of networked nonlinear systems via free parametrization of stable interconnected operators,” arXiv preprint arXiv:2311.13967, 2023.
- [44] D. Saccani, L. Massai, L. Furieri, and G. Ferrari-Trecate, “Optimal distributed control with stability guarantees by training a network of neural closed-loop maps,” arXiv preprint arXiv:2404.02820, 2024.
- [45] D. Youla, H. Jabr, and J. Bongiorno, “Modern Wiener-Hopf design of optimal controllers–Part II: The multivariable case,” IEEE Transactions on Automatic Control, vol. 21, no. 3, pp. 319–338, 1976.
- [46] L. Furieri, Y. Zheng, A. Papachristodoulou, and M. Kamgarpour, “An input–output parametrization of stabilizing controllers: amidst Youla and system level synthesis,” IEEE Control Systems Letters, 2019.
- [47] Y. Zheng, L. Furieri, M. Kamgarpour, and N. Li, “System-level, input–output and new parameterizations of stabilizing controllers, and their numerical computation,” Automatica, vol. 140, p. 110211, 2022.
- [48] M. Fazel, R. Ge, S. Kakade, and M. Mesbahi, “Global convergence of policy gradient methods for the linear quadratic regulator,” in International Conference on Machine Learning. PMLR, 2018, pp. 1467–1476.
- [49] M. G. Boroujeni, C. L. Galimberti, A. Krause, and G. Ferrari-Trecate, “A pac-bayesian framework for optimal control with stability guarantees,” arXiv preprint arXiv:2403.17790, 2024.
- [50] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. [Online]. Available: https://www.tensorflow.org/
- [51] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 2019, pp. 8024–8035.
- [52] D. Martinelli, C. L. Galimberti, I. R. Manchester, L. Furieri, and G. Ferrari-Trecate, “Unconstrained parametrization of dissipative and contracting neural ordinary differential equations,” in 2023 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023, pp. 3043–3048.
- [53] S. Bai, J. Z. Kolter, and V. Koltun, “Deep equilibrium models,” in Advances in Neural Information Processing Systems (NeurIPS), 2019.
- [54] M. Zakwan and G. Ferrari-Trecate, “Neural distributed controllers with port-Hamiltonian structures,” arXiv preprint arXiv:2403.17785, 2024.
- [55] L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger, “Learning-based model predictive control: Toward safe learning in control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 3, pp. 269–296, 2020.
- [56] K. P. Wabersich and M. N. Zeilinger, “A predictive safety filter for learning-based control of constrained nonlinear dynamical systems,” Automatica, vol. 129, p. 109597, 2021.
- [57] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” in 2019 18th European control conference (ECC). IEEE, 2019, pp. 3420–3431.
- [58] A. Agrawal and K. Sreenath, “Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation.” in Robotics: Science and Systems, vol. 13. Cambridge, MA, USA, 2017, pp. 1–10.
- [59] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. Scokaert, “Constrained model predictive control: Stability and optimality,” Automatica, vol. 36, no. 6, pp. 789–814, 2000.
- [60] X. Li, C.-I. Vasile, and C. Belta, “Reinforcement learning with temporal logic rewards,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 3834–3839.
- [61] K. Leung, N. Aréchiga, and M. Pavone, “Backpropagation through signal temporal logic specifications: Infusing logical structure into gradient-based methods,” The International Journal of Robotics Research, vol. 42, no. 6, pp. 356–370, 2023.
- [62] D. Bertsekas, Lessons from AlphaZero for optimal, model predictive, and adaptive control. Athena Scientific, 2022.
- [63] A. Martin and L. Furieri, “Learning to optimize with convergence guarantees using nonlinear system theory,” arXiv preprint arXiv:2403.09389, 2024.
- [64] D. Onken, L. Nurbekyan, X. Li, S. W. Fung, S. Osher, and L. Ruthotto, “A neural network approach applied to multi-agent optimal control,” in IEEE European Control Conference (ECC), 2021, pp. 1036–1041.