Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
\newmdtheoremenv

definitionDefinition

Safe and Reliable Training
of Learning-Based Aerospace Controllers

Udayan Mandal1, Guy Amir2, Haoze Wu1, Ieva Daukantas3, Fletcher Lee Newell1, Umberto J. Ravaioli4,
Baoluo Meng5, Michael Durling5, Milan Ganai1, Tobey Shim1, Guy Katz2, and Clark Barrett1
1Stanford University, 2The Hebrew University of Jerusalem, 3IT University of Copenhagen, 4Google, 5GE Aerospace Research
   Udayan Mandal Center for AI Safety
Stanford University
Stanford, USA
udayanm@stanford.edu
   Guy Amir School of CS & Engineering
The Hebrew University of Jerusalem
Jerusalem, Israel
guyam@cs.huji.ac.il
   Haoze Wu Center for AI Safety
Stanford University
Stanford, USA
haozewu@stanford.edu
   Ieva Daukantas Department of Computer Science
IT University of Copenhagen
Copenhagen, Denmark
daukantas@itu.dk
   Fletcher Lee Newell Center for AI Safety
Stanford University
Stanford, USA
flnewell@stanford.edu
   Umberto Ravaioli Google
Mountain View, USA
uravaioli@google.com
   Baoluo Meng GE Aerospace Research
Niskayuna, USA
baoluo.meng@ge.com
   Michael Durling GE Aerospace Research
Niskayuna, USA
durling@ge.com
   Kerianne Hobbs Air Force Research Laboratory
US Air Force
Dayton, USA
kerianne.hobbs@afrl.af.mil
   Milan Ganai Department of Computer Science
Stanford University
Stanford, USA
mganai@stanford.edu
   Tobey Shim Department of Data Science
Stanford University
Stanford, USA
tshim24@stanford.edu
   Guy Katz School of CS & Engineering
The Hebrew University of Jerusalem
Jerusalem, Israel
guykatz@cs.huji.ac.il
   Clark Barrett Center for AI Safety
Stanford University
Stanford, USA
barrett@stanford.edu
Abstract

In recent years, deep reinforcement learning (DRL) approaches have generated highly successful controllers for a myriad of complex domains. However, the opaque nature of these models limits their applicability in aerospace systems and sasfety-critical domains, in which a single mistake can have dire consequences. In this paper, we present novel advancements in both the training and verification of DRL controllers, which can help ensure their safe behavior. We showcase a design-for-verification approach utilizing k𝑘kitalic_k-induction and demonstrate its use in verifying liveness properties. In addition, we also give a brief overview of neural Lyapunov Barrier certificates and summarize their capabilities on a case study. Finally, we describe several other novel reachability-based approaches which, despite failing to provide guarantees of interest, could be effective for verification of other DRL systems, and could be of further interest to the community.

Index Terms:
AI Safety, Deep Reinforcement Learning, Formal Verification, Deep Neural Network Verification

I Introduction

Deep reinforcement learning (DRL) has gained significant popularity in recent years, reaching state-of-the-art performance in various domains. One such domain is aerospace systems, in which DRL models are under consideration for replacing years-old software by learning to efficiently control airborne platforms and spacecraft. However, although they perform well empirically, DRL systems have an opaque decision-making process, making them challenging to reason about. More importantly, this opacity raises critical questions about safety and security (e.g., How can we ensure that the spacecraft will never violate a velocity constraint? Will it always reach its destination?) which are difficult to answer. These reliability concerns are a significant obstacle to deploying DRL controllers in real-world systems, where even a single mistake cannot be tolerated.

To cope with this urgent need, a myriad of DRL training techniques have been put forth in recent years to enhance the performance of such systems. However, these current approaches suffer from two main drawbacks: (i) they are usually not geared towards improving safety and reliability (which is key in aerospace systems); and (ii) they are heuristic in nature and do not afford any formal guarantees. At the same time, the formal methods community has been developing methods for formally and rigorously assessing the reliability of DRL systems. However, although such methods are useful for identifying whether a system is safe, they are typically not incorporated into the DRL training process, but are rather used only afterwards.

In this work, we begin bridging this gap by proposing a novel design-for-verification approach that can be incorporated during the DRL training process. Our approach both modifies the training loop to be more verification-friendly and also utilizes formal verification (in our case, k𝑘kitalic_k-induction), to ensure the correctness of the training. We also report a summary of our recent efforts to use Neural Lyapunov Barrier certificates [26] to generate DRL agents that not only perform well on large batches of data, but also meet rigorous correctness criteria as measured by state-of-the-art verification tools.

Finally, we introduce additional novel reachability-based approaches for providing safety and liveness guarantees about a DRL system. These approaches are derived from prior work on backward-tube reachability, forward-tube reachability, and abstraction-based reachability methods. Moreover, these approaches all follow a similar paradigm: the reachable space covered by all possible paths from the starting state space is over-approximated using a verification engine, and safety and liveness properties are checked over this over-approximated state space.

To demonstrate the usefulness of our approaches, we apply them to a benchmark satellite-control model developed in collaboration with industry partners (GE Aerospace Research and the U.S. Air Force). We demonstrate that liveness can be verified using our k𝑘kitalic_k-induction approach. Additionally, as a point of comparison, we showcase that the certificate-based approach is indeed able to generate a controller that provably behaves safely. Notably, the problem setting and controller complexity are beyond that acheived in previous work on formally verified controllers.

The other reachability-based methods fail on this benchmark. However, we believe that these failed attempts: (i) demonstrate the merits of our successful approaches in handling complex, nontrivial properties; (ii) can be of value to the community, by shedding light on vulnerabilities of alternate methods; and (iii) could be potentially successful when applied over different DRL systems.

We view this work as an important step towards the safe and reliable deployment of DRL controllers in real-world systems, especially in the complex domain of avionics. We additionally hope that our work will further motivate additional research in neural network verification, DRL safety, and specifically, their role in the important domain of DRL-controlled aerospace systems.

The rest of the paper is organized as follows. In Sec. II, we cover background on deep learning, DRL, and verification, and we also introduce Neural Lyapunov Barrier functions. In Sec. III, we introduce our benchmark problem, a 2D spacecraft docking challenge. We subsequently introduce our k-induction technique in Sec. IV, and we present alternative verification approaches in Sec. V. 111Code for these approaches is available at: github.com/NeuralNetworkVerification/artifact-dasc-docking Finally, we conclude in Sec. VI.

II Preliminaries and Related Work

II-A Safety and Liveness

In this paper, we are interested in obtaining DRL controllers that satisfy safety and liveness properties [2] in discrete-time settings.

Safety.

In a sequence satisfying a safety property, a bad state is never reached. For the set of system states 𝒳𝒳\mathcal{X}caligraphic_X, let τ𝒳𝜏superscript𝒳\tau\subseteq\mathcal{X}^{*}italic_τ ⊆ caligraphic_X start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT be the set of potential system trajectories. We say a trajectory α𝛼\alphaitalic_α satisfies safety property P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT if and only if each state in α𝛼\alphaitalic_α satisfies property P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. More formally:

α:ατ.xα.xP1.\forall\,\alpha:\alpha\in\tau.\forall\,x\in\alpha.\>x\vDash P_{1}.∀ italic_α : italic_α ∈ italic_τ . ∀ italic_x ∈ italic_α . italic_x ⊨ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . (1)

Finite-length trajectories terminating in a “bad” state (where P1subscript𝑃1P_{1}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT does not hold) constitute the set of trajectories in violation of the safety property.

Liveness.

On the other hand, a liveness property indicates a good state is eventually reached. A liveness property P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is satisfied by trajectory α𝛼\alphaitalic_α if and only if there exists a state x𝑥xitalic_x in α𝛼\alphaitalic_α where P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT holds. Defining τsuperscript𝜏\tau^{\infty}italic_τ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT as the set of infinite-length trajectories, we formally specify liveness property P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as:

α:ατ.xα.xP2.\forall\,\alpha:\alpha\in\tau^{\infty}.\>\exists\,x\in\alpha.\>x\vDash P_{2}.∀ italic_α : italic_α ∈ italic_τ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT . ∃ italic_x ∈ italic_α . italic_x ⊨ italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . (2)

Infinite-length trajectories which contain no “good” states (i.e., no states where P2subscript𝑃2P_{2}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT holds) constitute the set of trajectories in violation of the liveness property.

II-B DNNs, DNN Verification, and Dynamical Systems.

Deep Learning.

Deep neural networks (DNNs) consist of layers of neurons that perform some (usually nonlinear) transformation of the input [38]. In this paper, we investigate deep reinforcement learning (DRL), where we train a DNN to obtain a policy, which maps states to actions that control a system [54].

DNN Verification.

Given (i) a trained DNN (e.g., a DRL agent) N𝑁Nitalic_N; (ii) a pre-condition P𝑃Pitalic_P on the DNN’s inputs, limiting the input assignments; and (iii) a post-condition Q𝑄Qitalic_Q on the DNN’s outputs , the goal of DNN verification is to determine whether the property P(x)Q(N(x))𝑃𝑥𝑄𝑁𝑥P(x)\rightarrow Q(N(x))italic_P ( italic_x ) → italic_Q ( italic_N ( italic_x ) ) holds for any neural network input x𝑥xitalic_x. In many DNN verifiers (a.k.a., verification engines), this task is equivalently reduced to determining the satisfiability of the formula P(x)¬Q(N(x))𝑃𝑥𝑄𝑁𝑥P(x)\land\neg Q(N(x))italic_P ( italic_x ) ∧ ¬ italic_Q ( italic_N ( italic_x ) ). If the formula is satisfiable (SAT), then there is an input that satisfies the pre-condition and violates the post-condition, which means the property is violated. On the other hand, if the formula is unsatisfiable (UNSAT), then the property holds. It has been shown [49] that verification of piece-wise-linear DNNs is NP-complete. In recent years, the formal methods community has put forth various techniques for verifying and improving DNN reliability [6, 9, 70, 5, 13, 1, 17, 23]. These techniques include SMT-based methods [45, 8, 50, 52], optimization-based methods [55, 30, 68, 15], methods based on abstraction-refinement [32, 59, 10, 65, 58, 31, 22], methods based on shielding [51, 63, 24], and more.

Discrete-Time Dynamical Systems.

We consider discrete-time dynamical systems, particularly systems whose trajectories satisfy the equation:

xt+1=f(xt,ut),subscript𝑥𝑡1𝑓subscript𝑥𝑡subscript𝑢𝑡x_{t+1}=f(x_{t},u_{t}),italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_f ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , (3)

in which the transition function f𝑓fitalic_f takes as inputs the current state xt𝒳subscript𝑥𝑡𝒳x_{t}\in\mathcal{X}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_X and a control ut𝒰subscript𝑢𝑡𝒰u_{t}\in\mathcal{U}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_U and produces as output the subsequent state xt+1subscript𝑥𝑡1x_{t+1}italic_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT. To control these systems, we employ a policy π:𝒳𝒰:𝜋𝒳𝒰\pi:\mathcal{X}\rightarrow\mathcal{U}italic_π : caligraphic_X → caligraphic_U that takes in a state x𝒳𝑥𝒳x\in\mathcal{X}italic_x ∈ caligraphic_X and outputs a control action u=π(x)𝑢𝜋𝑥u=\pi(x)italic_u = italic_π ( italic_x ). In DRL, the controller π𝜋\piitalic_π is realized by a trained DNN agent. These learning-based controllers have proven to be effective in many real-world settings including robotics [26], biomedical systems [28], and energy management [44], due to their expressive power and ability to generalize to unseen, complex environments [67].

II-C Control Lyapunov Barrier Functions

The problem of verifying safety and liveness properties in a dynamical system can be solved by finding a function V:𝒳:𝑉maps-to𝒳V:\mathcal{X}\mapsto\mathbb{R}italic_V : caligraphic_X ↦ blackboard_R with certain properties. Control theory identifies two fundamental types of functions [53].

Lyapunov Functions.

Lyapunov functions, a.k.a., Control Lyapunov functions, capture the energy level at a particular state: over time, energy is dissipated along a trajectory until the system attains zero-energy equilibrium [41]. Lyapunov functions can guarantee asymptotic stability, which ensures the system eventually converges to some goal state (thereby satisfying a liveness property). Lyapunov functions must be (i𝑖iitalic_i) equal to 00 at equilibrium, (ii𝑖𝑖iiitalic_i italic_i) strictly positive at all other states; and (iii𝑖𝑖𝑖iiiitalic_i italic_i italic_i) monotonically decreasing [19, 18, 36].

Barrier Functions.

Barrier functions [4], a.k.a., Control Barrier Functions, guarantee that a system never enters an unsafe region (i.e., a “bad” state) in the state space. This is achieved by setting the function value to be above some threshold for unsafe states and then verifying that the system can never transition to a state where the function is above the threshold [3, 72, 12]. Previous work [60, 61, 69, 75] demonstrates how to obtain Barrier functions for various safety-critical tasks such as pedestrian avoidance, neural radiance field-based obstacle navigation [57], and multi-agent control.

Control Lyapunov Barrier Functions.

Often, it is necessary to ensure both safety and liveness properties simultaneously. In such cases, we can employ a Control Lyapunov Barrier Function (CLBF), which integrates the properties and guarantees of both Control Lyapunov functions and Control Barrier functions [27]. CLBFs can solve reach-while-avoid tasks [29], which we discuss next.

Reach-while-Avoid Tasks.

The goal of Reach-while-Avoid (RWA) tasks is to find a controller π𝜋\piitalic_π for a dynamical system such that every trajectory {x1,x2}subscript𝑥1subscript𝑥2\{x_{1},x_{2}...\}{ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … } produced under this controller (i) never enters an unsafe (“bad”) state; and (ii) eventually enters a goal (“good”) region or state. We can formally define the problem as:

{definition}

[Reach-while-Avoid Task]
Input: A dynamical system with a set of initial states 𝒳I𝒳subscript𝒳𝐼𝒳\mathcal{X}_{I}\subseteq\mathcal{X}caligraphic_X start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ⊆ caligraphic_X, a set of goal states 𝒳G𝒳subscript𝒳𝐺𝒳\mathcal{X}_{G}\subseteq\mathcal{X}caligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ⊆ caligraphic_X, and a set of unsafe states 𝒳U𝒳subscript𝒳𝑈𝒳\mathcal{X}_{U}\subseteq\mathcal{X}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ⊆ caligraphic_X, where 𝒳I𝒳U=subscript𝒳𝐼subscript𝒳𝑈\mathcal{X}_{I}\cap\mathcal{X}_{U}=\emptysetcaligraphic_X start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT ∩ caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = ∅ and 𝒳G𝒳U=subscript𝒳𝐺subscript𝒳𝑈\mathcal{X}_{G}\cap\mathcal{X}_{U}=\emptysetcaligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∩ caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = ∅
Output: A controller π𝜋\piitalic_π such that for every trajectory τ={x1,x2}𝜏subscript𝑥1subscript𝑥2\tau=\{x_{1},x_{2}...\}italic_τ = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT … } satisfying x1𝒳Isubscript𝑥1subscript𝒳𝐼x_{1}\in\mathcal{X}_{I}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT:

  1. 1.

    Reach: t.xt𝒳Gformulae-sequence𝑡subscript𝑥𝑡subscript𝒳𝐺\exists\,t\in\mathbb{N}.\>x_{t}\in\mathcal{X}_{G}∃ italic_t ∈ blackboard_N . italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT

  2. 2.

    Avoid: t.xt𝒳Uformulae-sequencefor-all𝑡subscript𝑥𝑡subscript𝒳𝑈\forall\,t\in\mathbb{N}.\>x_{t}\not\in\mathcal{X}_{U}∀ italic_t ∈ blackboard_N . italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∉ caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT

Some solutions for RWA tasks rely on control theoretic principles. The approach in [27] trains Lyapunov and Barrier certificates to solve RWA tasks. Hamilton-Jacobi (HJ) reachability-based methods [11]) have also been employed to solve RWA tasks [34, 43, 66]. Safe DRL is closely connected to RWA, with its goal being to maximize cumulative rewards while minimizing costs along a trajectory [14]. It has been solved with both Lyapunov/Barrier methods [73, 20] and HJ reachability methods [74, 35].

II-D Other Verification Approaches

Reachability Analysis.

Reachability analysis methods aim to define and compute the set of final reachable states and then verify that this set (i) does not include any bad states, and (ii) is contained within the goal region. Reachability methods include forward-tube and backward-tube verification [40], which either propagate states forward from the starting set or backward from the goal set. Other related work in reachability analysis includes hybrid system verifiers [46], growing the set of reachable states over a discrete action space [48], approximating reachable states during forward and backward reachability [39], and reformulating the dynamics of a system for easier reachability verification [37].

Bounded Model Checking and k𝑘kitalic_k-induction.

Bounded model checking uses a symbolic analysis over k𝑘kitalic_k copies of a system to check whether a bad state is reachable in k𝑘kitalic_k or fewer steps from the starting set of states. k𝑘kitalic_k-induction is similar, except that it starts from an arbitrary state and can thus be used to prove that a bad state is never reached. Bounded model checking has been explored in the WhiRL tool [33] using the neural network verifier Marabou [50, 71]. [64] implements another tool for checking adversarial cases and coverage using bounded model checking for artificial neural networks. WhiRL 2.0 [7] adds k𝑘kitalic_k-induction capabilities to WhiRL.

Design-for-Verification.

Design-for-verification broadly encompasses any method which aims to modify the design and training process to make verification easier. The Trainify framework [47] uses a CEGAR-based approach to grow an easily verifiable state space by repeatedly retraining the DRL system. [25] motivates an optimized DRL training approach to reduce the number of safety violations, easing formal verification. This approach was also implemented in Marabou [50, 71].

III 2D Docking Problem

We adopt as a motivating case study benchmark the 2D docking problem presented in [62]. The goal is to train a DRL controller to safely navigate a deputy spacecraft to a chief spacecraft within two-dimensional space. The reference frame is defined such that the chief spacecraft is always at the origin (0,0)00(0,0)( 0 , 0 ). The state of the deputy spacecraft is 𝒙=[x,y,x˙,y˙]𝒙𝑥𝑦˙𝑥˙𝑦\boldsymbol{x}=[x,y,\dot{x},\dot{y}]bold_italic_x = [ italic_x , italic_y , over˙ start_ARG italic_x end_ARG , over˙ start_ARG italic_y end_ARG ], where (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) are the position of the spacecraft and (x˙,y˙)˙𝑥˙𝑦(\dot{x},\dot{y})( over˙ start_ARG italic_x end_ARG , over˙ start_ARG italic_y end_ARG ) are the respective directional velocities.

III-A Dynamics

The system dynamics are defined according to the linearly-approximate Clohessy-Wiltshire relative orbital motion equations in a non-inertial Hill’s reference frame [21, 42]. The control input to the system is 𝒖=[Fx,Fy]𝒖subscript𝐹𝑥subscript𝐹𝑦\boldsymbol{u}=[F_{x},F_{y}]bold_italic_u = [ italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ], where Fxsubscript𝐹𝑥F_{x}italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and Fysubscript𝐹𝑦F_{y}italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT are the thrust forces applied to the deputy spacecraft in the x𝑥xitalic_x and y𝑦yitalic_y directions. We follow [62], setting the deputy spacecraft mass to m=12𝑚12m=12italic_m = 12 kg and the mean motion to n=0.001027𝑛0.001027n=0.001027italic_n = 0.001027 rad/s. The continuous time state dynamics of the system are given by the following differential equations:

𝒙˙˙𝒙\displaystyle\dot{\boldsymbol{x}}over˙ start_ARG bold_italic_x end_ARG =[x˙,y˙,x¨,y¨]absent˙𝑥˙𝑦¨𝑥¨𝑦\displaystyle=[\dot{x},\dot{y},\ddot{x},\ddot{y}]= [ over˙ start_ARG italic_x end_ARG , over˙ start_ARG italic_y end_ARG , over¨ start_ARG italic_x end_ARG , over¨ start_ARG italic_y end_ARG ] (4)
x¨¨𝑥\displaystyle\ddot{x}over¨ start_ARG italic_x end_ARG =2ny˙+3n2x+Fxmabsent2𝑛˙𝑦3superscript𝑛2𝑥subscript𝐹𝑥𝑚\displaystyle=2n\dot{y}+3n^{2}x+\frac{F_{x}}{m}= 2 italic_n over˙ start_ARG italic_y end_ARG + 3 italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_x + divide start_ARG italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG (5)
y¨¨𝑦\displaystyle\ddot{y}over¨ start_ARG italic_y end_ARG =2nx˙+Fymabsent2𝑛˙𝑥subscript𝐹𝑦𝑚\displaystyle=-2n\dot{x}+\frac{F_{y}}{m}= - 2 italic_n over˙ start_ARG italic_x end_ARG + divide start_ARG italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG (6)

Integration using a discrete time step T𝑇Titalic_T yields a closed-form next-state function. Given a state 𝒙=[x,y,x˙,y˙]𝒙𝑥𝑦˙𝑥˙𝑦\boldsymbol{x}=[x,y,\dot{x},\dot{y}]bold_italic_x = [ italic_x , italic_y , over˙ start_ARG italic_x end_ARG , over˙ start_ARG italic_y end_ARG ] and control inputs 𝒖=[Fx,Fy]𝒖subscript𝐹𝑥subscript𝐹𝑦\boldsymbol{u}=[F_{x},F_{y}]bold_italic_u = [ italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ], the spacecraft’s next state 𝒙i=[x,y,x˙,y˙]superscriptsubscript𝒙𝑖superscript𝑥superscript𝑦superscript˙𝑥superscript˙𝑦\boldsymbol{x}_{i}^{\prime}=[x^{\prime},y^{\prime},\dot{x}^{\prime},\dot{y}^{% \prime}]bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = [ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over˙ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over˙ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] after an elapsed time T𝑇Titalic_T is:

x=(2y˙n+4x+Fxmn2)+(2Fymn)+(Fxmn22y˙n3x)cos(nT)+(2Fymn2+x˙n)sin(nT)superscript𝑥2˙𝑦𝑛4𝑥subscript𝐹𝑥𝑚superscript𝑛22subscript𝐹𝑦𝑚𝑛subscript𝐹𝑥𝑚superscript𝑛22˙𝑦𝑛3𝑥𝑛𝑇2subscript𝐹𝑦𝑚superscript𝑛2˙𝑥𝑛𝑛𝑇\displaystyle\begin{split}x^{\prime}&=(\frac{2\dot{y}}{n}+4x+\frac{F_{x}}{mn^{% 2}})+(\frac{2F_{y}}{mn})+(-\frac{F_{x}}{mn^{2}}-\frac{2\dot{y}}{n}-3x)\\ &\qquad\cdot\cos{(nT)}+(\frac{-2F_{y}}{mn^{2}}+\frac{\dot{x}}{n})\sin{(nT)}% \end{split}start_ROW start_CELL italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL start_CELL = ( divide start_ARG 2 over˙ start_ARG italic_y end_ARG end_ARG start_ARG italic_n end_ARG + 4 italic_x + divide start_ARG italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) + ( divide start_ARG 2 italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n end_ARG ) + ( - divide start_ARG italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG - divide start_ARG 2 over˙ start_ARG italic_y end_ARG end_ARG start_ARG italic_n end_ARG - 3 italic_x ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⋅ roman_cos ( italic_n italic_T ) + ( divide start_ARG - 2 italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG over˙ start_ARG italic_x end_ARG end_ARG start_ARG italic_n end_ARG ) roman_sin ( italic_n italic_T ) end_CELL end_ROW (7)
y=(2x˙n+y+4Fymn2)+(2Fxmn3y˙6nx)T+3Fy2mt2+(4Fymn2+2x˙n)cos(nT)+(2Fxmn2+4y˙n+6x)sin(nT)\displaystyle\begin{split}y^{\prime}&=(-\frac{2\dot{x}}{n}+y+\frac{4F_{y}}{mn^% {2}})+(\frac{-2F_{x}}{mn}-3\dot{y}-6nx)T+-\frac{3F_{y}}{2m}t^{2}\\ &\qquad+(-\frac{4F_{y}}{mn^{2}}+\frac{2\dot{x}}{n})\cos{(nT)}+(\frac{2F_{x}}{% mn^{2}}+\frac{4\dot{y}}{n}+6x)\\ &\qquad\cdot\sin{(nT)}\end{split}start_ROW start_CELL italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL start_CELL = ( - divide start_ARG 2 over˙ start_ARG italic_x end_ARG end_ARG start_ARG italic_n end_ARG + italic_y + divide start_ARG 4 italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) + ( divide start_ARG - 2 italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n end_ARG - 3 over˙ start_ARG italic_y end_ARG - 6 italic_n italic_x ) italic_T + - divide start_ARG 3 italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_m end_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ( - divide start_ARG 4 italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 2 over˙ start_ARG italic_x end_ARG end_ARG start_ARG italic_n end_ARG ) roman_cos ( italic_n italic_T ) + ( divide start_ARG 2 italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 4 over˙ start_ARG italic_y end_ARG end_ARG start_ARG italic_n end_ARG + 6 italic_x ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⋅ roman_sin ( italic_n italic_T ) end_CELL end_ROW (8)
x˙=(2Fxmn)+(2Fymn+x)cos(nT)+(Fxmn+2y˙+3nx)sin(nT)superscript˙𝑥2subscript𝐹𝑥𝑚𝑛2subscript𝐹𝑦𝑚𝑛𝑥𝑛𝑇subscript𝐹𝑥𝑚𝑛2˙𝑦3𝑛𝑥𝑛𝑇\displaystyle\begin{split}\dot{x}^{\prime}&=(\frac{2F_{x}}{mn})+(\frac{-2F_{y}% }{mn}+x)\cos{(nT)}+(\frac{F_{x}}{mn}+2\dot{y}\\ &\qquad+3nx)\sin{(nT)}\\ \end{split}start_ROW start_CELL over˙ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL start_CELL = ( divide start_ARG 2 italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n end_ARG ) + ( divide start_ARG - 2 italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n end_ARG + italic_x ) roman_cos ( italic_n italic_T ) + ( divide start_ARG italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n end_ARG + 2 over˙ start_ARG italic_y end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + 3 italic_n italic_x ) roman_sin ( italic_n italic_T ) end_CELL end_ROW (9)
y˙=(2Fxmn3y˙6nx)+(3Fym)T+(2Fxmn+4y˙+6nx)cos(nT)+(4Fymn2x˙)sin(nT)superscript˙𝑦2subscript𝐹𝑥𝑚𝑛3˙𝑦6𝑛𝑥3subscript𝐹𝑦𝑚𝑇2subscript𝐹𝑥𝑚𝑛4˙𝑦6𝑛𝑥𝑛𝑇4subscript𝐹𝑦𝑚𝑛2˙𝑥𝑛𝑇\displaystyle\begin{split}\dot{y}^{\prime}&=(\frac{-2F_{x}}{mn}-3\dot{y}-6nx)+% (-\frac{3F_{y}}{m})T+(\frac{2F_{x}}{mn}+4\dot{y}\\ &\qquad+6nx)\cos{(nT)}+(\frac{4F_{y}}{mn}-2\dot{x})\sin{(nT)}\end{split}start_ROW start_CELL over˙ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_CELL start_CELL = ( divide start_ARG - 2 italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n end_ARG - 3 over˙ start_ARG italic_y end_ARG - 6 italic_n italic_x ) + ( - divide start_ARG 3 italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG start_ARG italic_m end_ARG ) italic_T + ( divide start_ARG 2 italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n end_ARG + 4 over˙ start_ARG italic_y end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + 6 italic_n italic_x ) roman_cos ( italic_n italic_T ) + ( divide start_ARG 4 italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_n end_ARG - 2 over˙ start_ARG italic_x end_ARG ) roman_sin ( italic_n italic_T ) end_CELL end_ROW (10)

III-B Liveness —– Docking Region

The problem as given in [62] defines a docking region which is a circle of radius 0.50.50.50.5 meters centered at the origin. The goal is for the deputy spacecraft to eventually enter this region. To simplify the verification query, it is easier to use linear bounds for the goal region, so we use a square centered at the origin with sides parallel to the axes of length 0.70.70.70.7 meters (note that this square fits inside the docking region of [62]). Formally, our liveness condition is:

α:ατ.t.|αt.x|0.35|αt.y|0.35,\forall\alpha:\alpha\in\tau^{\infty}.\>\exists t.\>|\alpha_{t}.x|\leq 0.35% \land|\alpha_{t}.y|\leq 0.35,∀ italic_α : italic_α ∈ italic_τ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT . ∃ italic_t . | italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . italic_x | ≤ 0.35 ∧ | italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . italic_y | ≤ 0.35 , (11)

where αtsubscript𝛼𝑡\alpha_{t}italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the state at time t𝑡titalic_t in trajectory α𝛼\alphaitalic_α, and αt.xformulae-sequencesubscript𝛼𝑡𝑥\alpha_{t}.xitalic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . italic_x and αt.yformulae-sequencesubscript𝛼𝑡𝑦\alpha_{t}.yitalic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT . italic_y are the x𝑥xitalic_x and y𝑦yitalic_y components of αtsubscript𝛼𝑡\alpha_{t}italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

III-C Safety — Velocity Threshold

To minimize the risk to both spacecraft, a safety constraint is imposed on the magnitude of the velocity of the deputy spacecraft. The constraint depends on the distance from the deputy. Formally,  [62] requires the following state invariant:

x˙2+y˙20.2+2nx2+y2superscript˙𝑥2superscript˙𝑦20.22𝑛superscript𝑥2superscript𝑦2\sqrt{\dot{x}^{2}+\dot{y}^{2}}\leq 0.2+2n\sqrt{x^{2}+y^{2}}square-root start_ARG over˙ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + over˙ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≤ 0.2 + 2 italic_n square-root start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG (12)

We therefore define the unsafe region to be the negation of (12).

Again, we desire to instead use a linear constraint in order to be compatible with our formal tools. We use the Euclidean norm approximation of [16], which approximates the norm by projecting it onto vectors in all different directions and taking the one with the maximum magnitude. We use the two inequalities:

maxi[1,n𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠](u1cos(2(i1)πn𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠)+u2sin(2(i1)πn𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠))u12+u22subscript𝑖1subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠subscript𝑢12𝑖1𝜋subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠subscript𝑢22𝑖1𝜋subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠superscriptsubscript𝑢12superscriptsubscript𝑢22\displaystyle\begin{split}\max_{i\in[1,n_{\mathit{directions}}]}(u_{1}\cdot% \cos(\frac{2(i-1)\pi}{n_{\mathit{directions}}})+u_{2}\\ \cdot\sin(\frac{2(i-1)\pi}{n_{\mathit{directions}}}))\leq\sqrt{u_{1}^{2}+u_{2}% ^{2}}\end{split}start_ROW start_CELL roman_max start_POSTSUBSCRIPT italic_i ∈ [ 1 , italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ roman_cos ( divide start_ARG 2 ( italic_i - 1 ) italic_π end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT end_ARG ) + italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋅ roman_sin ( divide start_ARG 2 ( italic_i - 1 ) italic_π end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT end_ARG ) ) ≤ square-root start_ARG italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW (13)

and

1cos(π/n𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠)maxi[1,n𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠](u1cos(2(i1)πn𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠)+u2sin(2(i1)πn𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠))u12+u22,1𝑐𝑜𝑠𝜋subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠subscript𝑖1subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠subscript𝑢12𝑖1𝜋subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠subscript𝑢22𝑖1𝜋subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠superscriptsubscript𝑢12superscriptsubscript𝑢22\displaystyle\begin{split}\frac{1}{cos(\pi/n_{\mathit{directions}})}\max_{i\in% [1,n_{\mathit{directions}}]}(u_{1}\cdot\cos(\frac{2(i-1)\pi}{n_{\mathit{% directions}}})\\ +u_{2}\cdot\sin(\frac{2(i-1)\pi}{n_{\mathit{directions}}}))\geq\sqrt{u_{1}^{2}% +u_{2}^{2}},\end{split}start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG italic_c italic_o italic_s ( italic_π / italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT ) end_ARG roman_max start_POSTSUBSCRIPT italic_i ∈ [ 1 , italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT ] end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ roman_cos ( divide start_ARG 2 ( italic_i - 1 ) italic_π end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT end_ARG ) end_CELL end_ROW start_ROW start_CELL + italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ roman_sin ( divide start_ARG 2 ( italic_i - 1 ) italic_π end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT end_ARG ) ) ≥ square-root start_ARG italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , end_CELL end_ROW (14)

where n𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠n_{\mathit{directions}}italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT is a positive integer. Larger values of n𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠n_{\mathit{directions}}italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT yield more precise approximations. We can simplify this by noting that:

u12+u22=|u1|2+|u2|2,superscriptsubscript𝑢12superscriptsubscript𝑢22superscriptsubscript𝑢12superscriptsubscript𝑢22\sqrt{u_{1}^{2}+u_{2}^{2}}=\sqrt{|u_{1}|^{2}+|u_{2}|^{2}},square-root start_ARG italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = square-root start_ARG | italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

and then focusing our search only on vectors in the first quadrant. Assuming n𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠n_{\mathit{directions}}italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT is a multiple of 4, we get:

under(u1,u2)=maxi[1,n𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠/4+1](|u1|cos(2(i1)πn𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠)+|u2|sin(2(i1)πn𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠))u12+u22undersubscript𝑢1subscript𝑢2subscript𝑖1subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠41subscript𝑢12𝑖1𝜋subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠subscript𝑢22𝑖1𝜋subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠superscriptsubscript𝑢12superscriptsubscript𝑢22\displaystyle\begin{split}\text{under}(u_{1},u_{2})&=\max_{i\in[1,n_{\mathit{% directions}}/4+1]}(|u_{1}|\cdot\cos(\frac{2(i-1)\pi}{n_{\mathit{directions}}})% \\ &\qquad+|u_{2}|\cdot\sin(\frac{2(i-1)\pi}{n_{\mathit{directions}}}))\\ &\leq\sqrt{u_{1}^{2}+u_{2}^{2}}\end{split}start_ROW start_CELL under ( italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL start_CELL = roman_max start_POSTSUBSCRIPT italic_i ∈ [ 1 , italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT / 4 + 1 ] end_POSTSUBSCRIPT ( | italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ⋅ roman_cos ( divide start_ARG 2 ( italic_i - 1 ) italic_π end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT end_ARG ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + | italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ⋅ roman_sin ( divide start_ARG 2 ( italic_i - 1 ) italic_π end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT end_ARG ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ square-root start_ARG italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW (15)

and

over(u1,u2)=1cos(π/n𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠)maxi[1,n𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠/4+1](|u1|cos(2(i1)πn𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠)+|u2|sin(2(i1)πn𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠))u12+u22.oversubscript𝑢1subscript𝑢21𝑐𝑜𝑠𝜋subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠subscript𝑖1subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠41subscript𝑢12𝑖1𝜋subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠subscript𝑢22𝑖1𝜋subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠superscriptsubscript𝑢12superscriptsubscript𝑢22\displaystyle\begin{split}\text{over}(u_{1},u_{2})&=\frac{1}{cos(\pi/n_{% \mathit{directions}})}\max_{i\in[1,n_{\mathit{directions}}/4+1]}(|u_{1}|\\ &\qquad\cdot\cos(\frac{2(i-1)\pi}{n_{\mathit{directions}}})+|u_{2}|\cdot\sin(% \frac{2(i-1)\pi}{n_{\mathit{directions}}}))\\ &\geq\sqrt{u_{1}^{2}+u_{2}^{2}}.\end{split}start_ROW start_CELL over ( italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG italic_c italic_o italic_s ( italic_π / italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT ) end_ARG roman_max start_POSTSUBSCRIPT italic_i ∈ [ 1 , italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT / 4 + 1 ] end_POSTSUBSCRIPT ( | italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⋅ roman_cos ( divide start_ARG 2 ( italic_i - 1 ) italic_π end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT end_ARG ) + | italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ⋅ roman_sin ( divide start_ARG 2 ( italic_i - 1 ) italic_π end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT end_ARG ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≥ square-root start_ARG italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . end_CELL end_ROW (16)

Using these constraints, we can over-approximate the unsafe region as

over(x˙t,y˙t)>0.2+2nunder(xt,yt).oversubscript˙𝑥𝑡subscript˙𝑦𝑡0.22𝑛undersubscript𝑥𝑡subscript𝑦𝑡\displaystyle\text{over}(\dot{x}_{t},\dot{y}_{t})>0.2+2n\cdot\text{under}(x_{t% },y_{t}).over ( over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over˙ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) > 0.2 + 2 italic_n ⋅ under ( italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) . (17)

This is a piece-wise linear constraint. Moreover, both the absolute value function and the maximum function can be easily encoded in neural network verification tools such as Marabou. In our experiments, we use n𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠=400subscript𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠400n_{\mathit{directions}}=400italic_n start_POSTSUBSCRIPT italic_directions end_POSTSUBSCRIPT = 400.

III-D DNN Setup

As in [62], we use Ray RLib’s Proximal Policy Optimization (PPO) reinforcement learning algorithm to learn the system dynamics, but we make four important alterations to improve downstream verification, part of our design for verification scheme.

III-D1 Scenario Regions

To improve performance near the docking region, we reduce the docking distance during training from 0.5 meters to 0.25 meters. We also simplify the problem by reducing the initial position of the deputy spacecraft from a radius of 150 meters to only 5 meters. Scaling back up to larger initial positions is part of an ongoing research effort.

III-D2 Speed Observations

We limit the observations of the agent to its x𝑥xitalic_x and y𝑦yitalic_y positions and respective x˙˙𝑥\dot{x}over˙ start_ARG italic_x end_ARG and y˙˙𝑦\dot{y}over˙ start_ARG italic_y end_ARG velocities, eliminating the agent’s observations of its current speed and the distance-dependent velocity constraint described in Equation 12. This makes it less likely that irregular trajectories will be learned because of observations of the safety constraint. As a result, liveness verification becomes easier.

III-D3 Distance Reward

We keep the rewards relating to success or failure, the safety constraint, and delta-v as presented in [62], but we alter the distance change reward to use the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT norm of the position of the deputy — i.e., the Manhattan distance from the deputy to the chief, rather than the nonlinear L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT norm. This is to match the induction invariant described in Section IV. To account for the new distance metric and previously-described smaller initial distances, we developed a novel reward function for distance change:

Rtdnewsubscriptsuperscript𝑅subscript𝑑𝑛𝑒𝑤𝑡\displaystyle R^{d_{new}}_{t}italic_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_n italic_e italic_w end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =2(ea1dtmea1dt1m)+2(ea2dtmea2dt1m),absent2superscript𝑒subscript𝑎1subscriptsuperscript𝑑𝑚𝑡superscript𝑒subscript𝑎1subscriptsuperscript𝑑𝑚𝑡12superscript𝑒subscript𝑎2subscriptsuperscript𝑑𝑚𝑡superscript𝑒subscript𝑎2subscriptsuperscript𝑑𝑚𝑡1\displaystyle=2\left(e^{-a_{1}d^{m}_{t}}-e^{-a_{1}d^{m}_{t-1}}\right)+2\left(e% ^{-a_{2}d^{m}_{t}}-e^{-a_{2}d^{m}_{t-1}}\right),= 2 ( italic_e start_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) + 2 ( italic_e start_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_d start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) , (18)

where dim=|xi|+|xi|subscriptsuperscript𝑑𝑚𝑖subscript𝑥𝑖subscript𝑥𝑖d^{m}_{i}=\left\lvert x_{i}\right\rvert+\left\lvert x_{i}\right\rvertitalic_d start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | + | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |, a1=ln(2)5subscript𝑎125a_{1}=\frac{\ln(2)}{5}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG roman_ln ( 2 ) end_ARG start_ARG 5 end_ARG, and a2=ln(2)0.5subscript𝑎220.5a_{2}=\frac{\ln(2)}{0.5}italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG roman_ln ( 2 ) end_ARG start_ARG 0.5 end_ARG.

III-D4 Model Architecture

Our DRL controller should be sufficiently small to keep verification time reasonable and sufficiently large to be able to learn the necessary behavior. We found that reducing the hidden layer widths from 256 neurons to 20 neurons, while maintaining two hidden layers, acheives a good balance between verification time and expressive power. Also, we swap the tanh activation functions for ReLU activation functions since ReLU is supported by most neural network verification tools (such as Marabou).

IV Using k𝑘kitalic_k-induction For Liveness Guarantees

In this section, we present an approach for scalably verifying a liveness property for the 2D docking problem presented in Section III using k𝑘kitalic_k-induction. We describe the conceptual approach, the experimental framework, and the results.

IV-A Proving Liveness by k𝑘kitalic_k-induction

In order to apply k𝑘kitalic_k-induction, we must find a way to reduce a liveness property to a k𝑘kitalic_k-inductive property. Typically, this is done by finding a ranking function, a function with a well-founded co-domain, which can be shown to always be decreasing by k𝑘kitalic_k-induction.

For the spacecraft, an obvious choice for a ranking function is the distance from the deputy to the chief. In order to make the function easier to reason about, we use a linear proxy function for the actual distance, namely the Manhattan distance. Unfortunately, it is not the case that this measure always decreases, as the spacecraft may move away from the target.

Thus, we instead propose a property that ensures the spacecraft eventually starts moving towards the target. The property is expressed as a logical disjunction: after k𝑘kitalic_k steps, either the Manhattan distance decreases or the magnitude of the velocity decreases. Again, we approximate the velocity magnitude by the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT norm, the sum of the absolute values of x˙˙𝑥\dot{x}over˙ start_ARG italic_x end_ARG and y˙˙𝑦\dot{y}over˙ start_ARG italic_y end_ARG. Formally, if the current state is (x0,y0,x˙0,y˙0)subscript𝑥0subscript𝑦0subscript˙𝑥0subscript˙𝑦0(x_{0},y_{0},\dot{x}_{0},\dot{y}_{0})( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over˙ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and the future state after k𝑘kitalic_k steps is (x,y,x˙,y˙)superscript𝑥superscript𝑦superscript˙𝑥superscript˙𝑦(x^{\prime},y^{\prime},\dot{x}^{\prime},\dot{y}^{\prime})( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over˙ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over˙ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), we must show:

(|x|+|y|)(|x0|+|y0|)<ϵ(|x˙|+|y˙|)(|x˙0|+|y˙0|)<ϵ,formulae-sequencesuperscript𝑥superscript𝑦subscript𝑥0subscript𝑦0italic-ϵsuperscript˙𝑥superscript˙𝑦subscript˙𝑥0subscript˙𝑦0italic-ϵ(|x^{\prime}|+|y^{\prime}|)–(|x_{0}|+|y_{0}|)<–\epsilon\quad\bigvee\quad(|\dot% {x}^{\prime}|+|\dot{y}^{\prime}|)–(|\dot{x}_{0}|+|\dot{y}_{0}|)<–\epsilon,( | italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + | italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ) – ( | italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | + | italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | ) < – italic_ϵ ⋁ ( | over˙ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + | over˙ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ) – ( | over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | + | over˙ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | ) < – italic_ϵ , (19)

where ϵitalic-ϵ\epsilonitalic_ϵ is some positive value.

Proposition 1.

If property (19) holds (for some k𝑘kitalic_k) for every state, then eventually the spacecraft will be moving towards the goal (i.e., the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT norm of the position will decrease).

Proof.

Suppose that from some starting state, (x0,y0,x˙0,y˙0)subscript𝑥0subscript𝑦0subscript˙𝑥0subscript˙𝑦0(x_{0},y_{0},\dot{x}_{0},\dot{y}_{0})( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over˙ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), the spacecraft follows a trajectory that never moves towards the goal in the sense that the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT norm never decreases. Let (xi,yi,x˙i,y˙i)subscript𝑥𝑖subscript𝑦𝑖subscript˙𝑥𝑖subscript˙𝑦𝑖(x_{i},y_{i},\dot{x}_{i},\dot{y}_{i})( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over˙ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) be the state after i𝑖iitalic_i time steps. This means that for all i𝑖iitalic_i, |xi|+|yi||xi+1|+|yi+1|subscript𝑥𝑖subscript𝑦𝑖subscript𝑥𝑖1subscript𝑦𝑖1|x_{i}|+|y_{i}|\leq|x_{i+1}|+|y_{i+1}|| italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | + | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | ≤ | italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT | + | italic_y start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT |. Let Vi=|x˙i|+|y˙i|subscript𝑉𝑖subscript˙𝑥𝑖subscript˙𝑦𝑖V_{i}=|\dot{x}_{i}|+|\dot{y}_{i}|italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = | over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | + | over˙ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |. By (19), we know that for each Visubscript𝑉𝑖V_{i}italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, there must be some k𝑘kitalic_k, such that Vi+kVi<ϵsubscript𝑉𝑖𝑘subscript𝑉𝑖italic-ϵV_{i+k}-V_{i}<-\epsilonitalic_V start_POSTSUBSCRIPT italic_i + italic_k end_POSTSUBSCRIPT - italic_V start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < - italic_ϵ. Thus, for any n𝑛nitalic_n, we can construct a sequence Vj0,Vj1,Vj2,Vjnsubscript𝑉subscript𝑗0subscript𝑉subscript𝑗1subscript𝑉subscript𝑗2subscript𝑉subscript𝑗𝑛V_{j_{0}},V_{j_{1}},V_{j_{2}},\dots V_{j_{n}}italic_V start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_V start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_V start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , … italic_V start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT such that j0=0subscript𝑗00j_{0}=0italic_j start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0 and VjiVji+1>ϵsubscript𝑉subscript𝑗𝑖subscript𝑉subscript𝑗𝑖1italic-ϵV_{j_{i}}-V_{j_{i+1}}>\epsilonitalic_V start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_V start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT > italic_ϵ. If we then take n>V0/ϵ𝑛subscript𝑉0italic-ϵn>V_{0}/\epsilonitalic_n > italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT / italic_ϵ, we get that Vjn<0subscript𝑉subscript𝑗𝑛0V_{j_{n}}<0italic_V start_POSTSUBSCRIPT italic_j start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT < 0, which is impossible. ∎

Algorithm.

We verify (19) using Algorithm 1. We gradually increase k𝑘kitalic_k until the property holds, a maximum of k=kmax𝑘subscript𝑘𝑚𝑎𝑥k=k_{max}italic_k = italic_k start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT is reached, or a timeout is exceeded.

0:  Bounds on state components x0,y0,x˙0,y˙0subscript𝑥0subscript𝑦0subscript˙𝑥0subscript˙𝑦0x_{0},y_{0},\dot{x}_{0},\dot{y}_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over˙ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, values for kmin,kmaxsubscript𝑘𝑚𝑖𝑛subscript𝑘𝑚𝑎𝑥k_{min},k_{max}italic_k start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_k start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT
0:  If result = UNSAT, then property (19) holds for all states within the defined bounds.
1:  for each k[kmin,kmax]𝑘subscript𝑘𝑚𝑖𝑛subscript𝑘𝑚𝑎𝑥k\in[k_{min},k_{max}]italic_k ∈ [ italic_k start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_k start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ] do
12:     Verify the negation of the distilled property: ¬((|x|+|y|)(|x0|+|y0|)<ϵ(|x˙|+|y˙|)(|x˙0|+|y˙0|)<ϵ))\neg\left(\begin{aligned} &(|x^{\prime}|+|y^{\prime}|)–(|x_{0}|+|y_{0}|)<–% \epsilon\\ &\bigvee\\ &(|\dot{x}^{\prime}|+|\dot{y}^{\prime}|)–(|\dot{x}_{0}|+|\dot{y}_{0}|)<–% \epsilon)\end{aligned}\right)¬ ( start_ROW start_CELL end_CELL start_CELL ( | italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + | italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ) – ( | italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | + | italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | ) < – italic_ϵ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⋁ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ( | over˙ start_ARG italic_x end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | + | over˙ start_ARG italic_y end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | ) – ( | over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | + | over˙ start_ARG italic_y end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | ) < – italic_ϵ ) end_CELL end_ROW )
3:     if UNSAT then
4:        result = [UNSAT, k𝑘kitalic_k]
5:        break;
6:     else
7:        result = [SAT, k𝑘kitalic_k, counterexample k𝑘kitalic_k-step trajectory].
8:     end if
9:  end for
10:  return  result
Algorithm 1 Algorithm for k𝑘kitalic_k-induction.

Input bounds for the state space can be chosen according to the problem specification. It is also important to note that different kminsubscript𝑘𝑚𝑖𝑛k_{min}italic_k start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT and kmaxsubscript𝑘𝑚𝑎𝑥k_{max}italic_k start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT values can be chosen. In practice, in order to make the verification more tractable, we first split the state space into subregions, then call the algorithm on each subregion. For each subregion of the state space, we explore values of k𝑘kitalic_k from kminsubscript𝑘𝑚𝑖𝑛k_{min}italic_k start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT to kmaxsubscript𝑘𝑚𝑎𝑥k_{max}italic_k start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT. For each k𝑘kitalic_k, a neural network verifier is invoked to check if the negation of the property holds after k𝑘kitalic_k steps. There are three possible results of the algorithm.

  1. 1.

    If the negation of the property is satisfiable for each k𝑘kitalic_k, the algorithm returns SAT along with a counter-example.

  2. 2.

    If the negation of the property is unsatisfiable for some k𝑘kitalic_k, this means that the property holds for that value of k𝑘kitalic_k. In this case, the algorithm returns UNSAT together with the value of k𝑘kitalic_k for which unsatisfiability was determined. In this case, verification of the region is complete.

  3. 3.

    If a predefined timeout is exceeded, the algorithm terminates and a timeout result is returned.

Experimental Setup. We use Marabou for the neural network verification step. We set the following parameters for Marabou: “verbosity=0, timeoutInSeconds=5000, numWorkers=10, tighteningStrategy=“sbt”, solveWithMILP=True”. Marabou also requires a back-end linear programming engine. We use Gurobi 9.5.

We start with positional bounds of |x|,|y|[25,25]𝑥𝑦2525|x|,|y|\in[-25,25]| italic_x | , | italic_y | ∈ [ - 25 , 25 ] and velocity bounds of x˙,y˙[0.2,0.2]˙𝑥˙𝑦0.20.2\dot{x},\dot{y}\in[-0.2,0.2]over˙ start_ARG italic_x end_ARG , over˙ start_ARG italic_y end_ARG ∈ [ - 0.2 , 0.2 ]). We initially divide these into 25 subregions by focusing on 5×5555\times 55 × 5 regions in the positional space. A subregion is further subdivided if Algorithm 1 times out. We set kminsubscript𝑘𝑚𝑖𝑛k_{min}italic_k start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT to 1, kmaxsubscript𝑘𝑚𝑎𝑥k_{max}italic_k start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT to 20, and use a timeout of 1.4 hours for each loop iteration (i.e., 30 hours if all values of k𝑘kitalic_k time out).

Results. We end up with 71 subregions. For each subregion, Algorithm 1 returns UNSAT. The minimum returned value for k𝑘kitalic_k is 1, the maximum is 12, the average is 5, and the median is 3.

Notably, regions close to the goal region are more difficult: they require more subregions and take longer, whereas regions more distant can sometimes be verified without utilizing additional subregions. The minimum runtime (in seconds) for any subregion is 0.02, the maximum is 4295.86, the average is 193.62, and the median is 1.76.

As a sanity check, we validated our results experimentally by running a simulation framework. Starting from randomly sampled points in the state space, we confirmed that the k𝑘kitalic_k-inductive property holds on the trajectory starting at each point. These checks also succeeded.

Discussion.

Initially, we applied our approach to the neural network controller described in [62]. The original network topology (two hidden layers with 256 nodes each) resulted in lengthy verification times. Moreover, for many regions, the verification failed: we discovered counter-examples for all tested values of k𝑘kitalic_k.

Refer to caption
(a) Initial neural network.
Refer to caption
(b) Retrained neural network.
Figure 1: Design for Verification: An initial controller trajectory compared to a final controller trajectory, with the same initial state. The final controller has a more direct trajectory which is more conducive to verification via k𝑘kitalic_k-induction.

Figure 1(a) shows an example counterexample trajectory from the original neural network. The starting state is [x=0.5347935396499356,y=0.51,x˙=0.00038615766226848813,y˙=0.00038615766226848813]delimited-[]formulae-sequence𝑥0.5347935396499356formulae-sequence𝑦0.51formulae-sequence˙𝑥0.00038615766226848813˙𝑦0.00038615766226848813[x=0.5347935396499356,y=0.51,\dot{x}=0.00038615766226848813,\dot{y}=0.00038615% 766226848813][ italic_x = 0.5347935396499356 , italic_y = 0.51 , over˙ start_ARG italic_x end_ARG = 0.00038615766226848813 , over˙ start_ARG italic_y end_ARG = 0.00038615766226848813 ]. The controller moves steadily away from the goal, and only after many steps turns the spacecraft around to move towards the goal.

Such trajectories provided motivation for the design changes mentioned in Section III-D. In particular, the changes to the reward function strongly incentivize the controller to move towards the goal region. Figure 1(b) shows the trajectory using the verified controller, starting from the same starting state. Note how the spacecraft moves nearly directly towards the goal region.

The successful verification of (19) is not sufficient to establish that the deputy eventually reaches the chief. We would need to establish a second property, namely that once the spacecraft is moving towards its goal, it always gets closer (by at least some ϵitalic-ϵ\epsilonitalic_ϵ) within k𝑘kitalic_k steps. Let xi,yisubscript𝑥𝑖subscript𝑦𝑖x_{i},y_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be the position i𝑖iitalic_i steps from some starting position (x0,y0)subscript𝑥0subscript𝑦0(x_{0},y_{0})( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). This can be formalized with the property:

(|x1|+|y1|)(|x0|+|y0|)<0k.(|xk|+|yk|)(|x0|+|y0|)<ϵ.\displaystyle\begin{split}(|x_{1}|+|y_{1}|)–(|x_{0}|+|y_{0}|)<0\implies\\ \quad\quad\exists\,k.\>(|x_{k}|+|y_{k}|)–(|x_{0}|+|y_{0}|)<–\epsilon.\end{split}start_ROW start_CELL ( | italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | + | italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | ) – ( | italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | + | italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | ) < 0 ⟹ end_CELL end_ROW start_ROW start_CELL ∃ italic_k . ( | italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | + | italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | ) – ( | italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | + | italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | ) < – italic_ϵ . end_CELL end_ROW (20)

Formally verifying this property is left to future work.

IV-B An Alternative Approach using Polar Coordinates

Before moving to the Manhattan distance, we explored an alternative approach using polar coordinates, which allows the L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT norm to be used directly in the invariant while maintaining linearity. More specifically, if r𝑟ritalic_r is the distance to the origin and θ𝜃\thetaitalic_θ is the angle from the x𝑥xitalic_x-axis, then we can write the equivalent of property (19) as:

rr<ϵr˙r˙<ϵ.superscript𝑟𝑟italic-ϵsuperscript˙𝑟˙𝑟italic-ϵr^{\prime}-r<-\epsilon\vee\dot{r}^{\prime}-\dot{r}<-\epsilon.italic_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_r < - italic_ϵ ∨ over˙ start_ARG italic_r end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - over˙ start_ARG italic_r end_ARG < - italic_ϵ . (21)

Note how much simpler property 21 is compared with property (19). However, there remain two challenges: training a polar controller and converting the dynamics to polar coordinates.

Training a controller for the polar system is not straightforward; it requires complex parameter changes, for example, adjusting the learning rate, observation vector order, and the length and normalization constants. However, these challenges are ultimately solvable, and we were able to train a network that takes polar coordinate inputs. The output is still Fxsubscript𝐹𝑥F_{x}italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and Fysubscript𝐹𝑦F_{y}italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT, as we did not envision changing the physical spacecraft system.

The second challenge proved more difficult. We needed a way to calculuate new values of r𝑟ritalic_r and θ𝜃\thetaitalic_θ, given current values of r𝑟ritalic_r, θ𝜃\thetaitalic_θ, r˙˙𝑟\dot{r}over˙ start_ARG italic_r end_ARG, and θ˙˙𝜃\dot{\theta}over˙ start_ARG italic_θ end_ARG, as well as Fxsubscript𝐹𝑥F_{x}italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and Fysubscript𝐹𝑦F_{y}italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT. We did not find closed-form solutions in the literature for the Clohessy–Wiltshire Equations utilizing polar coordinates. We thus converted equations (7) through  (10) to polar coordinates using the standard conversion equations:

x=rcosθ,y=rsinθ,r=x2+y2,θ=tan1yxformulae-sequence𝑥𝑟𝜃formulae-sequence𝑦𝑟𝜃formulae-sequence𝑟superscript𝑥2superscript𝑦2𝜃superscript1𝑦𝑥\displaystyle x=r\cos\theta,\quad y=r\sin\theta,\quad r=\sqrt{x^{2}+y^{2}},% \quad\theta=\tan^{-1}\frac{y}{x}italic_x = italic_r roman_cos italic_θ , italic_y = italic_r roman_sin italic_θ , italic_r = square-root start_ARG italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , italic_θ = roman_tan start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT divide start_ARG italic_y end_ARG start_ARG italic_x end_ARG (22)

We encoded the derivation of the equations directly in Python, which allowed us to confirm in simulation that our polar neural network had behavior similar to that of the original model. However, attempting formal verification with the new dynamics proved difficult. The new dynamics are highly non-linear. We attempted to use the OVERT tool222https://github.com/sisl/OVERT.jl for the purpose of linearizing r𝑟ritalic_r and θ𝜃\thetaitalic_θ. However, the results were too complex and ultimately unsuccessful. It was at this point that we decided to instead use the L1superscript𝐿1L^{1}italic_L start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT norm and revert to standard rectangular coordinates.

We report this effort here in order to highlight both the potential benefits and pitfalls of using a different coordinate representation. If the dynamics had been more tractable in polar space, this would have been an attractive direction.

V Alternate Verification Approaches

While exploring the k𝑘kitalic_k-induction approaches described above, we concurrently explored an alternative approach using Neural Lyapunov Barrier certificates. The results of that effort represent the most complete verification results we have obtained to date and are reported in [56]. Here, for convenience, we review that approach at a high level and present some details not reported there. We also discuss several reachability-based approaches, which we also applied to the 2D docking problem, but which were, ultimately, unsuccessful.

V-A RWA Certificates

{definition}

A function V:𝒳:𝑉maps-to𝒳V:\mathcal{X}\mapsto\mathbb{R}italic_V : caligraphic_X ↦ blackboard_R is an RWA certificate for the task defined in Definition II-C if there exist some α>βγ𝛼𝛽𝛾\alpha>\beta\geq\gammaitalic_α > italic_β ≥ italic_γ and ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0, such that the following constraints are satisfied.

x𝒳.for-all𝑥𝒳\displaystyle\forall\,x\in\mathcal{X}.∀ italic_x ∈ caligraphic_X . V(x)γ𝑉𝑥𝛾\displaystyle V(x)\geq\gammaitalic_V ( italic_x ) ≥ italic_γ (23)
x𝒳I.for-all𝑥subscript𝒳𝐼\displaystyle\forall\,x\in\mathcal{X}_{I}.∀ italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT . V(x)β𝑉𝑥𝛽\displaystyle V(x)\leq\betaitalic_V ( italic_x ) ≤ italic_β (24)
x𝒳𝒳G.for-all𝑥𝒳subscript𝒳𝐺\displaystyle\forall\,x\in\mathcal{X}\setminus\mathcal{X}_{G}.∀ italic_x ∈ caligraphic_X ∖ caligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT . V(x)βV(x)V(f(x,π(x)))ϵ𝑉𝑥𝛽𝑉𝑥𝑉𝑓𝑥𝜋𝑥italic-ϵ\displaystyle V(x)\leq\beta\rightarrow V(x)-V(f(x,\pi(x)))\geq\epsilonitalic_V ( italic_x ) ≤ italic_β → italic_V ( italic_x ) - italic_V ( italic_f ( italic_x , italic_π ( italic_x ) ) ) ≥ italic_ϵ (25)
x𝒳U.for-all𝑥subscript𝒳𝑈\displaystyle\forall\,x\in\mathcal{X}_{U}.∀ italic_x ∈ caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT . V(x)α𝑉𝑥𝛼\displaystyle V(x)\geq\alphaitalic_V ( italic_x ) ≥ italic_α (26)

Any tuple of values (α,β,ϵ,γ)𝛼𝛽italic-ϵ𝛾(\alpha,\beta,\epsilon,\gamma)( italic_α , italic_β , italic_ϵ , italic_γ ) for which these conditions hold is called a witness for the certificate.333These constraints are similar to those in  [29] but are specific to discrete-time systems and do not place constraints on a compact safe set, opting to use an unsafe set instead. RWA certificates provide the following guarantee.

Lemma 1.

If V is an RWA certificate for a dynamical system with witness (α,β,ϵ,γ)𝛼𝛽italic-ϵ𝛾(\alpha,\beta,\epsilon,\gamma)( italic_α , italic_β , italic_ϵ , italic_γ ), then for every trajectory τ𝜏\tauitalic_τ starting from a state x𝒳𝒳G𝑥𝒳subscript𝒳𝐺x\in\mathcal{X}\setminus\mathcal{X}_{G}italic_x ∈ caligraphic_X ∖ caligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT such that V(x)β𝑉𝑥𝛽V(x)\leq\betaitalic_V ( italic_x ) ≤ italic_β, τ𝜏\tauitalic_τ will eventually contain a state in 𝒳Gsubscript𝒳𝐺\mathcal{X}_{G}caligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT without ever passing through a state in 𝒳Usubscript𝒳𝑈\mathcal{X}_{U}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT.

We use reinforcement learning to jointly train neural networks for both the controller and the corresponding RWA certificate.

RWA Training Loss.

The training objective for RWA certificates is described below:

Ossubscript𝑂𝑠\displaystyle O_{s}italic_O start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT =csi|xi𝒳IReLU(δ1+V(xi)β)i|xi𝒳I1absentsubscript𝑐𝑠subscriptconditional𝑖subscript𝑥𝑖subscript𝒳𝐼ReLUsubscript𝛿1𝑉subscript𝑥𝑖𝛽subscriptconditional𝑖subscript𝑥𝑖subscript𝒳𝐼1\displaystyle=c_{s}\sum_{i\,|\,x_{i}\in\mathcal{X}_{I}}\frac{\text{ReLU}(% \delta_{1}+V(x_{i})-\beta)}{\sum_{i\,|\,x_{i}\in\mathcal{X}_{I}}1}= italic_c start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ReLU ( italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_V ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_β ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT end_POSTSUBSCRIPT 1 end_ARG (27)
Odsubscript𝑂𝑑\displaystyle O_{d}italic_O start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT =cdi|xi𝒳(𝒳U𝒳G),V(xi)<βReLU(δ2+ϵ+V(xi)V(xi))i|xi𝒳(𝒳U𝒳G),V(xi)<β1absentsubscript𝑐𝑑subscriptformulae-sequenceconditional𝑖subscript𝑥𝑖𝒳subscript𝒳𝑈subscript𝒳𝐺𝑉subscript𝑥𝑖𝛽ReLUsubscript𝛿2italic-ϵ𝑉subscriptsuperscript𝑥𝑖𝑉subscript𝑥𝑖subscriptformulae-sequenceconditional𝑖subscript𝑥𝑖𝒳subscript𝒳𝑈subscript𝒳𝐺𝑉subscript𝑥𝑖𝛽1\displaystyle=c_{d}\!\!\!\!\!\!\sum_{i\,|\,x_{i}\in\mathcal{X}\setminus(% \mathcal{X}_{U}\cup\mathcal{X}_{G}),V(x_{i})<\beta}\frac{\text{ReLU}(\delta_{2% }+\epsilon+V(x^{\prime}_{i})-V(x_{i}))}{\sum_{i\,|\,x_{i}\in\mathcal{X}% \setminus(\mathcal{X}_{U}\cup\mathcal{X}_{G}),V(x_{i})<\beta}1}= italic_c start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_X ∖ ( caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ∪ caligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) , italic_V ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < italic_β end_POSTSUBSCRIPT divide start_ARG ReLU ( italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_ϵ + italic_V ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - italic_V ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_X ∖ ( caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ∪ caligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) , italic_V ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < italic_β end_POSTSUBSCRIPT 1 end_ARG (28)
Ousubscript𝑂𝑢\displaystyle O_{u}italic_O start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT =cui|xi𝒳UReLU(δ3V(xi)+α)i|xi𝒳U1absentsubscript𝑐𝑢subscriptconditional𝑖subscript𝑥𝑖subscript𝒳𝑈ReLUsubscript𝛿3𝑉subscript𝑥𝑖𝛼subscriptconditional𝑖subscript𝑥𝑖subscript𝒳𝑈1\displaystyle=c_{u}\sum_{i\,|\,x_{i}\in\mathcal{X}_{U}}\frac{\text{ReLU}(% \delta_{3}-V(x_{i})+\alpha)}{\sum_{i\,|\,x_{i}\in\mathcal{X}_{U}}1}= italic_c start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_i | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG ReLU ( italic_δ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT - italic_V ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_α ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT end_POSTSUBSCRIPT 1 end_ARG (29)
O𝑂\displaystyle Oitalic_O =Os+Od+Ouabsentsubscript𝑂𝑠subscript𝑂𝑑subscript𝑂𝑢\displaystyle=O_{s}+O_{d}+O_{u}= italic_O start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + italic_O start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT + italic_O start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT (30)

Equation (27) penalizes deviations from constraint (24), Equation (28) penalizes deviations from constraint (25), and Equation (29) penalizes deviations from constraint (26). We incorporate parameters δ1>0subscript𝛿10\delta_{1}>0italic_δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0, δ2>0subscript𝛿20\delta_{2}>0italic_δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0, and δ3>0subscript𝛿30\delta_{3}>0italic_δ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT > 0, which can be used to tune how strongly the certificate over-approximates adherence to each constraint. Similarly, constants cssubscript𝑐𝑠c_{s}italic_c start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, cdsubscript𝑐𝑑c_{d}italic_c start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, cusubscript𝑐𝑢c_{u}italic_c start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT can be used to tune the relative weight of the two objectives. The final training objective O𝑂Oitalic_O in (30) is what the optimizer seeks to minimize, by using stochastic gradient descent (SGD) or other optimization techniques.

γ𝛾\gammaitalic_γ lower bound.

It is important to note that the RWA training objective does not explicitly penalize deviations from Equation (23). Instead, because V𝑉Vitalic_V is implemented as a neural network using floating-point arithmetic, it has only a finite number of possible inputs and outputs, so Equation (23) must hold for some γ𝛾\gammaitalic_γ. In practice, we can use Marabou to find γ𝛾\gammaitalic_γ by doing a linear search for the minimum value of V𝑉Vitalic_V: we simply set γ𝛾\gammaitalic_γ to some initial value, say α𝛼\alphaitalic_α, then repeatedly check x.V(x)<γformulae-sequence𝑥𝑉𝑥𝛾\exists\,x.\>V(x)<\gamma∃ italic_x . italic_V ( italic_x ) < italic_γ, updating γ𝛾\gammaitalic_γ with the new value each time the query is satisfiable, and repeat until the query is unsastisfiable.

Sampling from 𝒳Usubscript𝒳𝑈\mathcal{X}_{U}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT and 𝒳𝒳G𝒳subscript𝒳𝐺\mathcal{X}\setminus\mathcal{X}_{G}caligraphic_X ∖ caligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT.

While 𝒳Isubscript𝒳𝐼\mathcal{X}_{I}caligraphic_X start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT is typically defined as having both upper and lower bounds on state variables, this is not the case for 𝒳Usubscript𝒳𝑈\mathcal{X}_{U}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT, which often has only lower bounds on state variables (this is the case, for example, for the 2D docking problem defined in Section III).

However, during training, we do impose an upper bound on the states sampled from 𝒳Usubscript𝒳𝑈\mathcal{X}_{U}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT. Specifically, if the controller operates over n𝑛nitalic_n-dimensional states x=[x1,x2,..,xn]x=[x_{1},x_{2},..,x_{n}]italic_x = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , . . , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ], we sample points satisfying the following constraints:

(x1>p1)(x2>p2)(xn>pn)subscript𝑥1subscript𝑝1subscript𝑥2subscript𝑝2subscript𝑥𝑛subscript𝑝𝑛\displaystyle(x_{1}>p_{1})\vee(x_{2}>p_{2})\vee...\vee(x_{n}>p_{n})( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∨ ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∨ … ∨ ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT > italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) (31)
(x1<p1+γ1)(x2<p2+γ2)(xn<pn+γn)subscript𝑥1subscript𝑝1subscript𝛾1subscript𝑥2subscript𝑝2subscript𝛾2subscript𝑥𝑛subscript𝑝𝑛subscript𝛾𝑛\displaystyle(x_{1}<p_{1}+\gamma_{1})\wedge(x_{2}<p_{2}+\gamma_{2})\wedge...% \wedge(x_{n}<p_{n}+\gamma_{n})( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∧ ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∧ … ∧ ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT < italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) (32)

Here, 31 represents the (given) lower bounds on the unsafe region 𝒳Usubscript𝒳𝑈\mathcal{X}_{U}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT, and γ1,,γnsubscript𝛾1subscript𝛾𝑛\gamma_{1},...,\gamma_{n}italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_γ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are chosen to be strictly greater than 0.

A similar issue arises when sampling from 𝒳𝒳G𝒳subscript𝒳𝐺\mathcal{X}\setminus\mathcal{X}_{G}caligraphic_X ∖ caligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT. This can often be solved simply by sampling instead from 𝒳(𝒳G𝒳U)𝒳subscript𝒳𝐺subscript𝒳𝑈\mathcal{X}\setminus(\mathcal{X}_{G}\cup\mathcal{X}_{U})caligraphic_X ∖ ( caligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∪ caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ), as the lower bounds on variables in 𝒳Usubscript𝒳𝑈\mathcal{X}_{U}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT then create upper bounds for the sampling step.

Masking out 𝒳Usubscript𝒳𝑈\mathcal{X}_{U}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT.

For objective 28, if xisubscriptsuperscript𝑥𝑖x^{\prime}_{i}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT lies in 𝒳Usubscript𝒳𝑈\mathcal{X}_{U}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT, we replace the actual value of V(xi)𝑉subscriptsuperscript𝑥𝑖V(x^{\prime}_{i})italic_V ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) with α𝛼\alphaitalic_α. This is because we learn correct functional behaviors of 𝒳Usubscript𝒳𝑈\mathcal{X}_{U}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT through objective 29 regardless, and thus using the actual value of V(xi)𝑉subscriptsuperscript𝑥𝑖V(x^{\prime}_{i})italic_V ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) would lead to unnecessary training effort and excessive penalties.

Certificate Warmup.

To improve training, the objective is used to train the certificate V𝑉Vitalic_V alone for a few iterations, after which training includes both the certificate and the controller. This is done to avoid erratic training of the controller when V𝑉Vitalic_V has random weights.

RWA Verification.

In order to obtain formal guarantees, we use Marabou to formally verify the constraints in Section V-A. Verification of RWA constraints is generally straightforward, but we have to similarly bound 𝒳Usubscript𝒳𝑈\mathcal{X}_{U}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT and 𝒳𝒳G𝒳subscript𝒳𝐺\mathcal{X}\setminus\mathcal{X}_{G}caligraphic_X ∖ caligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT to verify constraints 26 and 25 respectively. Instead of using 𝒳𝒳G𝒳subscript𝒳𝐺\mathcal{X}\setminus\mathcal{X}_{G}caligraphic_X ∖ caligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT as the input space for 25, we use instead 𝒳(𝒳G𝒳U)𝒳subscript𝒳𝐺subscript𝒳𝑈\mathcal{X}\setminus(\mathcal{X}_{G}\cup\mathcal{X}_{U})caligraphic_X ∖ ( caligraphic_X start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∪ caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ), which provides the same guarantees. Moreover, instead of using 𝒳Usubscript𝒳𝑈\mathcal{X}_{U}caligraphic_X start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT as the input space for 26, we use the bounded space, call it 𝒳USsubscriptsuperscript𝒳𝑆𝑈\mathcal{X}^{S}_{U}caligraphic_X start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT, used for data sampling. To ensure this provides the same guarantees, we check that no states beyond the upper bound of 𝒳USsubscriptsuperscript𝒳𝑆𝑈\mathcal{X}^{S}_{U}caligraphic_X start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT are reachable.

Instead of encoding verification as a single property passed to the DNN verifier, verification is partitioned into muliple queries. This is done by paritioning the input space in the original property into equally sized smaller state spaces, over which the same property is checked. This helps avoid unreasonably long verification times that can occur with a large monolithic query.

Retraining.

If any of the RWA verification checks return counterexamples, these are used to augment the training data set, and then training is done again. This process repeats until no more counterexamples are found. We weight counterexamples more heavily in the objective function 30 (compared to points in the initial training dataset) in order to focus the training on removing the counterexamples.

Results and Analysis.

As shown in prior work in [56], RWA certificates can provide liveness and safety guarantees for the 2D spacecraft docking problem defined in Section III. More details and a pointer to the code can be found in [56].

V-B Reachability Analysis Approaches

In this subsection, we discuss approaches based on reachability analysis. While these approaches were ultimately unsuccessful on the case study problem outlined in section III, we still mention them here, as the reasons for their failure may be of interest, and they may be useful on other problems.

Forward-tube and Backward-tube Reachability.

Forward-tube and backward-tube reachability attempt to generate a path over abstract state spaces (i.e., sets of states) from the starting state space to the goal state space. At each step along the abstract path, we check that every state in the abstract state set meets any safety guarantees.

In forward-tube reachability, a starting set of states 𝒳F0subscriptsuperscript𝒳0𝐹\mathcal{X}^{0}_{F}caligraphic_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and step size k𝑘kitalic_k is defined. Then, a set of states 𝒳F1subscriptsuperscript𝒳1𝐹\mathcal{X}^{1}_{F}caligraphic_X start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT is constructed such that all states reachable from 𝒳F0subscriptsuperscript𝒳0𝐹\mathcal{X}^{0}_{F}caligraphic_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT in k𝑘kitalic_k steps are contained within 𝒳F1subscriptsuperscript𝒳1𝐹\mathcal{X}^{1}_{F}caligraphic_X start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT. This process is continued, and additional sets of states 𝒳Fi+1subscriptsuperscript𝒳𝑖1𝐹\mathcal{X}^{i+1}_{F}caligraphic_X start_POSTSUPERSCRIPT italic_i + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT are constructed, each with the property that they contain the states reachable from 𝒳Fisubscriptsuperscript𝒳𝑖𝐹\mathcal{X}^{i}_{F}caligraphic_X start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT in k𝑘kitalic_k steps. If at some point, the constructed set is a subset of the goal region, then the liveness property is ensured. However, it can be very challenging to find a sequence of sets of states 𝒳Fisubscriptsuperscript𝒳𝑖𝐹\mathcal{X}^{i}_{F}caligraphic_X start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT that eventually lead to a subset of the goal region. This was the case for the spacecraft example.

On the other hand, in backward-tube reachability, we start with 𝒳B0subscriptsuperscript𝒳0𝐵\mathcal{X}^{0}_{B}caligraphic_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT set equal to the goal states and define a step size k𝑘kitalic_k. Then, a set of states 𝒳B1subscriptsuperscript𝒳1𝐵\mathcal{X}^{1}_{B}caligraphic_X start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT is constructed such that all states reachable from 𝒳B1subscriptsuperscript𝒳1𝐵\mathcal{X}^{1}_{B}caligraphic_X start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT in k𝑘kitalic_k steps are contained within 𝒳B0subscriptsuperscript𝒳0𝐵\mathcal{X}^{0}_{B}caligraphic_X start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT. Again, this process can be repeated until the set of states includes the initial states. A difficulty with this approach is computing a sufficiently large previous set of states at each step.

Grid Reachability.

Grid reachability is a process which first partitions a bounded subset of the state space into cells, then computes a directed graph where each cell is a vertex, and each directed edge (a,b)𝑎𝑏(a,b)( italic_a , italic_b ) denotes that vertex b𝑏bitalic_b is reachable from vertex a𝑎aitalic_a in k𝑘kitalic_k steps, for a specific k𝑘kitalic_k, as shown in Fig. 2. The goal is to show that for all paths constructed from cells in the defined initial state space, a goal region reachable. However, to ensure liveness, it is also necessary to show that the graph has no cycles and that it is not possible to reach any cells beyond the partitioned state space.

Refer to caption
Figure 2: Grid reachability, with a cell navigating towards the docking region (in green)

We applied this technique to the spacecraft example. A challenge is preventing self-cycles in the graph. One strategy for doing this is to construct cells where at least one velocity component never changes sign. It is easy to see that for such cells, the spacecraft cannot remain in the cell forever, so we can ignore self-loops on such cells. For cells containing a velocity sign-change, we use a very narrow velocity range, narrow enough to ensure that the spacecraft leaves the range in k𝑘kitalic_k steps. It is also desirable to limit the number of cells reachable from a given cell, to avoid the need to do many reachability checks. This can be ensured by making the cells large enough that it is impossible to cross more than one cell in a single set of k𝑘kitalic_k steps.

1 Let 𝐼𝑆𝐼𝑆\mathit{IS}italic_IS be the input space
2 Let k𝑘kitalic_k be the step size
3 Divide 𝐼𝑆𝐼𝑆\mathit{IS}italic_IS into cells C=c0,c1,,cn𝐶subscript𝑐0subscript𝑐1subscript𝑐𝑛C=c_{0},c_{1},...,c_{n}italic_C = italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT
4 Let vertices V=C𝑉𝐶V=Citalic_V = italic_C
5 Initialize edge set E𝐸Eitalic_E to be the empty set
6 i=0𝑖0i=0italic_i = 0
7 for in𝑖𝑛i\leq nitalic_i ≤ italic_n do
8       Denote set of adjacent cells to cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as Crsubscript𝐶𝑟C_{r}italic_C start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT
9       Add cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to Crsubscript𝐶𝑟C_{r}italic_C start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT if self-cycles are possible
10       for crCrsubscript𝑐𝑟subscript𝐶𝑟c_{r}\in C_{r}italic_c start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∈ italic_C start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT do
11             if crsubscript𝑐𝑟c_{r}italic_c start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT is reachable from cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in k𝑘kitalic_k steps then
12                   Add directed edge (ci,cr)subscript𝑐𝑖subscript𝑐𝑟(c_{i},c_{r})( italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) to E𝐸Eitalic_E
13                  
14      i=i+1𝑖𝑖1i=i+1italic_i = italic_i + 1
15Let G:=(V,E)assign𝐺𝑉𝐸G:=(V,E)italic_G := ( italic_V , italic_E )
16 Check for cycles in G𝐺Gitalic_G
17 if G𝐺Gitalic_G is acyclic then
18       Determine cells Cssubscript𝐶𝑠C_{s}italic_C start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT with no paths leaving input space
19       return Cssubscript𝐶𝑠C_{s}italic_C start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT as cells meeting liveness property
20      
Algorithm 2 Applying Grid Reachability
Analysis of Grid Reachability.

We applied grid reachability to a state space with x,y[10,10]𝑥𝑦1010x,y\in[-10,10]italic_x , italic_y ∈ [ - 10 , 10 ] and x˙,y˙[1.6,1.6]˙𝑥˙𝑦1.61.6\dot{x},\dot{y}\in[-1.6,1.6]over˙ start_ARG italic_x end_ARG , over˙ start_ARG italic_y end_ARG ∈ [ - 1.6 , 1.6 ] using Algorithm 2. A binary search was conducted using Marabou to determine cell bounds such that cells could only reach adjacent cells. The step size k𝑘kitalic_k was chosen to be 1111.

We found a variety of cycles of increasing lengths, even as cells were divided further in an attempt to refine the grid abstraction. Moreover, we found that all cells had paths leaving the input space. We showcase one such trajectory of cells with this behavior in Fig. 3. In this trajectory, we see that for the first three steps, the velocity component ranges are negative, thereby guiding the spacecraft towards the goal region, but there is a path from cell 3 to cell 4 that induces a positive velocity component, allowing the path to diverge.

Refer to caption
Figure 3: Spurious trajectory with grid reachability

Ultimately, the grid abstraction does not lend itself well to the liveness task because such spurious paths are difficult to rule out. While further refinement of the grid approach is possible and could eventually yield a workable approach, we determined that the complexity and difficulty were too high, and abandoned it in favor of the certificate approach mentioned earlier.

VI Conclusion

We have presented methods for verifying safety and liveness properties for DRL systems using k𝑘kitalic_k-induction, Neural Lyapunov Barrier Certificates, and reachability analysis. We explore their effectiveness on a 2D spacecraft docking problem posed in previous work. For this problem, we show how a k𝑘kitalic_k-induction based approach can be used alongside a design-for-verification training scheme to provide liveness guarantees. We also discuss how Neural Lyapunov Barrier Certificates can be used to provide both liveness and safety guarantees. While reachability analysis ultimately did not provide any formal guarantees, we discuss the approach and its limitations. In future work, we plan to explore scaling these methods to more complex and realistic control systems.

VII Acknowledgements

This work was supported by AFOSR (FA9550-22-1-0227), the Stanford CURIS program, the NSF-BSF program (NSF: 1814369, BSF: 2017662), and the Stanford Center for AI Safety. The work of Amir was further supported by a scholarship from the Clore Israel Foundation. We thank Thomas Henzinger (ISTA), Chuchu Fan (MIT), and Songyuan Zhang (MIT) for useful conversations and advice, which contributed to the success of this project.

References

  • [1] P. Alamdari, G. Avni, T. Henzinger, and A. Lukina. Formal Methods with a Touch of Magic. In Proc. 20th Int. Conf. on Formal Methods in Computer-Aided Design (FMCAD), pages 138–147, 2020.
  • [2] B. Alpern and F. Schneider. Recognizing safety and liveness. Distributed Computing, 2:117–126, 09 1987.
  • [3] A. Ames, X. Xu, J. W. Grizzle, and P. Tabuada. Control barrier function based quadratic programs for safety critical systems. Trans. on Automatic Control, 2017.
  • [4] A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada. Control barrier functions: Theory and applications. In European Control Conf., 2019.
  • [5] G. Amir, D. Corsi, R. Yerushalmi, L. Marzari, D. Harel, A. Farinelli, and G. Katz. Verifying Learning-Based Robotic Navigation Systems. In Proc. 29th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 607–627, 2023.
  • [6] G. Amir, O. Maayan, T. Zelazny, G. Katz, and M. Schapira. Verifying Generalization in Deep Learning. In Proc. 35th Int. Conf. on Computer Aided Verification (CAV), pages 438–455, 2023.
  • [7] G. Amir, M. Schapira, and G. Katz. Towards Scalable Verification of Deep Reinforcement Learning. In Proc. 21st Int. Conf. on Formal Methods in Computer-Aided Design (FMCAD), pages 193–203, 2021.
  • [8] G. Amir, H. Wu, C. Barrett, and G. Katz. An SMT-Based Approach for Verifying Binarized Neural Networks. In Proc. 27th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 203–222, 2021.
  • [9] G. Amir, T. Zelazny, G. Katz, and M. Schapira. Verification-Aided Deep Ensemble Selection. In Proc. 22nd Int. Conf. on Formal Methods in Computer-Aided Design (FMCAD), pages 27–37, 2022.
  • [10] G. Anderson, S. Pailoor, I. Dillig, and S. Chaudhuri. Optimization and Abstraction: a Synergistic Approach for Analyzing Neural Network Robustness. In Proc. 40th ACM SIGPLAN Conf. on Programming Languages Design and Implementations (PLDI), pages 731–744, 2019.
  • [11] S. Bansal, M. Chen, S. Herbert, and C. J. Tomlin. Hamilton-Jacobi reachability: A brief overview and recent advances. In Conf. on Decision and Control, 2017.
  • [12] G. Basile and G. Marro. Controlled and conditioned invariant subspaces in linear system theory. Journal of Optimization Theory and Applications, 3:306–315, 1969.
  • [13] S. Bassan, G. Amir, D. Corsi, I. Refaeli, and G. Katz. Formally Explaining Neural Networks within Reactive Systems. In Proc. 23rd Int. Conf. on Formal Methods in Computer-Aided Design (FMCAD), pages 10–22, 2023.
  • [14] L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5:411–444, 2022.
  • [15] R. Bunel, I. Turkaslan, P. Torr, P. Kohli, and P. Mudigonda. A Unified View of Piecewise Linear Neural Network Verification. In Proc. 32nd Conf. on Neural Information Processing Systems (NeurIPS), pages 4795–4804, 2018.
  • [16] J.-T. Camino, C. Artigues, L. Houssin, and S. Mourgues. Linearization of euclidean norm dependent inequalities applied to multibeam satellites design. Computational Optimization and Applications, 73:679–705, 2019.
  • [17] M. Casadio, E. Komendantskaya, M. Daggitt, W. Kokke, G. Katz, G. Amir, and I. Refaeli. Neural Network Robustness as a Verification Property: A Principled Case Study. In Proc. 34th Int. Conf. on Computer Aided Verification (CAV), pages 219–231, 2022.
  • [18] Y.-C. Chang and S. Gao. Stabilizing neural control using self-learned almost lyapunov critics. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1803–1809, 2021.
  • [19] Y.-C. Chang, N. Roohi, and S. Gao. Neural lyapunov control. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  • [20] Y. Chow, O. Nachum, E. Duenez-Guzman, and M. Ghavamzadeh. A lyapunov-based approach to safe reinforcement learning. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 8092–8101. Curran Associates, Inc., 2018.
  • [21] W. Clohessy and R. Wiltshire. Terminal guidance system for satellite rendezvous. Journal of the aerospace sciences, 27(9):653–658, 1960.
  • [22] E. Cohen, Y. Elboher, C. Barrett, and G. Katz. Tighter Abstract Queries in Neural Network Verification. In Proc. 24th Int. Conf. on Logic for Programming, Artificial Intelligence and Reasoning (LPAR), 2023.
  • [23] D. Corsi, G. Amir, G. Katz, and A. Farinelli. Analyzing Adversarial Inputs in Deep Reinforcement Learning, 2024. Technical Report. https://arxiv.org/abs/2402.05284.
  • [24] D. Corsi, G. Amir, A. Rodriguez, C. Sanchez, G. Katz, and R. Fox. Verification-Guided Shielding for Deep Reinforcement Learning, 2024. Technical Report. http://arxiv.org/abs/2406.06507.
  • [25] D. Corsi, E. Marchesini, A. Farinelli, and P. Fiorini. Formal verification for safe deep reinforcement learning in trajectory generation. In 2020 Fourth IEEE International Conference on Robotic Computing (IRC), pages 352–359, 2020.
  • [26] C. Dawson, S. Gao, and C. Fan. Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods for robotics and control. IEEE Transactions on Robotics, 2023.
  • [27] C. Dawson, Z. Qin, S. Gao, and C. Fan. Safe nonlinear control using robust neural lyapunov-barrier functions. In Conference on Robot Learning, pages 1724–1735. PMLR, 2022.
  • [28] J. L. C. B. de Farias and W. M. Bessa. Intelligent control with artificial neural networks for automated insulin delivery systems. Bioengineering, 9(11):664, 2022.
  • [29] A. Edwards, A. Peruffo, and A. Abate. A general verification framework for dynamical and control models via certificate synthesis, 2023.
  • [30] R. Ehlers. Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks. In Proc. 15th Int. Symp. on Automated Technology for Verification and Analysis (ATVA), pages 269–286, 2017.
  • [31] Y. Elboher, E. Cohen, and G. Katz. Neural Network Verification using Residual Reasoning. In Proc. 20th Int. Conf. on Software Engineering and Formal Methods (SEFM), pages 173–189, 2022.
  • [32] Y. Elboher, J. Gottschlich, and G. Katz. An Abstraction-Based Framework for Neural Network Verification. In Proc. 32nd Int. Conf. on Computer Aided Verification (CAV), pages 43–65, 2020.
  • [33] T. Eliyahu, Y. Kazak, G. Katz, and M. Schapira. Verifying learning-augmented systems. Proceedings of the 2021 ACM SIGCOMM 2021 Conference, 2021.
  • [34] J. F. Fisac, M. Chen, C. J. Tomlin, and S. S. Sastry. Reach-avoid problems with time-varying dynamics, targets and constraints. In Proceedings of the 18th international conference on hybrid systems: computation and control, pages 11–20, 2015.
  • [35] M. Ganai, Z. Gong, C. Yu, S. L. Herbert, and S. Gao. Iterative reachability estimation for safe reinforcement learning. In Advances in Neural Information Processing Systems, 2023.
  • [36] M. Ganai, C. Hirayama, Y.-C. Chang, and S. Gao. Learning stabilization control from observations by learning lyapunov-like proxy models. 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 2913–2920, 2023.
  • [37] O. Gates, M. Newton, and K. Gatsis. Scalable forward reachability analysis of multi-agent systems with neural network controllers, 2023.
  • [38] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016.
  • [39] S. Govindaraju and D. Dill. Verification by approximate forward and backward reachability. In 1998 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (IEEE Cat. No.98CB36287), pages 366–370, 1998.
  • [40] A. Gupta and I. Hwang. Safety verification of model based reinforcement learning controllers, 2020.
  • [41] W. Haddad and V. Chellaboina. Nonlinear dynamical systems and control: A lyapunov-based approach. Nonlinear Dynamical Systems and Control: A Lyapunov-Based Approach, 01 2008.
  • [42] G. W. Hill. Researches in the lunar theory. American journal of Mathematics, 1(1):5–26, 1878.
  • [43] K.-C. Hsu, V. Rubies-Royo, C. J. Tomlin, and J. F. Fisac. Safety and liveness guarantees through reach-avoid reinforcement learning. In Proceedings of Robotics: Science and Systems, Virtual, 7 2021.
  • [44] T. Huang, S. Gao, and L. Xie. A neural lyapunov approach to transient stability assessment of power electronics-interfaced networked microgrids. IEEE transactions on smart grid, 13(1):106–118, 2021.
  • [45] X. Huang, M. Kwiatkowska, S. Wang, and M. Wu. Safety Verification of Deep Neural Networks. In Proc. 29th Int. Conf. on Computer Aided Verification (CAV), pages 3–29, 2017.
  • [46] R. Ivanov, J. Weimer, R. Alur, G. J. Pappas, and I. Lee. Verisig: verifying safety properties of hybrid systems with neural network controllers, 2018.
  • [47] P. Jin, J. Tian, D. Zhi, X. Wen, and M. Zhang. Trainify: A cegar-driven training and verification framework for safe deep reinforcement learning. In Computer Aided Verification: 34th International Conference, CAV 2022, Haifa, Israel, August 7–10, 2022, Proceedings, Part I, page 193–218, Berlin, Heidelberg, 2022. Springer-Verlag.
  • [48] K. D. Julian and M. J. Kochenderfer. A reachability method for verifying dynamical systems with deep neural network controllers, 2019.
  • [49] G. Katz, C. Barrett, D. Dill, K. Julian, and M. Kochenderfer. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. In Proc. 29th Int. Conf. on Computer Aided Verification (CAV), pages 97–117, 2017.
  • [50] G. Katz, D. Huang, D. Ibeling, K. Julian, C. Lazarus, R. Lim, P. Shah, S. Thakoor, H. Wu, A. Zeljić, D. Dill, M. Kochenderfer, and C. Barrett. The Marabou Framework for Verification and Analysis of Deep Neural Networks. In Proc. 31st Int. Conf. on Computer Aided Verification (CAV), pages 443–452, 2019.
  • [51] B. Könighofer, F. Lorber, N. Jansen, and R. Bloem. Shield Synthesis for Reinforcement Learning. In Proc. Int. Symposium on Leveraging Applications of Formal Methods, Verification and Validation (ISoLA), pages 290–306, 2020.
  • [52] L. Kuper, G. Katz, J. Gottschlich, K. Julian, C. Barrett, and M. Kochenderfer. Toward Scalable Verification for Safety-Critical Deep Networks, 2018. Technical Report. https://arxiv.org/abs/1801.05950.
  • [53] B. Li, S. Wen, Z. Yan, G. Wen, and T. Huang. A survey on the control lyapunov function and control barrier function for nonlinear-affine control systems. IEEE/CAA Journal of Automatica Sinica, 10(3):584–602, 2023.
  • [54] Y. Li. Deep Reinforcement Learning: An Overview, 2017. Technical Report. http://arxiv.org/abs/1701.07274.
  • [55] A. Lomuscio and L. Maganti. An Approach to Reachability Analysis for Feed-Forward ReLU Neural Networks, 2017. Technical Report. http://arxiv.org/abs/1706.07351.
  • [56] U. Mandal, G. Amir, H. Wu, I. Daukantas, F. Newell, U. Ravaioli, B. Meng, M. Durling, M. Ganai, T. Shim, G. Katz, and C. Barrett. Formally Verifying Deep Reinforcement Learning Controllers with Lyapunov Barrier Certificates. In Proc. 24th Int. Conf. on Formal Methods in Computer-Aided Design (FMCAD), 2024.
  • [57] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  • [58] M. Ostrovsky, C. Barrett, and G. Katz. An Abstraction-Refinement Approach to Verifying Convolutional Neural Networks. In Proc. 20th. Int. Symposium on Automated Technology for Verification and Analysis (ATVA), pages 391–396, 2022.
  • [59] P. Prabhakar and Z. Afzal. Abstraction Based Output Range Analysis for Neural Networks, 2020. Technical Report. https://arxiv.org/abs/2007.09527.
  • [60] Z. Qin, T.-W. Weng, and S. Gao. Quantifying safety of learning-based self-driving control using almost-barrier functions. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 12903–12910. IEEE, 2022.
  • [61] Z. Qin, K. Zhang, Y. Chen, J. Chen, and C. Fan. Learning safe multi-agent control with decentralized neural barrier certificates. In ICLR, 2021.
  • [62] U. J. Ravaioli, J. Cunningham, J. McCarroll, V. Gangal, K. Dunlap, and K. L. Hobbs. Safe reinforcement learning benchmark environments for aerospace control systems. In 2022 IEEE Aerospace Conference (AERO), pages 1–20. IEEE, 2022.
  • [63] A. Rodriguez, G. Amir, D. Corsi, C. Sanchez, and G. Katz. Shield Synthesis for LTL Modulo Theories, 2024. Technical Report. http://arxiv.org/abs/2406.04184.
  • [64] L. H. Sena, I. V. Bessa, M. R. Gadelha, L. C. Cordeiro, and E. Mota. Incremental bounded model checking of artificial neural networks in cuda. In 2019 IX Brazilian Symposium on Computing Systems Engineering (SBESC), pages 1–8, 2019.
  • [65] G. Singh, T. Gehr, M. Puschel, and M. Vechev. An Abstract Domain for Certifying Neural Networks. In Proc. 46th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL), 2019.
  • [66] O. So and C. Fan. Solving stabilize-avoid optimal control via epigraph form and deep reinforcement learning. In Proceedings of Robotics: Science and Systems, 2023.
  • [67] V. Talpaert, I. Sobh, B. R. Kiran, P. Mannion, S. Yogamani, A. El-Sallab, and P. Perez. Exploring applications of deep reinforcement learning for real-world autonomous driving systems, 2019.
  • [68] V. Tjeng, K. Xiao, and R. Tedrake. Evaluating Robustness of Neural Networks with Mixed Integer Programming. In Proc. 7th Int. Conf. on Learning Representations (ICLR), 2019.
  • [69] M. Tong, C. Dawson, and C. Fan. Enforcing safety for vision-based controllers via control barrier functions and neural radiance fields. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 10511–10517. IEEE, 2023.
  • [70] M. Usman, D. Gopinath, Y. Sun, Y. Noller, and C. Pǎsǎreanu. NNrepair: Constraint-based Repair of Neural Network Classifiers, 2021. Technical Report. http://arxiv.org/abs/2103.12535.
  • [71] H. Wu, O. Isac, A. Zeljić, T. Tagomori, M. Daggitt, W. Kokke, I. Refaeli, G. Amir, K. Julian, S. Bassan, et al. Marabou 2.0: A Versatile Formal Analyzer of Neural Networks. In Proc. 36th Int. Conf. on Computer Aided Verification (CAV), 2024.
  • [72] X. Xu, P. Tabuada, J. W. Grizzle, and A. D. Ames. Robustness of control barrier functions for safety critical control. Int. Federation of Automatic Control, 2015.
  • [73] Y. Yang, Y. Jiang, Y. Liu, J. Chen, and S. E. Li. Model-free safe reinforcement learning through neural barrier certificate. IEEE Robotics and Automation Letters, 2023.
  • [74] D. Yu, H. Ma, S. Li, and J. Chen. Reachability constrained reinforcement learning. In International Conference on Machine Learning, pages 25636–25655. PMLR, 2022.
  • [75] H. Yu, C. Hirayama, C. Yu, S. Herbert, and S. Gao. Sequential neural barriers for scalable dynamic obstacle avoidance. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 11241–11248. IEEE, 2023.