Conformance Testing For Stochastic Cyber-Physical Systems

Conformance Testing for Stochastic Cyber-Physical Systems
Xin Qin1 , Navid Hashemi1 , Lars Lindemann1 , and Jyotirmoy V. Deshmukh1

1
Thomas Lord Department of Computer Science, University of Southern California
arXiv:2308.06474v1 [eess.SY] 12 Aug 2023
August 15, 2023
Abstract
Conformance is defined as a measure of distance between the behaviors of two dynamical sys-
tems. The notion of conformance can accelerate system design when models of varying fidelities
are available on which analysis and control design can be done more efficiently. Ultimately, con-
formance can capture distance between design models and their real implementations and thus
aid in robust system design. In this paper, we are interested in the conformance of stochastic
dynamical systems. We argue that probabilistic reasoning over the distribution of distances be-
tween model trajectories is a good measure for stochastic conformance. Additionally, we propose
the non-conformance risk to reason about the risk of stochastic systems not being conformant.
We show that both notions have the desirable transference property, meaning that conformant
systems satisfy similar system specifications, i.e., if the first model satisfies a desirable specifi-
cation, the second model will satisfy (nearly) the same specification. Lastly, we propose how
stochastic conformance and the non-conformance risk can be estimated from data using statis-
tical tools such as conformal prediction. We present empirical evaluations of our method on an
F-16 aircraft, an autonomous vehicle, a spacecraft, and Dubin’s vehicle.
1 Introduction
Cyber-physical systems (CPS) are usually designed using a model-based design (MBD) paradigm.
Here, the designer models the physical parts and the operating environment of the system and then
designs the software used for perception, planning, and low-level control. Such closed-loop systems
are then rigorously tested against various operating conditions, where the quality of the designed
software is evaluated against model properties such as formal design specifications (or other kinds of
quantitative objectives). Examples of such property-based analysis techniques include requirement
falsification [1–5], nondeterministic and statistical verification [6–13], and risk analysis [14, 15].
MBD is a fundamentally iterative process in which the designer continuously modifies the soft-
ware to tune performance or increase safety margins, or change plant models to perform design
space exploration [16], e.g., using model abstraction or simplification [17–19], or to incorporate
new data [20]. Any change to the system model, however, requires repeating the property-based
analyses as many times as the number of system properties. The fundamental problem that we
consider in this paper is that of conformance [21–25]. The notion of conformance is defined w.r.t.
the input-output behavior of a model. Typically, model inputs include exogenous disturbances or
user-inputs to the model, user-controllable design parameters, and initial operating conditions. For
a given input u, let y = S(u) denote the observable behavior of the model S. Furthermore, let
d(y1 , y2 ) be a metric defined over the space of the model behaviors. For deterministic models, two
1
models S1 , S2 are said to be δ-conformant if for all inputs u it holds that d(y1 , y2 ) < δ where
y1 = S1 (u) and y2 = S2 (u) [22, 23, 25]. This notion of deterministic conformance is useful to reason
about worst-case differences between models. However, most CPS applications use components that
exhibit stochastic behavior; for example, sensors have measurement noise, actuators can have man-
ufacturing variations, and most physical phenomena are inherently stochastic. The central question
that this paper considers is: What is the notion of conformance between two stochastic CPS models?
There are some challenges in comparing stochastic CPS models; even if two models are repeat-
edly excited by the same input, the pair of model behaviors that are observed may be different for
every such simulation. Thus, the observable behavior of a stochastic model is more accurately char-
acterized by a distribution over the space of trajectories. A possible way to compare two stochastic
models is to use measure-theoretic techniques to compare the distance between the trajectory dis-
tributions. A number of divergence measures such as the f-divergences, e.g., the Kullback-Leibler
divergence and the total variation distance, or the Wasserstein distance may look like candidate
tools to compare the trajectory distributions. However, we argue in this paper that a divergence is
not the right notion to use to compare stochastic CPS models. There can be two stochastic models
whose output trajectories are very close using any trajectory space metric, but the divergence be-
tween their trajectory distributions can be infinite. On the other hand, there can be two trajectory
distributions with zero divergence for which the distance between trajectories can be arbitrarily far
apart.
This raises an interesting question: how do we then compare two stochastic models? In this
paper, we argue that probabilistic bounds derived from the distribution of the distances between
model trajectories (excited by the same input) gives us a general definition of conformance that
has several advantages, as outlined below. We complement this probabilistic viewpoint further and
capture the risk that the distribution of the distancs between model trajectories is large leveraging
risk measures [26].
First, we show that two stochastic systems that are conformant under our definition inherit the
property of transference [22]. In simple terms, transference is the property that if the first model has
certain logical or quantitative properties, then the second model also satisfies the same (or nearly
same) properties. This property brings several benefits. Consider the scenario where probabilistic
guarantees that a model has certain quantitative properties have been established after an exten-
sive and large number of simulations. Ordinarily, if there were any changes made to the model,
establishing such probabilistic guarantees would require repeating the extensive simulation-based
procedure. However, transference allows us to potentially sample from existing simulations for the
first model and sample a small number of simulations from the modified model to establish stochas-
tic conformance between the models, thereby allowing us to establish probabilistic guarantees on the
second model. We demonstrate examples of such transference w.r.t. quantitative properties arising
from quantitative semantics of temporal logic specifications and control-theoretic cost functions.
Next, we show how we can efficiently compute these probabilistic bounds using the notion of
conformal prediction [27, 28] from statistical learning theory. At a high-level, conformal prediction
involves computing quantiles of the empirical distribution of non-conformity scores over a validation
dataset to obtain prediction intervals at a given confidence threshold.
The contributions of this paper are summarized as follows:
• We define stochastic conformance as a probabilistic bound over the distribution of distances
between model trajectories. We also define the non-conformance risk to detect systems that
are at risk of not being conformant.
• We show that both notions have the desirable transference property, meaning that conformant
2
systems satisfy similar system specifications.
• We show how stochastic conformance and the non-conformance risk can be estimated using
statistical tools from risk theory and conformal prediction.
2 Problem Statement and Preliminaries

Consider the probability space (Ω, F, P ) where Ω is the sample space, F is a σ-algebra1 of Ω,
and P : F → [0, 1] is a probability measure. In this paper, our goal is to quantify conformance
of stochastic systems, i.e., systems whose inputs and outputs form a probability space with an
appropriately defined measure. Let the two stochastic systems be denoted by S1 and S2 . The
inputs and outputs of stochastic systems are signals, i.e., functions from a bounded interval of
positive reals known as the time domain T ⊆ R≥0 to a metric space, e.g., the standard Euclidean
metric. Each stochastic system Si then describes an input-output relation Si : U × Ω → Y where
U and Y denote the sets of all input and output signals. We allow input signals2 to be stochastic,
and we use the notation U : T × Ω → Rm to denote a stochastic input signals.3 Modeling stochastic
systems this way provides great flexibility, and Si can e.g., describe the motion of stochastic hybrid
systems, Markov chains, and stochastic difference equations.
Assume now that we apply the input signal U : T × Ω → Rm to systems S1 and S2 , and let
the resulting output signals be denoted by Y1 : T × Ω → Rn and Y2 : T × Ω → Rn , respectively.
We assume that the functions S1 , S2 , and U are measurable so that the output signals Y1 and
Y2 are well-defined stochastic signals. One can hence think of Y1 and Y2 to be drawn from the
distributions D1 and D2 , respectively, which are functions of the probability space (Ω, F, P ) as well
as the functions S1 , S2 , and U . In this paper, we make no restricting assumptions on the functions
S1 , S2 , and U , and consequently we make no assumptions on the distributions D1 and D2 .
Informal Problem Statement. Let Y1 and Y2 be stochastic output signals of the stochastic
systems S1 and S2 , respectively, under the stochastic input signal U . How can we quantify closeness
of the stochastic systems S1 and S2 under U ? To answer this question, we will explore different
ways of defining system “closeness” of Y1 and Y2 , and we will present algorithms to compute these
stochastic notions of closeness. A subsequent problem that we consider is related to transference of
properties from one system to another system. Particularly, given a signal temporal logic specifica-
tion, can we infer guarantees about the satisfaction of the specification of one system from another
system if the systems are close under a suitable definition of closeness?
2.1 Distance Metrics and Risk Measures

To define a general framework for quantifying closeness of stochastic systems, we will use i) different
signal metrics to capture the distance between individual realizations y1 := Y1 (·, ω) and y2 :=
Y2 (·, ω) of the stochastic signals Y1 and Y2 where ω ∈ Ω is a single outcome, and ii) probabilistic
reasoning and risk measures to capture stochastic conformance and non-conformance, respectively,
under these signal metrics.
1
A σ-algebra on a set Ω is a nonempty collection of subsets of Ω closed under complement, countable unions, and
countable intersections.
2
Probability spaces over signals are defined by standard notions of cylinder sets [6].
3
We will instead of the probability measure P , defined over (Ω, F), use more generally the notation Prob to be
independent of the underlying probability space that we induce, e.g., as a result of transformations via U .
3
We first equip the set of output signals Y with a function d : Y × Y → R that quantifies distance
between signals. A natural choice of d is a signal metric that results in a metric space (Y, d). We
use general signal metrics such as the metric induced by the Lp signal norm for p ≥ 1. Particularly,
R 1/p
define dp (y1 , y2 ) := T ∥y1 (t) − y 2 (t)∥p dt so that the L∞ norm can also be expressed as
d∞ (y1 , y2 ) := supt∈T ∥y1 (t) − y2 (t)∥.
It is now easy to see that a signal metric d(Y1 , Y2 ) evaluated over the stochastic signals Y1
and Y2 results in a distribution over distances between realizations of Y1 and Y2 . To reason over
properties of d(Y1 , Y2 ), we will use probabilistic reasoning but we will also consider risk measures [26]
as introduced next.
A risk measure is a function R : F(Ω, R) → R that maps the set of real-valued random variables
F(Ω, R) to the real numbers. Typically, the input of R indicates a cost. There exist various risk
measures that capture different characteristic of the distribution of the cost random variable, such
as the mean or the variance. However, we are particularly interested in tail risk measures that
capture the right tail of the cost distribution, i.e., the potentially rare but costly outcomes.
In this paper, we particularly consider the value-at-risk V aRβ and the conditional value-at-risk
CV aRβ at risk level β ∈ (0, 1). The V aRβ of a random variable Z : Ω → R is defined as
V aRβ (Z) := inf{α ∈ R|Prob(Z ≤ α) ≥ β},
i.e., V aRβ (Z) captures the 1−β quantile of the distribution of Z from the right. Note that there is an
obvious connection between value-at-risk and chance constraints, i.e., it holds that Prob(Z ≤ α) ≥ β
is equivalent to V aRβ (Z) ≤ α. The CV aRβ of Z, on the other hand, is defined as
CV aRβ (Z) := inf α + (1 − β)−1 E([Z − α]+ )

α∈R
where [Z − α]+ := max(Z − α, 0) and E(·) indicates the expected value. When the function
Prob(Z ≤ α) is continuous (in α), it holds that CV aRβ (Z) = E(Z|Z ≥ V aRβ (Z)), i.e., CV aRβ (Z)
is the expected value of Z conditioned on the outcomes where Z is greater or equal than V aRβ (Z).
Finally, note that it holds that V aRβ (Z) ≤ CV aRβ (Z), i.e., CV aRβ is more risk sensitive.
For our risk transference results that we present later, we will require that R is monotone,
positive homogeneous, and subadditive:
• For two random variables Z, Z ′ ∈ F(Ω, R), the risk measure R is monotone if Z(ω) ≤ Z ′ (ω) for
all ω ∈ Ω implies that R(Z) ≤ R(Z ′ ).
• For a random variable Z ∈ F(Ω, R), the risk measure R is positive homogeneous if, for any
constant H ≥ 0, it holds that R(HZ) = HR(Z).
• For two random variables Z, Z ′ ∈ F(Ω, R), the risk measure R is subadditive if R(Z + Z ′ ) ≤
R(Z) + R(Z ′ ).
We remark that the V aRβ and the CV aRβ satisfies all three properties [26].
2.2 System specifications

To express specifications, we use Signal Temporal Logic (STL). Let y : T → Rn be a deterministic
signal, e.g., a realization of the stochastic signal Y . The atomic elements of STL are predicates
that are functions µ : Rn → {True, False}. For convenience, the predicate µ is often defined via a
predicate function h : Rn → R as µ(y(t)) := True if h(y(t)) ≥ 0 and µ(y(t)) := False otherwise.
4
The syntax of STL is recursively defined as
ϕ ::= True | µ | ¬ϕ′ | ϕ′ ∧ ϕ′′ | ϕ′ UI ϕ′′ (1)
where ϕ′ and ϕ′′ are STL formulas. The Boolean operators ¬ and ∧ encode negations (“not”) and
conjunctions (“and”), respectively. The until operator ϕ′ UI ϕ′′ encodes that ϕ′ has to be true until
ϕ′′ becomes true at some future time within the time interval I ⊆ R≥0 . We derive the operators for
disjunction (ϕ′ ∨ ϕ′′ := ¬(¬ϕ′ ∧ ¬ϕ′′ )), eventually (FI ϕ := ⊤UI ϕ), and always (GI ϕ := ¬FI ¬ϕ).
To determine if a signal y satisfies an STL formula ϕ that is imposed at time t, we can define
the semantics as a relation |=, i.e., (y, t) |= ϕ means that ϕ is satisfied. While the STL semantics
are fairly standard [29], we recall them in Appendix A. Additionally, we can define robust semantics
ρϕ (y, t) ∈ R that indicate how robustly the formula ϕ is satisfied or violated [30,31], see Appendix A.
Larger and positive values of ρϕ (y, t) hence indicate that the specification is satisfied more robustly.
Importantly, it holds that (y, t) |= ϕ if ρϕ (y, t) > 0.
3 Conformance for Stochastic Input-Output Systems

Our goal is now to quantify closeness of two stochastic systems S1 and S2 under the input U . We
present our definitions for stochastic conformance and non-conformance risk upfront, and provide
motivation for these afterwards.
Definition 1. Let U : T × Ω → Rm be a stochastic input signal, S1 , S2 : U × Ω → Y be stochastic
systems, and Y1 , Y2 : T × Ω → Rn be stochastic output signals with Y1 := S1 (U, ·) and Y2 := S2 (U, ·).
Further, let ϵ ∈ R be a conformance threshold, δ ∈ (0, 1) be a failure probability, and d : Y × Y → R
be a signal metric. Then, we say that the systems S1 and S2 under the input U are (ϵ, δ)-conformant
if
Prob(d(Y1 , Y2 ) ≤ ϵ) ≥ 1 − δ. (2)
Additionally, let R : F(Ω, R) → R be a risk measure and r ∈ R be a risk threshold. Then, S1 and S2
under the input U are at risk of being r-non-conformant if
R(d(Y1 , Y2 )) > r. (3)
Eq. (2) is referred to as stochastic conformance and Eq. (3) as non-conformance risk. Let us
now motivate and discuss these two definitions. While the definition of conformance in equation
(2) appears natural at first sight, there are at least two competing ways of defining stochastic
conformance. First, as Y1 and Y2 are distributions, it would be possible to define conformance as
D(Y1 , Y2 ) where D is a distance function that measures the difference between two distributions, such
as a divergence (Kullback–Leibler or f -divergence). However, our definition provides an intuitive
interpretation in the signal space where system specifications are typically defined, while it is usually
difficult to provide such an interpretation for divergences between distributions. Additionally, the
divergence between Y1 and Y2 may be unbounded (or zero) even when equation (2) holds (does not
hold).
Proposition 1. There exist stochastic systems S1 and S2 and distance metrics d where equation (2):
i) is satisfied for ϵ > 0 and δ = 0, i.e., w.p. 1, but where the divergence between the systems is
unbounded, and ii) is not satisfied for any given ϵ > 0 and δ ∈ (0, 1), but where the divergence
between the systems is zero.
5
Proof. Let us first prove i). For simplicity, consider systems S1 and S2 where the stochastic input
and output signals are defined over the time domain T := {t0 , . . . , tT }. Further, for all t ∈ T let
y1 (t) := 0 and y2 (t) := ϵ. Clearly, equation (2) is satisfied, e.g., for d∞ . The distributions D1 and
D2 (joint distributions of Y1 (t) and Y2 (t), respectively) are Dirac distributions centered at 0 and ϵ,
respectively. The Kullback–Leibler divergence between these two distributions is ∞ [32].
Let us now prove ii). Let T consist of a single time point for simplicity so that Y1 and Y2 are
random variables defined over a sample space R. Let D1 and D2 be the same uniform distribution
over [0, a]. Clearly, the divergence between D1 and D1 is zero. We know that the distribution of
Y := Y1 − Y2 has support on [−a, a], and that the probability density function of Y is p(y) :=
1/a − |y|/a2 . We can now compute Prob(|y| ≤ ϵ) = 2ϵ/a − ϵ2 /a2 . Given ϵ > 0 and δ ∈ (0, 1), we
pick an δ̄ ∈ (0, 1) such that δ̄ > δ. We then solve the quadratic equation 2ϵ/a − ϵ√ 2 /a2 = 1 − δ̄
subject to the constraint that ϵ ≤ a. Consequently, we find that a ≥ ϵ/(1 − δ̄)(1 + δ̄) results in
Prob(|y| ≤ ϵ) < 1 − δ so that (2) is not satisfied.
Another way of defining stochastic conformance was presented in [33] where the authors consider
a task-specific definition of stochastic conformance where satisfaction probabilities are required
to be approximately equal. In other words, two stochastic systems are called c-approximately
probabilistically conformant if |Prob((Y1 , τ ) |= ϕ) − Prob((Y2 , τ ) |= ϕ)| ≤ c. In this definition, it
may happen that two systems are c-approximately probabilistically conformant for a small value of
c, while the systems produce completely different behaviors and individual realizations y1 and y2
are vastly different. Additionally to not being task specific, our definition covers the risk of being
r-non-conformant in equation (3).
Finally, we would like to remark that the definition of conformance in equation (2) is related
to the definition of non-conformance risk in equation (3). In fact, when the risk measure R is the
value-at-risk V aRβ , then we know that
V aRβ (d(Y1 , Y2 )) > r ⇔ Prob(d(Y1 , Y2 ) ≤ r) < β
since V aRβ (d(Y1 , Y2 )) ≤ r is equivalent to Prob(d(Y1 , Y2 ) ≤ r) ≥ β according to Section 2. Con-

sequently, if β := 1 − δ and r := ϵ then V aRβ (d(Y1 , Y2 )) > r implies that the systems S1 and S2
under U are not (ϵ, δ)-conformant.
The notion of conformance in Definition 1 is useful when the input U describes internal inputs
such as system parameters (an unknown mass), exogeneous disturbances from known sources, or
initial system conditions. In other words, the distribution U is known, making U a known unknown.
However, in case of external inputs that could be manipulated (e.g. user inputs that represent rare
malicious attacks), the input U may be unknown, making U an unknown unknown. We therefore
provide an alternative definition of conformance.
Definition 2. Let U ∈ U be an unknown deterministic input signal, S1 , S2 : U ×Ω → Y be stochastic

systems, and Y1 , Y2 : T × Ω → R be stochastic output signals with Y1 := S1 (U, ·) and Y2 := S2 (U, ·).
Further, let ϵ ∈ R be a conformance threshold, δ ∈ (0, 1) be a failure probability, and d : Y × Y → R
be a signal metric. Then, we say that the systems S1 and S2 are (ϵ, δ)-conformant if

Prob sup d(Y1 , Y2 ) ≤ ϵ ≥ 1 − δ. (4)
U ∈U
Additionally, let R : F(Ω, R) → R be a risk measure and r ∈ R be a risk threshold. Then, we say
6
that the systems S1 and S2 under the input U are at risk of being r-non-conformant if

R sup d(Y1 , Y2 ) > r. (5)
U ∈U
Based on this definition, note that it will be inherently more difficult to verify Definition 2
compared to Definition 1 due to the sup-operator.
4 Transference of System Properties under Conformance

We expect two systems S1 and S2 that are (ϵ, δ)-conformant in the sense of Definitions 1 and 2 to
have similar behaviors with respect to satisfying a given system specification. Therefore, we will
define the notion of transference with respect to a performance function C : Y → R that measures
how well a signal y ∈ Y satisfies this system specification. Towards capturing similarity between S1
and S2 with respect to C, the signal metric d has to be chosen carefully.
Definition 3. Let d : Y × Y be a signal metric and C : Y → R be a performance function. Then,

we say that C is Hölder continuous w.r.t. d if there exists constants H, γ > 0 such that, for any two
signals y1 , y1 : T → Rn , it holds that
|C(y1 ) − C(y2 )| ≤ Hd(y1 , y2 )γ (6)
A specific example of the performance function C is the robust semantics ρϕ of an STL spec-
ification ϕ. In fact, the robust semantics ρϕ are Hölder continuous w.r.t. the sup-norm d∞ for
constants H = 1 and γ = 1 [34, Lemma 2]. For the convenience of the reader, we state the proof
of [34, Lemma 2] with the notation used in this paper in Appendix B. The robust semantics are also
Hölder continuous w.r.t. theRSkorokhod metric, see Appendix C. A commonly used performance
T
function in control is C(y) = 0 y(t)⊤ y(t)dt, and we note that this choice of C is Hölder continuous
w.r.t. d1 as shown in Appendix D. Finally, note that the Hölder continuity condition in equation
(6) implies that, for any constants c, ϵ ∈ R, it holds that
⇒ C(y2 ) ≥ c − Hϵγ . (7)

C(y1 ) ≥ c ∧ d(y1 , y2 ) ≤ ϵ
4.1 Transference under stochastic conformance

With the definition of C being Hölder continuous w.r.t. d, we can now derive a stochastic transfer-
ence result under stochastic conformance as per Definition 1.
Theorem 1. Let the premises in Definitions 1 and 3 hold. Further, let the systems S1 and S2 under
the input U be (ϵ, δ)-conformant so that equation (2) holds and let C be Hölder continuous w.r.t. d
so that equation (6) holds. Then, it holds that
Prob C(Y1 ) ≥ c ≥ 1 − δ̄ ⇒ Prob C(Y2 ) ≥ c − Hϵγ ≥ 1 − δ − δ̄.

Proof. By assumption, it holds that Prob(d(Y1 , Y1 ) ≤ ϵ) ≥ 1 − δ and Prob(C(Y1 ) ≥ c) ≥ 1 − δ̄ so

that we know that Prob(d(Y1 , Y1 ) > ϵ) ≤ δ and Prob(C(Y1 ) < c) ≤ δ̄. We can now apply the union
bound over these two events so that
Prob d(Y1 , Y1 ) > ϵ ∨ C(Y1 ) < c ≤ δ + δ̄.

7
From here, we can simply see that
Prob d(Y1 , Y1 ) ≤ ϵ ∧ C(Y1 ) ≥ c ≥ 1 − δ − δ̄.

Since C is Hölder continuous w.r.t. d, which implies that equation (7) holds, it is easy to conclude
that Prob(C(Y2 ) ≥ c − Hϵγ ) ≥ 1 − δ − δ̄.
Theorem 1 tells us that i) (ϵ, δ)-conformance of systems S1 and S2 under U , and ii) Hölder
continuity of the performance function C w.r.t. the metric d enables us to derive a probabilistic
lower bound for the performance of system S2 w.r.t. C from the performance of system S1 .
We can derive a transference result similar to Theorem 1 when we assume that the systems S1
and S2 are (ϵ, δ)-conformant in the sense of Definition 2 instead of Definition 1.
Theorem 2. Let the premises in Definitions 2 and 3 hold. Further, let the systems S1 and S2 be
(ϵ, δ)-conformant so that equation (4) holds and let C be Hölder continuous w.r.t. d so that equation
(6) holds. Then, it holds that
Prob inf C(Y1 ) ≥ c ≥ 1 − δ̄ ⇒ Prob inf C(Y2 ) ≥ c − Hϵγ ≥ 1 − δ − δ̄

U ∈U U ∈U
Proof. By assumption, it holds that Prob(supU ∈U d(Y1 , Y1 ) ≤ ϵ) ≥ 1 − δ and Prob(inf U ∈U C(Y1 ) ≥

c) ≥ 1 − δ̄ so that we know that Prob(supU ∈U d(Y1 , Y1 ) > ϵ) ≤ δ and Prob(inf U ∈U C(Y1 ) < c) ≤ δ̄.
We can now apply the union bound over these two events so that
Prob sup d(Y1 , Y1 ) > ϵ ∨ inf C(Y1 ) < c ≤ δ + δ̄.

U ∈U U ∈U
From here, we can simply see that
Prob sup d(Y1 , Y1 ) ≤ ϵ ∧ inf C(Y1 ) ≥ c ≥ 1 − δ − δ̄.

U ∈U U ∈U
This equation tells us that, for each U ∈ U, we have
Prob d(Y1 , Y1 ) ≤ ϵ ∧ C(Y1 ) ≥ c ≥ 1 − δ − δ̄.

Since C is Hölder continuous w.r.t. d, we know that equation (6) holds for each U ∈ U. Conse-
quently, we can conclude that Prob(inf U ∈U C(Y2 ) ≥ c − Hϵγ ) ≥ 1 − δ − δ̄ .
4.2 Transference under non-conformance risk

On the other hand, by considering the notion of r-non-conformance risk, we expect that two sys-
tems S1 and S2 that are not at risk of being r-non-conformant have a similar risk of violating a
specification. Here, we define the risk of violating a specifications by following ideas from [14] as
R(−C(Y1 )) and R(−C(Y2 )).
Theorem 3. Let the premises in Definitions 1 and 3 hold. Further, let the systems S1 and S2
under the input U not be at risk of being r-non-conformant so that equation (3) does not hold (i.e.,
R(d(Y1 , Y2 )) ≤ r) and let C be Hölder continuous w.r.t. d with γ = 1 so that equation (6) holds. If
the risk measure R is monotone, positive homogeneous, and subadditive, it holds that
R(−C(Y2 )) ≤ R(−C(Y1 )) + Hr.
8
Proof. We can derive the following chain of inequalities
(a)
R(−C(Y2 )) ≤ R(−C(Y1 ) + Hd(Y1 , Y1 ))
(b)
≤ R(−C(Y1 )) + R(Hd(Y1 , Y1 ))
(c)
= R(−C(Y1 )) + HR(d(Y1 , Y1 ))
(d)
≤ R(−C(Y1 )) + Hr
where (a) follows since C is Hölder continuous w.r.t. d and since R is monotone, (b) follows since
R is subadditive, and (c) follows since R is positive homogeneous, while the inequality (d) follows
since S1 and S2 under U are not at risk of being r-non-conformant, i.e., R(d(Y1 , Y2 )) ≤ r.
This result implies that the risk of system S2 w.r.t. the performance function C is upper bounded
by the risk of system S1 w.r.t. C if the systems S1 and S2 are not at risk of being r-non-conformant.
We remark that a similar result appeared in our prior work [34]. Here, we present these results
in the more general context of conformance and extend the result as we use general performance
functions C, which additionally requires R to be positive homogeneous. Additionally, we derive a
transference result similar to Theorem 3 when we assume that the systems S1 and S2 are not at
risk of being r-non-conformant in the sense of Definition 2 instead of Definition 1.
Theorem 4. Let the premises in Definitions 2 and 3 hold. Further, let the systems S1 and S2 not be
at risk of being r-non-conformant so that equation (5) does not hold (i.e., R(supU ∈U d(Y1 , Y2 )) ≤ r)
and let C be Hölder continuous w.r.t. d with γ = 1 so that equation (6) holds. If the risk measure
R is monotone, positive homogeneous, and subadditive, it holds that
R(− inf C(Y2 )) ≤ R(− inf C(Y1 )) + Hr.

U ∈U U ∈U
Proof. We can derive the following chain of inequalities

(a)
R(− inf C(Y2 )) ≤ R(− inf C(Y1 ) + H sup d(Y1 , Y1 ))
U ∈U U ∈U U ∈U
(b)
≤ R(− inf C(Y1 )) + R(H sup d(Y1 , Y1 ))
U ∈U U ∈U
(c)
= R(− inf C(Y1 )) + HR(sup d(Y1 , Y1 ))
U ∈U U ∈U
(d)
≤ R(− inf C(Y1 )) + Hr
U ∈U
where (a) follows since − inf U ∈U C(Y2 ) = supU ∈U −C(Y2 ), since C is Hölder continuous w.r.t. d,
and since R is monotone, (b) follows since R is subadditive, and (c) follows since R is positive
homogeneous. The inequality (d) follows since S1 and S2 are not at risk of being r-non-conformant,
i.e., supU ∈U R(d(Y1 , Y2 )) ≤ r.
9
5 Statistical Estimation of Stochastic Conformance
We propose algorithms to compute stochastic conformance and the non-conformance risk. In prac-
tice, note that one will be limited to discrete-time stochastic systems to apply these algorithms.
5.1 Estimating stochastic conformance

To estimate stochastic conformance, we use conformal prediction which is a statistical tool intro-
duced in [28, 35] to obtain valid uncertainty regions for complex prediction models without making
assumptions on the underlying distribution or the prediction model [27,36–39]. Let Z, Z (1) , . . . , Z (k)
be k + 1 independent and identically distributed random variables modeling a quantity known as
the nonconformity score. Our goal is to obtain an uncertainty region for Z based on Z (1) , . . . , Z (k) ,
i.e., the random variable Z should be contained within the uncertainty region with high probability.
Formally, given a failure probability δ ∈ (0, 1), we want to construct a valid uncertainty region over
Z (defined in terms of a value Z̄) that depends on Z (1) , . . . , Z (k) such that Prob(Z ≤ Z̄) ≥ 1 − δ.
By a surprisingly simple quantile argument, see [27, Lemma 1], one can obtain Z̄ to be the
(1−δ)th quantile of the empirical distribution of the values Z (1) , . . . , Z (k) and ∞. By assuming that
Z (1) , . . . , Z (k) are sorted in non-decreasing order, and by adding Z (k+1) := ∞, we can equivalently
obtain Z̄ := Z (p) where p := ⌈(k + 1)(1 − δ)⌉ with ⌈·⌉ being the ceiling function.
We can now use conformal prediction to estimate stochastic conformance as defined in Definition
1 by setting Z := d(Y1 , Y2 ). We therefore assume that we have access to a calibration dataset
(i) (i)
Dcal that consists of realizations y1 and y2 from the stochastic signals Y1 ∼ D1 and Y2 ∼ D2 ,
respectively.
Theorem 5. Let the premises of Definition 1 hold and Dcal be a calibration dataset with datapoints
(i) (i) (i) (i)
(y1 , y2 ) drawn from D1 × D2 . Further, define Z (i) := d(y1 , y2 ) for all i ∈ {1, . . . , |Dcal |} and
Z (|Dcal |+1) := ∞, and assume that the Z (i) are sorted in non-decreasing
order. Then, it holds that
Prob(d(Y1 , Y2 ) ≤ Z̄) ≥ 1 − δ with Z̄ defined as Z̄ := Z (p) where p := (|Dcal | + 1)(1 − δ) . Thus,

the systems S1 and S2 under the input U are (ϵ, δ)-conformant if Z̄ ≤ ϵ.
We see that checking stochastic conformance as defined in Definition 1 is computationally simple

when we have a calibration dataset Dcal . Checking stochastic conformance as defined in Definition 2,
however, is more difficult due to the existence of the sup-operator. To compute this notion of
conformance, we make two assumptions: i) the set U is compact, and ii) for every realization ω ∈ Ω,
the function d(Y1 (·, ω), Y2 (·, ω)) is Lipschitz continuous with Lipschitz constant L. While knowledge
of the Lipschitz constant L would presume knowledge about the closeness of the systems S1 and
S2 , it would only provide a conservative over-approximation. We will, however, not need to know
the Lipschitz constant L and estimate L instead along with probabilistic guarantees.
Our approach is summarized in Algorithms 1 and 2. Algorithm 1 computes Z̄ such that
Prob(supU ∈U d(Y1 , Y2 ) ≤ Z̄ + Lκ) ≥ 1 − δ when L is known and where κ is a gridding parameter,
while Algorithm 2 estimates the Lipschitz constant. We present a description of these algorithms
upfront and state their theoretical guarantees afterwards.
In line 1 of Algorithm 1, we construct a κ-net Ū of U, i.e., we construct a finite set Ū so that for
each U ∈ U there exists a Ū ∈ Ū such that d(U, ¯ Ū ) ≤ κ where d¯ : U × U → R is a metric. For this
purpose, simple gridding strategies can be used as long as the set U has a convenient representation.
Alternatively, randomized algorithms can be used that sample from U [40]. In lines 2-4, we apply
(i) (i)
Theorem 5 for each element Ū ∈ Ū. Therefore, we obtain realizations (y1 , y2 ) from D1 × D2 under
10
Algorithm 1 Conformance Estimation as per Definition 2
Input: Failure probability δ ∈ (0, 1) and grid size κ > 0
Output: Z̄ such that Prob(supU ∈U d(Y1 , Y2 ) ≤ Z̄ + Lκ) ≥ 1 − δ
1: Construct κ-net Ū of U
2: for Ū ∈ Ū do
3: Ū consisting of realizations (y (i) , y (i) ) under Ū
Obtain calibration set Dcal 1 2
4: Compute Z̄Ū := Z (p) by applying Theorem 5 but instead using dataset Dcal Ū
5: Z̄ := maxŪ ∈Ū Z̄Ū
Algorithm 2 Lipschitz Constant Estimation of L

Input: Failure probabilities δL ∈ (0, 1), grid size κ > 0, calibration size KL > 0
Output: L̄ such that Prob(supU ∈U d(Y1 , Y2 ) ≤ Z̄ + L̄κ) ≥ 1 − δ − δL
1: for i from 1 to KL do
2: Sample (U ′ , U ′′ ) uniformly from U × U
3: Obtain realizations (y1′ , y2′ ) under U ′ and (y1′′ , y2′′ ) under U ′′
4: ¯ ′ , U ′′ )
Compute L(i) := |d(y1′ , y2′ ) − d(y1′′ , y2′′ )|/d(U
5: Compute L̄ := L(p) where p := (KL + 1)(1 − δ ′′ )

Ū (line 3). We then compute Z̄Ū so that Prob(d(Y1 (Ū , ·), Y2 (Ū , ·)) ≤ Z̄Ū ) ≥ 1 − δ (line 4). Finally,
we set Z̄ := maxŪ ∈Ū Z̄Ū (line 5).
In Algorithm 2, we compute L̄ such that Prob(L ≤ L̄) ≥ 1 − δL . We uniformly sample control
inputs (U ′ , U ′′ ) (line 2), obtain realizations (y1′ , y2′ ) from D1 × D2 under U ′ and realizations (y1′′ , y2′′ )
from D1 × D2 under U ′′ (line 3), and compute the non-conformity score L(i) (line 4). In line 5, we
obtain an estimate L̄ of the Lipschitz constant L that holds with a probability of 1 − δL over the
randomness introduced in Algorithm 1.
Theorem 6. Let the premises of Definition 2 hold. If the Lipschitz constant L of d(Y1 (·, ω), Y2 (·, ω))
is known uniformly over ω ∈ Ω, then, for a gridding parameter κ > 0, the output Z̄ of Algorithm 1
ensures that
Prob(sup d(Y1 , Y2 ) ≤ Z̄ + Lκ) ≥ 1 − δ

U ∈U
Thus, the systems S1 and S2 are (ϵ, δ)-conformant if Z̄ + Lκ ≤ ϵ. Otherwise, let δL ∈ (0, 1) be a
failure probability, then the output L̄ of Algorithm 2 ensures that
Prob(sup d(Y1 , Y2 ) ≤ Z̄ + L̄κ) ≥ 1 − δ − δL

U ∈U
where Prob is defined over the randomness introduced in Algorithm 2.
Proof. From line 4 of Algorithm 1 we know that Prob(d(Y1 (Ū , ·), Y2 (Ū , ·)) ≤ Z̄Ū ) ≥ 1 − δ for
each Ū ∈ U. Due to Lipschitz continuity, we can conclude that for each U ∈ U that is such that
¯ Ū ) ≤ κ it holds that
d(U,
Prob(d(Y1 , Y2 ) ≤ Z̄Ū + Lκ) ≥ 1 − δ.
11
Since Ū is a κ-net of U, it follows that Prob(supU ∈U d(Y1 , Y2 ) ≤ Z̄ + Lκ) ≥ 1 − δ.
For the second part of the proof, note that from line 5 of Algorithm 2 we know that Prob(L ≤
L̄) ≥ 1 − δL . We can now union bound over this event and Prob(d(Y1 (Ū , ·), Y2 (Ū , ·)) ≤ Z̄Ū ) ≥ 1 − δ
so that
Prob(d(Y1 (Ū , ·), Y2 (Ū , ·)) ≤ Z̄Ū ∧ L ≤ L̄) ≥ 1 − δ − δL .
The rest of the proof follows as in the first part.
5.2 Estimating non-conformance risk

We next briefly summarize how to estimate the value-at-risk and the conditional value-at-risk fol-
lowing standard results such as from [14, 41] and [42], respectively.
Proposition 2. Let the premises of Definition 1 hold and Dcal be a calibration dataset with dat-
(i) (i)
apoints (y1 , y2 ) drawn from D1 × D2 . Let β ∈ (0, 1) be a risk level and γ ∈ (0, 1) be a failure
(i) (i)
threshold. Define Z (i) := d(y1 , y2 ) for each i ∈ {1, . . . , |Dcal |} and assume that Prob(Z ≤ α) is
continuous in α. Then,

Prob V aRβ ≤ V aRβ (d(Y1 , Y2 )) ≤ V aRβ ≥ 1 − γ.
n q o
where we have V aRβ := inf α ∈ R Prob(Z
d ≤ α) − ln(2/γ)
2|Dcal | ≥ β and V aRβ :=
n q o
inf α ∈ R Prob(Z
d ≤ α) + ln(2/γ)
2|Dcal | ≥ β with the empirical cumulative distribution function
|D |
≤ α) := |D1cal | i=1cal I(Z (i) ≤ α) and the indicator function I.
P
Prob(Z
d
For estimating CV aRβ (Z), we assume that the random variable d(Y1 , Y2 ) has bounded support,
i.e., that Prob(d(Y1 , Y2 ) ∈ [a, b]) = 1. Note that d(Y1 , Y2 ) is usually bounded from below by a := 0
if d is a metric. To obtain an upper bound, we assume that the distance function saturated at b,
e.g., by clipping values larger than b to b. In practice, this means that realizations that are far apart
already have a large distance and are capped to b.
Proposition 3. Let the premises of Definition 1 hold and Dcal be a calibration dataset with dat-
(i) (i)
apoints (y1 , y2 ) drawn from D1 × D2 . Let β ∈ (0, 1) be a risk level and γ ∈ (0, 1) be a failure
(i) (i)
threshold. Define Z (i) := d(y1 , y2 ) for each i ∈ {1, . . . , |Dcal |} and assume that Prob(d(Y1 , Y2 ) ∈
[a, b]) = 1. Then, it holds that

Prob CV aRβ ≤ CV aRβ (d(Y1 , Y2 )) ≤ CV aRβ ≥ 1 − γ.
q q
5 ln(3/γ) 11 ln(3/γ)
where CV aRβ := CV aRβ + |Dcal |(1−β) (b − a) and CV aRβ := CV aRβ − |Dcal |(1−β) (b − a) where
\ \
P|D |
inf α∈R α + (|Dcal |(1 − β))−1 i=1cal [Z i − α]+ .

the empirical estimate of CV aRβ (Z) is CV
\ aRβ :=
As a consequence of these two lemmas, we know that with a probability of 1 − γ the systems S1
and S2 under the input U are at risk of not being conformant if V aRβ ≥ α or CV aRβ ≥ α based
on the risk measure of choice.
12
(a) Targets in (b) CARLA: Cross-track er- (c) F16: altitude signals for (d) Spacecraft Trajectories
Dubin’s car ror signals for S1 , S2 S1 , S2
Figure 1: The solid lines refer to Y1 and the dashed lines refer to Y2 ; in each of the displayed plots,
the initial condition for each pair of realizations is the same.
(a) Trajectory distance on valida- (b) Robustness on validation set (c) Robustness on validation set
tion set controller 1 controller 2
Figure 2: Distance and robustness histogram for Dubin’s car with δ = δ̄ = 0.05. We use CV aR(d)
to denote CV aR(d(Y1 , Y2 )). The c1 and ϵ are the values of conformal prediction on the calibration
set of ρϕdubin (Y1 ) and d∞ (Y1 , Y2 ).
6 Case Studies
We now demonstrate the practicality of stochastic conformance and risk analysis through various
case studies. For validation, if we obtain the value Z̄ using a conformal prediction procedure for a
nonconformity score defined by the random variable Z, i.e., such that Prob(Z ≤ Z̄) ≥ 1 − δ. Then,
given a test set Dtest , the validation score is defined as V S(Z) := |{z ∈ Dtest | z ≤ Z̄}|/|Dtest |.
6.1 Dubin’s car.

Dubin’s car models the motion of a point mass vehicle. The state variables are the x and y position,
θ denotes the steering angle and v the longitudinal velocity. While both θ and v are typically
assumed to be control inputs, we adapt the case study from [43] P where the angular velocity ω(t)
at each time t is assumed to be given so that θ(t) := Ts π + ti=1 ω(i)Ts where Ts := 0.1s. In
this example, we assume that ω(i) := 50T π
s
for i ∈ [1, 25], and ω(i) := − 50T
π
s
for i ∈ [26, 50]. The
velocity v(t) is provided by a feedback controller. The dynamics are assumed to have additive white
Gaussian noise η x (t), η y (t) ∼ N (0, 0.005). The dynamical equations of motion are as described
below:
x(t+1) = x(t)+Ts v(t) cos(θ(t)) + η x (t)
13
Distance |Dcal | ϵ d(Y1 , Y2 )
Metric V S(d(Y1 , Y2 )) V aR(d(Y1 , Y2 )) CV aR(d(Y1 , Y2 ))
50 0.7825 0.987 0.7183 0.7947
1000 0.7163 0.956 0.7148 0.7647
d∞ 2000 0.7122 0.952 0.712 0.7814
3000 0.7118 0.952 0.7117 0.7862
50 0.6723 0.953 0.6517 0.7181
dsk 1000 0.6722 0.972 0.6711 0.7156
(Skorokhod 2000 0.6645 0.96 0.6639 0.7106
Distance) 3000 0.6619 0.952 0.6613 0.7079
50 2.6086 0.937 2.503 2.612
1000 2.7339 0.96 2.732 3.048
d2 2000 2.7071 0.944 2.706 3.044
3000 2.7238 0.955 2.722 3.0929
Table 1: Effect of calibration set size on the validation score and risk measures. The size of the test
set, i.e., |Dtest |, is 1000. We use the conformal prediction procedure from Section 5 to obtain ϵ as
defined in Definition 1 for δ = 0.05.
y(t+1) = y(t)+Ts v(t) sin(θ(t)) + η y (t)
The two systems that we compare have two different feedback controllers. The first feedback
controller uses the method from [43, 44] and the second controller uses the method from [45]. We
plot a set of sampled trajectories in Fig. 1a. This figure also shows the set of initial states I :=
[−1, 0] × [−1, 0]. The controller aims to ensure that the system trajectory stays within a series of
sets T1 through T50 , the corresponding STL specification is ϕdubin := 50 i=1 [i−1,i] ([xi yi ] ∈ Ti ). For
V
F
the experiments that follow, we uniformly sampled initial states from I and noise η x , η y from the
described Gaussian distribution.
Effect of calibration set size. In the first experiment, we wish to benchmark the effect of the size of
the calibration set Dcal for various distance metrics. The results are shown in Table 1. The table
shows that with smaller sizes of the calibration set, we get a more conservative ϵ for d∞ (which
translates into a higher validation score). The V aR is almost identical to the value of ϵ at larger
Dcal sizes. We note that the CV aR values change with the value of V aR. A similar trend can be
observed the Skorokhod distance and the L2 -metric.
Empirical evaluation of transference. We empirically demonstrate that Theorem 1 holds. We use
C(Y ) = ρϕdubin (Y ), i.e., the robust semantics w.r.t. the property ϕdubin , and the L∞ signal metric
d∞ . The results are shown in Table 2. We can see that the predicted upper bound for the robustness
of realizations of Y2 w.r.t. ϕdubin is negative (c1 − ϵ), so it is not possible to conclude that the second
system satisfies ϕdubin with probability greater than 1 − δ − δ̄. However, we note that c2 is indeed
greater than the bound (c1 − ϵ). Similarly, we show that Theorem 3 is also empirically validated by
computing the CV aR values for the first system and the risk measure on d∞ (Y1 , Y2 ). We show the
empirical distributions of d∞ (Y1 , Y2 ), and ρϕdubin (Yi ) for i = 1, 2 in Figure 2.
Empirical evaluation of Theorem 2. We next apply Algorithms 1 and 2 to this case study. We grid
the initial set of states evenly into 25 cells with a grid size of κ = 0.02. We sample 650 trajectories
14
|Dcal | c1 ϵ V S(ρ1 ) V S(d∞ ) c2 Thm 1 CV aR Thm.3
valid? −d∞ −ρ1 −ρ2 valid?
δ = 0.2, 100 0.31 0.59 0.95 0.76 0.21 Y 0.90 -0.28 0.00 Y
δ̄ = 0.05 3K 0.30 0.60 0.95 0.79 0.20 Y 0.93 -0.27 0.03 Y
δ = 0.1, 1K 0.30 0.67 0.96 0.92 0.15 Y 0.79 -0.27 0.02 Y
δ̄ = 0.05 3K 0.30 0.66 0.95 0.91 0.15 Y 0.81 -0.27 0.03 Y
δ = 0.05, 2K 0.31 0.71 0.94 0.95 0.11 Y 0.78 -0.27 0.02 Y
δ̄ = 0.05 3K 0.30 0.71 0.95 0.95 0.11 Y 0.79 -0.27 0.03 Y
Table 2: Empirical evaluation of transference. Let ρi be short-hand for ρφdubin (Yi ) for i = 1, 2,
and d∞ be short-hand for d∞ (Y1 , Y2 ). Using Theorem 5, we show Prob(ρ1 ≥ c1 ) > 1 − δ̄, and
Prob(d∞ ≤ ϵ) > 1−δ. The validity scores for each guarantee on a test set Dtest with 1000 samples are
shown. The value c2 is obtained using Theorem 5 on ρ2 and observe that it exceeds c1 −ϵ, validating
Theorem 1. Similarly, we report the CV aR values for −ρ1 and d∞ , and CV aR(−ρ1 )+CV aR(d∞ ) ≥
CV aR(−ρ2 ) for all cases, validating Theorem 3.
Case Study Spec |Dcal | |Dtest | V S(ρ1 ) V S(d∞ ) ϵ V aR(d∞ )

F-16 ϕgcas 1K 3K 0.95 0.98 200 200
CARLA ϕcte 700 300 0.94 0.96 1.88 1.87
Satellite ϕsat 7K 3K 0.96 0.97 0.18 0.18
Table 3: Transference results for various case studies. We use δ = 0.05 and δ̄ = 0.05. As before, ρ1
is used as short-hand for ρϕ (Y1 ) for each spec, and d∞ is used as short-hand for d∞ (Y1 , Y2 ).
on each cell to obtain their calibration sets. Algorithm 2 gives Z̄ = 0.7562 and Lκ = 0.0687, giving
Z̄ + Lκ = 0.8249. We then evaluate on two test sets of unseen initial conditions with |Dtest | = 1000,
2500. The success rate on the test sets are 0.9996 and 1.0, with the goal success rate being 0.9. The
experiments demonstrate the effectiveness of Theorem 6.
6.2 F-16 aircraft.

The F-16 aicraft control system from [46] uses a 13-dimensional non-linear plant model based on
a 6 d.o.f. airplane model, and its dynamics describe force equations, moments, kinematics, and
engine behavior. We alter the original system S1 from [46] to a modified version S2 by changing the
controller gains. We evaluate the performance of the two systems on the ground collision avoidance
scenario with the specification ϕgcas := G[0,T ] (h ≥ 1000) where T is the mission time and h is the
altitude. For data collection, we perform uniform sampling of the initial states. We assume that
the x-center of gravity (xcg) of the aircraft is a stochastic parameter with uniform distribution on
[0, 0.8]. We obtain a calibration set Dcal of size 1000 by uniform sampling of the initial states and
the xcg parameter. We separately sample 3000 signals for Dtest . The results of transference and
risk estimates are shown in Table 3.
15
Case Study Spec |Dcal | |Dtest | δ CV aR
d∞ −ρ1 −ρ2
F-16 ϕgcas 1K 3K 0.01 200.3 -62.3 -62.3
CARLA ϕquad 7K 3K 0.01 2.04 -0.31 0.88
Satellite ϕsat 7K 3K 0.01 0.19 0.0 0.08
Table 4: Empirical validation of risk transference for all case studies. As before, ρi is short-hand
for ρϕ (Yi ), and d∞ is short-hand for d∞ (Y1 , Y2 ). Here, we set the risk level β = δ in each case.
6.3 Autonomous Driving using the CARLA simulator.

CARLA is a high-fidelity simulator for testing of autonomous driving systems [47]. We consider two
learning-based lane-keeping controllers from [14], one being an imitation learning controller (S1 )
and another being a learned barrier function controller (S2 ). We obtain 1000 trajectories from each
controller during a 180 degree left turn, and we use 700 of them for calibration and 300 for testing.
In this data, the initial states (ce , θe ) are drawn uniformly from [−1, 1] × [−0.4, 0.4] where ce is the
deviation from the center of the lane center (cross track error) and θe is the orientation error. The
STL specification ϕcte := G(|ce | ≤ 2.25) restricts |ce | to be bounded by 2.25. The results are shown
in Table 3.
6.4 Spacecraft Rendezvous

Next, we consider a spacecraft rendezvous problem from [48]. Here, a deputy spacecraft is to
rendezvous with a master spacecraft while staying within a line-of-sight cone. The system is a 4D
model s = [x, y, vx , vy ]⊤ where x, y ∈ R are the relative horizontal and vertical distances between
the two spacecrafts and vx , vy ∈ R are the relative vertical and horizontal velocities. There are two
different feedback controllers, using the same control algorithms we used in Dubin’s car example (i.e.,
the controllers from [43, 44] and [45]). The STL specification is a reach-avoid specification (visually
depicted in Fig. 1d), which requires the system to always stay in the yellow region and eventually
reach the target rectangle T shown: ϕsat := G[1,5] (y, |y|, |vx |, |vy | ≤ −|x|, ymax , vx,max , vy,max ) ∧
F[1,5] s ∈ T . The set of initial states is I = [−0.1, 0.1] × [−0.1, 0.1]. The system is assumed to have
additive Gaussian process noise with zero mean and a diagonal covariance matrix with variances
10−4 , 10−4 , 5 × 10−8 , 5 × 10−8 . We uniformly sample 100 different initial states from I and 100 noise
values sampled from the noise distribution. We divide the dataset into Dcal and Dtest with sizes 7K
and 3K respectively. The results are shown in Table 3.
Discussion on results for Transference across case studies. We omit the column for Table 3 that
shows the proportion of Dtest of the realizations of Y2 for which the bound c1 − ϵ exceeds c2 , where
the ci ’s are the conformal bounds on ρi ’s. For all case studies this ratio was either 1.0 or close
to 1.0, establishing the empirical validity of Theorem 1. We also observe that above results show
that it is feasible to use stochastic conformance in a control improvization loop, where we want to
change a system controller (perhaps for optimizing a performance objective) while allowing only
some degradation on probabilistic safety guarantees.
16
7 Related Work
Conformance has found applications in cyber-physical system design [49, 50] as well as in drug
testing and other applications [51–53]. Our work is inspired by existing works for conformance
of deterministic systems by which we mean that systems are non-stochastic, see [21, 54] for
surveys. The authors in [23–25] considered conformance testing between hybrid system. To capture
distance between hybrid system trajectories that may exhibit discontinuities, signal metrics were
considered that simultaneously quantify distance in space and time, resembling notions of system
closeness in the hybrid systems literature [55, 56]. For instance, [23] proposes (T, J, (τ, ϵ))-closeness
where τ and ϵ capture both timing distortions and state value mismatches, respectively, and where
T and J quantify limits on the total time and number of discontinuities, respectively. A stronger
notion compared to (T, J, (τ, ϵ))-closeness was proposed in [22] by using the Skoroghod metric. The
benefit of [22] over the other notion is that it preserves the timing structure. All these works derive
transference results with respect to timed linear temporal logic or metric interval temporal logic
specifications.
Conformance of stochastic systems has been less studied. The authors in [57] propose
precision and recall conformance measures based on the notion of entropy of stochastic automata.
The authors in [58] use the Wasserstein distance to quantify distance between two stochastic systems,
which is fundamentally different from our approach. (Bi)simulation relations for stochastic systems
were studied in [59–61]. Such techniques can define behavioral relations for systems [62, 63], and
they can be used to transfer verification results between systems [64]. The authors in [65] utilize such
behavioral relations to verify RL policies between a concrete and an abstract system. We remark
that bisimulations are difficult to compute, see e.g., [66], unlike our approach. Probably closest to
our work is [33]. However, in this paper conformance is task specific which allows two systems to
be conformant w.r.t. a system specification even when the systems produce completely different
trajectories. Additionally, we consider a worst-case notion of conformance where no information
about the input that excites both stochastic systems is available.
8 Conclusion
We studied conformance of stochastic dynamical systems. Particularly, we defined conformance
between two stochastic systems as probabilistic bounds over the distribution of distances between
model trajectories. Additionally, we proposed the non-conformance risk to reason about the risk of
stochastic systems not being conformant. We showed that both notions have the transference prop-
erty, meaning that conformant systems satisfy similar system specifications. Lastly, we showed how
stochastic conformance and the non-conformance risk can be estimated from data using statistical
tools such as conformal prediction.
Acknowledgments
The authors would like to thank the anonymous reviewers for their feedback. The National Science
Foundation supported this work through the following grants: CAREER award (SHF-2048094),
CNS-1932620, CNS-2039087, FMitF-1837131, CCF-SHF-1932620, the Airbus Institute for Engi-
neering Research, and funding by Toyota R&D and Siemens Corporate Research through the USC
Center for Autonomy and AI.
17
References
[1] E. Bartocci, J. Deshmukh, A. Donzé, G. Fainekos, O. Maler, D. Ničković, and S. Sankara-
narayanan, “Specification-based monitoring of cyber-physical systems: a survey on theory,
tools and applications,” Lectures on Runtime Verification: Introductory and Advanced Topics,
pp. 135–175, 2018.
[2] Q. Thibeault, J. Anderson, A. Chandratre, G. Pedrielli, and G. Fainekos, “Psy-taliro: A python

toolbox for search-based test generation for cyber-physical systems,” in Formal Methods for
Industrial Critical Systems: 26th International Conference, FMICS 2021, Paris, France, August
24–26, 2021, Proceedings 26. Springer, 2021, pp. 223–231.
[3] T. Akazaki, S. Liu, Y. Yamagata, Y. Duan, and J. Hao, “Falsification of cyber-physical systems
using deep reinforcement learning,” in Formal Methods: 22nd International Symposium, FM
2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 15-17,
2018, Proceedings 22. Springer, 2018, pp. 456–465.
[4] J. V. Deshmukh and S. Sankaranarayanan, “Formal techniques for verification and testing of
cyber-physical systems,” Design Automation of Cyber-Physical Systems, pp. 69–105, 2019.
[5] X. Qin, N. Aréchiga, A. Best, and J. Deshmukh, “Automatic testing with reusable adversarial
agents,” 2021.
[6] C. Baier and J.-P. Katoen, Principles of model checking. MIT press, 2008.
[7] Y. Wang, M. Zarei, B. Bonakdarpour, and M. Pajic, “Statistical verification of hyperproperties

for cyber-physical systems,” ACM Transactions on Embedded Computing Systems (TECS),
vol. 18, no. 5s, pp. 1–23, 2019.
[8] A. Legay, B. Delahaye, and S. Bensalem, “Statistical model checking: An overview,” in Pro-
ceedings of the International conference on runtime verification, St. Julians, Malta, November
2010, pp. 122–135.
[9] A. Legay and M. Viswanathan, “Statistical model checking: challenges and perspectives,” In-
ternational Journal on Software Tools for Technology Transfer, vol. 17, pp. 369–376, 2015.
[10] G. Agha and K. Palmskog, “A survey of statistical model checking,” ACM Transactions on
Modeling and Computer Simulation (TOMACS), vol. 28, no. 1, pp. 1–39, 2018.
[11] A. Abate, A. Edwards, M. Giacobbe, H. Punchihewa, and D. Roy, “Quantitative verification

with neural networks for probabilistic programs and stochastic systems,” 2023.
[12] Y. Zhao and K. Y. Rozier, “Probabilistic model checking for comparative analysis of automated
air traffic control systems,” in 2014 IEEE/ACM International Conference on Computer-Aided
Design (ICCAD), 2014, pp. 690–695.
[13] X. Qin, Y. Xia, A. Zutshi, C. Fan, and J. V. Deshmukh, “Statistical verification of cyber-
physical systems using surrogate models and conformal inference,” in 2022 ACM/IEEE 13th
International Conference on Cyber-Physical Systems (ICCPS), 2022, pp. 116–126.
18
[14] L. Lindemann, L. Jiang, N. Matni, and G. J. Pappas, “Risk of stochastic systems for temporal
logic specifications,” ACM Transactions on Embedded Computing Systems, vol. 22, no. 3, pp.
1–31, 2023.
[15] P. Akella, A. Dixit, M. Ahmadi, J. W. Burdick, and A. D. Ames, “Sample-based bounds
for coherent risk measures: Applications to policy synthesis and verification,” arXiv preprint
arXiv:2204.09833, 2022.
[16] A. D. Pimentel, “Exploring exploration: A tutorial introduction to embedded systems design
space exploration,” IEEE Design & Test, vol. 34, no. 1, pp. 77–90, 2016.
[17] W. H. Schilders, H. A. Van der Vorst, and J. Rommes, Model order reduction: theory, research
aspects and applications. Springer, 2008, vol. 13.
[18] R. Alur, T. A. Henzinger, G. Lafferriere, and G. J. Pappas, “Discrete abstractions of hybrid
systems,” Proceedings of the IEEE, vol. 88, no. 7, pp. 971–984, 2000.
[19] C. Belta, B. Yordanov, and E. A. Gol, Formal methods for discrete-time dynamical systems.
Springer, 2017, vol. 15.
[20] A. S. Polydoros and L. Nalpantidis, “Survey of model-based reinforcement learning: Appli-
cations on robotics,” Journal of Intelligent & Robotic Systems, vol. 86, no. 2, pp. 153–173,
2017.
[21] H. Roehm, J. Oehlerking, M. Woehrle, and M. Althoff, “Model conformance for cyber-physical
systems: A survey,” ACM Transactions on Cyber-Physical Systems, vol. 3, no. 3, pp. 1–26,
2019.
[22] J. V. Deshmukh, R. Majumdar, and V. S. Prabhu, “Quantifying conformance using the sko-
rokhod metric,” in Computer Aided Verification: 27th International Conference, CAV 2015,
San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part II 27. Springer, 2015, pp.
234–250.
[23] H. Abbas, B. Hoxha, G. Fainekos, J. V. Deshmukh, J. Kapinski, and K. Ueda, “Conformance
testing as falsification for cyber-physical systems,” arXiv preprint arXiv:1401.5200, 2014.
[24] H. Abbas, H. Mittelmann, and G. Fainekos, “Formal property verification in a conformance
testing framework,” in 2014 Twelfth ACM/IEEE Conference on Formal Methods and Models
for Codesign (MEMOCODE). IEEE, 2014, pp. 155–164.
[25] H. Y. Abbas, Test-based falsification and conformance testing for cyber-physical systems. Ari-
zona State University, 2015.
[26] A. Majumdar and M. Pavone, “How should a robot assess risk? towards an axiomatic theory
of risk in robotics,” in Robotics Research. Springer, 2020, pp. 75–84.
[27] R. J. Tibshirani, R. Foygel Barber, E. Candes, and A. Ramdas, “Conformal prediction under
covariate shift,” in Proceedings of the Conference on Neural Information Processing Systems,
vol. 32, Vancouver, Canada, December 2019.
[28] V. Vovk, A. Gammerman, and G. Shafer, Algorithmic learning in a random world. Springer
Science & Business Media, 2005.
19
[29] O. Maler and D. Nickovic, “Monitoring temporal properties of continuous signals,” in Proceed-
ings of the Formal Techniques, Modelling and Analysis of Timed and Fault-Tolerant Systems,
Grenoble, France, September 2004, pp. 152–166.
[30] A. Donzé and O. Maler, “Robust satisfaction of temporal logic over real-valued signals,” in Pro-
ceedings of the International Conference on Formal Modeling and Analysis of Timed Systems,
Klosterneuburg, Austria, September 2010, pp. 92–106.
[31] G. E. Fainekos and G. J. Pappas, “Robustness of temporal logic specifications for continuous-
time signals,” Theoretical Computer Science, vol. 410, no. 42, pp. 4262–4291, 2009.
[32] R. M. Gray, Entropy and information theory. Springer Science & Business Media, 2011.
[33] Y. Wang, M. Zarei, B. Bonakdarpoor, and M. Pajic, “Probabilistic conformance for cyber-
physical systems,” in Proceedings of the ACM/IEEE 12th International Conference on Cyber-
Physical Systems, 2021, pp. 55–66.
[34] M. Cleaveland, L. Lindemann, R. Ivanov, and G. J. Pappas, “Risk verification of stochastic

systems with neural network controllers,” Artificial Intelligence, vol. 313, p. 103782, 2022.
[35] G. Shafer and V. Vovk, “A tutorial on conformal prediction.” Journal of Machine Learning
Research, vol. 9, no. 3, 2008.
[36] A. N. Angelopoulos and S. Bates, “A gentle introduction to conformal prediction and

distribution-free uncertainty quantification,” arXiv preprint arXiv:2107.07511, 2021.
[37] M. Fontana, G. Zeni, and S. Vantini, “Conformal prediction: A unified review of theory and
new challenges,” Bernoulli, vol. 29, no. 1, pp. 1 – 23, 2023.
[38] J. Lei, M. G’Sell, A. Rinaldo, R. J. Tibshirani, and L. Wasserman, “Distribution-free predictive

inference for regression,” Journal of the American Statistical Association, vol. 113, no. 523, pp.
1094–1111, 2018.
[39] M. Cauchois, S. Gupta, A. Ali, and J. C. Duchi, “Robust validation: Confident predictions
even when distributions shift,” arXiv preprint arXiv:2008.04267, 2020.
[40] R. Vershynin, High-dimensional probability: An introduction with applications in data science.

Cambridge university press, 2018, vol. 47.
[41] P. Massart, “The tight constant in the dvoretzky-kiefer-wolfowitz inequality,” The annals of
Probability, pp. 1269–1283, 1990.
[42] Y. Wang and F. Gao, “Deviation inequalities for an estimator of the conditional value-at-risk,”
Operations Research Letters, vol. 38, no. 3, pp. 236–239, 2010.
[43] A. P. Vinod and M. M. Oishi, “Affine controller synthesis for stochastic reachability via differ-
ence of convex programming,” in 2019 IEEE 58th Conference on Decision and Control (CDC).
IEEE, 2019, pp. 7273–7280.
[44] K. Lesser, M. Oishi, and R. S. Erwin, “Stochastic reachability for control of spacecraft relative
motion,” in 52nd IEEE Conference on Decision and Control. IEEE, 2013, pp. 4705–4712.
20
[45] M. P. Vitus and C. J. Tomlin, “On feedback design and risk allocation in chance constrained
control,” in 2011 50th IEEE Conference on Decision and Control and European Control Con-
ference. IEEE, 2011, pp. 734–739.
[46] P. Heidlauf, A. Collins, M. Bolender, and S. Bak, “Verification challenges in f-16 ground collision
avoidance and other automated maneuvers,” in ARCH@ ADHS, 2018, pp. 208–217.
[47] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driv-
ing simulator,” in Proceedings of the Conference on robot learning. Mountain View, California:
PMLR, November 2017, pp. 1–16.
[48] A. P. Vinod, J. D. Gleason, and M. M. Oishi, “Sreachtools: a matlab stochastic reachability

toolbox,” in Proceedings of the 22nd ACM international conference on hybrid systems: compu-
tation and control, 2019, pp. 33–38.
[49] X. Jin, J. V. Deshmukh, J. Kapinski, K. Ueda, and K. Butts, “Benchmarks for model trans-
formations and conformance checking,” in 1st International Workshop on Applied Verification
for Continuous and Hybrid Systems (ARCH), 2014.
[50] H. Araujo, G. Carvalho, M. Mohaqeqi, M. R. Mousavi, and A. Sampaio, “Sound conformance

testing for cyber-physical systems: Theory and implementation,” Science of Computer Pro-
gramming, vol. 162, pp. 35–54, 2018.
[51] R. Dimitrova, M. Gazda, M. R. Mousavi, S. Biewer, and H. Hermanns, “Conformance-based

doping detection for cyber-physical systems,” in Formal Techniques for Distributed Objects,
Components, and Systems: 40th IFIP WG 6.1 International Conference, FORTE 2020, Held
as Part of the 15th International Federated Conference on Distributed Computing Techniques,
DisCoTec 2020, Valletta, Malta, June 15–19, 2020, Proceedings 40. Springer, 2020, pp. 59–77.
[52] S. Biewer, R. Dimitrova, M. Fries, M. Gazda, T. Heinze, H. Hermanns, and M. R. Mousavi,

“Conformance relations and hyperproperties for doping detection in time and space,” arXiv
preprint arXiv:2012.03910, 2020.
[53] R. Dimitrova, M. Gazda, M. R. Mousavi, S. Biewer, and H. Hermanns, “Conformance-based

doping detection for cyber-physical systems,” in Formal Techniques for Distributed Objects,
Components, and Systems, A. Gotsman and A. Sokolova, Eds. Cham: Springer International
Publishing, 2020, pp. 59–77.
[54] N. Khakpour and M. R. Mousavi, “Notions of conformance testing for cyber-physical systems:
Overview and roadmap,” in 26th International Conference on Concurrency Theory (CONCUR
2015). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2015.
[55] R. Goebel, R. G. Sanfelice, and A. R. Teel, “Hybrid dynamical systems,” IEEE Control Systems,
vol. 29, no. 2, pp. 28–93, 2009.
[56] ——, Hybrid Dynamical Systems: modeling, stability, and robustness, 1st ed. Princeton, NJ:
Princeton University Press, 2012.
[57] S. J. Leemans and A. Polyvyanyy, “Stochastic-aware conformance checking: An entropy-based

approach,” in Advanced Information Systems Engineering: 32nd International Conference,
CAiSE 2020, Grenoble, France, June 8–12, 2020, Proceedings 32. Springer, 2020, pp. 217–233.
21
[58] S. J. Leemans, A. F. Syring, and W. M. van der Aalst, “Earth movers’ stochastic confor-
mance checking,” in Business Process Management Forum: BPM Forum 2019, Vienna, Aus-
tria, September 1–6, 2019, Proceedings 17. Springer, 2019, pp. 127–143.
[59] A. A. Julius, A. Girard, and G. J. Pappas, “Approximate bisimulation for a class of stochastic
hybrid systems,” in 2006 American Control Conference. IEEE, 2006, pp. 6–pp.
[60] A. A. Julius and G. J. Pappas, “Approximations of stochastic hybrid systems,” IEEE Transac-
tions on Automatic Control, vol. 54, no. 6, pp. 1193–1203, 2009.
[61] S. Haesaert and S. Soudjani, “Robust dynamic programming for temporal logic control of
stochastic systems,” IEEE Transactions on Automatic Control, vol. 66, no. 6, pp. 2496–2511,
2020.
[62] G. Bian and A. Abate, “On the relationship between bisimulation and trace equivalence in
an approximate probabilistic context,” in Foundations of Software Science and Computation
Structures: 20th International Conference, FOSSACS 2017, Held as Part of the European Joint
Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, April 22-29,
2017, Proceedings 20. Springer, 2017, pp. 321–337.
[63] A. A. Julius and A. Van Der Schaft, “Bisimulation as congruence in the behavioral setting,” in
Proceedings of the 44th IEEE Conference on Decision and Control. IEEE, 2005, pp. 814–819.
[64] K. Zhang and M. Zamani, “Infinite-step opacity of nondeterministic finite transition systems:
A bisimulation relation approach,” in 2017 IEEE 56th Annual Conference on Decision and
Control (CDC). IEEE, 2017, pp. 5615–5619.
[65] F. Delgrange, A. Nowe, and G. A. Pérez, “Wasserstein auto-encoded mdps: Formal verification
of efficiently distilled rl policies with many-sided guarantees,” arXiv preprint arXiv:2303.12558,
2023.
[66] A. Girard and G. J. Pappas, “Approximate bisimulation: A bridge between computer science
and control theory,” European Journal of Control, vol. 17, no. 5-6, pp. 568–578, 2011.
[67] S. Sadraddini and C. Belta, “Robust temporal logic model predictive control,” in Proceedings of
the 53rd Annual Allerton Conference on Communication, Control, and Computing, Monticello,
IL, September 2015, pp. 772–779.
[68] J. R. Munkres, Topology, 2nd ed. Prentice Hall, 2000.
A Semantics of Signal Temporal Logic

For a signal y : T → Rn , the semantics of an STL formula ϕ that is imposed at time t, denoted by
(y, t) |= ϕ, can be recursively computed based on the structure of ϕ using the following rules:
(y, τ ) |= True iff True,

(y, τ ) |= µ iff h(y(τ )) ≥ 0,
(y, τ ) |= ¬ϕ iff (y, τ ) ̸|= ϕ,
(y, τ ) |= ϕ′ ∧ ϕ′′ iff (y, τ ) |= ϕ′ and (y, τ ) |= ϕ′′ ,
22
(y, τ ) |= ϕ′ UI ϕ′′ iff ∃τ ′′ ∈ (τ ⊕ I) ∩ T s.t. (y, τ ′′ ) |= ϕ′′ and ∀τ ′ ∈ (τ, τ ′′ ) ∩ T, (y, τ ′ ) |= ϕ′ .
The robust semantics ρϕ (y, t) provide more information than the semantics (y, t) |= ϕ, and
indicate how robustly a specification is satisfied or violated. We first define the predicate robustness
as
distµ (y, t) := inf ∥y(t) − y ′ ∥.

y ′ ∈cl(Oµ )
where Oµ := {y ∈ Rn |h(y) ≥ 0} denotes the set of all states that satisfy the predicate µ, cl(·)
denotes the closure of a set, and ∥ · ∥ denotes a vector norm. We now can recursively calculate
ρϕ (y, t) based on the structure of ϕ using the following rules:
ρTrue (y, τ ) := ∞,
(
dist¬µ (y, τ ) if y(τ ) ∈ Oµ
ρµ (y, τ ) :=
−distµ (y, τ ) otherwise,
ρ¬ϕ (y, τ ) := −ρϕ (y, τ ),
′ ′′ ′ ′′
ρϕ ∧ϕ (y, τ ) := min(ρϕ (y, τ ), ρϕ (y, τ )),
′ ′′
′′ ′
ρϕ UI ϕ (y, τ ) := sup min ρϕ (y, τ ′′ ), inf ρϕ (y, τ ′ ) .
τ ′′ ∈(τ ⊕I)∩T τ ′ ∈(τ,τ ′′ )∩T
B Hölder Continuity of Robust Semantics ρϕ w.r.t. d∞

Proposition 4. Let ϕ be an STL specification and let d∞ : Y × Y → R be the L∞ signal metric.
Then, it holds that the robust semantics ρϕ are Hölder continuous in the first argument w.r.t. d∞
with H = γ = 1.
Proof. Let y1 , y2 ∈ Y be two deterministic signals. We would like to show that, for a fixed time
τ ∈ T, it holds that
|ρϕ (y1 , τ ) − ρϕ (y2 , τ )| ≤ d∞ (y1 , y2 ).
The proof idea is largely based on ideas from [34, Lemma 2] in which it was, however, only stated that
ρϕ (y1 , τ )−ρϕ (y2 , τ ) ≤ d∞ (y1 , y2 ). While the other direction follows trivially, we present the full proof
using the notation of this paper for the convenience of the reader. Let us now assume that the STL
formula ϕ is in positive normal form, e.g., that ϕ contains no negations. This assumption is made
without loss of generality and the result holds for any STL formula since every STL formula ϕ can be
rewritten into a semantically equivalent STL formula that is in positive normal form [67, Proposition
2]. We will perform the proof recursively on the structure of the formula ϕ.
Predicates. First note that dist¬µ (y, τ ) is a Lipschitz continuous function with Lipschitz
constant one in the sense that |distµ (y1 , τ ) − distµ (y2 , τ )| ≤ ∥y1 (τ ) − y2 (τ )∥, see for instance [68,
Chapter 3]. Accordingly, it can easily be seen that ρµ (y, τ ) is Lipschitz continuous with Lipschitz
constant one so that |ρµ (y1 , τ ) − ρµ (y2 , τ )| ≤ ∥y1 (τ ) − y2 (τ )∥ ≤ d∞ (y1 , y2 ).
Conjunctions. For conjunctions ϕ′ ∧ ϕ′′ and by the induction assumption, it holds that
′ ′ ′′ ′′
|ρ (y1 , τ ) − ρ (y2 , τ )| ≤ d∞ (y1 , y2 ) and |ρϕ (y1 , τ ) − ρϕ (y2 , τ )| ≤ d∞ (y1 , y2 ). Now, it follows that
ϕ ϕ
′ ′′ ′ ′
ρϕ ∧ϕ (y1 , τ ) = min(ρϕ (y1 , τ ), ρϕ (y1 , τ ))
23
′ ′′
≥ min(ρϕ (y2 , τ ), ρϕ (y2 , τ )) − d∞ (y1 , y2 )
′ ′′
= ρϕ ∧ϕ (y2 , τ ) − d∞ (y1 , y2 ).
Similarly, we can derive the other direction

′ ′′ ′ ′
ρϕ ∧ϕ (y1 , τ ) = min(ρϕ (y1 , τ ), ρϕ (y1 , τ ))
′ ′′
≤ min(ρϕ (y2 , τ ), ρϕ (y2 , τ )) + d∞ (y1 , y2 )
′ ′′
= ρϕ ∧ϕ (y2 , τ ) + d∞ (y1 , y2 ).
′ ′′ ′ ′′
Consequently, it follows that |ρϕ ∧ϕ (y1 , τ ) − ρϕ ∧ϕ (y2 , τ )| ≤ d∞ (y1 , y2 ).
Until. Using the induction assumption and the same reasoning as for conjunctions (it can
′ ′′ ′ ′′
be used in the same way for inf or sup operators), it follows that |ρϕ UI ϕ (y2 , τ ) − ρϕ UI ϕ (y2 , τ )| ≤
d∞ (y1 , y2 ).
C Hölder Continuity of C(y) = ρφ (y) w.r.t dsk

We recall from [22] the definition of the Skorokhod metric. In the following definition, r : T → T
is a strictly increasing, bijective function known as a retiming function, and R is the space of all
possible retiming functions
dsk (y1 , y2 ) = inf max (d∞ (r(t), t), d∞ (y1 , y2 ◦ r)) . (8)
r∈R
For every retiming r(t), the operand of the inf operator is the maximum between the magnitude of
the retiming and the d∞ distance between y1 and the retimed y2 . Skorokhod distance is then the
infimum across all retimings.
Proposition 5. Let ϕ be an STL specification and let dsk : Y × Y → R be the Skorokhod metric.
If the output signals are Lipschitz-continuous with Lipschitz constant Ky , it holds that the robust
semantics ρϕ are Hölder continuous w.r.t. dsk with H = (1 + Ky ) and γ = 1.
Proof. Let r(t) be the optimal retiming in the definition of the Skorokhod metric. Let dsk (y1 , y2 ) = α
for brevity. Under the optimal retiming r, this means that d∞ (r(t), t)) ≤ α and d∞ (y1 , y2 ◦ r) ≤ α.
From the second term, we derive
α ≥ d∞ (y1 , y2 ◦ r)
= sup ∥y1 (t) − y2 (r(t))∥
t
= sup ∥(y1 (t) − y2 (t)) − (y2 (r(t)) − y2 (t))∥
t
(a)
≥ sup ∥y1 (t) − y2 (t)∥ − sup ∥y2 (r(t)) − y2 (t)∥
t t
(b)
≥ d∞ (y1 , y2 ) − Ky sup |r(t) − t|
t
= d∞ (y1 , y2 ) − Ky α
where we used the reverse triangle inequality in (a), the fact that y2 (t) is Lipschitz continuous with
24
constant Ky in (b), and that −d∞ (r(t), t) > −α in (c). In other words, α ≥ d∞ (y1 , y2 ) − Ky α, or
(1 + Ky )ds k(y1 , y2 ) ≥ d∞ (y1 , y2 )
From Proposition 4, we know that |ρϕ (y1 ) − ρϕ (y2 )| ≤ d∞ (y1 , y2 ) so that it follows that
|ρϕ (y1 ) − ρϕ (y2 )| ≤ (1 + Ky )dsk (y1 , y2 ).
RT
D Hölder Continuity of C(y) := 0 y(t)⊤ y(t)dt w.r.t d1
RT
Proposition 6. Let C(y) := 0 y(t)⊤ y(t)dt be the performance function and let d1 : Y × Y → R
be the L1 signal metric. If the output signal is of bounded magnitude, i.e., there exists ymax s.t.
∥y(t)∥ < ymax , then it holds that the performance function C is Hölder continuous w.r.t. d1 with
H = 2ymax and γ = 1.
Proof. We can derive the following chain of arguments

Z T
C(y1 ) − C(y2 ) = (y1 (t)⊤ y1 (t) − y2 (t)⊤ y2 (t)) dt
0
Z T
= (∥y1 (t)∥22 − ∥y2 (t)∥22 )dt
0
Z T
= (∥y1 (t)∥ + ∥y2 (t)∥)(∥y1 (t)∥ − ∥y2 (t)∥)dt
0
(a)
Z T
≤ 2ymax ∥y1 (t) − y2 (t)∥dt
0
(b)
≤ 2ymax d1 (y1 , y2 )
where (a) follows by the use of the reverse triangle inequality and (b) follows by the boundedness
assumption of signals. The other direction follows similarly so that the result follows trivially.
25

Conformance Testing For Stochastic Cyber-Physical Systems

Uploaded by

Copyright:

Available Formats

Conformance Testing For Stochastic Cyber-Physical Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Conformance Testing For Stochastic Cyber-Physical Systems

Uploaded by

Copyright:

Available Formats

Conformance Testing for Stochastic Cyber-Physical Systems

Xin Qin1 , Navid Hashemi1 , Lars Lindemann1 , and Jyotirmoy V. Deshmukh1

August 15, 2023

2 Problem Statement and Preliminaries

2.1 Distance Metrics and Risk Measures

V aRβ (Z) := inf{α ∈ R|Prob(Z ≤ α) ≥ β},

CV aRβ (Z) := inf α + (1 − β)−1 E([Z − α]+ )

2.2 System specifications

ϕ ::= True | µ | ¬ϕ′ | ϕ′ ∧ ϕ′′ | ϕ′ UI ϕ′′ (1)

3 Conformance for Stochastic Input-Output Systems

R(d(Y1 , Y2 )) > r. (3)

V aRβ (d(Y1 , Y2 )) > r ⇔ Prob(d(Y1 , Y2 ) ≤ r) < β

since V aRβ (d(Y1 , Y2 )) ≤ r is equivalent to Prob(d(Y1 , Y2 ) ≤ r) ≥ β according to Section 2. Con-

Definition 2. Let U ∈ U be an unknown deterministic input signal, S1 , S2 : U ×Ω → Y be stochastic

4 Transference of System Properties under Conformance

Definition 3. Let d : Y × Y be a signal metric and C : Y → R be a performance function. Then,

|C(y1 ) − C(y2 )| ≤ Hd(y1 , y2 )γ (6)

⇒ C(y2 ) ≥ c − Hϵγ . (7)

4.1 Transference under stochastic conformance

Prob C(Y1 ) ≥ c ≥ 1 − δ̄ ⇒ Prob C(Y2 ) ≥ c − Hϵγ ≥ 1 − δ − δ̄.

Proof. By assumption, it holds that Prob(d(Y1 , Y1 ) ≤ ϵ) ≥ 1 − δ and Prob(C(Y1 ) ≥ c) ≥ 1 − δ̄ so

Prob d(Y1 , Y1 ) > ϵ ∨ C(Y1 ) < c ≤ δ + δ̄.

Prob d(Y1 , Y1 ) ≤ ϵ ∧ C(Y1 ) ≥ c ≥ 1 − δ − δ̄.

Prob inf C(Y1 ) ≥ c ≥ 1 − δ̄ ⇒ Prob inf C(Y2 ) ≥ c − Hϵγ ≥ 1 − δ − δ̄

Proof. By assumption, it holds that Prob(supU ∈U d(Y1 , Y1 ) ≤ ϵ) ≥ 1 − δ and Prob(inf U ∈U C(Y1 ) ≥

Prob sup d(Y1 , Y1 ) > ϵ ∨ inf C(Y1 ) < c ≤ δ + δ̄.

From here, we can simply see that

Prob sup d(Y1 , Y1 ) ≤ ϵ ∧ inf C(Y1 ) ≥ c ≥ 1 − δ − δ̄.

This equation tells us that, for each U ∈ U, we have

Prob d(Y1 , Y1 ) ≤ ϵ ∧ C(Y1 ) ≥ c ≥ 1 − δ − δ̄.

4.2 Transference under non-conformance risk

R(−C(Y2 )) ≤ R(−C(Y1 )) + Hr.

R(− inf C(Y2 )) ≤ R(− inf C(Y1 )) + Hr.

Proof. We can derive the following chain of inequalities

5.1 Estimating stochastic conformance

the systems S1 and S2 under the input U are (ϵ, δ)-conformant if Z̄ ≤ ϵ.

We see that checking stochastic conformance as defined in Definition 1 is computationally simple

5: Z̄ := maxŪ ∈Ū Z̄Ū

Algorithm 2 Lipschitz Constant Estimation of L

Prob(sup d(Y1 , Y2 ) ≤ Z̄ + Lκ) ≥ 1 − δ

Prob(sup d(Y1 , Y2 ) ≤ Z̄ + L̄κ) ≥ 1 − δ − δL

where Prob is defined over the randomness introduced in Algorithm 2.

Prob(d(Y1 , Y2 ) ≤ Z̄Ū + Lκ) ≥ 1 − δ.

Prob(d(Y1 (Ū , ·), Y2 (Ū , ·)) ≤ Z̄Ū ∧ L ≤ L̄) ≥ 1 − δ − δL .

The rest of the proof follows as in the first part.

5.2 Estimating non-conformance risk

6.1 Dubin’s car.

x(t+1) = x(t)+Ts v(t) cos(θ(t)) + η x (t)

y(t+1) = y(t)+Ts v(t) sin(θ(t)) + η y (t)

Case Study Spec |Dcal | |Dtest | V S(ρ1 ) V S(d∞ ) ϵ V aR(d∞ )

6.2 F-16 aircraft.

6.3 Autonomous Driving using the CARLA simulator.

6.4 Spacecraft Rendezvous

[2] Q. Thibeault, J. Anderson, A. Chandratre, G. Pedrielli, and G. Fainekos, “Psy-taliro: A python

[7] Y. Wang, M. Zarei, B. Bonakdarpour, and M. Pajic, “Statistical verification of hyperproperties

[11] A. Abate, A. Edwards, M. Giacobbe, H. Punchihewa, and D. Roy, “Quantitative verification

[34] M. Cleaveland, L. Lindemann, R. Ivanov, and G. J. Pappas, “Risk verification of stochastic

[36] A. N. Angelopoulos and S. Bates, “A gentle introduction to conformal prediction and

[38] J. Lei, M. G’Sell, A. Rinaldo, R. J. Tibshirani, and L. Wasserman, “Distribution-free predictive