research-article

Public Access

Decentralized Asynchronous Crash-resilient Runtime Verification

Authors:

Corentin TraversAuthors Info & Claims

Journal of the ACM, Volume 69, Issue 5

Article No.: 34, Pages 1 - 31

https://doi.org/10.1145/3550483

Published: 27 October 2022 Publication History

All formats PDF

Abstract

Runtime verification is a lightweight method for monitoring the formal specification of a system during its execution. It has recently been shown that a given state predicate can be monitored consistently by a set of crash-prone asynchronous distributed monitors observing the system, only if each monitor can emit verdicts taken from a large enough finite set. We revisit this impossibility result in the concrete context of linear-time logic (ltl) semantics for runtime verification, that is, when the correctness of the system is specified by an ltl formula on its execution traces. First, we show that monitors synthesized based on the 4-valued semantics of ltl (rv-ltl) may result in inconsistent distributed monitoring, even for some simple ltl formulas. More generally, given any ltl formula φ, we relate the number of different verdicts required by the monitors for consistently monitoring φ, with a specific structural characteristic of φ called its alternation number. Specifically, we show that, for every k ≥ 0, there is an ltl formula φ with alternation number k that cannot be verified at runtime by distributed monitors emitting verdicts from a set of cardinality smaller than k + 1. On the positive side, we define a family of logics, called distributed ltl (abbreviated as dltl), parameterized by k ≥ 0, which refines rv-ltl by incorporating 2k + 4 truth values. Our main contribution is to show that, for every k ≥ 0, every ltl formula φ with alternation number k can be consistently monitored by distributed monitors, each running an automaton based on a (2 ⌈ k/2 ⌉ +4)-valued logic taken from the dltl family.

1 Introduction

1.1 Context

Runtime verification is a technique where a monitor process determines whether or not the current execution of a system under inspection complies with its formal specification. The state-of-the-art runtime verification methods exhibit the following shortcomings: Either they classically employ a central monitor or they employ several monitors but assume a fault-free setting, where each individual monitor is resilient to failures [9, 13, 14, 21, 25, 26, 27, 31]. Relaxing the latter assumption, that is, handling several monitors subject to failures, poses significant challenges, as these monitors would become unable to agree on the same perspective of the execution, due to the impossibility of consensus [17]. Thus, it is unavoidable that these monitors emit different individual verdicts about the current execution, so a consistent global verdict with respect to a correctness property can be constructed from these verdicts. Concretely, the two truth values of Boolean logic may be insufficient for allowing each monitor to express a wide spectrum of individual verdicts.

The necessity of using more than just the two truth values of Boolean logic is actually a known fact in the context of runtime verification, even with a single monitor. For instance, the linear temporal logic (ltl) [28] has been one of the most widely used specification languages to express the requirements of computing systems.¹ While ltl is a widely accepted language to reason about infinite execution traces, its three-valued semantics (denoted by ${\rm\small LTL}_3$ ) [8] is a logic on finite execution traces with three truth values in:

\begin{equation*} \mathbb {B}_3= \lbrace \top , \bot , ? \rbrace . \end{equation*}

These truth values respectively express whether, given the finite trace observed so far, an ltl formula is permanently satisfied, or permanently violated, or whether the observation is inconclusive. Likewise, rv-ltl [7] has four truth values in

\begin{equation*} \mathbb {B}_4 = \lbrace \top , \bot , \top _p, \bot _p\rbrace . \end{equation*}

These values respectively identify cases where a finite execution permanently satisfies, permanently violates, presumably satisfies, or presumably violates a given ltl formula. For example, consider a request/acknowledge property, where a request r should be eventually responded to by acknowledgment a, and a should not occur before r. Formally, an ltl formula for the request/acknowledge property is

\begin{equation} \varphi _{\mathit {ra}} = □ (\lnot a \wedge \lnot r) \, \vee \, [(\lnot a \, \mathbin {\mathcal {U}}\, r) \, \wedge \, ◇ a]. \end{equation}

(1)

This formula holds if either $□ (\lnot a \wedge \lnot r)$ holds (i.e., there is no request and no acknowledgment), or $(\lnot a \mathbin {\mathcal {U}}r) \wedge (◇ a)$ holds (i.e., a request is made at present or some future state and an acknowledgment is made after this request in the future). In rv-ltl, a finite execution containing r and ending in a (i.e., the request has been acknowledged) yields the truth value “permanently satisfied,” whereas an execution containing only r (i.e., the request has not yet been acknowledged) yields “presumably violated.” Although rv-ltl can monitor $\varphi _{\mathit {ra}}$ in a centralized setting (see Figure 1 for its monitor automaton), it is not powerful enough to monitor a conjunction of two such formulas in a framework of two asynchronous unreliable monitors:

\begin{equation*} \varphi _{\mathit {ra}2} = (□ (\lnot a_1 \wedge \lnot r_1) \, \vee \, [(\lnot a_1 \, \mathbin {\mathcal {U}}\, r_1) \, \wedge \, ◇ a_1])\, \wedge \, (□ (\lnot a_2 \wedge \lnot r_2) \, \vee \, [(\lnot a_2 \, \mathbin {\mathcal {U}}\, r_2) \,\wedge \, ◇ a_2]). \end{equation*}

Indeed, the set of verdicts emitted by the monitors is not sufficient to distinguish executions that satisfy the formula from those that violate it. Intuitively (we will formally establish this result further in the text), this is because each monitor has only a partial view of the system under scrutiny, and after a finite number of rounds of communication among monitors, still too many different perspectives of the global system state remain. For instance, the case where a monitor $M_1$ has observed a partial trace containing only $r_1$ (for which it should output $\bot _p\in \mathbb {B}_4$ ) is distinct from the case where $M_1$ has observed a partial trace containing only $a_1$ . However, $M_1$ should not output $\bot$ in this latter case (of course, it should not output either $\top$ or $\top _p$ ), because it may well be the case that another monitor $M_2$ has observed $r_1$ , yet $M_1$ is not aware of this observation, because of asynchrony and unreliability.

Fig. 1.

In fact, it was recently proved in Reference [20] that even deciding whether a single system state satisfies some given Boolean predicate, using a distributed set of asynchronous crash-prone monitors, requires that the individual verdicts be taken from a set whose size depends on the predicate under scrutiny. Although this size cannot exceed the number n of monitors, it is proved that, for any $k \in [0, n]$ , there are Boolean predicates on system states that require verdicts taken from a set of size at least $k+1$ . A matching upper bound is also presented in Reference [20]. In this article, we extend the preliminary results in Reference [20] to the setting of distributed monitoring execution traces whose correctness is expressed by ltl formulas, and we provide distributed monitors defined in terms of finite automata corresponding to multi-valued logics.

1.2 Our Results

In this article, we propose a framework for distributed fault-tolerant runtime verification, where the monitors are asynchronous and subject to crash. A monitor that crashes stops executing its code and does nothing afterwards. To this end, we introduce a multi-valued temporal logic. This new logic is a refinement of rv-ltl. More specifically, we propose a family of $(2k+4)$ -valued logics, denoted by dltl, for distributed ltl. In particular, ${\rm\small DLTL}$ with $k=0$ coincides with rv-ltl. The syntax of ${\rm\small DLTL}$ is identical to the one of ltl, and its semantics is based, as rv-ltl, on both fltl [24] and ${\rm\small LTL}_3$ [8], which are two ltl-based finite trace semantics for runtime monitoring. For each $k \ge 0$ , the kth instance of the family dltl has $2k + 4$ truth values

\begin{equation*} \mathbb {B}_{2k+4} = \lbrace \top , \bot , \top _0, \bot _0, \top _1, \bot _1, \dots , \top _k, \bot _k\rbrace . \end{equation*}

The index i of a logical value intuitively represents a degree of certainty that the formula is satisfied ( $\top _i$ ) or not ( $\bot _i$ ). In a nutshell, we characterize the formulas that can be monitored at runtime by a in ${\rm\small DLTL}_k$ , but cannot be distributedly monitored in ${\rm\small DLTL}_{k-1}$ .

More specifically, our first contribution (Theorem 5.2) is a lower bound on the cardinality of the set of values used by each monitor for expressing its local verdict. We revisit the result in Reference [20] and show that this lower bound can be expressed in terms of a particular characteristic of the ltl formula under consideration, called its alternation number. Roughly, the alternation number of an ltl formula $\varphi$ is the maximum, taken over all finite traces $\alpha = \alpha _0\alpha _1 \cdots \alpha _n$ , that the valuation of $\varphi$ can alternate in the finite semantics of ltl. In other words, the alternation number of $\varphi$ is the maximum number of times $\varphi$ can change its truth value in fltl by gaining more and more information about the truth values of the atomic propositions characterizing the current system’s global state $\alpha _n$ . As opposed to Reference [20], this number of changes depends not only on the current state $\alpha _n$ of the system, but also on the sequence of preceding states (i.e., those in $\alpha$ ). We show that, for every $k\ge 0$ , there is an ltl formula $\varphi$ with alternation number k that cannot be distributedly monitored by monitors emitting verdicts from a set of cardinality smaller than $k+1$ .

Our second contribution (Theorem 6.5) is a concrete mechanism for fault-tolerant distributed runtime verification. Each monitor gets a partial view of the system’s global state, communicates with the other monitors, and then emits a verdict in ${\rm\small DLTL}$ using $2\lceil k/2 \rceil +4$ truth values, where k is the alternation number of the ltl formula under scrutiny. The sets of verdicts collectively provided by the monitors are in one-to-one correspondence with the rv-ltl verdicts that would be computed by a centralized monitor with a full view of the system. In view of our lower bound, our algorithm is essentially optimal in terms of the number of verdicts emitted by the distributed monitors (up to a small additive constant). Our mechanism is concrete in the sense that we present a monitor construction algorithm that generates a finite-state Moore machine that, for any ltl formula $\varphi$ , computes the alternation number k of $\varphi$ , and constructs the ${\rm\small DLTL}$ automaton enabling to distributedly monitor $\varphi$ using $2\lceil k/2 \rceil +4$ logical values.

We emphasize that we do not make an assumption on whether the system under scrutiny is centralized or distributed. In fact, this has no impact on our results and, hence, the type of the system is abstracted away.

We note that there is long literature on what is monitorable. The classic definition Reference [29] is that an ltl formula is monitorable if any prefix can be extended to some other finite prefix that evaluates to a permanently false or true verdict. In this sense, all safety and co-safety formulas are monitorable. However, not all monitorable formulas are either safety or co-safety. A liveness formula such as $□ ◇ p$ is not monitorable, intuitively because one cannot observe p infinitely often within a finite prefix at run time. Having said this, the above notion of monitorability is not relevant to our results in this article. First, observe that the request/acknowledgment formula is neither safety no co-safety but is monitorable. The issue here is that even for such a formula rv-ltl is not sufficient to consistently monitor the formula due to the partial observability of the monitors.

1.3 Related Work

While there has been significant progress in sequential monitoring in the past decade, there has been less work devoted to distributed monitoring. Lattice-theoretic centralized and decentralized online predicate detection in distributed systems has been studied in References [13, 25]. This line of work does not address monitoring properties with temporal requirements. This shortcoming is partially addressed in References [27, 30], but for offline monitoring. In Reference [31], the authors design a method for monitoring safety properties in distributed systems using the past-time linear temporal logic (pltl). In such a work, however, the valuation of some predicates and properties may be overlooked. This is because monitors gain knowledge about the state of the system by piggybacking on the existing communication among processes. That is, if processes rarely communicate, then monitors exchange little information and, hence, some violations of properties may remain undetected. These techniques, however, assume perfect monitors that are not subject to faults.

Runtime monitoring of ltl formulas for synchronous distributed systems where processes share a single global clock has been studied in References [9, 14]. In Reference [10], the authors introduce parallel algorithms for runtime verification of sequential programs. Our work is inspired by the research line initiated in References [18, 19, 20]. Reference [19] pioneered the investigation of distributed decision in the context of asynchronous fault-tolerant distributed computing and characterized the Boolean predicates on system states that can be distributedly monitored with verdicts chosen from sets of two or three values. The follow-up contribution [20] extended this characterization to verdicts chosen from a set of k values, for any $k\ge 2$ , and Reference [18] analyzed the specific case of monitoring the Boolean predicates on system states corresponding to checking the correctness of k-set agreement tasks.

1.4 Organization

The rest of the article is organized as follows: Section 2 presents the preliminary concepts. We introduce our model of computation for distributed monitoring in Section 3. Then, in Section 4, we show why the power of rv-ltl is insufficient to deal with fault-tolerant distributed monitoring. The notion of alternation number is presented in Section 5, while its impact on the design of dltl is discussed in Section 6. Finally, we make concluding remarks and discuss future work in Section 7.

2 Background

We recall basic concepts related to ltl and its finite semantics for runtime verification.

2.1 Linear Temporal Logic (LTL)

Let $\mathsf {AP}$ be a set of atomic propositions and $\Sigma = 2^{\mathsf {AP}}$ be the set of all possible states. A trace is a sequence $s_0s_1\cdots$ , where $s_i\in \Sigma$ for every $i\ge 0$ . We denote by $\Sigma ^*$ (respectively, $\Sigma ^\omega$ ) the set of all finite (respectively, infinite) traces. We denote the empty trace by $\epsilon$ . For a finite trace $\alpha = s_0s_1\cdots s_k$ , $|\alpha |$ denotes its length, that is, its number of states, i.e., $k+1$ . Also, for $\alpha = s_0s_1\cdots s_k$ , by $\alpha ^i$ , we mean trace $s_is_{i+1}\cdots s_k$ of $\alpha$ .

The syntax and semantics of linear temporal logic(ltl) [28] are defined for infinite traces. The syntax is defined by the following grammar:

\begin{equation*} \varphi ::= \mathit {p}\; \mid \; \lnot \varphi \;\mid \; \varphi \,\vee \,\varphi \; \mid \; ◯ \varphi \; \mid \; \varphi \,\mathbin {\mathcal {U}}\, \varphi , \end{equation*}

where $\mathit {p}\in \mathsf {AP}$ , and where $◯$ and $\mathbin {\mathcal {U}}$ are the “next” and “until” temporal operators. We view other propositional and temporal operators as abbreviations, that is, $\mathsf {true}= p \vee \lnot p$ , $\mathsf {false}= \lnot \mathsf {true}$ , $\varphi \rightarrow \psi =\lnot \varphi \vee \psi$ , $\varphi \wedge \psi = \lnot (\lnot \varphi \vee \lnot \psi)$ , $◇ \varphi = \mathsf {true}\, \mathbin {\mathcal {U}}\, \varphi$ (finally $\varphi$ ), and $□ \varphi = \lnot ◇ \lnot \varphi$ (globally $\varphi$ ).

The infinite-trace semantics of ltl is defined as follows: Let $\sigma = s_0s_1s_2\dots \in \Sigma ^\omega$ , let $i \ge 0$ , and let $\models$ denote the satisfaction.

Also, $\sigma \models \varphi$ holds if and only if $\sigma , 0 \models \varphi$ holds. For instance, the request/acknowledgment ltl formula in Equation (1) specifies that, first, if a request r is emitted, then such a request should eventually be acknowledged by a, and, second, an acknowledgment happens only in response to a request.

2.2 Logics for Runtime Verification

In the context of runtime verification, the semantics of ltl is not fully appropriate, as it is defined over infinite traces. Before we delve into the details, we note that many distributed programs are not-terminating (e.g., databases, internet services, blockchains, web servers, content delivery). However, the goal of runtime monitoring is to evaluate the health of a system by only observing finite behaviors of the system. In some cases, the monitor is able to issue a verdict that generalizes to any infinite extension (e.g., permanently false and true verdicts). In this sense, the monitor can inspect the health of a program regardless of whether it is terminating or non-terminating.

2.2.1 Finite LTL.

Finite ltl ( fltl ) [24] allows us to reason about finite traces for verifying properties at runtime. The syntax of fltl is identical to that of ltl. The semantics of fltl for both atomic propositions and Boolean operators are identical to those of ltl. fltl employs two truth values to evaluate a formula with respect to a finite trace, denoted by $\mathbb {B}_2 = \lbrace \bot ,\top \rbrace$ . We now recall the semantics of fltl for the temporal operators. Let $\varphi$ , $\varphi _1$ , and $\varphi _2$ be ltl formulas, let $\alpha = s_0s_1 \cdots s_{n}$ be a non-empty finite trace, and let $\models _F$ denote satisfaction in fltl. We have:

\begin{equation*} \left[\alpha \models _F ◯ \, \varphi \right] = {\left\lbrace \begin{array}{ll} [\alpha ^1 \models _F \varphi ] & \text{if}\; \alpha ^1 \ne \epsilon \\ \bot & \text{otherwise,} \end{array}\right.} \end{equation*}

and

\begin{equation*} \left[\alpha \models _F \varphi _1 \, \mathbin {\mathcal {U}}\, \varphi _2\right] = {\left\lbrace \begin{array}{ll}\top & \text{if}\; \exists k \in [0,n]: ([\alpha ^k \models _F \varphi _2] = \top) \;\wedge \; (\forall \ell \in [0,k), [\alpha ^\ell \models _F \varphi _1] = \top)\\ \bot & \text{otherwise.} \end{array}\right.} \end{equation*}

To illustrate the difference between ltl and fltl, consider formula $\varphi = ◇ \mathit {p}$ and finite trace $\alpha =s_0s_1\cdots s_{n}$ . If $\mathit {p}\in s_i$ for some $i \in [0, n]$ , then we have $[\alpha \models _F \varphi ] = \top$ . However, if $\mathit {p}\notin s_i$ for every $i \in [0,n]$ , then $[\alpha \models _F \varphi ] = \bot$ , and this holds even if $\alpha$ is extended to another finite sequence including a state where $\mathit {p}$ holds.

2.2.2 Three-valued Semantics for LTL.

As illustrated in the previous subsection, fltl ignores the possible future extensions of finite traces when evaluating a formula. Three-valued ltl (Ltl₃) [8] also evaluates ltl formulas for finite traces, but with an eye on possible extensions. In ${\rm\small LTL}_3$ , the set of truth values is $\mathbb {B}_3=\lbrace \top ,\bot , ?\rbrace$ , where $\top$ (respectively, $\bot$ ) denotes that the formula is permanently satisfied (respectively, violated), no matter how the current trace extends, and “?” denotes an unknown verdict—i.e., there exists an extension that can falsify the formula, and another extension that can truthify the formula. Let $\alpha \in \Sigma ^{*}$ be a non-empty finite trace. The truth value of an ${\rm\small LTL}_3$ formula $\varphi$ with respect to $\alpha$ , denoted by $[\alpha \models _3 \varphi ]$ , is defined as follows:

\begin{equation*} \left[\alpha \models _3 \varphi \right] = {\left\lbrace \begin{array}{ll} \top & \text{if }~~~~~\forall \sigma \in \Sigma ^\omega : \alpha \sigma \models \varphi \\ \bot & \text{if }~~~~~\forall \sigma \in \Sigma ^\omega : \alpha \sigma \not\models \varphi \\ \hspace{1.5pt}?& \text{otherwise}. \end{array}\right.} \end{equation*}

For example, consider formula $\varphi = □ p$ and a finite trace $\alpha =s_0s_1\cdots s_{n}$ . If $\mathit {p}\not\in s_i$ for some $i \in [0,n]$ , then $[\alpha \models _3 \varphi ] = \bot$ . That is, the formula is permanently violated. Now, consider formula $\varphi = ◇ p$ and a finite trace $\alpha =s_0s_1\cdots s_{n}$ . If $\mathit {p}\not\in s_i$ for all $i \in [0,n]$ , then $[\alpha \models _3 \varphi ] = ?$ . This is because there exist infinite extensions to $\alpha$ that can satisfy or violate $\varphi$ in the infinite semantics of ltl.

Definition 2.1.

The Ltl₃ monitor for a formula $\varphi$ is the unique deterministic finite-state machine

\begin{equation*} \mathcal {M} = (\Sigma , Q, q_0, \delta ,\lambda), \end{equation*}

where Q is the set of states, $q_0$ is the initial state, $\delta : Q \times \Sigma \rightarrow Q$ is the transition function, and $\lambda : Q \rightarrow \mathbb {B}_3$ is a function such that

\begin{equation*} \lambda \big (\delta (q_0, \alpha)\big) = \left[\alpha \models _3 \varphi \right] \end{equation*}

for every finite trace $\alpha \in \Sigma ^*$ .

For example, Figure 2 shows the monitor automaton for formula $\varphi = a \, \mathbin {\mathcal {U}}\, b$ . The function $\lambda$ for this automaton is as follows: $\lambda (q_{0}) =\ ?, \lambda (q_\bot) = \bot$ , and $\lambda (q_\top) = \top$ .

Fig. 2.

2.2.3 Four-valued Semantics for LTL (RV-LTL).

The four-valued logic rv-ltl [7] refines the truth value “?” into $\bot _p$ and $\top _p$ . That is, its set of verdicts is $\mathbb {B}_4=\lbrace \top ,\top _p,\bot _p,\bot \rbrace$ . More specifically, evaluation of a formula in rv-ltl agrees with ${\rm\small LTL}_3$ if the verdict is $\bot$ or $\top$ . Otherwise, (i.e., when the verdict in ${\rm\small LTL}_3$ is ?), rv-ltl utilizes fltl to compute a more refined truth value. Let $\alpha \in \Sigma ^{*}$ be a finite trace. The truth value of an rv-ltl formula $\varphi$ with respect to $\alpha$ , denoted by $[\alpha \models _4 \varphi ]$ , is defined as follows:

\begin{equation*} \left[\alpha \models _{4} \varphi \right] = {\left\lbrace \begin{array}{ll} \top & \text{if }~~~~~[\alpha \models _3 \varphi ] = \top \\ \bot & \text{if }~~~~~[\alpha \models _3 \varphi ] = \bot \\ \top _p & \text{if }~~~~~[\alpha \models _3 \varphi ] = \;\hspace{1.5pt}?\; \wedge \; [\alpha \models _F \varphi ] = \top \\ \bot _p & \text{if }~~~~~[\alpha \models _3 \varphi ] = \;\hspace{1.5pt}?\; \wedge \; [\alpha \models _F \varphi ] = \bot \\ \end{array}\right.} . \end{equation*}

Definition 2.2.

The rv-ltl monitor of a formula $\varphi$ is the unique deterministic finite-state machine

\begin{equation*} \mathcal {M} = (\Sigma , Q, q_0, \delta , \lambda), \end{equation*}

where Q is the set of states, $q_0$ is the initial state, $\delta : Q \times \Sigma \rightarrow Q$ is the transition function, and $\lambda : Q \rightarrow \mathbb {B}_4$ is a function such that

\begin{equation*} \lambda (\delta (q_0, \alpha)) = \left[\alpha \models _4 \varphi \right] \end{equation*}

for every finite trace $\alpha \in \Sigma ^*$ .

An algorithm that takes as input an ltl formula and constructs as output the rv-ltl monitor is described in Reference [8]. For example, Figure 1 shows the rv-ltl monitor for the request/acknowledgment formula in Equation (1).

Remark. We note that the sizes of rv-ltl and Ltl₃ monitors are exponential in the size of the input ltl formula. However, since the size of formulas is typically small, the size of corresponding monitors after determinization and minimization is not expected to be large (usually a handful of states).

3 Distributed Fault-tolerant Monitoring

In this section, we present a general computation model for asynchronous distributed fault-tolerant monitoring.

3.1 General Objective

Throughout the rest of the article, the system under inspection produces a finite trace $\alpha = s_0s_1\cdots s_k$ , and is inspected with respect to an ltl formula $\varphi$ by a set ${\mathcal {M}}= \lbrace M_1, M_2, \dots , M_n\rbrace$ of monitors. The monitors run asynchronously and are subject to crash failures. When a monitor crashes, it stops functioning, i.e., does not perform any computation step, and will never recover. For the sake of simplifying the presentation, we assume that the monitors exchange information by atomic read/write accesses to a shared memory. Indeed, our focus is to measure the impact of distributed monitoring, not to deal with the subtleties of complex communication media, and, hence, we choose the wait-free distributed computing model, which is well understood [6]. Moreover, this model is known to be equivalent, with respect to task computability, to the message-passing model, under the weak assumption that fewer than half the monitors can crash [3].

To compare the power and limitations of distributed monitoring with those of centralized monitoring, we assume that the monitors perform their observation of the system, their computation, and their emission of verdicts reflecting their vision of the current trace $\alpha =s_0s_1\cdots s_k$ w.r.t. some ltl formula $\varphi$ , before the trace is extended to $\alpha s_{k+1}$ . In other words, the distributed monitors have time to observe, compute, and output in between any two global steps of the system execution. This allows us to compare the behavior of the distributed monitor with the behavior of a centralized event-triggered monitor observing the global execution of the system.

Informally, we aim at designing distributed monitors whose outputs enable to infer the verdicts that would be produced by a centralized monitor on the same execution trace. Specifically, we will compare our distributed monitors with a centralized monitor producing verdicts in rv-ltl. That is, assuming that the distributed monitors choose their verdicts from a set V, they must be able to map the sets of verdicts produced by the monitors to the truth values in $\mathbb {B}_4=\lbrace \top ,\top _p,\bot _p,\bot \rbrace$ produced by a (centralized) rv-ltl automaton monitoring the system, and this mapping

\begin{equation*} \mu : 2^V\rightarrow \mathbb {B}_4 \end{equation*}

must guarantee the soundness condition that, for every finite trace $\alpha$ , if the distributed monitors produce a set $m \in 2^V$ of verdicts for $\alpha$ , then

\begin{equation} \mu (m) = [\alpha \models _4 \varphi ]. \end{equation}

(2)

Note that m is a set of verdicts. Indeed, each monitor observes and maintains only a partial view of the system, and so two monitors may have different perspectives on the correctness of the system. Moreover, since the monitors run asynchronously, different read/write interleavings are possible, where each interleaving may lead to a different collective set m of verdicts emitted by the monitors for the same system state.

In the remaining of the section, we formally specify distributed fault-tolerant monitoring.

3.2 LTL on Partial Traces

In the centralized setting, recall from Section 2 that a state of the system is an element of $2^\mathsf {AP}$ . We will use the notation $\lbrace \mathsf {true},\mathsf {false}\rbrace ^{|\mathsf {AP}|}$ , specifying which atomic propositions are satisfied, and which ones are not satisfied in a given state. However, in a distributed setting, each monitor in $\mathcal {M}$ has only a partial view of the system under inspection, and it may be able to observe the truthfulness of only a subset of atomic propositions, so the value of the remaining propositions are unknown to the monitor. This leads us to the definition of partial states, and partial traces (see also References [11, 12]). We fix the notation $s[p]$ to denote the “value” of proposition p in state s (i.e., from the set $\lbrace \mathsf {true}, \mathsf {false}\rbrace$ ). We use the same notation for partial states and propositions.

Definition 3.1.

Let $\widehat{\Sigma }=\lbrace \mathsf {true}, \mathsf {false}, \natural \rbrace ^{|\mathsf {AP}|}$ where $\natural$ denotes an unknown value. A partial state is an element of $\widehat{\Sigma }$ , and a partial trace is an element of $\widehat{\Sigma }^*\cup \widehat{\Sigma }^\omega$ . Given a partial state $\hat{s}$ , a state s is a completion of $\hat{s}$ if, for every $\mathit {p}\in \mathsf {AP}$ , $s[\mathit {p}]\in \lbrace \mathsf {true}, \mathsf {false}\rbrace$ , and

\begin{equation*} (\hat{s}[p]\ne \natural) \; \Rightarrow \; (s[p]=\hat{s}[p]). \end{equation*}

A trace $\alpha$ is a completion of a partial trace $\hat{\alpha }$ if $|\alpha |=|\hat{\alpha }|$ and, for every $i\ge 0$ , the ith state of $\alpha$ is a completion of the ith partial state of $\hat{\alpha }$ .

We denote by $\mathsf {cmpl}(\hat{\alpha })$ the set of all traces $\alpha$ completing the partial trace $\hat{\alpha }$ . Then, for every finite partial trace $\hat{\alpha }$ , we set

\begin{equation} \left[\hat{\alpha } \models _3 \varphi \right] = {\left\lbrace \begin{array}{ll} \top & \text{if }~~~~~\forall \alpha \in \mathsf {cmpl}(\hat{\alpha }), \forall \sigma \in \Sigma ^\omega : \alpha \sigma \models \varphi \\ \bot & \text{if }~~~~~\forall \alpha \in \mathsf {cmpl}(\hat{\alpha }), \forall \sigma \in \Sigma ^\omega : \alpha \sigma \not\models \varphi \\ \hspace{1.5pt}?& \text{otherwise}. \end{array}\right.} \end{equation}

(3)

When a state s is reached in a finite trace, each monitor in ${\mathcal {M}}$ takes a sample from s, which results in obtaining a partial state. In a sample, if the value of an atomic proposition is known, then the sampled value is consistent with state s, so the actual state is a completion of any of its samples.

Definition 3.2.

A sample of a state $s\in \Sigma$ is a partial state $\hat{s}\in \widehat{\Sigma }$ such that, for every $\mathit {p}\in \mathsf {AP}$ ,

\begin{equation*} (\hat{s}[\mathit {p}]\ne \natural) \; \Rightarrow \; (\hat{s}[\mathit {p}]=s[\mathit {p}]). \end{equation*}

We assume that two monitors M and $M^{\prime }$ cannot take inconsistent samples. That is, if $\hat{s}$ and $\hat{s}^{\prime }$ are two samples of a state s by monitors M and $M^{\prime }$ , respectively, then we assume that, for every $\mathit {p}\in \mathsf {AP}$ ,

\begin{equation*} (\hat{s}[p] \ne \hat{s}^{\prime }[p]) \; \Rightarrow \; (\hat{s}[p] = \natural \, \vee \, \hat{s}^{\prime }[p] = \natural). \end{equation*}

We say that a set of monitors covers a state if the collection of partial views of these monitors covers the value of the all atomic propositions in s. A set $\mathcal {M}$ of monitors satisfies state coverage for a state s if, for every $\mathit {p}\in \mathsf {AP}$ , there exists $M \in \mathcal {M}$ whose sample $\hat{s}$ satisfies $\hat{s}[p] \ne \natural$ . Unfortunately, distributed monitoring with monitors subject to crash failures is subject to an important limitation: State coverage cannot be guaranteed. Indeed, even if it is guaranteed that $\mathcal {M}$ initially satisfies state coverage, the presence of crashes may result in this property no longer being true during the course of execution of the system. This follows from the fact that $\mathcal {M}^{\prime } = \lbrace M_i \mid i \in I\rbrace$ may not satisfy state coverage for $I\subset [1,n]$ , even if $\mathcal {M} = \lbrace M_i \mid i\in [1,n]\rbrace$ satisfies state coverage, because the monitors $M_i$ , where $i\in [1,n]\setminus I$ , have crashed.

Since state coverage cannot be guaranteed, one must also specify the correctness of partial traces in fltl so monitors can emit non-trivial verdicts even on partial traces. In this article, we do so via an extrapolation function allowing to associate a Boolean value with each atomic proposition, even if its truth value is unknown.

Definition 3.3.

An extrapolation function is a function $\mathbf {x}=(\mathbf {x}_\mathit {p})_{\mathit {p}\in \mathsf {AP}}$ , where

\begin{equation*} \mathbf {x}_p: \lbrace \mathsf {true}, \mathsf {false}, \natural \rbrace \rightarrow \lbrace \mathsf {true}, \mathsf {false}\rbrace \end{equation*}

satisfies $\mathbf {x}_p(\mathsf {true})=\mathsf {true}$ and $\mathbf {x}_p(\mathsf {false})=\mathsf {false}$ .

Given an extrapolation function $\mathbf {x}$ , for every finite (partial) trace $\hat{\alpha }=\hat{s}_0\hat{s}_1\cdots \hat{s}_k$ , we define

\begin{equation} [\hat{\alpha } \models _{F,\mathbf {x}} \varphi ] := [\mathbf {x}(\hat{s}_0)\mathbf {x}(\hat{s}_1) \cdots \mathbf {x}(\hat{s}_k) \models _F \varphi ]. \end{equation}

(4)

In the following, we assume that all the monitors in $\mathcal {M}$ are using the same extrapolation function $\mathbf {x}$ . Note that, once ${\rm\small LTL}_3$ and fltl have been both extended to partial traces, the extension of rv-ltl to partial traces directly follows:

\begin{equation*} \left[\hat{\alpha } \models _{4} \varphi \right] = {\left\lbrace \begin{array}{ll} \top & \text{if }~~~~~[\hat{\alpha } \models _3 \varphi ] = \top \\ \bot & \text{if }~~~~~[\hat{\alpha } \models _3 \varphi ] = \bot \\ \top _p & \text{if }~~~~~[\hat{\alpha } \models _3 \varphi ] = \;\hspace{1.5pt}?\; \wedge \; [\hat{\alpha } \models _{F,\mathbf {x}} \varphi ] = \top \\ \bot _p & \text{if }~~~~~[\hat{\alpha } \models _3 \varphi ] = \;\hspace{1.5pt}?\; \wedge \; [\hat{\alpha } \models _{F,\mathbf {x}} \varphi ] = \bot \\ \end{array}\right.} . \end{equation*}

Having extended ${\rm\small LTL}_3$ and fltl to partial traces, we can therefore refine our objective by revisiting Equation (2), rephrased as

\begin{equation*} \mu (m) = [\hat{\alpha } \models _4 \varphi ], \end{equation*}

where $\hat{\alpha }$ is the partial trace of an actual trace $\alpha =s_0s_1s_2\cdots s_k$ , defined as the sequence of partial states $\hat{s}_i$ of $s_i$ resulting from the unions of all the samples of $s_i$ taken by the monitors, ${\mathcal {M}}= \lbrace M_1, M_2, \dots , M_n\rbrace$ , and m is the set of verdicts returned by the monitors after having observed $s_k$ .

Remark. The choice of the extrapolation function $\mathbf {x}$ used to extend fltl to partial traces has no impact on our setting. Therefore, in the following, for simplifying the notations, and for the sake of improving readability, we shall no longer use the “ $\hat{~}$ ” symbol for distinguishing traces from partial traces, and we shall no longer specify extrapolation using $\mathbf {x}$ . The reader must solely remember that, from this point on, any mention of ${\rm\small LTL}_3$ refers to the semantics of Equation (3), and any mention of fltl refers to the semantics of Equation (4).

3.3 A Generic Algorithm for Distributed Monitoring

3.3.1 Wait-free Computing.

Each monitor is a process, and the monitors run in the standard asynchronous read/write shared memory model [6]. Each monitor runs at its own speed, which may vary along with time and may fail by crashing (i.e., halt and never recover). We assume no bound on the number of monitors that can crash, and thus a monitor never “waits” for another monitor, since this may cause a livelock (a process waiting for an event that will never occur). This model of computation is thus often referred to as wait-free shared memory computing. Every monitor that does not crash is required to output, i.e., in the context of this article, to emit a monitoring verdict. A distributed algorithm in this setting consists, for each process, of a bounded sequence of read/write accesses to the shared memory, at the end of which an output is produced, i.e., a verdict is emitted. If the number of possible inputs is bounded (which is the case in the setting of monitoring an ltl formula as every state is of bounded size), then the lengths of such read/write sequences are bounded. We thus assume, without loss of generality, that each monitor accesses the shared memory a fixed arbitrarily large number of times before emitting a verdict (see Reference [22] for more details).

3.3.2 Wait-free Snapshots.

Consider an array $\mathsf {SM}$ of single-writer/multi-reader registers, where process (monitor) $M_i$ can write to $\mathsf {SM}[i]$ , and can read the register $\mathsf {SM}[j]$ of any other processes $M_j$ . Programming using such an array can be significantly simplified, using instead snapshot operations. A process $M_i$ can still write only to $\mathsf {SM}[i]$ , but it can read all the array $\mathsf {SM}$ in a single atomic snapshot operation. If it would be possible to stop all other processes temporarily, to allow $M_i$ to read one-by-one all registers, then $M_i$ could obtain a snapshot $\mathsf {SM}$ . However, in a wait-free system, this is not allowed.

Remarkably, it is possible to implement a snapshot operation wait-free, allowing all other processes to continue executing their operations, possibly even writing and reading concurrently. Many wait-free atomic snapshot implementations have been proposed, on top of read/write registers, e.g., References [1, 2, 5, 23]. Furthermore, implementations of snapshots on top of a message passing system have also been proposed [4, 15]. Such implementations have high computation cost, but the main purpose of this article is to study feasibility, not efficiency. Our algorithms could be implemented by simply replacing the snapshot operation by the read/write algorithm implementing the snapshots (or potentially even the implementation on top of a message passing system as mentioned above), without compromising the correctness of the results in the rest of the article.

Thus, for the sake of simplifying the presentation, all our algorithms use atomic snapshot operations. That is, we assume that a monitor can acquire the entire memory $\mathsf {SM}$ in a single atomic “global read” instruction. A view of the shared memory $\mathsf {SM}$ is merely the result of a snapshot.

Using snapshots does not artificially strengthen the power of distributed monitoring, but considerably simplifies the presentation of the algorithms and their analysis. Indeed, snapshots are ordered by inclusion, because they return the contents of the shared memory that existed at some point in time, between the invocation of the snapshot operation, and the moment the operation returns. Thus, two snapshot operations may return the same view if they took effect simultaneously. Otherwise, one returns a view at some point in time, and the other a view of the contents of the shared memory at a later time. In this sense, we have the following statement:

Lemma 3.4 (Attiya et al. [5]).

The snapshots are ordered by inclusion, i.e., for any two monitors $M_i$ and $M_j$ , and any two snapshots of these monitors returning two views $w_i$ and $w_j$ of the shared memory, we have either $w_i\subseteq w_j$ or $w_i \supseteq w_j$ .

3.3.3 A Generic RV Algorithm.

As mentioned earlier, RV is concerned with verifying finite traces. Distributed monitoring works as follows: Let $s_0s_1s_2\cdots s_k$ be a finite trace under scrutiny. We perform a sequence of phases, where each phase $j \in [0, k]$ consists in evaluating the correctness of the trace $s_0s_1\dots s_j$ . That is, at phase j, each monitor receives a sample from state $s_j$ , which forms its input, then performs a fixed number R of access to the shared memory, after which it produces its verdict regarding the trace $s_0s_1\cdots s_j$ . We now describe this process in more detail.

Each monitor $M_i \in \mathcal {M}$ , where $i\in [1, n]$ , is provided with a local memory, $\mathsf {lm}_i$ . The shared memory is denoted by $\mathsf {SM}$ . For the sake of establishing a strong lower bound, we consider protocols that are not subject to any constraints in terms of how much data can be stored, and how much data can be transferred at once during a read (snapshot) or a write. In other words, we consider full knowledge protocols [22]. (Note, however, that our upper bound will be shown efficient in terms of both memory storage and bandwidth utilization).

Both the shared memory and the local memories are organized in levels, where, for every $j \in [0,k]$ , both the jth level $\mathsf {SM}[j]$ and $\mathsf {lm}_i[j]$ , $i \in [1,n]$ store data used when considering state $s_j$ of the monitored trace. Moreover, the jth level of the shared memory is organized in $R\cdot n$ registers, where n is the number of monitors, and R denotes the number of rounds of read/write instructions. Specifically, $\mathsf {SM}[j][r,i]$ stores data written by $M_i$ during its rth write. Similarly, the local memory of $M_i$ is organized in $R+2$ registers, where $\mathsf {lm}[j][0]$ stores the sample of $s_j$ by $M_i$ , and, for $1\le r \le R$ , $\mathsf {lm}[j][r]$ stores data extracted by $M_i$ from the shared memory during its rth read. (An extra level $\mathsf {lm}[j][R+1]$ is used for synchronization, as explained below.) We assume that all variables are initialized to $\natural$ .

Each monitor $M_i \in \mathcal {M}$ , $i\in [1, n]$ , runs Algorithm 1 that we detail next. First, before sampling $s_j$ , each monitor takes a snapshot of the shared memory. This is to make sure that all the monitors share the same information about the partial trace resulting from the global observation of $s_0s_1\cdots s_{j-1}$ . Indeed, recall that it is assumed that all non-faulty monitors sample, compute, and emit their verdict in between every two consecutive steps of the system. Thus, when $M_i$ starts considering $s_j$ , all non-faulty monitors have emitted their verdict about $s_0s_1\dots s_{j-1}$ . In particular, the values of all the atomic propositions of $s_{j-1}$ that are covered by the set of non-faulty monitors have been written in shared memory when $M_i$ samples $s_j$ . The instructions performed in Lines 1 and 2 allow $M_i$ to get all such values. As a consequence, for any two monitors $M_i$ and $M_{i^{\prime }}$ monitoring $s_0s_1\dots s_j$ , it holds

\begin{equation*} \forall p \in [0, j-1], \;\; \mathsf {lm}_i[p][R+1] = \mathsf {lm}_{i^{\prime }}[p][R+1]. \end{equation*}

That is, they agree on $s_0s_1\cdots s_{j-1}$ .

For any given new state $s_j$ , monitor $M_i$ takes a sample from state $s_j$ (cf. Line 3), which is stored in local memory $\mathsf {lm}_i[j][0]$ , at the 0th level. (Recall that the value of an atomic proposition in a sample is either $\mathsf {true}$ , $\mathsf {false}$ , or $\natural$ .) After sampling, each monitor $M_i$ executes a sequence of write/snapshot actions (cf. Lines 5 and 6) for some a priori known number of times R. More precisely, in Line 5, at the rth iteration, $M_i$ atomically writes all its knowledge accumulated so far, i.e., during the $r-1$ previous rounds of read/write instructions. This knowledge is stored at the rth level of the shared memory, in the register dedicated to data from monitor $M_i$ . In Line 6, $M_i$ reads all the registers in $\mathsf {SM}[j]$ , and copies them into $\mathsf {lm}_i[j][r]$ , in a single atomic step.

The R iterations of the for-loop allow $M_i$ to collect information about the current state $s_j$ . After R iterations, the for-loop ends, and $M_i$ emits a verdict based all the knowledge accumulated in its local memory. For our lower bound, we impose no restriction on the way this verdict is computed. However, for our upper bound, this verdict will be computed solely based on evaluating $\varphi$ on the partial trace accumulated by $M_i$ . Note that, even for a large R, $M_i$ may still not be aware of all the atomic propositions of $s_j$ , simply because the monitors that were covering these atomic propositions may be slow and may have not yet reported their samples in the shared memory. Also note that there is no point in waiting for the slow monitors, since it may well be the case that they have actually crashed, and waiting for them would yield a livelock.

A distributed-monitoring algorithm is an instantiation of the generic algorithm depicted in Algorithm 1. A concrete example of such an instantiation is provided in Section 4. Note that the generic Algorithm 1 takes full advantage of the total power of distributed wait-free computing.

3.4 Statement of the Problem

For any state $s_j$ , when a set of monitors execute Algorithm 1, different interleavings, and hence different sets of verdicts, are possible. Global consistency is the property enabling to map the set of verdicts of the distributed monitors to the verdict of a centralized monitor that has the view of states identical to the cumulated views of the monitors. More specifically, given a state $s_j$ , the cover of $s_j$ is the partial state $\hat{s}_j$ such that, for every $\mathit {p}\in \mathsf {AP}$ , $\hat{s}[\mathit {p}]\ne \natural$ if and only if $\mathit {p}$ is in the sample of $s_j$ by some non-faulty monitor $M_i$ . From this point on, any reference to an execution trace $\alpha =s_0s_1\cdots s_j$ actually refers to the sequence of states covered by the monitors.

A monitor trace for an execution trace $\alpha =s_0s_1\cdots s_k$ is a sequence $m = m_0m_1\cdots m_k$ , where, for every $j\in [0,k]$ , $m_j \subseteq V$ for some verdict set V, and each element of each $m_j$ is the verdict of some monitor $M_i \in \mathcal {M}$ emitted when considering state $s_j$ . Let $\varphi$ be an ltl formula, and let $\alpha =s_0s_1\cdots s_k$ be a finite (partial) trace corresponding to the sequence of (partial) states covered by the monitors.

Definition 3.5.

A monitor trace $m=m_0m_1\dots m_k$ with verdict set V satisfies global consistency for $\alpha$ with interpretation

\begin{equation*} \mu : 2^V \rightarrow \mathbb {B}_4 \end{equation*}

if, for every $0\le j \le k$ , if no monitors crash between the time when the system enters state $s_j$ and the time when the system leaves state $s_j$ , then

\begin{equation*} \mu (m_j) = [s_0s_1\cdots s_j\models _4 \varphi ]. \end{equation*}

Note that $\mathit {p}\in \mathsf {AP}$ might be in the sample of a monitor observing the system in state $s_j$ , but this monitor may crash before reporting this sample to the shared memory or may report this sample in the shared memory before crashing, but does it so late that no other monitors can see this sample (because asynchrony and failures prevent any monitor from waiting for any other monitor). This is why global consistency is required to hold only if no monitors crash when monitoring state $s_j$ .

Definition 3.6.

Let $\mathcal {A}$ be an instantiation of Algorithm 1 for an ltl formula $\varphi$ with verdict set V. Algorithm $\mathcal {A}$ is sound for rv-ltlif there exists a function $\mu : 2^V \rightarrow \mathbb {B}_4$ such that, for every finite (partial) trace $\alpha \in \Sigma ^*$ covered by the monitors and for every monitor trace m produced by $\mathcal {A}$ for $\alpha$ , m satisfies global consistency for $\alpha$ with interpretation $\mu$ .

The problem:. Given an ltl formula $\varphi$ , design an instantiation $\mathcal {A}$ of Algorithm 1 that correctly monitors $\varphi$ , with monitors emitting verdicts picked from a small set V of values.

In particular, is any ltl formula $\varphi$ correctly distributedly monitorable using $\mathbb {B}_4$ as verdict set for the monitors? The next section shows that the answer to this question is negative. However, further ahead in the text, it will be shown that, for every ltl formula $\varphi$ , there is a distributed algorithm that correctly monitors $\varphi$ with verdicts picked from the set of logical values of a multi-valued logic extending rv-ltl, whose cardinality is related neither to $|\mathsf {AP}|$ nor to $|\mathcal {M}|$ , but to a specific characteristic of the formula $\varphi$ .

4 Distributed Monitoring Using Rv-ltl

In this section, we pursue two goals. First, in Section 4.1, we modify Algorithm 1, so each monitor emits a verdict in $\mathbb {B}_4$ , that is, truth values of rv-ltl. This constructs Algorithm 2, which we describe in detail. Then, in Section 4.2, we provide a concrete example of how distributed monitors can successfully monitor an ltl formula using Algorithm 2. In Section 4.3, we discuss our second goal and show that Algorithm 2 cannot monitor any ltl formula while ensuring soundness. In Section 5, we generalize this negative result to an impossibility result for fault-tolerant monitoring.

4.1 Distributed Monitoring with Verdicts in RV-LTL

As in the generic case, the local memory $\mathsf {lm}_i$ of monitor $M_i$ is organized in levels, one for each state of the monitored trace. The same holds for the shared memory. For every $k\ge 0$ , $\mathsf {lm}_i[k]$ stores a partial state, i.e., an $|\mathsf {AP}|$ -dimensional vector with values in $\lbrace \mathsf {true}, \mathsf {false},\natural \rbrace$ . For every $k\ge 0$ , and every $i\in [1,n]$ , $\mathsf {SM}[k][i]$ stores a partial state, i.e., $\mathsf {SM}[k][i][\mathit {p}]\in \lbrace \mathsf {true}, \mathsf {false},\natural \rbrace$ stores the value in $s_k$ of the atomic proposition $\mathit {p}\in \mathsf {AP}$ , as written by monitor $M_i$ . Every monitor $M_i$ also uses an auxiliary storage variable $\mathsf {lm}^{\prime }_i$ for local computation, which has the same format as one level of the shared memory, i.e., $\mathsf {lm}^{\prime }_i$ stores one partial state for each monitor $M_i$ . Again, we assume that all variables are initialized to $\natural$ .

Algorithm 2 proceeds as follows: As in Algorithm 1, Lines 1–5 allow all non-faulty monitors observing $s_k$ to share the same information about the partial trace resulting from the global observation of $s_0s_1\cdots s_{k-1}$ . That is, for any monitor $M_i$ and sampling $s_k$ in Line 6, it holds

\begin{equation*} \mathsf {lm}_i[0]\mathsf {lm}_i[1]\cdots \mathsf {lm}_i[k-1] = s_0s_1\cdots s_{k-1}. \end{equation*}

Let us now focus on the core of the algorithm. In Line 6, the monitor takes a sample of the current state $s_k$ . This sample gives $M_i$ the value of some atomic propositions $\mathit {p}\in \mathsf {AP}$ , in which case $\mathsf {lm}_i[k][\mathit {p}]\in \lbrace \mathsf {true},\mathsf {false}\rbrace$ , but $M_i$ may not become aware of some other atomic propositions $\mathit {p}^{\prime }\in \mathsf {AP}$ , in which case $\mathsf {lm}_i[k][\mathit {p}^{\prime }]=\natural$ . Then, only one round of the generic algorithm is run. That is, $M_i$ writes its partial view of $s_k$ (Line 7), and takes a snapshot of the shared memory (Line 8) with the objective of getting the values of atomic propositions of $s_k$ that it is missing in its view. If there is indeed such a proposition $\mathit {p}$ in its snapshot, then $M_i$ adds this value in its partial view of $s_k$ , in Line 11.

For emitting its verdict, monitor $M_i$ evaluates trace $\mathsf {lm}_i[0]\cdots \mathsf {lm}_i[k]$ in rv-ltl, that is, its verdict is the truth value in $\mathbb {B}_4$ equal to:

\begin{equation*} [\mathsf {lm}_i[0]\mathsf {lm}_i[1]\cdots \mathsf {lm}_i[k] \models _4 \varphi ]. \end{equation*}

Algorithm 2 is probably the most natural way of providing fault-tolerant distributed monitoring. However, as we show in the next subsection, rv-ltl is far from being sufficient, and even simple ltl formulas cannot be evaluated using distributed monitors using rv-ltl.

4.2 A Positive Example for Distributed Monitoring Using RV-LTL

Let $\mathcal {M} = \lbrace M_1, M_2\rbrace$ , and let us consider monitoring the aforementioned request-acknowledgment formula

\begin{equation*} \varphi _{\mathit {ra}} = □ (\lnot a \wedge \lnot r) \, \vee \, ((\lnot a \, \mathbin {\mathcal {U}}\, r) \wedge ◇ a). \end{equation*}

We represent a (partial) state in a finite trace for $\varphi _{\mathit {ra}}$ as a vector

\begin{equation*} s = \begin{pmatrix} r \\ a \end{pmatrix} , \end{equation*}

where the propositions range over $\lbrace \mathsf {true},\mathsf {false},\natural \rbrace$ . Let us assume that atomic proposition $\natural$ is extrapolated to $\mathsf {false}$ (we will show that the choice of extrapolation does not matter). Using a central monitor, evaluation in rv-ltl should return the following verdicts:

where each column represents a trace of length one (i.e., a single state). In a distributed setting, a monitor may observe the following corresponding partial states and return verdicts in rv-ltl:

\begin{equation*} \begin{array}{c|c|c|c|c|c|c|c|c|c|} \begin{pmatrix} r \\ a \end{pmatrix} & \begin{pmatrix} \natural \\ \natural \end{pmatrix} & \begin{pmatrix} \natural \\ \mathsf {false}\end{pmatrix} & \begin{pmatrix} \natural \\ \mathsf {true}\end{pmatrix} & \begin{pmatrix} \mathsf {false}\\ \natural \end{pmatrix} & \begin{pmatrix} \mathsf {false}\\ \mathsf {false}\end{pmatrix} & \begin{pmatrix} \mathsf {false}\\ \mathsf {true}\end{pmatrix} & \begin{pmatrix} \mathsf {true}\\ \natural \end{pmatrix} & \begin{pmatrix} \mathsf {true}\\ \mathsf {false}\end{pmatrix} & \begin{pmatrix} \mathsf {true}\\ \mathsf {true}\end{pmatrix} \\ \hline verdict & \top _p & \top _p & \bot & \top _p & \top _p & \bot & \bot _p & \bot _p & \top \\ \hline \end{array} \ . \end{equation*}

Thanks to Lemma 3.4, the sets of possible verdicts returned by a collection of distributed monitors observing the system are, for the four possible scenarios:

\begin{equation*} \begin{array}{c|c|c|c|c|} \begin{pmatrix} r \\ a \end{pmatrix} & \begin{pmatrix} \mathsf {false}\\ \mathsf {false}\end{pmatrix} & \begin{pmatrix} \mathsf {true}\\ \mathsf {false}\end{pmatrix} & \begin{pmatrix} \mathsf {false}\\ \mathsf {true}\end{pmatrix} & \begin{pmatrix} \mathsf {true}\\ \mathsf {true}\end{pmatrix} \\ \hline verdict sets & \lbrace \top _p\rbrace & \lbrace \bot _p\rbrace or \lbrace \top _p,\bot _p\rbrace & \lbrace \bot \rbrace or \lbrace \top _p,\bot \rbrace & \lbrace \top \rbrace or \lbrace \top ,\top _p\rbrace or \lbrace \top ,\bot _p\rbrace or \lbrace \top ,\bot \rbrace \\ \hline \end{array} \ . \end{equation*}

Let us define the following interpretation function: For every non-empty $m\subseteq \mathbb {B}_4=\lbrace \top ,\bot ,\top _p,\bot _p\rbrace$ ,

\begin{equation*} \mu (m) = \left\lbrace \!\! \begin{array}{ll} \top & \mbox{if $\top \in m$} \\ \bot & \mbox{if $\top \notin m$ and $\bot \in m$} \\ \bot _p & \mbox{if $m \cap \lbrace \top ,\bot \rbrace =\emptyset $ and $\bot _p\in m$} \\ \top _p & \mbox{otherwise.} \\ \end{array} \right. \end{equation*}

With such an interpretation function, we do have

\begin{equation*} \mu (m) = [s \models _4 \varphi _{\mathit {ra}}], \end{equation*}

as desired. This analysis can be extended to traces, and to monitor traces, establishing that Algorithm 2 correctly monitors $\varphi _{\mathit {ra}}$ in rv-ltl.

4.3 A Counterexample to Distributed Monitoring Using RV-LTL

Let $\mathcal {M} = \lbrace M_1, M_2\rbrace$ and let us consider the ltl formula for two requests and two acknowledgments:

4.3.1 Negative Example of Monitoring φ _ra2.

Figure 3 shows a concrete finite trace $\alpha$ and its corresponding monitor trace resulting from running Algorithm 2, where f stands for $\mathsf {false}$ , and t stands for $\mathsf {true}$ (in this example, too, $\natural$ is extrapolated to $\mathsf {false}$ ). It also shows the content of the local memories of two monitors $M_1$ and $M_2$ monitoring $\alpha$ , as well as their individual evaluations of $\varphi _{\mathit {ra}2}$ with respect to the observed trace. For instance, for $s_0$ , let:

\begin{equation*} \mbox{sample}_1(s_0)= \begin{pmatrix} \mathsf {true}\\ \natural \\ \mathsf {false}\\ \mathsf {false}\end{pmatrix} \;\;\;\; \mbox{sample}_2(s_0) = \begin{pmatrix} \mathsf {true}\\ \mathsf {true}\\ \natural \\ \mathsf {false}\end{pmatrix}, \end{equation*}

where each vector shows the value of propositions $r_1$ , $a_1$ , $r_2$ , and $a_2$ . Then, when $M_1$ and $M_2$ perform the write-snapshot instructions of Lines 7 and 8 of Algorithm 2, Figure 3 illustrates an execution in which $M_1$ does not get any new information ( $M_1$ took the snapshot before $M_2$ wrote), while $M_2$ gets the partial trace sampled by $M_1$ . As a result,

\begin{equation*} \mathsf {lm}_1[0] = \begin{pmatrix} \mathsf {true}\\ \natural \\ \mathsf {false}\\ \mathsf {false}\end{pmatrix}\;\;\; \mathsf {lm}_2[0] = \begin{pmatrix} \mathsf {true}\\ \mathsf {true}\\ \mathsf {false}\\ \mathsf {false}\end{pmatrix}. \end{equation*}

It follows that $M_1$ emits

\begin{equation*} \bot _p= \big [ \mathsf {lm}_1[0] \models _4 \varphi _{\mathit {ra}2}\big ], \end{equation*}

while $M_2$ emits

\begin{equation*} \top _p= \big [ \mathsf {lm}_2[0] \models _4 \varphi _{\mathit {ra}2}\big ]. \end{equation*}

Since $[s_0 \models _4 \varphi _{\mathit {ra}2}]=\top _p$ , it must be case that the set of verdicts $m_0=\lbrace \top _p,\bot _p\rbrace$ is interpreted as $\top _p$ , i.e.,

\begin{equation*} \mu (m_0)=\top _p. \end{equation*}

Fig. 3.

A contradiction can be observed when considering $M_1$ and $M_2$ observing $s_0s_1s_2$ . Indeed, in this case, too, the set of verdicts emitted by the monitors can be $m_2=m_0=\lbrace \top _p,\bot _p\rbrace$ for some interleaving of the write-snapshot instruction. However, $[s_0s_1s_2 \models _4 \varphi _{\mathit {ra}2}]=\bot _p$ . Therefore, we get

\begin{equation*} \mu (m_2)\ne [s_0s_1s_2 \models _4 \varphi _{\mathit {ra}2}]. \end{equation*}

That is, Algorithm 2 does not correctly monitor $\varphi _{\mathit {ra}2}$ .

4.3.2 Negative Result on Monitoring a Single State for φ_ra2.

We show that Algorithm 2 does not even correctly monitor $\varphi _{\mathit {ra}2}$ on a single state. Figure 4 shows different execution interleavings of monitors $M_1$ and $M_2$ when running Algorithm 2 from two different states

\begin{equation*} s_0 = \lbrace r_1, a_1\rbrace , \end{equation*}

and

\begin{equation*} s_0^{\prime } = \lbrace r_1, a_1, r_2\rbrace . \end{equation*}

Again, let us represent a state in a partial trace for $\varphi _{\mathit {ra}2}$ as a vector

\begin{equation*} s = \begin{pmatrix} r_1 \\ a_1 \\ r_2 \\ a_2 \end{pmatrix} \end{equation*}

with entries in $\lbrace \mathsf {true},\mathsf {false},\natural \rbrace$ . In case of $s_0$ , after executing Line 6 of Algorithm 2, monitors’ samples consist of

\begin{equation*} \mathsf {lm}_1[0] = \begin{pmatrix} \mathsf {true}\\ \natural \\ \mathsf {false}\\ \mathsf {false}\end{pmatrix}, \;\; \mbox{and} \;\; \mathsf {lm}_2[0] = \begin{pmatrix} \mathsf {true}\\ \mathsf {true}\\ \natural \\ \mathsf {false}\end{pmatrix}. \end{equation*}

Likewise, for state $s^{\prime }_0$ , Figure 4 shows different local snapshots by $M_1$ and $M_2$ . The verdict depends on the different interleavings of write/snapshot. In Figure 4, $M_1, M_2$ (respectively, $M_2, M_1$ ) denotes the case where monitor $M_1$ (respectively, $M_2$ ) executes write-snapshot instructions (Lines 7–8 of Algorithm 2) before monitor $M_2$ (respectively, $M_1$ ) does, and $M_1|| M_2$ denotes the case where monitors $M_1$ and $M_2$ execute their write-snapshot actions concurrently.

Fig. 4.

Figure 4 shows that rv-ltl is unable to consistently monitor $\varphi _{\mathit {ra}2}$ . More precisely, observe that, in the figure, the shaded collective verdicts $m_0$ and $m^{\prime }_0$ , for trace $s_0$ and trace $s^{\prime }_0$ , respectively, are identical, both equal to $\lbrace \bot _p, \top _p\rbrace$ , while $[s_0 \models _4 \varphi _{\mathit {ra}2}] \ne [s^{\prime }_0 \models _4 \varphi _{\mathit {ra}2}]$ . Specifically, let us consider the following scenarios:

Scenario 1:

Starting from state $s_0$ with $M_1, M_2$ interleaving, we have $[\mathsf {lm}_1[0] \models _4 \varphi _{\mathit {ra}2}] = \bot _p$ and $[\mathsf {lm}_2[0] \models _4 \varphi _{\mathit {ra}2}] = \top _p$ . That is, the collective set of local verdicts is $m_0 = \lbrace \bot _p, \top _p\rbrace$ .

Scenario 2:

Starting from state $s^{\prime }_0$ , with $M_2, M_1$ interleaving, we have $[ \mathsf {lm}^{\prime }_1[0] \models _4 \varphi _{\mathit {ra}_2}] = \bot _p$ and $[\mathsf {lm}^{\prime }_2[1] \models _4 \varphi _{\mathit {ra}_2}] = \top _p$ . That is, the collective set of local verdicts is $m^{\prime }_0 = \lbrace \bot _p, \top _p\rbrace$ .

Therefore, although the valuations of $\varphi _{\mathit {ra}_2}$ for two finite traces $s_0$ and $s^{\prime }_0$ are different in rv-ltl (i.e., $\top _p$ and $\bot _p$ , respectively), the collective set of verdicts emitted by monitors $M_1$ and $M_2$ in the above two scenarios are identical (i.e., $\lbrace \bot _p, \top _p\rbrace$ ). That is,

\begin{equation*} [s_0 \models _4 \varphi _{\mathit {ra}_2}] \ne [s^{\prime }_0 \models _4 \varphi _{\mathit {ra}_2}], \end{equation*}

but $\mu (m_0) = \mu (m^{\prime }_0)$ for any $\mu$ , and, thus, $\varphi _{\mathit {ra}2}$ is not correctly monitored, even on traces consisting in a single state.

We summarize the discussions in this section by the following:

Property 4.1.

Not all ltl formulas can be consistently monitored by a 1-round distributed monitor with traces in rv-ltl. In particular, the ltl formula $\varphi _{\mathit {ra}_2}$ cannot be monitored by a 1-round distributed monitor with traces in rv-ltl, even on traces consisting of a single state, even if monitors satisfy state coverage, and even if no monitors crash during the execution.

The above results yield several questions. Do they hold only because Algorithm 2 does not perform sufficiently many communication rounds? Do they hold because the monitors exchange only partial states? Do they hold because the four possible individual verdicts are interpreted as logical values in $\mathbb {B}_4$ ? In the next section, we answer all these questions negatively: even the full-information Algorithm 1 cannot distributedly monitor ltl formula $\varphi _{\mathit {ra}_2}$ with a verdict set of cardinality 4, independently from its number of rounds $R\ge 1$ .

5 Distributed Monitoring Requires Large Verdict Sets

In this section, we introduce a parameter that will be shown to have a strong impact on distributed monitoring, namely, the alternation number of an ltl formula. In particular, in this section, we show that, for every $k\ge 0$ , there is an ltl formula $\varphi$ with alternation number k that cannot be distributedly monitored by monitors emitting verdicts from a set of cardinality smaller than $k+1$ . This lower bound is an adaption of the lower bound in Reference [20], which deals with states whose correctness is specified by Boolean logic, to execution traces whose correctness is specified by linear temporal logic. In the next section, we shall show that the alternation number also essentially determines an upper bound on the number of truth values needed to ensure consistency in distributed monitoring, using truth values from a properly defined multi-valued logic.

5.1 Alternation Number

Let $\alpha \in \Sigma ^*$ be a finite trace, and let $\alpha ^{\prime }$ be the longest proper prefix of $\alpha$ , i.e., $\alpha =\alpha ^{\prime } s$ , where $\alpha ^{\prime } \in \Sigma ^*$ and $s\in \Sigma$ . Let $\varphi$ be an ltl formula. We set the alternation number of $\varphi$ with respect to $\alpha$ , denoted by $\mathsf {altern}(\varphi , \alpha)$ , as follows: First, for full generality, we do not define the alternation number of $\varphi$ solely for traces, but also for partial traces. That is, in state s, proposition $\mathit {p}\in \mathsf {AP}$ can be $\mathsf {true}$ , $\mathsf {false}$ , or unknown ( $\natural$ ). Given two partial states s and $s^{\prime }$ , we set

\begin{equation*} s^{\prime }\prec s \end{equation*}

if the following two conditions hold:

•

$\forall \mathit {p}\in \mathsf {AP}: (s^{\prime }[\mathit {p}]\in \lbrace \mathsf {true}, \mathsf {false}\rbrace \; \Rightarrow \; s[\mathit {p}]=s^{\prime }[\mathit {p}])$ ;

•

$\exists \mathit {p}\in \mathsf {AP}: (s^{\prime }[\mathit {p}]=\natural \; \wedge \; s[\mathit {p}]\in \lbrace \mathsf {true},\mathsf {false}\rbrace)$ .

We denote by $s^\natural$ the partial state in which all atomic propositions are unknown.

Definition 5.1.

The alternation number of an ltl formula $\varphi$ with respect to a finite partial trace $\alpha =\alpha ^{\prime } s$ with $\alpha ^{\prime }\in \Sigma ^*$ and $s\in \Sigma$ , denoted by $\mathsf {altern}(\varphi , \alpha)$ , is the maximum integer $\ell \ge 0$ , such that there exists a sequence of partial states $s_0s_1\cdots s_\ell$ with $s_0=s^\natural$ , $s_\ell =s$ , and, for every $i \in \lbrace 0,1,\dots ,\ell -1\rbrace$ ,

\begin{equation*} (s_i\prec s_{i+1}) \;\; \wedge \;\; ([\alpha ^{\prime } s_{i} \models _F \varphi ] \ne [\alpha ^{\prime } s_{i+1} \models _F \varphi ]). \end{equation*}

The alternation number of an ltl formula $\varphi$ is $\mathsf {altern}(\varphi) = \max \lbrace \mathsf {altern}(\varphi , \alpha) \mid \alpha \in \Sigma ^* \rbrace$ .

It directly follows from this definition that, for any ltl formula $\varphi$ , its alternation number is bounded by its number of atomic propositions, i.e.,

\begin{equation*} \mathsf {altern}(\varphi)\le |\mathsf {AP}|. \end{equation*}

However, the alternation number can be much smaller than the number of atomic propositions. For instance,

\begin{equation*} \varphi = x_1 \wedge x_2 \wedge \dots \wedge x_t \end{equation*}

satisfies $|\mathsf {AP}|=t$ and $\mathsf {altern}(\varphi)=1$ (assuming that the evaluation of a partial trace is performed by replacing all $\natural$ by $\mathsf {false}$ ). Let us consider a few examples.

•

$\mathsf {altern}(□ p) = 1$ , since once p is $\mathsf {false}$ , the formula can never evaluate to $\top$ .

•

$\mathsf {altern}(□ (r \rightarrow ◇ a)) = 2$ , as witnessed by the partial states

\begin{equation*} {r \choose a} = {\natural \choose \natural } {\mathsf {true}\choose \natural } {\mathsf {true}\choose \mathsf {true}} , \end{equation*}

which evaluate to $\top ,\bot ,\top$ , respectively, in fltl, when we extrapolate all $\natural$ to $\mathsf {false}$ .

•

$\mathsf {altern}(\varphi _{\mathit {ra}}) = \mathsf {altern}\big (□ (\lnot a \wedge \lnot r) \; \vee \, [(\lnot a \, \mathbin {\mathcal {U}}\, r) \,\wedge \, ◇ a]\big) = 2$ with

\begin{equation*} {r \choose a} = {\natural \choose \natural } {\natural \choose \mathsf {true}} {\mathsf {true}\choose \mathsf {true}} , \end{equation*}

which evaluate to $\top ,\bot ,\top$ , respectively, in fltl, when we extrapolate all $\natural$ to $\mathsf {false}$ .

•

$\mathsf {altern}(\varphi _{\mathit {ra}2}) = 4$ with

\begin{equation*} \left(\begin{array}{c} r_1 \\ a_1 \\ r_2 \\ a_2 \end{array} \right) = \left(\begin{array}{c} \natural \\ \natural \\ \natural \\ \natural \end{array} \right) \left(\begin{array}{c} \mathsf {true}\\ \natural \\ \natural \\ \natural \end{array} \right) \left(\begin{array}{c} \mathsf {true}\\ \mathsf {true}\\ \natural \\ \natural \end{array} \right) \left(\begin{array}{c} \mathsf {true}\\ \mathsf {true}\\ \mathsf {true}\\ \natural \end{array} \right) \left(\begin{array}{c} \mathsf {true}\\ \mathsf {true}\\ \mathsf {true}\\ \mathsf {true}\end{array} \right) , \end{equation*}

which evaluates to $\top ,\bot ,\top ,\bot ,\top$ , respectively, in fltl, when we extrapolate all $\natural$ to $\mathsf {false}$ .

5.2 The Impact of Alternation Number on Distributed Monitoring

The following result extends Property 4.1 to any distributed monitoring algorithm. It also extends the lower bound in Reference [20] to execution traces whose correctness is specified by means of linear temporal logic.

Theorem 5.2.

For every $k \ge 0$ , there is an ltl formula $\varphi$ with $\mathsf {altern}(\varphi)=2k$ that cannot be correctly monitored by $n \gt 2k$ distributed monitors using verdict set V if $|V| \le \mathsf {altern}(\varphi)$ .

6 Multi-valued Ltl for Consistent Distributed Monitoring

In this section, we introduce a novel multi-valued logic, called dltl for distributed ltl, and we relate this logic to the notion of alternation number. We establish our main result in this section. That is, we show that, for every $\ell \ge 0$ and for every ltl formula $\varphi$ with alternation number $\ell$ , there are distributed monitors using a verdict set of cardinality $2\lceil \ell /2 \rceil +4$ that correctly monitor $\varphi$ , where each monitor uses an automaton for evaluating $\varphi$ in ${\rm\small DLTL}$ , i.e., dltl with all truth values in

\begin{equation*} \mathbb {B}_{2\lceil \ell /2 \rceil +4}=\lbrace \top ,\bot ,\top _0,\bot _0,\dots ,\top _{\lceil \ell /2 \rceil },\bot _{\lceil \ell /2 \rceil } \rbrace , \end{equation*}

which can be automatically synthesized from $\varphi$ .

6.1 Semantics of DLTL

6.1.1 Definition.

dltl is directly motivated by distributed monitoring. In some sense, dltl extends rv-ltl to more than four logical values with an eye on the alternation number. However, as opposed to rv-ltl, which is motivated by refining the uncertainty regarding what could occur in the future, dltl is motivated by refining the uncertainty caused by asynchrony and failures.

For instance, let us consider a monitor M running Algorithm 2, and assume that M eventually collected a partial state s after having sampled a trace $\alpha$ with $|\alpha |=1$ , and after having exchanged information with other monitors. Let us assume that $[s \models _3 \varphi ] = \; ?$ and $[s \models _F \varphi ] = \top$ . In rv-ltl, such a monitor M would output $\top _p$ as verdict, by Line 12 of Algorithm 2. The objective of dltl is to refine such a verdict by providing a level of certainty. Indeed, it may well be the case that some other monitor $M^{\prime }$ collected a partial state $s^{\prime }\prec s$ , with $[s^{\prime } \models _3 \varphi ] = \; ?$ and $[s^{\prime } \models _F \varphi ] = \bot$ , yielding a verdict $\bot _p$ from that monitor. With rv-ltl verdicts, i.e., verdicts in $\lbrace \top ,\top _p,\bot _p,\bot \rbrace$ , the set of verdicts emitted by these two monitors M and $M^{\prime }$ would be $\lbrace \top _p,\bot _p\rbrace$ , while the $\top _p$ verdict emitted by M is somehow more relevant than the verdict $\bot _p$ emitted by $M^{\prime }$ , because M has more information about the system than $M^{\prime }$ . The objective of dltl is that M emits a verdict $\top _i$ while $M^{\prime }$ emits a verdict $\bot _j$ , with $i\gt j$ , where i and j are non-negative integers reflecting the degree of certainty of the verdicts. That is, a verdict $\top _i$ is viewed as more certain than a verdict $\bot _j$ whenever $i\gt j$ .

Choosing the right level of certainty at which a verdict must be emitted is at the core of the definition of dltl below.

Definition 6.1.

Let $\alpha = \alpha ^{\prime } s$ be a finite partial trace in $\Sigma ^{*}$ , i.e., $s\in \Sigma =\lbrace \mathsf {true}, \mathsf {false}, \natural \rbrace$ , and $\alpha ^{\prime } \in \Sigma ^*$ . The truth value in dltl of an ltl formula $\varphi$ with respect to $\alpha$ , denoted by $[\alpha \models _{D} \varphi ]$ , is defined as follows:

\begin{equation*} \left[\alpha \models _{D} \varphi \right] = {\left\lbrace \begin{array}{ll} \top & \text{if }~~~[\alpha \models _4 \varphi ] = \top \\ \bot & \text{if }~~~[\alpha \models _4 \varphi ] = \bot \\ \top _0 & \text{if }~~~[\alpha \models _4 \varphi ] = \top _{{p}} \; \wedge \; (\forall s^{\prime } \prec s : [\alpha ^{\prime }s^{\prime } \models _D \varphi ] = \top _0)\\ \bot _0 & \text{if }~~~[\alpha \models _4 \varphi ] = \bot _{{p}}\; \wedge \; (\forall s^{\prime } \prec s : [\alpha ^{\prime }s^{\prime } \models _D \varphi ] = \bot _0)\\ \top _i \; \;\; \mbox{$i\gt 0$} & \text{if }~~~[\alpha \models _4 \varphi ] = \top _{{p}} \; \wedge \; (\exists s^{\prime } \prec s : [\alpha ^{\prime }s^{\prime } \models _D \varphi ] = \bot _{i-1}) \\ & \ \wedge \; (\forall s^{\prime } \prec s, \exists j\lt i : [\alpha ^{\prime }s^{\prime } \models _D \varphi ] \in \lbrace \top _j, \bot _j\rbrace \cup \lbrace \top _i\rbrace) \\ \bot _i \; \;\; \mbox{$i\gt 0$} & \text{if }~~~[\alpha \models _4 \varphi ] = \bot _{{p}} \; \wedge \; (\exists s^{\prime } \prec s : [\alpha ^{\prime }s^{\prime } \models _D \varphi ] = \top _{i-1}) \\ & \ \wedge \; (\forall s^{\prime } \prec s, \exists j \lt i : [\alpha ^{\prime }s^{\prime } \models _D \varphi ] \in \lbrace \top _j, \bot _j\rbrace \cup \lbrace \bot _i\rbrace) \end{array}\right.} . \end{equation*}

For $\ell \ge 0$ , ${\rm\small DLTL}_\ell$ is the restriction of dltl, with all truth values in $\mathbb {B}_\ell =\lbrace \top ,\bot ,\top _0,\bot _0,\dots ,\top _\ell ,\bot _\ell \rbrace$ .

Hence, in the case discussed above of two monitors M and $M^{\prime }$ having collected the partial states s and $s^{\prime }$ , respectively, with $s^{\prime }\prec s$ , M can evaluate s in dltl instead of rv-ltl, leading it to output a verdict $\top _i$ , while evaluating $s^{\prime }$ in dltl leads $M^{\prime }$ to output a verdict $\bot _j$ , with $i\gt j$ . Indeed, the existence of $s^{\prime }$ demonstrates that there exists a partial state $s^{\prime }\prec s$ such that $[s^{\prime }\models _F \varphi ]\ne [s\models _F \varphi ]$ , so M emits a verdict with more certainty than $M^{\prime }$ . The level i is actually the length of the longest sequence $s_0\prec s_1\prec \dots \prec s_i$ where $s_i=s$ , such that, for every $j\in \lbrace 0,\dots ,i-1\rbrace$ , we have $[s_j\models _F \varphi ]\ne [s_{j+1}\models _F \varphi ]$ . Formally, we have the following:

Lemma 6.2.

Let $\alpha \ne \epsilon$ be a finite partial trace. The alternation number of an ltl formula $\varphi$ with respect to $\alpha$ satisfies

\begin{equation*} \mathsf {altern}(\varphi ,\alpha) = {\left\lbrace \begin{array}{ll} 0 & \text{if }~~~[\alpha \models _D \varphi ]\in \lbrace \top ,\bot \rbrace \\ \ell & \text{if }~~~[\alpha \models _D \varphi ] \in \lbrace \bot _\ell , \top _\ell \rbrace \; \text{for some $\ell \ge 0$} \end{array}\right.} . \end{equation*}

Proof.

Let $\varphi$ be an ltl formula, and let $\alpha \ne \epsilon$ be a finite partial trace. Also, let $\alpha =\alpha ^{\prime } s$ with $\alpha ^{\prime }\in \Sigma ^*$ and $s\in \Sigma$ . If $[\alpha \models _D \varphi ]\in \lbrace \top ,\bot ,\top _0,\bot _0\rbrace$ , then $\mathsf {altern}(\varphi ,\alpha)=0,$ because the value of $[\alpha ^{\prime }s^{\prime } \models _F \varphi ]$ is the same for all $s^{\prime } \preceq s$ , and thus there are no alternances. The rest of the proof is by induction on $\ell$ . Let $\ell \gt 0$ , assume that the lemma holds for $\ell -1$ , and let us show that it holds for $\ell$ . If $[\alpha \models _D \varphi ] = \top _\ell$ , then let $s^{\prime }\prec s$ such that $[\alpha ^{\prime }s^{\prime } \models _D \varphi ] = \bot _{\ell -1}$ . By induction, we get that $\mathsf {altern}(\varphi ,\alpha ^{\prime }s^{\prime })=\ell -1$ . Moreover, $[\alpha ^{\prime }s^{\prime } \models _F \varphi ]=\bot$ , and $[\alpha ^{\prime }s \models _F \varphi ]=\top$ , with $s^{\prime } \prec s$ . It follows that $\mathsf {altern}(\varphi ,\alpha ^{\prime }s)\ge \ell$ . Moreover, $\mathsf {altern}(\varphi ,\alpha ^{\prime }s)\le \ell$ , because for every $s^{\prime } \prec s$ , $[\alpha ^{\prime }s^{\prime } \models _D \varphi ] \in \lbrace \top _j, \bot _j\rbrace$ for some $j\lt \ell$ , which implies by induction that $\mathsf {altern}(\varphi ,\alpha ^{\prime }s^{\prime })=j\lt \ell$ . It follows that $\mathsf {altern}(\varphi ,\alpha ^{\prime }s) = \ell$ , as claimed. The proof for the case $[\alpha \models _D \varphi ] = \bot _\ell$ is analogous.□

6.1.2 Reducing the Number of Logical Values in DLTL.

Lemma 6.2 provides the intuition that, using dltl, distributed monitoring an ltl formula with alternation number $\ell \ge 0$ could be done using verdicts in $\mathbb {B}_\ell =\lbrace \top ,\bot ,\top _0,\bot _0,\dots ,\top _\ell ,\bot _\ell \rbrace$ , i.e., using $2\ell +4$ logical values. While we shall prove in the next section that this is indeed the case, one can reduce the number of logical values by a factor of 2. Indeed, let us revisit the case of request-acknowledgment. As we have seen in Section 5, $\mathsf {altern}(\varphi _{\mathit {ra}})=2$ , and, as we have seen in Section 4.2, monitoring $\varphi _{\mathit {ra}}$ using rv-ltl can be done using verdicts in $\mathbb {B}_4=\lbrace \top ,\bot ,\top _p,\bot _p\rbrace$ . Instead, Lemma 6.2 suggests that using dltl would require eight values. This is because dltl defines the relative certainty of verdicts $\bot _i$ and $\top _j$ only for $i\gt j$ or $j\lt i$ . One can halve the number of logical values in dltl by imposing an arbitrary order also between the certainties of $\bot _i$ and $\top _i$ . This yields two variants of dltl, respectively, called ${\rm\small DLTL}^+$ and ${\rm\small DLTL}^-$ , depending on whether one imposes $\top _i$ more certainty than $\bot _i$ , or $\top _i$ less certainty than $\bot _i$ , respectively. More formally, these logics are defined as follows:

Definition 6.3.

Let $\alpha = \alpha ^{\prime } s$ be a finite partial trace in $\Sigma ^{*}$ , i.e., $s\in \Sigma =\lbrace \mathsf {true}, \mathsf {false}, \natural \rbrace$ , and $\alpha ^{\prime } \in \Sigma ^*$ . The truth value in ${\rm\small DLTL}^+$ of an ltl formula $\varphi$ with respect to $\alpha$ , denoted by $[\alpha \models _{D^+} \varphi ]$ , is defined as follows:

\begin{equation*} \left[\alpha \models _{D^+} \varphi \right] = {\left\lbrace \begin{array}{ll} \top & \text{if }~~~[\alpha \models _4 \varphi ] = \top \\ \bot & \text{if }~~~[\alpha \models _4 \varphi ] = \bot \\ \top _0 & \text{if }~~~[\alpha \models _4 \varphi ] = \top _{{p}} \; \wedge \; (\forall s^{\prime } \prec s : [\alpha ^{\prime }s^{\prime } \models _{D^+} \varphi ] \in \lbrace \top _0,\bot _0\rbrace) \\ \bot _0 & \text{if }~~~[\alpha \models _4 \varphi ] = \bot _{{p}} \; \wedge \; (\forall s^{\prime } \prec s : [\alpha ^{\prime }s^{\prime } \models _{D^+} \varphi ] = \bot _0)\\ \top _i \; \;\; \mbox{$i\gt 0$} & \text{if }~~~[\alpha \models _4 \varphi ] = \top _{{p}} \; \wedge \; (\exists s^{\prime } \prec s : [\alpha ^{\prime }s^{\prime } \models _{D^+} \varphi ] \in \lbrace \top _i, \bot _i\rbrace) \\ & \ \wedge \; (\forall s^{\prime } \prec s, \exists j\le i : [\alpha ^{\prime }s^{\prime } \models _{D^+} \varphi ] \in \lbrace \top _j, \bot _j\rbrace) \\ \bot _i \; \;\; \mbox{$i\gt 0$} & \text{if }~~~[\alpha \models _4 \varphi ] = \bot _{{p}} \; \wedge \; (\exists s^{\prime } \prec s : [\alpha ^{\prime }s^{\prime } \models _{D^+} \varphi ] = \top _{i-1}) \\ & \ \wedge \; (\forall s^{\prime } \prec s, \exists j \lt i : [\alpha ^{\prime }s^{\prime } \models _{D^+} \varphi ] \in \lbrace \top _j, \bot _j\rbrace \cup \lbrace \bot _i\rbrace) \end{array}\right.} . \end{equation*}

Similarly, the truth value in ${\rm\small DLTL}^-$ of an ltl formula $\varphi$ with respect to $\alpha$ , denoted by $[\alpha \models _{D^-} \varphi ]$ , is defined as follows:

\begin{equation*} \left[\alpha \models _{D^-} \varphi \right] = {\left\lbrace \begin{array}{ll} \top & \text{if }~~~[\alpha \models _4 \varphi ] = \top \\ \bot & \text{if }~~~[\alpha \models _4 \varphi ] = \bot \\ \top _0 & \text{if }~~~[\alpha \models _4 \varphi ] = \top _{{p}} \; \wedge \; (\forall s^{\prime } \prec s : [\alpha ^{\prime }s^{\prime } \models _{D^-} \varphi ])= \top _0 \\ \bot _0 & \text{if }~~~[\alpha \models _4 \varphi ] = \bot _{{p}} \; \wedge \; (\forall s^{\prime } \prec s : [\alpha ^{\prime }s^{\prime } \models _{D^-} \varphi ] \in \lbrace \top _0,\bot _0\rbrace)\\ \top _i \; \;\; \mbox{$i\gt 0$} & \text{if }~~~[\alpha \models _4 \varphi ] = \top _{{p}} \; \wedge \; (\exists s^{\prime } \prec s : [\alpha ^{\prime }s^{\prime } \models _{D^-} \varphi ] = \bot _{i-1} \\ & \ \wedge \; (\forall s^{\prime } \prec s, \exists j \lt i : [\alpha ^{\prime }s^{\prime } \models _{D^-} \varphi ] \in \lbrace \top _j, \bot _j\rbrace \cup \lbrace \top _i\rbrace) \\ \bot _i \; \;\; \mbox{$i\gt 0$} & \text{if }~~~[\alpha \models _4 \varphi ] = \bot _{\mathsf {p}} \; \wedge \; (\exists s^{\prime } \prec s : [\alpha ^{\prime }s^{\prime } \models _{D^-} \varphi ] \in \lbrace \top _i, \bot _i\rbrace)) \\ & \ \wedge \; (\forall s^{\prime } \prec s, \exists j \le i : [\alpha ^{\prime }s^{\prime } \models _{D^-} \varphi ] \in \lbrace \top _j, \bot _j\rbrace) \end{array}\right.} . \end{equation*}

It follows from these definitions that ${\rm\small DLTL}^+$ induces the following order between the logical values:

\begin{equation*} \bot _0 \lt \top _0 \lt \bot _1 \lt \top _1 \lt \dots \lt \top _{i-1} \lt \bot _i \lt \top _i \lt \bot _{i+1} \lt \dots \end{equation*}

while ${\rm\small DLTL}^-$ induces

\begin{equation*} \top _0 \lt \bot _0 \lt \top _1 \lt \bot _1 \lt \dots \lt \bot _{i-1} \lt \top _i \lt \bot _i \lt \top _{i+1} \lt \dots \end{equation*}

The following lemma illustrates the gain in terms of the number of logical values with respect to the alternation number, in comparison with Lemma 6.2. Recall that $s^\natural$ denotes the partial state in which none of the atomic propositions is known.

Lemma 6.4.

Let $\alpha =\alpha ^{\prime }s$ , with $\alpha ^{\prime }\in \Sigma ^*$ and $s\in \Sigma$ , be a finite partial trace. The alternation number of an ltl formula $\varphi$ with respect to $\alpha$ satisfies the following two equalities:

\begin{equation*} \mathsf {altern}(\varphi ,\alpha) = {\left\lbrace \begin{array}{ll} 0 & \text{if }~~~~~[\alpha \models _{D^+} \varphi ]\in \lbrace \top ,\bot \rbrace \\ 2\ell +1 & \text{if }~~~~([\alpha \models _{D^+}\varphi ] = \top _\ell) \; \wedge \; ([\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \bot) \\ 2\ell & \text{if }~~~\big (([\alpha \models _{D^+}\varphi ] = \top _\ell) \; \wedge \; ([\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \top) \big)\\ & \ \vee \; \big (([\alpha \models _{D^+}\varphi ] = \bot _\ell) \; \wedge \; ([\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \bot) \big) \\ 2\ell -1 & \text{if }~~~~([\alpha \models _{D^+}\varphi ] = \bot _\ell) \; \wedge \; ([\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \top) \end{array}\right.} , \end{equation*}

\begin{equation*} \mathsf {altern}(\varphi ,\alpha) = {\left\lbrace \begin{array}{ll} 0 & \text{if }~~~~~[\alpha \models _{D^-} \varphi ]\in \lbrace \top ,\bot \rbrace \\ 2\ell +1 & \text{if }~~~~([\alpha \models _{D^-}\varphi ] = \bot _\ell) \; \wedge \; ([\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \top) \\ 2\ell & \text{if }~~~\big (([\alpha \models _{D^-}\varphi ] = \bot _\ell) \; \wedge \; ([\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \bot) \big)\\ & \ \vee \; \big (([\alpha \models _{D^-}\varphi ] = \top _\ell) \; \wedge \; ([\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \top) \big) \\ 2\ell -1 & \text{if }~~~~([\alpha \models _{D^-}\varphi ] = \top _\ell) \; \wedge \; ([\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \bot) \end{array}\right.} . \end{equation*}

Proof.

Let $\varphi$ be an ltl formula, and let $\alpha =\alpha ^{\prime }s$ be a finite partial trace. We first consider the statement for ${\rm\small DLTL}^+$ . If $[\alpha \models _{D^+} \varphi ]\in \lbrace \top ,\bot \rbrace$ , then $\mathsf {altern}(\varphi ,\alpha)=0,$ because the value of $[\alpha ^{\prime }s^{\prime } \models _F \varphi ]$ is the same for all $s^{\prime } \preceq s$ , and thus there are no alternances. From this point on, we assume that $[\alpha \models _{D^+} \varphi ]\notin \lbrace \top ,\bot \rbrace$ . The rest of the proof is by induction on $\ell$ , where the reasoning below applies both to the base case $\ell =0$ , and to the inductive case for $\ell \ge 1$ . Let $\ell \ge 0$ .

If $[\alpha \models _{D^+} \varphi ] = \top _\ell$ , then let $s^{\prime }\prec s$ such that $[\alpha ^{\prime }s^{\prime } \models _{D^+} \varphi ] \in \lbrace \top _\ell , \bot _\ell \rbrace$ , and $s^{\prime }$ is minimal for this property, i.e., for every $s^{\prime \prime }\prec s^{\prime }$ , we have $[\alpha ^{\prime }s^{\prime \prime }\models _{D^+} \varphi ] \notin \lbrace \top _\ell , \bot _\ell \rbrace$ . Minimality implies that $[\alpha ^{\prime }s^{\prime } \models _D \varphi ] = \bot _\ell$ . Thus, let $s^{\prime \prime }\prec s^{\prime }$ such that $[\alpha ^{\prime }s^{\prime \prime } \models _D \varphi ] = \top _{\ell -1}$ . By induction, we get that $\mathsf {altern}(\varphi ,\alpha ^{\prime }s^{\prime \prime })=2\ell -1$ or $2\ell -2$ , depending on whether $[\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \bot$ or $\top$ , respectively. Moreover, $[\alpha ^{\prime }s^{\prime \prime } \models _F \varphi ]=\top$ , $[\alpha ^{\prime }s^{\prime } \models _F \varphi ]=\bot$ , and $[\alpha ^{\prime }s \models _F \varphi ]=\top$ , with $s^{\prime \prime }\prec s^{\prime } \prec s$ . It follows that $\mathsf {altern}(\varphi ,\alpha ^{\prime }s)\ge 2\ell +1$ if $[\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \bot$ , and $\mathsf {altern}(\varphi ,\alpha ^{\prime }s)\ge 2\ell$ if $[\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \top$ . Moreover, $\mathsf {altern}(\varphi ,\alpha ^{\prime }s)$ cannot be strictly greater than these respective bounds because, for every $s^{\prime }\prec s$ , there exists $j\le \ell$ such that $[\alpha ^{\prime }s^{\prime } \models _{D^+} \varphi ] \in \lbrace \top _j, \bot _j\rbrace$ , which implies that $\varphi$ cannot alternate more than $2\ell +1$ (respectively, $2\ell$ ) times with respect to $\alpha$ when $[\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \bot$ (respectively, $[\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \top$ ).

If $[\alpha \models _D \varphi ] = \bot _\ell$ , then let $s^{\prime }\prec s$ such that $[\alpha ^{\prime }s^{\prime } \models _{D^+} \varphi ] = \top _{\ell -1}$ . By induction, we get that $\mathsf {altern}(\varphi ,\alpha ^{\prime }s^{\prime })=2\ell -2$ or $2\ell -3$ , depending on whether $[\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \top$ or $\bot$ , respectively. Moreover, $[\alpha ^{\prime }s^{\prime } \models _F \varphi ]=\top$ , and $[\alpha ^{\prime }s \models _F \varphi ]=\bot$ , with $s^{\prime } \prec s$ . It follows that $\mathsf {altern}(\varphi ,\alpha ^{\prime }s)\ge 2\ell$ if $[\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \bot$ , and $\mathsf {altern}(\varphi ,\alpha ^{\prime }s)\ge 2\ell -1$ if $[\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \top$ . Moreover, since, for every $s^{\prime }\prec s$ , there exists $j \lt \ell$ such that $[\alpha ^{\prime }s^{\prime } \models _{D^+} \varphi ] \in \lbrace \top _j, \bot _j\rbrace$ , it follows that $\varphi$ cannot alternate more than $2\ell$ (respectively, $2\ell -1$ ) times with respect to $\alpha$ when $[\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \bot$ (respectively, $[\alpha ^{\prime }s^\natural \models _{F}\varphi ] = \top$ ).

This completes the proof for ${\rm\small DLTL}^+$ . The proof for ${\rm\small DLTL}^-$ is analogous and thus omitted.□

As shown in Section 5.1, we have $\mathsf {altern}(\varphi _{\mathit {ra}})= 2$ with the sequence

\begin{equation*} {r \choose a} = {\natural \choose \natural }, {\natural \choose \mathsf {true}}, {\mathsf {true}\choose \mathsf {true}} , \end{equation*}

which evaluate to $\top ,\bot ,\top$ , respectively, in fltl (assuming every atomic proposition $\natural$ is extrapolated to $\mathsf {false}$ ). Also, we have seen in Section 4.2 that $\varphi _{\mathit {ra}}$ can be distributedly monitored using rv-ltl. For this, we used an interpretation function $\mu$ that returns $\bot _p$ when applied to the set $\lbrace \top _p,\bot _p\rbrace$ . This can be put in correspondence with using ${\rm\small DLTL}_0^-$ , with an interpretation function $\mu$ that simply returns the logical value with highest certainty in ${\rm\small DLTL}^-$ , i.e., $\bot _0$ for the set $\lbrace \top _0,\bot _0\rbrace$ . We use such type of interpretation functions in our main theorem, stated in the next section.

6.2 Monitorability and Monitor Synthesis for DLTL

We have now all the ingredients to present our main result.

Theorem 6.5.

For every $\ell \ge 0$ , and for every ltl formula $\varphi$ with $\mathsf {altern}(\varphi)=\ell$ , there are distributed monitors using verdict set $\mathbb {B}_{2\lceil \ell /2 \rceil +4}=\lbrace \bot ,\top ,\bot _0,\top _0,\dots ,\bot _{\lceil \ell /2 \rceil },\top _{\lceil \ell /2 \rceil }\rbrace$ that correctly monitor $\varphi$ . Each monitor uses an automaton for evaluating $\varphi$ in ${\rm\small DLTL}_{\lceil \ell /2 \rceil }^+$ , which can be automatically synthesized from $\varphi$ .

Remarks. It is worth pointing out that the number of logical values used by the monitors in Theorem 6.5 can be further reduced, but by an additive factor only, under some specific conditions, including the following scenarios. We also note that one significance of Theorem 6.5 is that safety formulas can be efficiently monitored with only four truth values. In general, formulas with only one temporal operator need this many truth values to be consistently monitored. We should also mention that the size of a dltl monitor is the size of its corresponding rv-ltlmonitor times $\ell$ (one rv-ltl monitor per alternation).

•

Let us consider an ltl formula $\varphi$ , with $\mathsf {altern}(\varphi)=\ell$ odd. Let us also assume that, for every finite trace $\alpha$ such that there exists a sequence $s_0\prec s_1 \prec \dots \prec s_\ell$ of partial states satisfying $[\alpha s_i\models _F \varphi ]\ne [\alpha s_{i+1}\models _F \varphi ]$ for every $i\in \lbrace 0,\dots ,\ell -1\rbrace$ , we have $[\alpha s^\natural \models _F \varphi ]=\bot$ . Then the number of truth values used by ${\rm\small DLTL}^+$ is not $2\lceil \ell /2\rceil +4$ but only $2\lfloor \ell /2\rfloor +4$ . Similarly, if $[\alpha s^\natural \models _F \varphi ]=\top$ for every finite trace $\alpha$ such that there exists a sequence $s_0\prec s_1 \prec \dots \prec s_\ell$ of partial states satisfying $[\alpha s_i\models _F \varphi ]\ne [\alpha s_{i+1}\models _F \varphi ]$ for every $i\in \lbrace 0,\dots ,\ell -1\rbrace$ , then, using ${\rm\small DLTL}^-$ instead of ${\rm\small DLTL}^+$ yields using only $2\lfloor \ell /2\rfloor +4$ truth values, instead of $2\lceil \ell /2\rceil +4$ .

•

Let us consider an ltl formula $\varphi$ , with $\mathsf {altern}(\varphi)=\ell$ even. Let us also assume that, for every finite trace $\alpha$ such that there exists a sequence $s_0\prec s_1 \prec \dots \prec s_\ell$ of partial states satisfying $[\alpha s_i\models _F \varphi ]\ne [\alpha s_{i+1}\models _F \varphi ]$ for every $i\in \lbrace 0,\dots ,\ell -1\rbrace$ , we have

\begin{equation*} [\alpha s^\natural \models _F \varphi ]=\bot ,\; \mbox{and} \; \; [\alpha s_\ell \models _3 \varphi ]=\top \end{equation*}

for all such sequences (note that the evaluation of $\alpha s_\ell$ is performed in ${\rm\small LTL}_3$ ). An example of such a situation is $\varphi _{\mathit {ra}}$ . Its alternation number is 2, and every sequence $s_0\prec s_1 \prec s_2$ alternating twice satisfies $[\alpha s_0 \models _F \varphi _{\mathit {ra}}]=\bot$ with $s_0= ({{\natural } \atop {\natural }})$ , and $[\alpha s_2 \models _3 \varphi _{\mathit {ra}}]=\top$ with $s_2=({{true} \atop {true}})$ . In such a scenario, the truth values of highest certainty, $\top _\ell$ and $\bot _\ell$ , can be discarded whenever using ${\rm\small DLTL}^-$ instead of ${\rm\small DLTL}^+$ , saving two truth values. That is, one can restrict the truth values to be in $\mathbb {B}_{\ell /2+2}=\lbrace \top ,\bot ,\top _0,\bot _0,\dots ,\top _{\ell /2-1},\bot _{ \ell /2-1}\rbrace$ . In the particular case of $\varphi _{\mathit {ra}}$ , one can therefore restrict the truth values to be in $\mathbb {B}_{4}=\lbrace \top ,\bot ,\top _0,\bot _0\rbrace$ , as it was previously established in Section 4.2.

7 Conclusion

We have established a tight (up to a small additive constant) bound on the cardinality of the set of verdicts from which a collection of asynchronous crash-prone monitors pick their individual verdicts for monitoring an ltl formula $\varphi$ in a distributed manner. This cardinality is related to the alternation number, $\mathsf {altern}(\varphi)$ , of the formula. We showed that, for every $\ell \ge 0$ , every ltl formula $\varphi$ with $\mathsf {altern}(\varphi)=\ell$ can be monitored by distributed monitors with verdicts in $\mathbb {B}_{2\lceil \ell /2 \rceil }=\lbrace \bot ,\top ,\bot _0,\top _0,\dots ,\top _{2\lceil \ell /2 \rceil }, \bot _{2\lceil \ell /2 \rceil }\rbrace$ , and each verdict results from evaluating the observed partial trace in the multi-valued logic ${\rm\small DLTL}^+$ . The bound on the size of the verdict set is (almost) tight, in the sense that, for every $\ell \ge 0$ , there exists an ltl formula $\varphi$ with $\mathsf {altern}(\varphi)=\ell$ such that, for every set V with $|V|\le \ell$ , $\varphi$ cannot be monitored by distributed monitors with verdicts in V.

For establishing these results, we impose two restrictions. First, we assume that all operations performed by the distributed monitors (sampling the current state, exchanging information with the other monitors, and producing the verdict) can be performed between two changes of states by the monitored system. Second, we specify distributed monitoring by imposing global consistency of the set $m_k$ of verdicts with respect to the centralized evaluation of the actual trace $s_0s_1\cdots s_k$ in rv-ltl, by requiring equality $\mu (m_k)=[s_0s_1\dots s_k \models _4 \varphi ]$ between the interpretation $\mu (m_k)$ and the evaluation of $s_0s_1\cdots s_k$ in rv-ltl, only for verdicts produced in the absence of crashes during the monitoring of $s_k$ . This latter restriction appears natural, and perhaps even unavoidable, because, otherwise, the distributed monitors and the centralized monitor deal with different traces, which are inherently incomparable. However, it might be desirable to relax the former restriction, because such an assumption might not always be satisfied in practice, in particular by rapidly evolving systems. Getting rid of this assumption seems, however, challenging, as one would have to deal not only with issues caused by asynchrony between monitors with different partial views of a same state, but also with issues caused by asynchrony between monitors with partial views of different states. Reconciliation of such views looks difficult. Nevertheless, this opens a challenging, but rewarding direction for future work.

Another challenging problem in the context of fault-tolerant distributed monitoring is to consider other types of faults, namely, Byzantine faults. These faults may arbitrarily change the output of individual monitors, i.e., their verdicts. It is unclear how a collection of faulty monitors that may misrepresent their partial view of the system can be transformed into a sound single verdict that a correct centralized monitor would produce. This problem can also open an entirely new line of research to deal with distributed monitoring in the presence of faults and security attacks.

Footnote

We refer the reader to Reference [16], where the author formalized 54 commonly used requirements as ltl formulas. We also note that the area of runtime verification mainly focuses on specification languages that are trace-based. This is due to the fact that at runtime, monitors can realistically observe only a finite execution trace. The semantics of temporal logics such as CTL is based on computation trees and is not suitable for runtime monitoring.

References

[1]

Yehuda Afek, Hagit Attiya, Danny Dolev, Eli Gafni, Michael Merritt, and Nir Shavit. 1993. Atomic snapshots of shared memory. J. ACM 40, 4 (1993), 873–890.

Digital Library

Google Scholar

[2]

James Aspnes, Hagit Attiya, Keren Censor-Hillel, and Faith Ellen. 2015. Limited-use atomic snapshots with polylogarithmic step complexity. J. ACM 62, 1 (2015), 3:1–3:22. DOI:

Digital Library

Google Scholar

[3]

Hagit Attiya, Amotz Bar-Noy, and Danny Dolev. 1995. Sharing memory robustly in message-passing systems. J. ACM 42, 1 (1995), 124–142. DOI:

Digital Library

Google Scholar

[4]

Hagit Attiya, Sweta Kumari, Archit Somani, and Jennifer L. Welch. 2022. Store-collect in the presence of continuous churn with application to snapshots and lattice agreement. Inf. Computat. (2022), 104869. DOI:

Digital Library

Google Scholar

[5]

Hagit Attiya and Ophir Rachman. 1998. Atomic snapshots in $O(n \log n)$ operations. SIAM J. Comput. 27, 2 (1998), 319–340. DOI:

Digital Library

Google Scholar

[6]

Hagit Attiya and Jennifer Welch. 2004. Distributed Computing: Fundamentals, Simulations, and Advanced Topics. Wiley.

Digital Library

Google Scholar

[7]

A. Bauer, M. Leucker, and C. Schallhart. 2010. Comparing LTL semantics for runtime verification. J. Logic Computat. 20, 3 (2010), 651–674.

Digital Library

Google Scholar

[8]

A. Bauer, M. Leucker, and C. Schallhart. 2011. Runtime verification for LTL and TLTL. ACM Trans. Softw. Eng. Methodol. 20, 4 (2011), 14.

Digital Library

Google Scholar

[9]

A. K. Bauer and Y. Falcone. 2012. Decentralised LTL monitoring. In Proceedings of the 18th International Symposium on Formal Methods (FM). 85–100.

Crossref

Google Scholar

[10]

S. Berkovich, B. Bonakdarpour, and S. Fischmeister. 2013. GPU-based runtime verification. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS). 1025–1036.

Digital Library

Google Scholar

[11]

Glenn Bruns and Patrice Godefroid. 1999. Model checking partial state spaces with 3-valued temporal logics. In Proceedings of the 11th International Conference on Computer Aided Verification (CAV). 274–287. DOI:

Crossref

Google Scholar

[12]

Glenn Bruns and Patrice Godefroid. 2000. Generalized model checking: Reasoning about partial state spaces. In Proceedings of the 11th International Conference on Concurrency Theory (CONCUR). 168–182. DOI:

Crossref

Google Scholar

[13]

H. Chauhan, V. K. Garg, A. Natarajan, and N. Mittal. 2013. A distributed abstraction algorithm for online predicate detection. In Proceedings of the 32nd IEEE Symposium on Reliable Distributed Systems (SRDS). 101–110.

Digital Library

Google Scholar

[14]

C. Colombo and Y. Falcone. 2014. Organising LTL monitors over distributed systems with a global clock. In Proceedings of the 14th International Conference on Runtime Verification (RV). 140–155.

Crossref

Google Scholar

[15]

Carole Delporte-Gallet, Hugues Fauconnier, Sergio Rajsbaum, and Michel Raynal. 2018. Implementing snapshot objects on top of crash-prone asynchronous message-passing systems. IEEE Trans. Parallel Distrib. Syst. 29, 9 (2018), 2033–2045. DOI:

Digital Library

Google Scholar

[16]

M. B. Dwyer, G. S. Avrunin, and J. C. Corbett. 1999. Patterns in property specifications for finite-state verification. In Proceedings of the International Conference on Software Engineering (ICSE). 411 –420.

Digital Library

Google Scholar

[17]

M. J. Fischer, N. A. Lynch, and M. S. Peterson. 1985. Impossibility of distributed consensus with one faulty processor. J. ACM 32, 2 (1985), 373–382.

Digital Library

Google Scholar

[18]

Pierre Fraigniaud, Sergio Rajsbaum, Matthieu Roy, and Corentin Travers. 2014. The opinion number of set-agreement. In Proceedings of the 18th International Conference on Principles of Distributed Systems (OPODIS). 155–170.

Crossref

Google Scholar

[19]

Pierre Fraigniaud, Sergio Rajsbaum, and Corentin Travers. 2013. Locality and checkability in wait-free computing. Distrib. Comput. 26, 4 (2013), 223–242. DOI:

Digital Library

Google Scholar

[20]

P. Fraigniaud, S. Rajsbaum, and C. Travers. 2014. On the number of opinions needed for fault-tolerant run-time monitoring in distributed systems. In Proceedings of the 5th International Conference on Runtime Verification (RV). 92–107.

Crossref

Google Scholar

[21]

R. Ganguly, A. Momtaz, and B. Bonakdarpour. 2020. Distributed runtime verification under partial asynchrony. In Proceedings of the 24th International Conference on Principles of Distributed Systems (OPODIS). 20:1–20:17.

Google Scholar

[22]

M. H. Herlihy, D. Kozlov, and S. Rajsbaum. 2013. Distributed Computing Through Combinatorial Topology. Morgan Kaufmann-Elsevier.

Digital Library

Google Scholar

[23]

Michiko Inoue and Wei Chen. 1994. Linear-time snapshot using multi-writer multi-reader registers. In Proceedings of the 8th International Workshop on Distributed Algorithms (WDAG). 130–140. DOI:

Crossref

Google Scholar

[24]

Z. Manna and A. Pnueli. 1995. Temporal Verification of Reactive Systems - Safety. Springer.

Crossref

Google Scholar

[25]

N. Mittal and V. K. Garg. 2005. Techniques and applications of computation slicing. Distrib. Comput. 17, 3 (2005), 251–277.

Digital Library

Google Scholar

[26]

M. Mostafa and B. Bonakdarpour. 2015. Decentralized runtime verification of LTL specifications in distributed systems. In Proceedings of the 29th International Parallel and Distributed Processing Symposium (IPDPS). 494–503.

Digital Library

Google Scholar

[27]

V. A. Ogale and V. K. Garg. 2007. Detecting temporal logic predicates on distributed computations. In Proceedings of the 21st International Symposium on Distributed Computing (DISC). 420–434.

Crossref

Google Scholar

[28]

A. Pnueli. 1977. The temporal logic of programs. In Proceedings of the Symposium on Foundations of Computer Science (FOCS). 46–57.

Digital Library

Google Scholar

[29]

A. Pnueli and A. Zaks. 2006. PSL model checking and run-time verification via testers. In Proceedings of the 14th International Symposium on Formal Methods (FM). 573–586.

Digital Library

Google Scholar

[30]

A. Sen and V. K. Garg. 2004. Detecting temporal logic predicates in distributed programs using computation slicing. In Principles of Distributed Systems. Springer, 171–183.

Crossref

Google Scholar

[31]

K. Sen, A. Vardhan, G. Agha, and G. Rosu. 2004. Efficient decentralized monitoring of safety in distributed systems. In Proceedings of the 26th International Conference on Software Engineering (ICSE). 418–427.

Digital Library

Google Scholar

Cited By

View all

Castañeda ARajsbaum S(2024)Recent Advances on Principles of Concurrent Data StructuresCommunications of the ACM10.1145/365329067:8(45-46)Online publication date: 11-Jul-2024
https://dl.acm.org/doi/10.1145/3653290
Ganguly RKazemloo SBonakdarpour B(2024)Crash-Resilient Decentralized Synchronous Runtime VerificationIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.326556621:3(1017-1031)Online publication date: May-2024
https://doi.org/10.1109/TDSC.2023.3265566
Danielsson LSánchez C(2023)Decentralized Stream Runtime Verification for Timed Asynchronous NetworksIEEE Access10.1109/ACCESS.2023.329832911(84091-84112)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3298329

Index Terms

Decentralized Asynchronous Crash-resilient Runtime Verification
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
2. Theory of computation
  1. Logic
    1. Verification by model checking
  2. Models of computation
    1. Concurrency
      1. Distributed computing models

Recommendations

Regular model checking for LTL(MSO)

Regular model checking is a form of symbolic model checking for parameterized and infinite-state systems whose states can be represented as words of arbitrary length over a finite alphabet, in which regular sets of words are used to represent sets of ...
Crash-Resilient Decentralized Synchronous Runtime Verification
<italic>Runtime verification</italic> is a technique, where a <italic>monitor</italic> process extracts information from a running system in order to evaluate whether system executions violate or satisfy a given correctness specification. In this article, ...
Runtime verification of cryptographic protocols

There has been a significant amount of work devoted to the static verification of security protocol designs. Virtually all of these results, when applied to an actual implementation of a security protocol, rely on certain implicit assumptions on the ...

Comments

Information & Contributors

Information

Published In

Journal of the ACM Volume 69, Issue 5

October 2022

420 pages

ISSN:0004-5411

EISSN:1557-735X

DOI:10.1145/3563903

Editor:
Venkatesan Guruswami
University of California, Berkeley, United States

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2022

Online AM: 10 August 2022

Accepted: 01 July 2022

Revised: 14 February 2022

Received: 01 September 2020

Published in JACM Volume 69, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

NSF
ANR
UNAM-PAPIIT

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
806
Total Downloads

Downloads (Last 12 months)368
Downloads (Last 6 weeks)59

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Castañeda ARajsbaum S(2024)Recent Advances on Principles of Concurrent Data StructuresCommunications of the ACM10.1145/365329067:8(45-46)Online publication date: 11-Jul-2024
https://dl.acm.org/doi/10.1145/3653290
Ganguly RKazemloo SBonakdarpour B(2024)Crash-Resilient Decentralized Synchronous Runtime VerificationIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.326556621:3(1017-1031)Online publication date: May-2024
https://doi.org/10.1109/TDSC.2023.3265566
Danielsson LSánchez C(2023)Decentralized Stream Runtime Verification for Timed Asynchronous NetworksIEEE Access10.1109/ACCESS.2023.329832911(84091-84112)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3298329

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

1 Introduction

1.1 Context

1.2 Our Results

1.3 Related Work

1.4 Organization

2 Background

2.1 Linear Temporal Logic (LTL)

2.2 Logics for Runtime Verification

2.2.1 Finite LTL.

2.2.2 Three-valued Semantics for LTL.

2.2.3 Four-valued Semantics for LTL (RV-LTL).

3 Distributed Fault-tolerant Monitoring

3.1 General Objective

3.2 LTL on Partial Traces

3.3 A Generic Algorithm for Distributed Monitoring

3.3.1 Wait-free Computing.

3.3.2 Wait-free Snapshots.

3.3.3 A Generic RV Algorithm.

3.4 Statement of the Problem

4 Distributed Monitoring Using Rv-ltl

4.1 Distributed Monitoring with Verdicts in RV-LTL

4.2 A Positive Example for Distributed Monitoring Using RV-LTL

4.3 A Counterexample to Distributed Monitoring Using RV-LTL

4.3.1 Negative Example of Monitoring φ ra2.

4.3.2 Negative Result on Monitoring a Single State for φra2.

5 Distributed Monitoring Requires Large Verdict Sets

5.1 Alternation Number

5.2 The Impact of Alternation Number on Distributed Monitoring

6 Multi-valued Ltl for Consistent Distributed Monitoring

6.1 Semantics of DLTL

6.1.1 Definition.

6.1.2 Reducing the Number of Logical Values in DLTL.

6.2 Monitorability and Monitor Synthesis for DLTL

7 Conclusion

Footnote

References

Cited By

Index Terms

Recommendations

Regular model checking for LTL(MSO)

Crash-Resilient Decentralized Synchronous Runtime Verification

Runtime verification of cryptographic protocols

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations

4.3.1 Negative Example of Monitoring φ _ra2.

4.3.2 Negative Result on Monitoring a Single State for φ_ra2.