1 Introduction

The emergence of 5G technology has inspired a massive wave of the research and development in science and technology in the era of IoT where the communication between computing devices has become significantly faster with lower latency and power consumption. The power of this modern communication technology influences and benefits all aspects of Cyber-Physical Systems (CPSs) such as smart grids, smart homes, intelligent transportation and smart cities. In particular, the study of autonomous vehicles has become an increasingly popular research field in both academic and industrial transportation applications. Automotive crashes pose significant financial and life-threatening risks, and there is an urgent need for advanced and scalable methods that can efficiently verify a distributed system of autonomous vehicles.

Over the last two decades, although many methods have been developed to conduct reachability analysis and safety verification of CPS, such as the approaches proposed in [1, 4, 10, 11, 13, 15, 18], applying these techniques to real-time distributed CPS remains a big challenge. This is due to the fact that, (1) all existing techniques have intensive computation costs and are usually too slow to be used in a real-time manner and, (2) these techniques target the safety verification of a single CPS, and therefore they naturally cannot be applied efficiently to a distributed CPS where clock mismatches and communication between agents (i.e., individual systems) are essential concerns. Since the future autonomous vehicles systems will work distributively involving effective communication between each agent, there is an urgent need for an approach that can provide formal guarantees of the safety of distributed CPS in real-time. More importantly, the safety information should be defined based on the agents local clocks to allow these agents to perform “intelligent actions” to escape from the upcoming dangerous circumstances. For example, if an agent A knows based on its local clock that it will collide with an agent B in the next 5 s, it should perform an action such as stopping or quickly finding a safe path to avoid the collision.

In this paper, we propose a decentralized real-time safety verification approach for a distributed CPS with multiple agents. We are particularly interested in two types of safety properties. The first one is a local safety property which specifies the local constraints of the agent operation. For example, each agent is only allowed to move within a specific region, does not hit any obstacles, and its velocity needs to be limited to specific range. This type of property does not require the information of other agents and can be verified locally at run-time. The second safety property is a global safety property in which we want to check if there are any potential collision occurring between the agents.

Our decentralized real-time safety verification approach works as follows. Each agent locally and periodically computes the local reachable set from the current local time to the next T seconds, and then encodes and broadcasts its reachable set information to the others via a communication network. When the agent receives a reachable set message, it immediately decodes the message to read the reachable set information of the sender, and then performs peer-to-peer collision checking based on its current state and the reachable set of the sender. Additionally, the local safety property of the agent is verified simultaneously with the reachable set computation process at run-time. The proposed verification approach is based on an underlying assumption that is, all agents are time-synchronized to some level of accuracy. This assumption is reasonable as it can be achieved by using existing time synchronization protocols such as the Network Time Protocol (NTP). Our approach has successfully verified in real-time the local safety properties and collision occurrences for a group of quadcopters conducting a search mission.

2 Problem Formulation

In this paper, we consider a distributed CPS with N agents that can communicate with each other via an asynchronous communication channel.

Communication Model. The communication between agents is implemented by the actions of sending and receiving messages over an asynchronous communication channel. We formally model this communication model as a single automaton, , which stores the set of in-flight messages that have been sent, but are yet to be delivered. When an agent sends a message m, it invokes a send(m) action. This action adds m to the in-flight set. At any arbitrary time, the chooses a message in the in-flight set to either delivers it to its recipient or removes it from the set. All messages are assumed to be unique and each message contains its sender and recipient identities. Let M be the set of all possible messages used in communication between agents. The sending and receiving messages by agent i are denoted by \(M_{i,*}\) and \(M_{*,i}\), respectively.

Agent Model. The \(i^{th}\) agent is modeled as a hybrid automaton [12, 22] defined by the tuple \(\langle \mathcal{A}_i = V_i, A_i, \mathcal{D}_i, \mathcal{T}_i \rangle \), where:

  1. (a)

    \(V_i\) is a set of variables consisting of the following: (i) a set of continuous variables \(X_i\) including a special variable \( clk _i\) which records the agent’s local time, and (ii) a set of discrete variables \(Y_i\) including the special variable \({ msghist }_i\) that records all sent and received messages. A valuation \(\mathbf{v}_i\) is a function that associates each \(v_i \in V_i\) to a value in its type. We write \(val{(V_i)}\) for the set of all possible valuations of \(V_i\). We abuse the notion of \(\mathbf{v}_i\) to denote a state of \(\mathcal{A}_i\), which is a valuation of all variables in \(V_i\). The set \(Q_i \mathrel {{\mathop {=}\limits ^{\scriptscriptstyle \Delta }}}val(V_i)\) is called the set of states.

  2. b)

    \(A_i\) is a set of actions consisting of the following subsets: (i) a set \(\{ send _i(m) \ | \ m \in M_{i,*} \}\) of send actions (i.e., output actions), (ii) a set \(\{ receive _i(m) \ | \ m \in M_{*,i} \}\) of receive actions (i.e., input actions), and (iii) a set \(H_i\) of other, ordinary actions.

  3. (c)

    \(\mathcal{D}_i \subseteq val(V_i) \times A_i \times val(V_i)\) is called the set of transitions. For a transition \((\mathbf{v}_i, a_i, \mathbf{v}_i') \in \mathcal{D}_i\), we write \(\mathbf{v}_i \mathrel {{\mathop {\rightarrow }\limits ^{a_i}}} \mathbf{v}_i'\) in short. (i) If \(a_i = send _i(m)\) or \(receive _i(m)\), then all the components of \(\mathbf{v}_i\) and \(\mathbf{v}_i'\) are identical except that m is added to \({ msghist }\) in \(\mathbf{v}_i'\). That is, the agent’s other states remain the same on message sends and receives. Furthermore, for every state \(\mathbf{v}_i\) and every receive action \(a_i\), there must exist a \(\mathbf{v}_i'\) such that \(\mathbf{v}_i \mathrel {{\mathop {\rightarrow }\limits ^{a_i}}} \mathbf{v}_i'\), i.e., the automaton must have well-defined behavior for receiving any message in any state. (ii) If \(a_i \in H_i\), then \(\mathbf{v}_i.{ msghist }= \mathbf{v}_i'.{ msghist }\).

  4. (d)

    \(\mathcal{T}_i\) is a collection of trajectories for \(X_i\). Each trajectory of \(X_i\) is a function mapping an interval of time \([0,t], t \ge 0\) to \(val{(V_i)}\), following a flow rate that specifies how a real variable \(x_i \in X_i\) evolving over time. We denote the duration of a trajectory as \(\tau _{dur}\), which is the right end-point of the interval t.

Agent Semantics. The behavior of each agent can be defined based on the concept of an execution which is a particular run of the agent. Given an initial state \(\mathbf{v}^0_i\), an execution \(\alpha _i\) of an agent \(A_i\) is a sequence of states starting from \(\mathbf{v}^0_i\), defined as \(\alpha _i = \mathbf{v}^0_i, \mathbf{v}^1_i, \ldots \), and for each index j in the sequence, the state update from \(\mathbf{v}^j_i\) to \(\mathbf{v}^{j+1}_i\) is either a transition or trajectory. A state \(\mathbf{v}^j_i\) is reachable if there exists an executing that ends in \(\mathbf{v}^j_i\). We denote \(\mathsf{Reach}(A_i)\) as the reachable set of agent \(A_i\).

System Model. The formal model of the complete system, denoted as , is a network of hybrid automata that is obtained by parallel composing the agent’s models and the communication channel. Formally, we can write, . Informally, the agent \(\mathcal{A}_i\) and the communication channel are synchronized through sending and receiving actions. When the agent \(A_i\) sends a message \(m \in M_{i,j}\) to the agent \(A_j\), it triggers the \(send _i(m)\) action. At the same time, this action is synchronized in the automaton by putting the message m in the in-flight set. After that, the will trigger (non-deterministically) the \(receive _j(m)\) action. This action is synchronized in the agent \(A_j\) by putting the message m into the \({ msghist }_j\).

In this paper, we investigate two real-time safety verification problems for distributed cyber-physical systems as defined in the following.

Problem 1

(Local safety verification in real-time). The real-time local safety verification problem is to compute online the reachable set \(\mathsf{Reach}(A_i)\) of the agent and verify if it violates the local safety property, i.e., checking \(\mathsf{Reach}(A_i) \cap \mathcal {U}_i = \emptyset ?\), where \(\mathcal {U}_i \triangleq C_ix_i \le d_i, x_i \in X_i\) is the unsafe set of the agent.

Problem 2

(Decentralized real-time collision verification). The decentralized real-time collision verification problem is to reason in real-time whether an agent \(A_i\) will collide with other agents from its current local time \(t_c^i\) to the computable, safe time instance in the future \(T_{safe}\) based on (i) the clock mismatches, and (ii) the exchanging reachable set messages between agents. Formally, we require that \(\forall ~ t_c^i \le t \le T_{safe}, d_{ij}(t) \ge l\), where \(d_{ij}(t)\) is the distance between agents \(A_i\) and \(A_j\) at the time t of the agent \(A_i\) local clock, and l is the allowable safe distance between agents.

3 Real-Time Local Safety Verification

The first important step in our approach is, each agent \(A_i\) computes forwardly its reachable set of states from the current local time \(t^i\) to the next \((t^i + T)\) seconds which is defined by \(\mathcal {R}_i[t^i, t^i + T]\). Since there are many variables used in the agent modeling that are irrelevant in safety verification, we only need to compute the reachable set of state that is related to the agent’s physical dynamics (so called as motion dynamics) which is defined by a nonlinear ODE \(\dot{x}_i = f(x_i, u_i)\), where \(x_i \in \mathbb {R}^n\) is state vector and \(u_i \in \mathbb {R}^m\) is the control input vector. The agent can switch from one mode to the another mode via discrete transitions, and in each mode, the control law may be different. When the agent computes its reachable set, the only information it needs are its current set of states \(x_i(t^i)\) and the current control input \(u_i(t^i)\). It should be clarified that although the control law may be different among modes, the control signal \(u_i\) is updated with the same control period \(T^i_c\). Consequently, \(u_i\) is a constant vector in each control period.

Assuming that the agent’s current time is \(t^i_j = j \times T_c\), using its local sensors and GPS, we have the current state of the agent \(x_i\). Note that the local sensors and the provided GPS can only provide the information of interest to some accuracy, therefore the actual state of the agent is in a set \(x_i \in I_i\). The control signal \(u_i\) is computed based on the state \(x_i\) and a reference signal, e.g., a set point denoting where the agent needs to go to, and then computed control signal is applied to the actuator to control the motion of the agent. From the current set of states \(I_i\) and the control signal \(u_i\), we can compute the forward reachable set of the agent for the next \(t^i_j + T\) seconds. This reachable set computation needs to be completed after an amount of time \(T^i_{runtime} < T^i_c\) because if \(T^i_{runtime} \ge T^i_c\), a new \(u_i\) will be updated. The control period \(T^i_c\) is chosen based on the agent’s motion dynamics, and thus to control an agent with fast dynamics, the control period \(T^i_c\) needs to be sufficiently small. This is the source of the requirement that the allowable run-time for reachable set computation be small.

To compute the reachable set of an agent in real-time, we use the well-known face-lifting method [3, 6] and a hyper-rectangle to represent the reachable set. This method is useful for short-time reachability analysis of real-time systems. It allows users to define an allowable run-time \(T^i_{runtime}\), and has no dynamic data structures, recursion, and does not depend on complex external libraries as in other reachability analysis methods. More importantly, the accuracy of the reachable set computation can be iteratively improved based on the remaining allowable run-time.

figure a

Algorithm 3.1 describes the real-time reachability analysis for one agent. The Algorithm works as follows. The time period \([t^i, t^i + T]\) is divided by M steps. The reach time step is defined by \(h_i = T/M\). Using the reach time step and the current set \(I_i\), the face-lifting method performs a single-face-lifting operation. The results of this step are a new reachable set and a remaining reach time \(T^i_{remainReachTime} < T\). This step is iteratively called until the reachable set for the whole time period of interest \([t^i, t^i + T]\) is constructed completely, i.e., the remaining reach time is equal to zero. Interestingly, with the reach time step size \(h_i\) defined above, the face-lifting algorithm may be finished quickly after an amount of time which is smaller than the allowable run-time \(T^i_{runtime}\) specified by user, i.e., there is still an amount of time called remaining run time \(T^i_{remainRunTime} < T^i_{runtime}\) that is available for us to recall the face-lifting algorithm with a smaller reach time step size, for example, we can recall the face-lifting algorithm with a new reach time step \(h_i / 2\). By doing this, the conservativeness of the reachable set can be iteratively improved. The core step of face-lifting method is the single-face-lifting operation. We refer the readers to [3] for further detail. As mentioned earlier, the local safety property of each agent can be verified at run-time simultaneously with the reachable set computation process. Precisely, let \(\mathcal {U}_i \triangleq C_ix_i \le d_i\) be the unsafe region of the \(i^{th}\) agent, the agent is said to be safe from \(t^i\) to \(t^i + t \le t^i + T\) if \(\mathcal {R}_i[t^i, t^i + t] \cap \mathcal {U}_i = \emptyset \). Since the reachable set \(\mathcal {R}_i[t^i, t^i + t]\) is given by the face-lifting method at run-time, the local safety verification problem for each agent can be solved at run-time. Since the Algorithm 3.1 computes an over-approximation of the reachable set of each agent in a short time interval, it guarantees the soundness of the result as described in the following lemma.

Lemma 1

[3, 6]. The real-time reachability analysis algorithm is sound, i.e., the computed reachable set contains all possible trajectories of agent \(A_i\) from \(t^i\) to \(t^i + T\).

4 Decentralized Real-Time Collision Verification

Our collision verification scheme is performed based on the exchanged reachable set messages between agents. For every control period \(T_c\), each agent executes the real-time reachability analysis algorithm to check if it is locally safe and to obtain its current reachable set with respect to its current control input. When the current reachable set is available, the agent encodes the reachable set in a message and then broadcasts this message to its cooperative agents and listens to the upcoming messages sent from these agents. When a reachable set message arrives, the agent immediately decodes the message to construct the current reachable set of the sender and then performs peer-to-peer collision detection. The process of computing, encoding, transferring, decoding of the reachable set along with collision checking is illustrated in Fig. 1 based on the agent’s local clock.

Fig. 1.
figure 1

Timeline for reachable set computing, encoding, transferring, decoding and collision checking.

Let \(t^i_{rs}\), \(t^i_e\), \(t^i_{tf}\), \(t^i_d\), and \(t^i_c\) respectively be the instants that we compute, encode, transfer, decode the reachable set and do collision checking on the agent \(A_i\). Note that these time instants are based on the agent \(A_i\)’s local clock. The actual run-times are defined as follows.

$$\begin{aligned} \begin{aligned}&\tau ^i_{rs} = t^i_e - t^i_{rs}, \%~\textit{reachablet set computation~time}, \\&\tau ^i_e = t^i_{tf} - t^i_{e}, \%~\textit{encoding time}, \\&\tau ^i_{tf} \approx t^j_d - t^i_{tf}, \%~\textit{transferring time}, \\&\tau ^i_{d} = t^i_c - t^i_d, \%~\textit{decoding time}. \end{aligned} \end{aligned}$$

Note that we do not know the exact transfer time \(\tau ^i_{tf}\) since it depends on two different local time clocks. The above transfer time formula describes its approximate value when neglecting the mismatch between the two local clocks. The actual reachable set computation time is close to the allowable run-time chosen by user, i.e., \(\tau ^i_{rs} \approx T^i_{runtime}\). We will see later that the encoding time and decoding time are fairly small in comparison with the transferring time, i.e., \(\tau ^i_e \approx \tau ^i_d \ll \tau ^i_{tf}\). All of these run-times provide useful information for selecting an appropriate control period \(T_c\) for an agent. However, for collision checking purpose, we only need to consider the time instants that an agent starts computing reachable set \(t^i_{rs}\) and checking collision \(t^i_c\).

A reachable set message contains three pieces of information: the reachable set which is a list of intervals, the time period (based on the local clock) in which this reachable set is valid, i.e., the start time \(t^i_{rs}\) and the end time \(t^i_{rs} + T\) and the time instant that this message is sent. Based on the timing information of the reachable set and the time-synchronization errors, an agent can examine whether or not a received reachable set contains information about the future behavior of the sent agent which is useful for checking collision. The usefulness of the reachable sets used in collision checking is defined as follows.

Fig. 2.
figure 2

Useful reachable set.

Definition 1

(Useful reachable sets). Let \(\delta _i\) and \(\delta _j\) respectively be the time-synchronization errors of agent \(A_i\) and \(A_j\) in comparison with the virtual global time t, i.e, \(t - \delta _i \le t^i \le t + \delta _i\) and \(t - \delta _j \le t^j \le t + \delta _j\), where \(t^i\) and \(t^j\) are current local times of \(A_i\) and \(A_j\) respectively. The reachable sets \(\mathcal {R}_i[t^i_{rs}, t^i_{rs} + T]\) and \(\mathcal {R}_j[t^j_{rs}, t^j_{rs} + T]\) of the agent \(A_j\) that are available at the agent \(A_i\) at time \(t^i_c\) are useful for checking collision between \(A_i\) and \(A_j\) if:

$$\begin{aligned} \begin{aligned}&t^i_c< t^j_{rs} + T - \delta _i - \delta _j, \\&t^i_c < t^i_{rs} + T. \end{aligned} \end{aligned}$$
(1)

Assume that we are at a time instant where the agent \(A_i\) checks if a collision occurs. This means that the current local time is \(t^i_c\). Note that agent \(A_i\) and \(A_j\) are synchronized to the global time with errors \(\delta _i\) and \(\delta _j\) respectively. The reachable set \(\mathcal {R}_j[t^j_{rs}, t^j_{rs} + T]\) is useful if it contains information about the future behavior of agent \(A_j\) under the view of the agent \(A_i\) based on its local clock. This can be guaranteed if we have: \(t^j_{rs} + T \ge t^{i}_{rs} - \delta _j + T > t^i_c + \delta _i\). Additionally, the current reachablet set of agent \(A_i\) contains information about its future behavior if \(t^i_c < t^i_{rs} + T\) as depicted in Fig. 2. We can see that if \(t^i_c > t^j_{rs} + T + \delta _i + \delta _j\), then the reachable set of \(A_j\) contains a past information, and thus it is useless for checking collision. One interesting case is when \( t^j_{rs} + T - \delta _i - \delta _j< t^i_c < t^j_{rs} + T + \delta _i + \delta _j\). In this case, we do not know whether the received reachable set is useful or not.

Remark 1

We note that the proposed approach does not rely on the concept of Lamport happens-before relation [17] to compute the local reachable set of each agent. If the agent could not receive reachable messages from others until a requested time-stamp expires, it still calculates the local reachable set based on its current state and the state information of other agents in the messages it received previously. In other words, our method does not require the reachable set of each agent to be computed corresponding to the ordering of the events (sending or receiving a message) in the system, but only relies on the local clock period and the time-synchronization errors between agents. Such implementation ensures that the computation process can be accomplished in real-time, and is not affected by the message transmission delay.

figure b

The peer-to-peer collision checking procedure depicted in Algorithm 4.2 works as follows: when a new reachable set message arrives, the receiving agent decodes the message and checks the usefulness of the received reachable set and its current reachable set. Then, the agent combines its current reachable set and the received reachable set to compute the minimum possible distance between two agents. If the distance is larger than an allowable threshold l, there is no collision between two agents in some known time interval in the future, i.e., \(T_{safe}\).

Lemma 2

The decentralized real-time collision verification algorithm is sound.

Proof

From Lemma 1, we know that the received reachable set \(\mathcal {R}_j[t^j_{rs}, t^j_{rs} + T]\) contains all possible trajectories of the agent \(A_j\) from \(t^j_{rs}\) to \(t^j_{rs} + T\). Also, the current reachable set of the agent \(A_i\), \(\mathcal {R}_i[t^i_{rs}, t^i_{rs} + T]\), contains all possible trajectories of the agent from \(t^i_{rs}\) to \(t^i_{rs} + T\). If those reachable sets are useful, then they contains all possible trajectories of two agents from \(t_c^i\) to sometime \(T_{safe} = min(t^j_{rs} + T - \delta _i - \delta _j, t^i_{rs} + T)\) in the future based on the agent \(A_i\) clock. Therefore, the minimum distance \(d_{min}\) between two agents computed from two reachable sets is the smallest distance among all possible distances in the time interval \([t^i_{c},T_{safe}]\). Consequently, the collision free guarantee is sound in the time interval \([t^i_{c},T_{safe}]\).

Fig. 3.
figure 3

Distributed search application using quadcopters.

5 Case Study

The decentralized real-time safety verification for distributed CPS proposed in this paper is implemented in Java as a package called drreach. This package is currently integrated as a library in StarL, which is a novel platform-independent framework for programming reliable distributed robotics applications on Android [19]. StarL is specifically suitable for controlling a distributed network of robots over WiFi since it provides many useful functions and sophisticated algorithms for distributed applications. In our approach, we use the reliable communication network of StarL which is assumed to be asynchronous and peer-to-peer. There may be message dropouts and transmission delays; however, every message that an agent tries to send is eventually delivered with some time guarantees. All experimental results of our approach are reproducible and available online at: https://github.com/trhoangdung/starl/tree/drreach.

We evaluate the proposed approach via a distributed search application using quadcoptersFootnote 1 in which each quadcopter executes its search mission provided by users as a list of way-points depicted in Fig. 3. These quadcopters follow the way-points to search for some specific objects. For safety reasons, they are required to work only in a specific region defined by users. In this case study, the quadcopters are controlled to operate at the same constant altitude. It has been shown from the experiments that the proposed approach is promisingly scalable as it works well for a different number of quadcopters. We choose to present in this section the experimental results for the distributed search application with eight quadcopters.

The first step in our approach is locally computing the reachable set of each quadcopter using face-lifting method. The quadcopter has nonlinear motion dynamics given in Eq. 2 in which \(\theta \), \(\phi \), and \(\psi \) are the pitch, roll, and yaw angles, \(f = \varSigma _{i=1}^4T_i\) is the sum of the propeller forces, m is the mass of the quadcopter and \(g = 9.81\, \mathrm{m/s}^2\) is the gravitational acceleration constant. As the quadcopter is set to operate on a constant altitude, we have \(\ddot{z} = 0\) which yields the following constraint: \(f = \frac{mg}{cos(\theta )cos(\phi )}\). Let \(v_x\) and \(v_y\) be the velocities of a quadcopter along with x- and y- axes. Using the constraint on the total force, the motion dynamics of the quadcopter can be rewritten as a 4-dimensional nonlinear ODE as depicted in Eq. 3.

figure c

A PID controller is designed to control the quadcopter to move from its current position to desired way-points. Details about the controller parameters can be found in the available source code. The PID controller has a control period of \(T_c = 200\) ms. In every control period, the control inputs pitch \((\theta )\) and roll \((\phi )\) are computed based on the current positions of the quadcopter and the current target position (i.e., the current way-point it needs to go). Using the control inputs, the current positions and velocities given from GPS and the motion dynamics of the quadcopter, the real-time reachable set computation algorithm (Algorithm 3.1) is executed inside the controller. This algorithm computes the reachable set of a quadcopter from its current local time to the next \(T = 2\) s. The allowable run-time for this algorithm is \(T_{runtime} = 10\) ms. The local safety property is verified by the real-time reachable set computation algorithm at run-time. The computed reachable set is then encoded and sent to another quadcopter. When a reachable set message arrives, the quadcopter decodes the message to reconstruct the current reachable set of the sender. The GPS error is assumed to be \(2\%\). The time-synchronization error between the quadcopters is \(\delta = 3\) ms. We want to verify in real-time: (1) local safety property for each quadcopter; (2) collision occurrence. The local safety property is defined by \(v_x \le 500\), i.e., the maximum allowable velocities along the x-axis of two arbitrary quadcopters are not larger than 500 m/s. The collision is checked using the minimum allowable distance between two arbitrary quadcopters \(d_{min} = 100\).

Fig. 4.
figure 4

A sample of events.

Fig. 5.
figure 5

One sample of the reachable sets of eight quadcopters in \([0, 2\,\mathrm{s}]\) time interval and their interval hulls.

Figure 4 presents a sample of a sequence of events happening in the distributed search application. One can see that each quadcopter can determine based on its local clocks if there is no collision to some known time in the future. In addition, the local safety property can also be verified at run-time. For example, in the figure, the quadcopter 1 receives a reachable set message from the quadcopter 0 which is valid from 17 : 29 : 49.075 to 17 : 29 : 51.074 of the quadcopter 0’s clock. After decoding this message, taking into account the time-synchronization error \(\delta \), quadcopter 1 realizes that the received reachable set message is useful for checking collision for the next 1.645 s of its clock. After checking collision, quadcopter 1 knows that it will not collide with the quadcopter 0 in the next 1.645 s (based on its clock).

It should be noted that we can intuitively verify the collision occurrences by observing the intermediate reachable sets of all quadcopters and their interval hulls. The intermediate reachable sets of the quadcopters in every \([0, 2\,\mathrm{s}]\) time interval computed by the real-time reachable set computation algorithm (i.e., Algorithm 3.1) is described in Fig. 5. The zoom plot within the figure presents a very short-time interval reachable set of the quadcopters. We note that the intermediate reachable set of a quadcopter is represented as a list of hyper-rectangles and is used for verifying the local safety property at run-time. The reachable set that is sent to another quadcopter is the interval hull of these hyper-rectangles. The intermediate reachable set cannot be transferred via a network since it is very large (i.e., hundreds of hyper-rectangles). The interval hull of all hyper-rectangles contained in the intermediate reachable set covers all possible trajectories of a quadcopter in the time interval of \([0, 2\,\mathrm{s}]\). Therefore, it can be used for safety verification. One may question why we use the interval hull instead of using the convex hull of the reachable set since the former one results in a more conservative result. The reason is that we want to perform the safety verification online, convex hull of hundreds of hyper-rectangles is a time-consuming operation. Therefore, in the real-time setting, interval hull operation is a suitable solution. From the figure, we can see that the interval hulls of the reachable set of all quadcopters do not intersect with each other. Therefore, there is no collision occurrence (in the next 2 s of global time).

Table 1. The average encoding time \(\tau _e\), decoding time \(\tau _d\), transferring time \(\tau _{tf}\), collision checking time \(\tau _c\) and total verification time VT of the quadcopters.

Since we implement the decentralized real-time safety verification algorithm inside the quadcopter’s controller, it is important to analyze whether or not the verification procedure affects the control performance of the controller. To reason about this, we measure the average encoding, decoding, transferring and collision checking times for all quadcopters using 100 samples which are presented in Table 1. We note that the transferring time \(\tau _{tf}\) is the average time for one message transferred from other quadcopters to the \(i^{th}\) quadcopter. It can be seen that the encoding, decoding and collision checking times at each quadcopter constitute a tiny amount of time. The total verification time is the sum of the reachable set computation, encoding, transferring, decoding and collision checking times. Note that the allowable runtime for reachable set computation algorithm is specified by users as \(T_{runtime} = 10\) ms. Therefore, the (average) total time for the safety verification procedure on each quadcopter is \(VT_i = T_{runtime} + \tau _e^i + (N-1)\times (\tau _{tf}^i + \tau _{d}^i + \tau _{c}^i)\), where \(i = 1, 2, \ldots , N\), and N is the number of quadcopters. As shown in the Table, the (average) total verification time for each quadcopter is small (\({<}30\) ms), compared to the control period \(T_c = 200\) ms. Besides, from the experiment, we observe that the computation time for the control signal of the PID controller \(\tau _{control}^i\) (not presented in the table) is also small, i.e., from 5 to 10 ms. Since \(VT_i + \tau _{control}^i < T_c/4 = 50\) ms, we can conclude that the verification procedure does not affect the control performance of the controller.

Interestingly, from the verification time formula, we can estimate the range of the number of agents that the decentralized real-time verification procedure can deal with. The idea is that, in each control period \(T_c\), after computing the control signal, the remaining time bandwidth \(T_c - \tau _{control}\) can be used for verification. Let be the maximum (minimum) encoding, transferring, decoding and collision checking times on a quadcopter, be the maximum (minimum) control signal computation time for each control period \(T_c\), then the number of agents that the decentralized real-time safety verification procedure can deal with (with assumption that the communication network works well) satisfies the following constraint:

(4)

Let consider our case study, from the Table, we assume that  ms. Also, we assume that \(\bar{\tau }_{control} = 10\) and  ms. We can estimate theoretically the number of quadcopters that our verification approach can deal with is \(64 \le N \le 168\).

6 Related Work

Our work is inspired by the static and dynamic analysis of timed distributed traces [8] and the real-time reachability analysis for verified simplex design [3]. The former one proposes a sound method of constructing a global reachable set for a distributed CPS based on the recorded traces and time synchronization errors of participating agents. Then the global reachable set is used to verify a global property using Z3 [7]. This method can be considered to be a centralized analysis where the reachable set of the whole system is constructed and verified by one analyzer. Such a verification approach is offline which is fundamentally different from our approach as we deal with online verification in a decentralized manner. Our real-time verification method borrows the face-lifting technique developed in [3] and applies it to a distributed CPS.

Another interesting aspect of real-time monitoring for linear systems was recently published in [5]. In this work, the authors proposed an approach that combines offline and online computation to decide if a given plant model has entered an uncontrollable state which is a state that no control strategy can be applied to prevent the plant go to the unsafe region. This method is useful for a single real-time CPS, but not a distributed CPS with multiple agents.

Additionally, there has been other significant works for verifying distributed CPS. Authors of [9, 23, 24] presented a real-time software for distributed CPS but did not perform a safety verification of individual components and a whole system. The works presented in [2, 14, 16] can be used to verify distributed CPS, but they do not consider a real-time aspect. An interesting work proposed in [21] can formally model and verify a distributed car control system against several safety objectives such as collision avoidance for an arbitrary number of cars. However, it does not address the verification problem of distributed CPS in a real-time manner. The novelty of our approach is that it can over-approximate of the reachable set of each agent whose dynamics are non-linear with a high precision degree in real-time.

The most related work to our scheme was recently introduced in [20]. The authors proposed an online verification using reachability analysis that can guarantee safe motion of mobile robots with respective to walking pedestrians modeled as hybrid systems. This work utilizes CORA toolbox [1] to perform reachability analysis while our work uses a face-lifting technique. However, this work does not consider the time-elapse for encoding, transferring and decoding the reachable set messages between each agent, which play an important role in distributed systems.

7 Conclusion and Future Work

We have proposed a decentralized real-time safety verification method for distributed cyber-physical systems. By utilizing the timing information and the reachable set information from exchanged reachable set messages, a sound guarantee about the safety of the whole system is obtained for each participant based on its local time. Our method has been successfully applied for a distributed search application using quadcopters built upon StarL framework. The main benefit of our approach is that it allows participants to take advantages of formal guarantees available locally in real-time to perform intelligent actions in dangerous situations. This work is a fundamental step in dealing with real-time safe motion/path planing for distributed robots. For future work, we seek to deploy this method on a real-platform and extend it to distributed CPS with heterogeneous agents where the agents can have different motion dynamics and thus they have different control periods. In addition, the scalability of the proposed method can be improved by exploiting the benefit of parallel processing, i.e., each agent handles multiple reachable set messages and checks for collision in parallel.