research-article

Open access

Satisfiability Modulo Ordering Consistency Theory for SC, TSO, and PSO Memory Models

Authors:

Hongyu Fan,

Zhihang Sun,

Fei HeAuthors Info & Claims

ACM Transactions on Programming Languages and Systems, Volume 45, Issue 1

Article No.: 6, Pages 1 - 37

https://doi.org/10.1145/3579835

Published: 03 March 2023 Publication History

All formats PDF

Abstract

Automatically verifying multi-threaded programs is difficult because of the vast number of thread interleavings, a problem aggravated by weak memory consistency. Partial orders can help with verification because they can represent many thread interleavings concisely. However, there is no dedicated decision procedure for solving partial-order constraints.

In this article, we propose a novel ordering consistency theory for concurrent program verification that is applicable not only under sequential consistency, but also under the TSO and PSO weak memory models. We further develop an efficient theory solver, which checks consistency incrementally, generates minimal conflict clauses, and includes a custom propagation procedure. We have implemented our approach in a tool, called Zord, and have conducted extensive experiments on the SV-COMP 2020 ConcurrencySafety benchmarks. Our experimental results show a significant improvement over the state-of-the-art.

1 Introduction

Shared-memory multi-threaded programs are commonly used in modern computing systems. The number of interleavings of a concurrent program makes its verification very hard in practice. It is highly desirable to develop techniques to alleviate the execution explosion problem of concurrent program verification.

A memory consistency model (for short, memory model) [5] restricts the execution order of shared-memory accesses from different threads. It determines what value(s) a read access can return. The sequential consistency(SC) model [42] forces memory accesses in each thread to follow the program order of instructions. To make full use of hardware resources and improve the efficiency of multi-threaded program execution, weak memory models allow certain memory access orders to be relaxed. Compared to SC, a weak memory model allows more concurrent behaviors and further aggravates the execution explosion problem. This article examines two weak memory models, i.e., total store order (TSO) [47] and partial store order (PSO) [58].

Bounded model checking (BMC) [15, 17] is a verification technique that is particularly efficient in bug-finding and has been widely adopted by most verification tools. BMC sets upper bounds for loops and recursive functions to obtain a bounded program and then uses SMT solvers to verify its correctness. A promising technique for handling multi-threaded programs with BMC is to use partial orders to represent the happens-before relation between shared-memory access events [9, 10]. In this way, one can achieve a compact representation of the vast number of interleaving behaviors of multi-threaded programs.

The standard approach (e.g., in References [8, 9, 10, 51, 61]) for solving partial order constraints is based on integer difference logic. Each event is associated with an integer-valued clock, and event orders are represented as differences among these clock variables. Then, the partial order constraints can be solved by the decision procedure of integer difference logic. There are two problems with this approach. First, it determines a clock value for each event, which goes a little bit too far, because we only care about the events’ order, not their exact clock values. Second, there is an important axiom (axiom 3 in section 4) in reasoning about multi-threaded programs, which defines the derivation rule for the so-called from-read orders. Existing approaches [8, 9, 10, 28, 51] encode all possible from-read constraints, irrespective of whether they are actually needed for verification. This method yields numerous from-read constraints, which significantly increases the burden on the solver and worsens its performance.

In this article, we propose a new and novel ordering consistency \(\mathcal {T}_{{\it ord}}\) theory (see section 4) and elaborate on its theory solver (see section 5) for multi-threaded program verification. Using this method, we no longer need to specify all possible from-read orders in the encoding formula. One direct benefit is the significant reduction in the size of the encoding formula. Another benefit is the on-demand deduction of from-read orders. With a specialized theory propagation procedure (see section 5.5), a from-read order is derived only when the relevant variables get assigned. In this way, we avoid the generation of massive useless from-read constraints.

We develop an efficient theory solver for \(\mathcal {T}_{{\it ord}}\) and integrate it into the DPLL(T) framework [26]. Given a partial assignment, the solver judges whether this assignment is consistent with the theory axioms, which can further be reduced to detecting cycles on a so-called event graph. In particular, we use an incremental consistency checking algorithm (see section 5.2) that utilizes the previously computed results and attains better efficiency. We also devise a conflict clause generation algorithm (see section 5.3) for finding the minimal reasons for inconsistency. The complexity of this algorithm is linear in the number of conflict clauses and the number of edges in the event graph.

Last but not least, inspired by the idea of unit clause propagation, we propose a novel technique for theory propagation. We attempt to find the so-called unit edges and use these edges to enforce values of some unassigned variables (see section 5.5.1). With this technique, the decision iterations of DPLL(T) are greatly reduced, and the whole performance is significantly improved.

We have implemented the proposed approach in CBMC [41] and Z3 [21] and conducted experiments on 1,061 benchmarks in the ConcurrencySafety category of SV-COMP 2020. We compare our approach with state-of-the-art concurrent verification tools, including CBMC [9, 41], Lazy-CSeq [36], Lazy-SMA [54], CPA-Seq [13, 14], and Dartagnan [27], under SC, TSO, and PSO, respectively:

•

Under SC,¹ our approach solves 38, 119, and 897 more cases than CBMC, CPA-Seq, and Dartagnan, respectively, and 6 less cases than Lazy-CSeq; counting on both-solved cases, our approach runs 2.44 \(\times\) , 90.04 \(\times\) , 139.47 \(\times\) , and 7.20 \(\times\) faster, consumes 20.8%, 99.6%, 99.0%, and 94.5% less memory, than CBMC, CPA-Seq, Dartagnan, and Lazy-CSeq, respectively.

•

Under TSO,² our approach solves 47, 925, and 533 more cases than CBMC, Dartagnan, and Lazy-SMA, respectively; counting on both-solved cases, our approach is 2.47 \(\times\) , 174.18 \(\times\) , 4.49 \(\times\) faster, and consumes 31.8%, 98.2%, 92.6% less memory than CBMC, Dartagnan, and Lazy-SMA, respectively.

•

Under PSO, our approach solves 50, 890, and 273 more cases than CBMC, Dartagnan, and Lazy-SMA, respectively; counting on both-solved cases, our approach runs 2.44 \(\times\) , 163.43 \(\times\) , 11.29 \(\times\) faster, and consumes 31.0%, 98.4%, 93.2% less memory, than CBMC, Dartagnan, and Lazy-SMA, respectively.

We have also compared our approach with state-of-the-art stateless model checking (SMC) tools, namely, Nidhugg [2, 4] and GenMC [40] on nine benchmarks from the Nidhugg suite. Experimental results show that as the program’s scale (measured by the number of traces) increases, our approach is superior to these tools in most cases.

In summary, our main contributions are:

•

We propose a new ordering consistency theory \(\mathcal {T}_{{\it ord}}\) for multi-threaded program verification under SC, TSO, and PSO memory models.

•

We elaborate on an efficient theory solver for \(\mathcal {T}_{{\it ord}}\) , which realizes incremental consistency checking, minimal conflict clause generation, and specialized theory propagation to improve the efficiency of SMT solving.

•

We implement our approach in CBMC and Z3. Experimental results on SV-COMP concurrent benchmarks demonstrate orders of magnitude improvements of our method over state-of-the-art verification tools.

This article is an extended and revised version of a previous conference paper [32]. Compared to Reference [32], this article makes the following new contributions: First, only the SC memory model is supported in Reference [32]; in this article, we extend our approach to support the TSO and PSO memory models. Second, the \(\mathcal {T}_{{\it ord}}\) theory in Reference [32] assumes no atomicity constraints. Noting that atomic operations are commonly specified in concurrent programs, this article extends \(\mathcal {T}_{{\it ord}}\) to support atomicity constraints. This extension leads to a new atomicity relation \(\approx\) and adaptations in \(\mathcal {T}_{{\it ord}}\) and the \(\mathcal {T}_{{\it ord}}\) -solver (see section 4). Third, this article also reports two new sets of experimental results (see section 6) for evaluating our approach under TSO and PSO memory models.

The rest of the article is organized as follows: section 2 introduces some background knowledge. section 3 demonstrates our symbolic encoding of multi-threaded programs. section 4 proposes the new \(\mathcal {T}_{{\it ord}}\) theory. section 5 develops a theory solver for \(\mathcal {T}_{{\it ord}}\) . We report experimental results in section 6 and discuss related work in section 7. section 8 concludes this article.

2 Preliminaries

2.1 Notions

In first-order logic, a term is a variable, a constant, or an n-ary function applied to n terms; an atom is \(\bot\) , \(\top\) , or an n-ary predicate applied to n terms; a literal is an atom or its negation. A first-order formula is built from literals using Boolean connectives and quantifiers. A model M consists of a non-empty object set \(\mathit {dom}(M)\) , called the domain of M, an assignment that maps each variable to an object in \(\mathit {dom}(M)\) , and an interpretation for each constant, function and predicate, respectively. A formula \(\Phi\) is satisfiable if there exists a model M, \(M \models \Phi\) ; \(\Phi\) is valid if for any model M, \(M \models \Phi\) .

A first-order theory \(\mathcal {T}\) is defined by a signature and a set of axioms. The signature consists of constant symbols, function symbols, and predicate symbols allowed in \(\mathcal {T}\) ; the axioms prescribe the intended meanings of these symbols. A \(\mathcal {T}\) -model is a model that satisfies all axioms of \(\mathcal {T}\) . A formula \(\Phi\) is \(\mathcal {T}\) -satisfiable if there exists a \(\mathcal {T}\) -model M so \(M \models \Phi\) ; \(\Phi\) is \(\mathcal {T}\) -valid if it is satisfied by all \(\mathcal {T}\) -models.

2.2 Satisfiability Modulo Theory and DPLL(T)

The satisfiability modulo theories (SMT) problem [11, 21, 22] is a decision problem for formulas in some combination of first-order background theories. A theory solver is required for each background theory \(\mathcal {T}\) , called \(\mathcal {T}\) -solver, with which the \(\mathcal {T}\) -satisfiability of any conjunction of literals in \(\mathcal {T}\) can be determined.

DPLL(T) is the standard framework for solving SMT instances. It extends the classical DPLL algorithm [20, 45] with dedicated theory solvers. fig. 1 shows a high-level overview of DPLL(T). Given an SMT formula \(\Psi\) , DPLL(T) first replaces each atom with a fresh Boolean variable. This process is called Boolean abstraction, because the resulting formula, denoted by \(\mathcal {B}(\Psi)\) , is an over-approximation of the original formula \(\Psi\) with respect to satisfiability. The satisfiability of \(\mathcal {B}(\Psi)\) can be determined by a SAT solver. If \(\mathcal {B}(\Psi)\) is unsatisfiable, then so is \(\Psi\) ; but the reverse may not hold. If \(\mathcal {B}(\Psi)\) is satisfiable and M is the satisfying model returned by the SAT solver, then we need to go ahead to check whether M is consistent with the underlying first-order theories.

Fig. 1.

A theory solver can be integrated with DPLL(T) in an online or offline scheme. Let M be the current (partial) assignment to \(\mathcal {B}(\Psi)\) . In the online scheme, \(\mathcal {T}\) -solver checks \(\mathcal {T}\) -consistency of M as long as M is updated (even when M is a partial assignment); in the offline scheme, consistency checking is involved only when M is a satisfying model of \(\mathcal {B}(\Psi)\) . If M is \(\mathcal {T}\) -inconsistent, then \(\mathcal {T}\) -solver attempts to generate a conflict clause and adds it to the clause set to prevent the solver from repeating the same inconsistency in the future. A typical theory solver also supports theory propagation, which deduces values of unassigned literals by theory axioms. Our method is integrated with DPLL(T) in an online scheme.

2.3 Concurrent Execution as Partial Order

A multi-threaded program comprises multiple threads running in parallel. It contains a set of variables that can be divided into local variables accessible to a specific thread only and shared variables accessible to all threads. Our approach extends the framework of Alglave et al. [9], which models executions of multi-threaded programs using partial order.

An event is either a read or a write access to a shared variable and has the following attributes:

•

\(\mathit {type}(e)\) : the type of e, i.e., \(\mathtt {W}\) if e is a write access, and \(\mathtt {R}\) if e is a read access,

•

\(\mathit {addr}(e)\) : the memory address that e accesses,

•

\(\mathit {guard}(e)\) : the guard condition on which e is enabled.

Let \(\mathbb {E}\) be the set of all events. There are some relations over events in \(\mathbb {E}\) . The program order(PO) relation \(\prec _{{\it po}}\) is a total order of events from the same processor. The atomicity relation \(\approx\) is an equivalence relation such that \(e_1\approx e_2\) iff \(e_1\) and \(e_2\) are contained in the same atomic operation. The write serialization(WS) relation \(\prec _{{\it ws}}\) is a total order of writes with the same address. The read-from(RF) relation \(\prec _{{\it rf}}\) links a write event \(e_1\) (with \(\mathit {type}(e_1)=\mathtt {W}\) ) to a read event \(e_2\) (with \(\mathit {type}(e_2)=\mathtt {R}\) ), so \(e_2\) reads the value written by \(e_1\) . Moreover, given a pair of write events \(e_1\) , \(e_2\) (with \(\mathit {type}(e_1)=\mathit {type}(e_2)=\mathtt {W}\) ) and a read event \(e_3\) (with \(\mathit {type}(e_3)=\mathtt {R}\) ) so \(e_1\prec _{{\it ws}}e_2\) and \(e_1\prec _{{\it rf}}e_3\) , we know that \(e_1\) happens before \(e_2\) , and \(e_3\) reads from \(e_1\) . To ensure that \(e_3\) does not read from \(e_2\) , \(e_3\) must happen before \(e_2\) ; we call such relation the from-read(FR) relation \(\prec _{{\it fr}}\) .

A weak memory model allows the order of certain pairs of memory access events to be relaxed. In this article, we focus on total store order(TSO) [47] and partial store order(PSO) [58]: the former relaxes the write-to-read program orders, and the latter further relaxes the write-to-write program orders. Formally, consider a program order \(e_1\prec _{{\it po}}e_2\) , TSO relaxes this order if \(\mathit {type}(e_1) = \mathtt {W}\) , \(\mathit {type}(e_2) = \mathtt {R}\) and \(\mathit {addr}(e_1) \ne \mathit {addr}(e_2)\) , PSO relaxes this order if \(\mathit {type}(e_1) = \mathtt {W}\) and \(\mathit {addr}(e_1) \ne \mathit {addr}(e_2)\) . Note that PSO can relax more program orders than TSO. We use \(\prec _{{\it ppo}}\) to represent the preserved program order(PPO) relation after relaxation. Especially, \(\prec _{{\it ppo}}\) is identical to \(\prec _{{\it po}}\) under SC.

A concurrent execution of the multi-threaded program can be represented as a set of partial orders over access events. Note that \(\prec _{{\it ppo}}\) is determined by the program and the architecture, while \(\prec _{{\it ws}}, \prec _{{\it rf}}\) and \(\prec _{{\it fr}}\) are specified by executions. A concurrent execution is valid if \(\prec _{{\it rf}}\cup \prec _{{\it ws}}\cup \prec _{{\it fr}}\) is consistent with \(\prec _{{\it ppo}}\) and \(\approx\) ; that is, there is a linearization of events on this execution that respects \(\prec _{{\it ppo}}\) and the accesses contained in each atomic operation are executed consecutively, not interrupted by other accesses.

Lemma 1 ([50]).

The relation \(\prec _{{\it rf}}\cup \prec _{{\it ws}}\cup \prec _{{\it fr}}\) is consistent with the partial order \(\prec _{{\it ppo}}\) and the equivalence relation \(\approx\) , if and only if:

(1)

\((\prec _{{\it ppo}}\cup \prec _{{\it rf}}\cup \prec _{{\it ws}}\cup \prec _{{\it fr}})\cap \approx\) has no cycles, and

(2)

all cycles in \(\prec _{{\it ppo}}\cup \prec _{{\it rf}}\cup \prec _{{\it ws}}\cup \prec _{{\it fr}}\cup \approx\) are contained in \(\approx\) .

If there is no equivalence relation in the program, then \(\prec _{{\it rf}}\cup \prec _{{\it ws}}\cup \prec _{{\it fr}}\) is consistent with \(\prec _{{\it ppo}}\) iff \(\prec _{{\it ppo}}\cup \prec _{{\it rf}}\cup \prec _{{\it ws}}\cup \prec _{{\it fr}}\) has no cycles. An execution is correct if it satisfies the correctness condition. An incorrect execution is also called a counterexample. A program is correct iff it does not contain any valid counterexample.

3 Symbolic Encoding of Multi-threaded Programs

In this section, we use a simple example to introduce our symbolic encoding, discuss its differences with other approaches, and establish its correctness.

3.1 Symbolic Encoding

Consider the program in Figure 2(a), which contains three threads, i.e., main, \(t_1\) , and \(t_2\) . Our goal is to verify that m and n cannot be both equal to 1 at the end of the execution.

Fig. 2.

We first convert the original program to its static single assignments (SSA) form [19], shown in Figure 2(b), where each occurrence (no matter write or read) of each shared variable is replaced with a fresh copy of this variable. A similar SSA transformation procedure is adopted in References [9, 51, 61].

SSA Variables and Access Events.

Given an SSA variable \(x_i\) , we write \((\!| x_i|\!)\) for its corresponding access event. Especially, we write \((\!| x_i|\!) ^w\) for a write access and \((\!| x_i|\!) ^r\) for a read access. With respect to the attributes, we have \(\mathit {type}((\!| x_i|\!) ^w) = \mathtt {W}\) , \(\mathit {type}((\!| x_i|\!) ^r) = \mathtt {R}\) , and \(\mathit {addr}((\!| x_i|\!) ^w) = \mathit {addr}((\!| x_i|\!) ^r) = x\) .

Considering x in the program (Figure 2(a)), there are five accesses to this variable. Five fresh variables, i.e., \(x_1, x_2, x_3, x_4, x_5\) , are introduced in the SSA form (Figure 2(b)). Note that \(x_1\) , \(x_5\) represent write accesses and \(x_2\) , \(x_3\) , \(x_4\) represent read accesses; their corresponding events are represented as \((\!| x_1|\!) ^w\) , \((\!| x_2|\!) ^r\) , \((\!| x_3|\!) ^r\) , \((\!| x_4|\!) ^r\) , and \((\!| x_5|\!) ^w\) , respectively.

Value Assignment Encoding.

Value assignments of variables in each thread can be encoded by directly interpreting SSA statements. The encoding \(\rho _{{\it va}}^{t_1}\) of thread \(t_1\) ’s value assignment is:

\begin{equation*} (x_2 = 1 \rightarrow m_3 = 1) \wedge (\lnot (x_2 = 1) \rightarrow m_4 = x_3) \wedge (y_2 = x_4 + 1) . \end{equation*}

In a similar way, we get encodings \(\rho _{{\it va}}^{t_2}\) and \(\rho _{{\it va}}^{\it main}\) for value assignments of threads \(t_2\) and main, respectively. The value assignment encoding of the whole program is:

\begin{equation*} \rho _{{\it va}}:= \rho ^{t_1}_{{\it va}}\wedge \rho ^{t_2}_{{\it va}} \wedge \rho ^{\it main}_{{\it va}} . \end{equation*}

Error Condition.

We use \(\rho _{{\it err}}\) to encode the error condition of the program. The error condition of the example program is

\begin{equation*} \rho _{{\it err}}:= (m_2 = 1) \wedge (n_2 = 1). \end{equation*}

Note that \(\rho _{{\it va}}\wedge \rho _{{\it err}}\) is not sufficient for verifying the program’s correctness. A satisfying model of \(\rho _{{\it va}}\wedge \rho _{{\it err}}\) does not necessarily represent a valid execution.

Considering the example program in fig. 2, a satisfying model of \(\rho _{{\it va}}\wedge \rho _{{\it err}}\) is:

\[\begin{multline*} \lbrace x_1\mapsto 0, x_2\mapsto 1, m_3\mapsto 1, x_4\mapsto 0, y_2\mapsto 1, y_3\mapsto 1, n_3\mapsto 1, y_5\mapsto 0, x_5\mapsto 1, m_2\mapsto 1, \ldots \rbrace . \end{multline*}\]

The execution corresponding to this model is, however, invalid. Let us consider the variable x. Recall that \((\!| x_1|\!) ^w\) and \((\!| x_5|\!) ^w\) are write accesses, \((\!| x_2|\!) ^r\) and \((\!| x_4|\!) ^r\) are read accesses. By \(x_1=x_4\ne x_5\) , \((\!| x_4|\!) ^r\) must read from \((\!| x_1|\!) ^w\) , and \((\!| x_4|\!) ^r\) must happen before \((\!| x_5|\!) ^w\) (otherwise, \((\!| x_4|\!) ^r\) should read from \((\!| x_5|\!) ^w\) instead). Then, \((\!| x_2|\!) ^r\) should also happen before \((\!| x_5|\!) ^w\) (since \((\!| x_2|\!) ^r\) happens before \((\!| x_4|\!) ^r\) ), and therefore should also read from \((\!| x_1|\!) ^w\) , which contradicts the fact that \(x_1 \ne x_2\) .

The main reason is that \(\rho _{{\it va}}\ \wedge \ \rho _{{\it err}}\) does not restrict the order of memory accesses. In the following, we formulate order constraints of concurrent executions.

Program Order Constraints.

We use \(\rho _{{\it po}}\) to encode the program order (PO) constraints, which represents the natural order of access events in each thread. The program orders of thread \(t_1\) , \(t_2\) , and main are

\begin{equation*} \begin{split} \rho _{{\it po}}^{t_1} &:= (\!| x_2|\!) ^r\prec _{{\it po}}(\!| m_3|\!) ^w\prec _{{\it po}}(\!| x_3|\!) ^r\prec _{{\it po}}(\!| m_4|\!) ^w\prec _{{\it po}}(\!| x_4|\!) ^r\prec _{{\it po}}(\!| y_2|\!) ^w, \\ \rho _{{\it po}}^{t_2} &:= (\!| y_3|\!) ^r\prec _{{\it po}}(\!| n_3|\!) ^w\prec _{{\it po}}(\!| y_4|\!) ^r\prec _{{\it po}}(\!| n_4|\!) ^w\prec _{{\it po}}(\!| y_5|\!) ^r\prec _{{\it po}}(\!| x_5|\!) ^w, \\ \rho _{{\it po}}^{\it main}&:= (\!| x_1|\!) ^w\prec _{{\it po}}(\!| y_1|\!) ^w\prec _{{\it po}}(\!| m_1|\!) ^w\prec _{{\it po}}(\!| n_1|\!) ^w\prec _{{\it po}}(\!| m_2|\!) ^r\prec _{{\it po}}(\!| n_2|\!) ^r. \end{split} \end{equation*}

Moreover, since \(t_1\) and \(t_2\) are child threads of main, all events in \(t_1\) and \(t_2\) should happen between the invocations to pthread_create and pthread_join, respectively. As a result, we have the following program order constraint:

\begin{equation*} \rho _{{\it po}}^{\it spawn}:= (\!| n_1|\!) ^w\prec _{{\it po}}(\!| x_2|\!) ^r \wedge (\!| n_1|\!) ^w\prec _{{\it po}}(\!| y_3|\!) ^r \wedge (\!| y_2|\!) ^w\prec _{{\it po}}(\!| m_2|\!) ^r \wedge (\!| x_5|\!) ^w\prec _{{\it po}}(\!| m_2|\!) ^r. \end{equation*}

Let \(\rho _{{\it po}}\) be the conjunction of the above program order constraints, i.e.,

\begin{equation*} \rho _{{\it po}}:= \rho _{{\it po}}^{t_1} \wedge \rho _{{\it po}}^{t_2} \wedge \rho _{{\it po}}^{\it main}\wedge \rho _{{\it po}}^{\it spawn}. \end{equation*}

Note that some program orders are relaxed under weak memory models. Taking thread \(t_1\) as an example, TSO relaxes \((\!| m_3|\!) ^w \prec _{{\it po}}(\!| x_3|\!) ^r\) and \((\!| m_4|\!) ^w \prec _{{\it po}}(\!| x_4|\!) ^r\) (write-to-read program orders), and PSO relaxes these two program orders plus \((\!| m_4|\!) ^w \prec _{{\it po}}(\!| y_2|\!) ^w\) (write-to-write program order). In the following, we use \(\rho _{{\it ppo}}\) to represent the preserved program order(PPO) constraint. Note that no program order is relaxed under SC, thus \(\rho _{{\it ppo}}\equiv \rho _{{\it po}}\) for SC.

Atomicity Constraints.

We use \(\rho _{\approx }\) to represent the atomicity constraints. Intuitively, \(e_1 \approx e_2\) means that \(e_1\) and \(e_2\) are in the same atomic operation, so they are executed indivisibly, not interrupted by other accesses. There is no atomic operation specified in the example program in fig. 2 (to simplify other discussions in this article). Assuming that the last statement in thread \(t_1\) , i.e., \(y_2 = x_4 + 1\) , is declared as an atomic operation,³ we have \((\!| x_4|\!) ^r \approx (\!| y_2|\!) ^w\) .

Read-from Variables and Constraints.

Note that a read event \((\!| x_i|\!) ^r\) reads a value written by a write event to the same address. Let \(\pi ((\!| x_i|\!) ^r)\) be the set of write accesses that \((\!| x_i|\!) ^r\) may read from. Because of thread interactions, \(\pi ((\!| x_i|\!) ^r)\) may contain write accesses in other threads. Consider the read event \((\!| x_2|\!) ^r\) in the example program:

\begin{equation*} \pi ((\!| x_2|\!) ^r) = \lbrace (\!| x_1|\!) ^w, (\!| x_5|\!) ^w\rbrace . \end{equation*}

For each write event \((\!| x_j|\!) ^w\in \pi ((\!| x_i|\!) ^r)\) , we define a Boolean variable \(rf_{j,i}^x\) , called a read-from(RF) variable, to specify whether \((\!| x_i|\!) ^r\) reads its value from \((\!| x_j|\!) ^w\) . For each RF variable, we have the following constraints:

•

RF-Val constraint: if \(rf_{j,i}^x\) is \(\mathit {true}\) , \((\!| x_i|\!) ^r\) and \((\!| x_j|\!) ^w\) are enabled and their values are equal, i.e.,

\begin{equation*} rf_{j,i}^x \rightarrow \mathit {guard}((\!| x_i|\!) ^r) \wedge \mathit {guard}((\!| x_j|\!) ^w) \wedge (x_i = x_j); \end{equation*}

•

RF-Ord constraint: if \(rf_{j,i}^x\) is \(\mathit {true}\) , the write event \((\!| x_j|\!) ^w\) must happen before the read event \((\!| x_i|\!) ^r\) , i.e.,

\begin{equation*} rf_{j,i}^x \rightarrow (\!| x_j|\!) ^w \prec _{{\it rf}}(\!| x_i|\!) ^r; \end{equation*}

•

RF-Some constraint: if the read event \((\!| x_i|\!) ^r\) is enabled, it must obtain its value from a certain write event in \(\pi ((\!| x_i|\!) ^r)\) , i.e.,

\begin{equation*} \mathit {guard}((\!| x_i|\!) ^r) \rightarrow \bigvee _{(\!| x_j|\!) ^w\in \pi ((\!| x_i|\!) ^r)} rf_{j,i}^x . \end{equation*}

In the following, we use \(\rho _{{\it rf-val}}\) , \(\rho _{{\it rf-ord}}\) , and \(\rho _{{\it rf-some}}\) to represent the conjunctions of RF-Val, RF-Ord, and RF-Some constraints over all RF variables, respectively. Considering the read event \((\!| x_2|\!) ^r\) in the example program, we have:

\begin{equation*} \begin{split} \rho _{{\it rf-val}}&:= (rf_{1,2}^x \rightarrow (x_2 = x_1)) \wedge (rf_{5,2}^x \rightarrow (x_2 = x_5)), \\ \rho _{{\it rf-ord}}&:= (rf_{1,2}^x \rightarrow (\!| x_1|\!) ^w \prec _{{\it rf}}(\!| x_2|\!) ^r) \wedge (rf_{5,2}^x \rightarrow (\!| x_5|\!) ^w \prec _{{\it rf}}(\!| x_2|\!) ^r), \\ \rho _{{\it rf-some}}&:= rf_{1,2} \vee rf_{5,2} . \end{split} \end{equation*}

Write-serialization Variables and Constraints.

For each variable x, let \(\gamma (x)\) be the set of write accesses to x. We need to determine a total order among all enabled write accesses in \(\gamma (x)\) . To this end, for each pair of write accesses \((\!| x_i|\!) ^w\) , \((\!| x_j|\!) ^w\) in \(\gamma (x)\) , we define a Boolean variable \(ws_{i,j}^x\) , called a write-serialization(WS) variable, to represent whether \((\!| x_i|\!) ^w\) happens before \((\!| x_j|\!) ^w\) .

For each WS variable, we have the following constraints:

•

WS-Cond constraint: if \(ws_{i,j}^x\) is \(\mathit {true}\) , then both \((\!| x_i|\!) ^w\) and \((\!| x_j|\!) ^w\) are enabled, i.e.,

\begin{equation*} ws_{i,j}^x \rightarrow \mathit {guard}((\!| x_i|\!) ^w) \wedge \mathit {guard}((\!| x_j|\!) ^w); \end{equation*}

•

WS-Ord constraint: if \(ws_{i,j}^x\) is \(\mathit {true}\) , then the write event \((\!| x_i|\!) ^w\) must happen before \((\!| x_j|\!) ^w\) , i.e.,

\begin{equation*} ws_{i,j}^x \rightarrow (\!| x_i|\!) ^w \prec _{{\it ws}}(\!| x_j|\!) ^w ; \end{equation*}

•

WS-Some constraint: if \((\!| x_i|\!) ^w\) and \((\!| x_j|\!) ^w\) are enabled, then one must happen before the other, i.e.,

\begin{equation*} \mathit {guard}((\!| x_i|\!) ^w) \wedge \mathit {guard}((\!| x_j|\!) ^w) \rightarrow ws_{i,j}^x \vee ws_{j,i}^x . \end{equation*}

In the following, we use \(\rho _{{\it ws-cond}}\) , \(\rho _{{\it ws-ord}}\) , and \(\rho _{{\it ws-some}}\) to represent the conjunctions of WS-Cond, WS-Ord, and WS-Some constraints over all WS variables, respectively. Considering two write events \((\!| m_1|\!) ^w\) and \((\!| m_3|\!) ^w\) in the example program, we have:

\begin{equation*} \begin{split} \rho _{{\it ws-cond}}&:= (ws_{1,3}^m \rightarrow (x_2 = 1)) \wedge (ws_{3,1}^m \rightarrow (x_2 = 1)), \\ \rho _{{\it ws-ord}}&:= (ws_{1,3}^m \rightarrow (\!| m_1|\!) ^w \prec _{{\it ws}}(\!| m_3|\!) ^w) \wedge (ws_{3,1}^m \rightarrow (\!| m_3|\!) ^w \prec _{{\it ws}}(\!| m_1|\!) ^w), \\ \rho _{{\it ws-some}}&:= (x_2 = 1) \rightarrow ws_{1,3}^m \vee ws_{3,1}^m . \end{split} \end{equation*}

From-read Constraints.

Considering one read access \((\!| x_i|\!) ^r\) and two write accesses \((\!| x_j|\!) ^w\) , \((\!| x_k|\!) ^w\) to the same variable x, if \((\!| x_j|\!) ^w\) happens before \((\!| x_k|\!) ^w\) and \((\!| x_i|\!) ^r\) reads from \((\!| x_j|\!) ^w\) , then \((\!| x_i|\!) ^r\) must also happen before \((\!| x_k|\!) ^w\) ; otherwise, \((\!| x_k|\!) ^w\) is closer than \((\!| x_j|\!) ^w\) to \((\!| x_i|\!) ^r\) , and \((\!| x_i|\!) ^r\) should read from \((\!| x_k|\!) ^w\) instead of \((\!| x_j|\!) ^w\) . Formally, this rule can be formulated as the following from-read(FR) constraint:

\begin{equation*} rf_{j,i}^x \wedge ws_{j,k}^x \rightarrow (\!| x_i|\!) ^r \prec _{{\it fr}}(\!| x_k|\!) ^w. \end{equation*}

Let \(\rho _{{\it fr}}\) denote the conjunction of all FR constraints.

Most existing techniques [9, 51, 56] for concurrent program verification include all from-read constraints in their encoding formulas. This is a safe choice to ensure the SMT encoding’s correctness. However, it is not practical. For each FR constraint \(rf_{j,i}^x \wedge ws_{j,k}^x \rightarrow (\!| x_i|\!) ^r \prec _{{\it fr}}(\!| x_k|\!) ^w\) , only when both \(rf_{j,i}^x\) and \(ws_{j,k}^x\) are evaluated \(\mathit {true}\) do we need to consider \((\!| x_i|\!) ^r\prec _{{\it fr}}(\!| x_k|\!) ^w\) . At the beginning of SMT solving, all RF and WS variables are unassigned—no FR orders need to be considered. For most of the time, only a small portion of these FR constraints take effect. Maintaining such a large set of (unnecessary for most of the time) constraints is expensive for the SMT solver. As a result, the efficiency of SMT solving degenerates (see section 6.3 for more details).

The Whole Encoding Formula.

The whole encoding formula for a program is:

\begin{equation} \Psi := \Phi _{\it ssa}\wedge \Phi _{\it ord}, \end{equation}

(1)

where

\begin{equation} \Phi _{\it ssa}= \rho _{{\it va}}\wedge \rho _{{\it err}}\wedge \rho _{{\it rf-val}}\wedge \rho _{{\it rf-some}}\wedge \rho _{{\it ws-cond}}\wedge \rho _{{\it ws-some}} \end{equation}

(2)

represents the data and control flow of the program, and

\begin{equation} \Phi _{\it ord}= \rho _{{\it ppo}}\wedge \rho _{\approx }\wedge \rho _{{\it ws-ord}}\wedge \rho _{{\it rf-ord}} \end{equation}

(3)

represents the ordering constraints⁴ of the program.

Note that \(\rho _{{\it fr}}\) is excluded from our encoding formula. Instead of adding all FR constraints into the SMT formula, we prefer adding them during SMT solving in an “online” schema—an FR order is derived and activated only when the corresponding RF and WS variables are evaluated \(\mathit {true}\) .

Let \(X_{\it ssa}\) , \(X_{{\it rf}}\) , and \(X_{{\it ws}}\) be the sets of SSA, RF, and WS variables, respectively. RF and WS variables are also called ordering variables. The formula \(\Phi _{\it ssa}\) is over \(X_{\it ssa}\ \cup \ X_{{\it rf}}\ \cup \ X_{{\it ws}}\) , and \(\Phi _{\it ord}\) is over \(X_{{\it rf}}\ \cup \ X_{{\it ws}}\) . Actually, \(\Phi _{\it ord}\) is a “pure” formula that contains only ordering variables and ordering literals; \(\Phi _{\it ssa}\) is a formula that does not include any ordering literal. To decide the satisfiability of \(\Phi _{\it ssa}\) , we can use any existing solver that supports a sufficiently rich fragment of first-order logic. To decide the satisfiability of \(\Phi _{\it ord}\) , we intend to develop a dedicated theory solver.

3.2 Constraints Generation

In this section, we detail the algorithms for generating PPO, atomicity, RF, and WS constraints under various memory models, respectively.

3.2.1 Preserved Program Order and Atomicity Constraints.

Algorithm 1 shows the pseudo code for generating the preserved program order \(\rho _{{\it ppo}}\) and atomicity constraints \(\rho _{\approx }\) . The algorithm starts to process each thread \(P_i\) from line 2. Let \(\mathbb {E}_i\) be the list of events in thread \(P_i\) , sorted by their occurrences in \(P_i\) (line 3). The key challenge here is to avoid encoding transitive closure to obtain a minimal number of constraints. More specifically, an order \((e_i, e_j)\) is added to \(\prec _{{\it ppo}}\) only if this order cannot be implied by the transitive closure of the current \(\prec _{{\it ppo}}\) , i.e., there does not exist an event \(e_k\) such that both \((e_i, e_k)\) and \((e_k, e_j)\) are in the current \(\prec _{{\it ppo}}\) . To this end, we define a set \({\it succ\_set}\) (initially \(\lbrace e\rbrace\) ) for each event e in \(\mathbb {E}_i\) (line 4) to keep all successor events of e in the current \(\prec _{{\it ppo}}\) .

Algorithm 1 traverses all events in \(\mathbb {E}_i\) in reverse order (line 5); and from line 6, the inner loop traverses all subsequent events \(e^{\prime }\) of e in the list \(\mathbb {E}_i\) from near to far. The order \((e, e^{\prime })\) must be in the transitive closure of \(\prec _{{\it po}}\) . If \(e\prec _{{\it po}}e^{\prime }\) is preserved under mm (line 7), then we need go further to check whether \(e^{\prime }\) is already in the \({\it succ\_set}\) of e (line 8)—if it is, then the order \((e, e^{\prime })\) is implied by the current \(\rho _{{\it ppo}}\) , i.e., there exists another event \(e^{\prime \prime }\) such that both \(e \prec _{{\it ppo}}e^{\prime \prime }\) and \(e^{\prime \prime } \prec _{{\it ppo}}e^{\prime }\) are in the current \(\rho _{{\it ppo}}\) (note that the \({\it succ\_set}\) of \(e^{\prime \prime }\) should be updated before the outer loop traverses e); otherwise, \(e\prec _{{\it ppo}}e^{\prime }\) is explicitly added to \(\rho _{{\it ppo}}\) (line 9), and the \({\it succ\_set}\) of e should be extended with \({\it succ\_set}\) of \(e^{\prime }\) (line 10). Moreover, if e and \(e^{\prime }\) belong to the same atomic operation, then \(e\approx e^{\prime }\) is added to \(\rho _{\approx }\) (line 12). Finally, each child thread is spawned by thread_create and stopped by thread_join. All events of this child thread must happen after the call of thread_create and before the call of thread_join; we use Interthread_Order_Generation (line 13) to encode such ordering constraints between the main and child threads.

Algorithm 2 determines whether the program order \(e\prec _{{\it po}}e^{\prime }\) is preserved under the memory model mm. If \(\mathit {addr}(e) = \mathit {addr}(e^{\prime })\) , i.e., they access the same memory address; or \(e\approx e^{\prime }\) , i.e., they belong to the same atomic operation; or \((e, e^{\prime })\) is preserved by fences [9],⁵ then this program order must be preserved. There are only two situations that the program order \(e\prec _{{\it po}}e^{\prime }\) are relaxed: (1) \(type(e) = \mathtt {W}\) and \(tyep(e^{\prime }) = \mathtt {R}\) (i.e., write-to-read program order) under TSO (lines 2 and 3), and (2) \(type(e) = \mathtt {W}\) (i.e., write-to-read/write program order) under PSO (lines 4 and 5).

The key challenge of Algorithm 1 is to avoid redundant transitive closure of \(\prec _{{\it ppo}}\) to obtain a small number of constraints. Let \(\rho _{{\it ppo}}\) be the output of Algorithm 1. We first prove that \(\rho _{{\it ppo}}\) is sufficient for representing all orders in \(\prec _{{\it ppo}}\) .

Lemma 2.

\(\forall (e, e^{\prime })\in \ \prec _{{\it ppo}}\) , it can be deduced from \(\rho _{{\it ppo}}\) .

Proof.

By \(e\prec _{{\it ppo}}e^{\prime }\) , the order \((e, e^{\prime })\) is preserved under the memory model. If \(e^{\prime }\notin e.{\it succ\_set}\) , by line 9 of Algorithm 1, the order \((e,e^{\prime })\) is explicitly encoded in \(\rho _{{\it ppo}}\) , the lemma holds for this case. If \(e^{\prime }\in e.{\it succ\_set}\) , then we prove that \(e \prec _{{\it ppo}}e^{\prime }\) can be deduced from \(\rho _{{\it ppo}}\) by induction:

•

Suppose there is only one pair of events \((e_1, e_2)\in \ \prec _{{\it ppo}}\) , \(e_1 \prec _{{\it ppo}}e_2\) must be in \(\rho _{{\it ppo}}\) (base case).

•

Suppose there is a set of events \(e_1, e_2, \ldots , e_n\) ( \(n \gt 2\) ) following their occurrence in \(P_i\) and \(\forall e_i, i\in {2, \ldots , n}. (e_1, e_i)\in \ \prec _{{\it ppo}}\) and \(e_i\in e_1.{\it succ\_set}\) , it can be deduced from \(\rho _{{\it ppo}}\) (induction hypothesis).

•

Then, for PO-successor of \(e_n\) , named \(e_{n+1}\) , s.t. \((e_1, e_{n+1})\in \ \prec _{{\it ppo}}\) and \(e_{n+1}\in e_1.{\it succ\_set}\) , we show that \((e_1, e_{n+1})\in \ \prec _{{\it ppo}}\) can be deduced from \(\rho _{{\it ppo}}\) . If \(e_1 \prec _{{\it ppo}}e_{n+1}\) is in \(\rho _{{\it ppo}}\) , then the inductive step trivially holds. Otherwise, since \(\bigcup _{i=2}^n \ e_i.{\it succ\_set}\subseteq e_1.{\it succ\_set}\) , let j be the maximal i s.t. \(e_{n+1} \in e_j.{\it succ\_set}\) , we have the PPO constraint \(e_j \prec _{{\it ppo}}e_{n+1}\) in \(\rho _{{\it ppo}}\) . Meanwhile, by induction hypothesis, there is a set of PPO constraints in \(\rho _{{\it ppo}}\) (abbreviated as S), which implies \((e_1, e_j)\in \ \prec _{{\it ppo}}\) using transitivity. Therefore, \(S\ \cup \ \lbrace e_j\prec _{{\it ppo}}e_{n+1}\rbrace\) deduces \((e_1, e_{n+1})\in \ \prec _{{\it ppo}}\) (inductive step).

Therefore, the above lemma holds.□

Next, we prove that \(\rho _{{\it ppo}}\) is minimal for representing \(\prec _{{\it ppo}}\) . For simplicity, we write \(c_{e,e^{\prime }}\) for an ordering constraint between e and \(e^{\prime }\) , \(\rho _{{\it ppo}}\) can be regarded as a set of such constraints.

Lemma 3.

\(\forall c_{e,e^{\prime }} \in \rho _{{\it ppo}}\) , \(c_{e,e^{\prime }}\) cannot be deduced from \(\rho _{{\it ppo}}\backslash \lbrace c_{e, e^{\prime }}\rbrace\) .

Proof.

We prove this lemma by contradiction. Assume \(e \prec _{{\it ppo}}e^{\prime }\) can be derived from \(\rho _{{\it ppo}}\backslash \lbrace c_{e,e^{\prime }}\rbrace\) , and let \(c_{e_1, e_2}\) , \(c_{e_2, e_3}, \ldots , c_{e_{n-1}, e_n}\) where \(e_1=e\) , \(e_n=e^{\prime }\) , is the set of constraints in \(\rho _{{\it ppo}}\backslash \lbrace c_{e,e^{\prime }}\rbrace\) that form the derive path \(e_1 \prec _{{\it ppo}}e_2 ... \prec _{{\it ppo}}e_n\) . By line 10 of Algorithm 1, \(e_{i+1}.{\it succ\_set}\) is integrated into \(e_i.{\it succ\_set}\) for \(i \in {1,2,\ldots ,n-1}\) . Therefore, \(e_n\) is in \(e_1.{\it succ\_set}\) and Algorithm 1 skips to encode \(c_{e,e^{\prime }}\) , which contradicts to the premise of \(c_{e,e^{\prime }} \in \rho _{{\it ppo}}\) .□

Let \(\rho _{\approx }\) be another output of Algorithm 1. It can be concluded from Algorithm 1 that atomicity constraints are generated for each pair of PO-adjacent events in the same atomic operation. The following lemma shows that \(\rho _{\approx }\) is sufficient and minimal for representing all atomicity orders in \(\approx\) . The proof is trivial and is omitted.

Lemma 4.

\(\forall (e,e^{\prime })\in \approx\) , it can be deduced from \(\rho _{\approx }\) ; \(\forall c_{e,e^{\prime }}\in \rho _{\approx }\) , it cannot be deduced from \(\rho _{\approx }\setminus \lbrace c_{e,e^{\prime }}\rbrace\) .

3.2.2 Read-from and Write-serialization Constraints.

For each shared variable x, we first obtain the set \(\mathit {reads}\) (or \(\mathit {writes}\) ) of all read (or write) accesses to x. Basically, a read r in \(\mathit {reads}\) may get its value from any write w in \(\mathit {writes}\) , except when \(r \prec _{{\it po}}w\) (since r cannot read from a later issued write). We then generate read-from constraints for all such pairs of read and write events.

Moreover, for each subset \(\lbrace w_1, w_2\rbrace \subseteq \mathit {writes}\) , either \(w_1\prec _{{\it ws}}w_2\) , or \(w_2 \prec _{{\it ws}}w_1\) . Especially, if \(w_1\) , \(w_2\) belong to the same thread and \(w_1 \prec _{{\it po}}w_2\) , we use a Boolean variable ws to imply \(w_1 \prec _{{\it ws}}w_2\) ; meanwhile, \(w_2 \prec _{{\it ws}}w_1\) can never hold (otherwise, there forms a cycle \(w_1 \prec _{{\it po}}w_2 \prec _{{\it ws}}w_1\) ). If \(w_1, w_2\) are from different threads, then we use two Boolean variables ws and \(ws^{\prime }\) for representing these two cases, respectively. We generate write-serialization constraints with these WS variables.

3.3 Comparison to Other Approaches

Our encoding is built on References [9, 10, 51, 56, 61]. Compared to their encoding formulas, the most significant difference is that our encoding does not include FR constraints, which has already been discussed in the preceding section (also see section 6.3 for experimental results).

Second, the way we model ordering constraints is also different. The existing techniques (e.g., References [9, 28, 51]) use integer-valued clocks to model the time of occurrence for each event and use differences between clock values to model the order among events. Moreover, they use clock equalities to represent the atomicity of events in the same atomic operation. Then, they can rely on the integer difference logic to solve ordering constraints. However, note here, we do not need to compute the exact occurrence time of each event, but only their order. We thus intend to develop a dedicated solver for ordering consistency theory.

Finally, compared to the encoding in Reference [9], the generation of each RF and WS constraint is slightly different, i.e, in our encoding, if an RF or a WS variable is assigned \(\mathit {true}\) , the two related events must both be enabled, while the encoding in Reference [9] has no such requirement. As a result, the derivation of FR orders with our encoding need not consider guard conditions anymore. This change is quite important, since the guard conditions often involve arithmetic computation and data structures, which can hardly be handled by a dedicated theory solver for order constraints.

3.4 Correctness of the Encoding

Given a concurrent program P and a memory model mm, the symbolic encoding procedure outputs a formula \(\Psi := \Phi _{\it ssa}\wedge \Phi _{\it ord}\) . Even though our encoding is slightly different from that in Reference [9], we can prove that \(\Psi \wedge \rho _{{\it fr}}\) is equi-satisfiable with the encoding in Reference [9]; we thus have the following theorem:

Theorem 1.

The formula \(\Psi \wedge \rho _{{\it fr}}\) is satisfiable iff there is a valid counterexample in the program.

4 Ordering Consistency Theory

This section presents our ordering consistency theory. We first introduce its definition, then discuss a data structure that is useful for its reasoning.

4.1 Theory Definition

The theory of ordering consistency \(\mathcal {T}_{{\it ord}}\) has the signature

\begin{equation*} \Sigma _{\it ord}: \lbrace e_1, e_2, \ldots , \prec _{{\it ppo}}, \prec _{{\it ws}}, \prec _{{\it rf}}, \prec _{{\it fr}}, \approx \rbrace , \end{equation*}

where

•

\(e_1, e_2, \ldots\) are constants, intended to represent the access events in \(\mathbb {E}\) ,

•

\(\prec _{{\it ppo}}, \prec _{{\it ws}}, \prec _{{\it rf}}, \prec _{{\it fr}}\) are binary predicates, intended to represent different orders among access events, and

•

\(\approx\) is a binary predicate, intended to represent the atomicity relation of the program.

A \(\Sigma _{\it ord}\) -atom is either a Boolean variable or a predicate \(e_1\circ e_2\) , where \(\circ \in \lbrace \prec _{{\it ppo}}, \prec _{{\it rf}}, \prec _{{\it ws}}\) , \(\prec _{{\it fr}}\) , \(\approx \rbrace\) . A \(\Sigma _{\it ord}\) -formula is constructed from \(\Sigma _{\it ord}\) -atoms using Boolean connectives. Recall the ordering constraints \(\rho _{{\it ppo}}\) , \(\rho _{{\it rf-ord}}\) , \(\rho _{{\it ws-ord}}\) , and \(\rho _{\approx }\) (in section 3). They are all Boolean combinations of \(\Sigma _{\it ord}\) -atoms, and thus are \(\Sigma _{\it ord}\) -formulas; the formula \(\Phi _{\it ord}= \rho _{{\it ppo}}\wedge \rho _{{\it rf-ord}}\wedge \rho _{{\it ws-ord}}\wedge \rho _{\approx }\) is also a \(\Sigma _{\it ord}\) -formula.

Each predicate symbol in \(\Sigma _{\it ord}\) defines a binary relation over \(\mathbb {E}\) . We use the same symbol for a predicate and the binary relation it defines. Now, we discuss the axioms of \(\mathcal {T}_{{\it ord}}\) .

Axiom 1 (Partial Order).

Predicates \(\prec _{{\it ppo}},\prec _{{\it ws}},\prec _{{\it rf}},\prec _{{\it fr}}\) in \(\Sigma _{\it ord}\) represent partial orders, and

•

\(\prec _{{\it ws}}\) , \(\prec _{{\it rf}}\) , \(\prec _{{\it fr}}\) are over accesses to the same memory address;

•

\(\forall e_1, e_2. \,\, e_1\prec _{{\it ws}}e_2 \rightarrow \mathit {type}(e_1) = \mathit {type}(e_2) = \mathtt {W}\) ;

•

\(\forall e_1, e_2. \,\, e_1\prec _{{\it rf}}e_2 \rightarrow \mathit {type}(e_1) = \mathtt {W}\wedge \mathit {type}(e_2) = \mathtt {R}\) ;

•

\(\forall e_1, e_2. \,\, e_1\prec _{{\it fr}}e_2 \rightarrow \mathit {type}(e_1) = \mathtt {R}\wedge \mathit {type}(e_2) = \mathtt {W}\) .

Axiom 2 (Equivalence Relation).

The predicate \(\approx\) represents an equivalence relation.

Axiom 3 (.

FR Derivation) For any two write events \(e_1, e_2\in \mathbb {E}\) and a read event \(e_3\in \mathbb {E}\) with \(addr(e_1)=addr(e_2)=addr(e_3)\) , \(\mathit {type}(e_1) = \mathit {type}(e_2) = \mathtt {W}\) , and \(\mathit {type}(e_3) = \mathtt {R}\) , we have:

\begin{equation*} e_1 \prec _{{\it rf}}e_3 \wedge e_1 \prec _{{\it ws}}e_2 \Rightarrow e_3 \prec _{{\it fr}}e_2. \end{equation*}

Axiom 4 (Consistency).

The union \(\prec _{{\it ws}}\cup \prec _{{\it rf}}\cup \prec _{{\it fr}}\) needs to be consistent with \(\prec _{{\it ppo}}\) and \(\approx\) .

The above axioms define the intended semantics of \(\prec _{{\it ppo}}\) , \(\prec _{{\it ws}}\) , \(\prec _{{\it rf}}\) , \(\prec _{{\it fr}}\) , \(\approx\) , as we understand them in the preceding sections. Note that axiom 4 should hold after any number of applications of axiom 3, i.e., after deriving any number of \(\prec _{{\it fr}}\) orders.

4.2 Event Graph

Let \(\alpha : X_{{\it rf}}\cup X_{{\it ws}}\rightarrow \lbrace \mathit {true}, \mathit {false}, \mathit {unassigned}\rbrace\) be the current assignment to ordering variables, and \(\prec _{\alpha }\) the set of induced RF, WS, and FR orders by \(\alpha\) . Let \(\prec\) be the union \(\prec _{{\it ppo}}\cup \approx \cup \prec _{\alpha }\) . The event set \(\mathbb {E}\) and the order set \(\prec\) can be represented as a graph, called an event graph, where events are represented as nodes and orders as edges.⁶

At the beginning of SMT solving, all ordering variables are unassigned; no RF or WS edges are drawn on the event graph. Since FR orders are derived from RF and WS orders, there are no FR edges, either. Therefore, only PPO and atomicity edges present in the graph at that moment. The event set \(\mathbb {E}\) , the preserved program order PPO, and the atomicity constraints make up the skeleton of the event graph. Later, along with variable assignments, more edges are added to the graph.

According to the axioms of \(\mathcal {T}_{{\it ord}}\) , on each edge addition, we need to check whether this new edge leads to a cycle. If this is the case, then we say the current variable assignment is invalid—we then need to analyze the event graph to find the inconsistency reason. Otherwise, if there is no cycle, then we go ahead to apply axiom 3 to derive FR edges. Note that if any FR edge is derived, then we need to check the consistency of the current variable assignment again.

We associate each edge with a Boolean expression, called derivation reason (abbreviated as reason), to indicate what this edge is derived from.

•

The reason for a PPO or an atomicity edge is \(\mathit {true}\) , for this edge always presents in the graph.

•

The reason for an RF or a WS edge is the corresponding ordering variable, for this edge is directly derived from this variable.

•

The reason for an FR edge is the conjunction of reasons for the RF edge and the WS edge that derive this FR edge.

The concept of derivation reason can be lifted to a path. Let \(e_1 \prec e_2 \prec \cdots \prec e_n\) be a path on the event graph; the reason for this path is the conjunction of reasons for each edge it passes, i.e.,

\begin{equation*} reason(e_1\prec e_2\prec \cdots \prec e_n) = \bigwedge _{i = 1}^{n - 1} reason(e_i \prec e_{i + 1}). \end{equation*}

Figure 3 shows several event graphs that may occur during SMT solving of the program in fig. 2. To differentiate event types, we use grey and white nodes to represent write and read events, respectively. Preserved program orders are drawn as solid lines, while others are drawn as dashed lines. In the beginning, the event graph contains only PPO edges, as shown in Figure 3(a). After some assignments, more edges are added to the graph; Figure 3(b) shows an updated event graph of Figure 3(a) during SMT solving, in which \((\!| n_1|\!) ^w \prec _{{\it rf}}(\!| n_2|\!) ^r\) and \((\!| n_1|\!) ^w \prec _{{\it ws}}(\!| n_4|\!) ^w\) derive \((\!| n_2|\!) ^r \prec _{{\it fr}}(\!| n_4|\!) ^w\) . Finally, the red dashed edges in Figure 3(c) form a cycle, indicating a \(\mathcal {T}_{{\it ord}}\) -inconsistent execution.

Fig. 3.

An event graph may be different under weak memory models. Figure 3(a) shows an event graph under SC; Figures 3(d) and 3(e) show event graphs of the same program under TSO and PSO, respectively. Edges of these three graphs are quite different—some edges are deleted and some are added. However, in comparison with their transitive closures (the PPO relation is transitive), the graphs of TSO and PSO always contain less program orders. For example, considering the edge \((\!| x_2|\!) ^r \prec _{{\it ppo}}(\!| x_3|\!) ^r\) added in both Figures 3(d) and 3(e), the corresponding program order is also contained in the transitive closure of Figure 3(a). In contrast, for edge \((\!| m_3|\!) ^w\prec _{{\it ppo}}(\!| x_3|\!) ^r\) , which is deleted in Figures 3(d) and 3(e), the corresponding program order is also relaxed.

Similar structures to our event graph were defined in References [9, 51, 61]. Note that the events discussed in this article can hold symbolic values, and thus our event graph is actually a “symbolic” event graph. In Reference [9], a so-called symbolic event structure is defined, which, however, is used to depict program order only. Moreover, the event order graph defined in Reference [61] represents a counterexample instead of a program. In Reference [51], an interference skeleton is defined, which equals the skeleton of our event graph.

5 THEORY SOLVER FOR 𝒯_ord

In this section, we present \(\mathcal {T}_{{\it ord}}\) -solver with emphasis on algorithms for consistency checking, conflict clause generation, and theory propagation.

5.1 Overview

Figure 4 shows an overview of \(\mathcal {T}_{{\it ord}}\) -solver. Each time an ordering variable is assigned in the SAT solver, \(\mathcal {T}_{{\it ord}}\) -solver performs consistency checking (see section 5.2) to detect if a cycle exists after the corresponding edge addition to the event graph.

Fig. 4.

If the current assignment is \(\mathcal {T}_{{\it ord}}\) -consistent, then \(\mathcal {T}_{{\it ord}}\) -solver proceeds to: (1) determine values of unassigned literals by using axioms of \(\mathcal {T}_{{\it ord}}\) (called unit-edge propagation; see section 5.5.1), and (2) deduce all possible FR edges with respect to the assignment (called from-read propagation; see section 5.5.2). If any unassigned literal is assigned, then this assignment should be returned to the SAT solver; if any FR edge is deduced, then the consistency checking needs to be invoked again.

If the current assignment is \(\mathcal {T}_{{\it ord}}\) -inconsistent, then \(\mathcal {T}_{{\it ord}}\) -solver computes conflict clauses (called conflict clause generation; see section 5.3) to record the inconsistency reason, returns it to DPLL(T) to prevent the solver from going down the same path in the future.

5.2 Consistency Checking

Each time an ordering variable is assigned \(\mathit {true}\) , \(\mathcal {T}_{{\it ord}}\) -solver inserts the corresponding edge into the event graph and performs consistency checking. Consistency checking can be reduced to cycle detection on the event graph.

Let \(\prec _{{\it ppo}}\) be the preserved program order of the program, \(\approx\) the atomicity relation of the program, and \(\prec _{\alpha }\) the set of induced RF, WS, and FR orders by the current assignment \(\alpha\) . The basic routine for consistency checking is as follows:

•

If \(\prec _{{\it ppo}}\cup \prec _{\alpha }\cup \approx\) has no cycles, then so does its subset \((\prec _{{\it ppo}}\cup \prec _{\alpha })\cap \approx\) . Then, both conditions of lemma 1 are satisfied. By lemma 1, \(\prec _{\alpha }\) is consistent with \(\prec _{{\it ppo}}\) and \(\approx\) .

•

If \(\prec _{{\it ppo}}\cup \prec _{\alpha }\cup \approx\) has a cycle that is not contained in \(\approx\) , then this cycle must involve at least two equivalence classes of \(\approx\) , indicating that condition (2) of lemma 1 is violated. By lemma 1, \(\prec _{\alpha }\) is not consistent with \(\prec _{{\it ppo}}\) and \(\approx\) .

•

Otherwise, all cycles of \(\prec _{{\it ppo}}\cup \prec _{\alpha }\cup \approx\) are contained in \(\approx\) , i.e., condition (2) of lemma 1 is satisfied. We continue to check:

—

If \((\prec _{{\it ppo}}\cup \prec _{\alpha })\cap \approx\) has cycles, then condition (1) of lemma 1 is violated. By lemma 1, \(\prec _{\alpha }\) is not consistent with \(\prec _{{\it ppo}}\) and \(\approx\) .

—

Otherwise, by lemma 1, \(\prec _{\alpha }\) is consistent with \(\prec _{{\it ppo}}\) and \(\approx\) .

The following theorem ensures the correctness of consistency checking:

Theorem 2.

The above consistency checking procedure finds no cycle if \(\prec _{\alpha }\) is consistent with \(\prec _{{\it ppo}}\) and \(\approx\) ; otherwise, it can find cycle (s).

The efficiency of consistency checking is critical for the overall performance, because the model checker performs many consistency checks. First, DPLL(T) must perform many assignments to the ordering variables to reason about the complicated thread interactions, each of which leads to a consistency check. Second, from-read propagation generates further checks whenever it inserts edges into the event graph.

Previous works [9, 28] perform a fresh cycle detection on each consistency check, which is inefficient. We found it better to perform cycle detection incrementally for two reasons. First, the event graph must be acyclic before an edge addition—otherwise, it must have been recognized in the previous consistency checking. Therefore, we can reuse the topological order of the previous acyclic graph and try to build a new acyclic graph incrementally. Second, incremental cycle detection has been shown to be efficient for sparse graphs [12], and the event graph is typically sparse: \(\prec _{{\it ppo}}\) only relates events within the same thread (as well as thread creating/joining), while \(\prec _{\alpha }\) only relates events that access the same shared variable.

5.2.1 Incremental Cycle Detection.

We employ an incremental cycle detection (ICD) algorithm [7, 12] to check \(\mathcal {T}_{{\it ord}}\) -consistency. ICD algorithms are based on the topological order in directed graphs (including the event graph). A topological order exists in a directed graph iff the graph is acyclic: Each node in the graph is labeled with an integer-valued level such that for any edge, say, from node \(e_i\) to node \(e_j\) , the level of \(e_i\) (written \(lv(e_i)\) ) is smaller than that of \(e_j\) (written \(lv(e_j)\) ). Once a new edge is inserted into the event graph, the algorithm reuses the previous topological order and attempts to compute a new topological order incrementally. If a new topological order is computed, then the graph is acyclic; therefore, the current assignment is \(\mathcal {T}_{{\it ord}}\) -consistent. Otherwise, the algorithm finds a cycle; therefore, a \(\mathcal {T}_{{\it ord}}\) -inconsistency is reported.

We employ an ICD algorithm for sparse graphs [12]. This algorithm relaxes the topological order into a pseudo-topological order such that for any edge from node \(e_i\) to node \(e_j\) , \(lv(e_i) \le lv(e_j)\) . The pseudo-topological order can handle atomicity by assigning the same integer-valued level to all events in an atomic block. For each node e, except of its pseudo-topological level \(lv(e)\) , the ICD algorithm also keeps \(in(e)\) and \(out(e)\) , where \(in(e)\) stores e’s incoming edges whose start nodes have the same level as e, and \(out(e)\) stores e’s outgoing edges. During the process of inserting an edge into the graph, the pseudo-topological levels, outgoing edges sets, and incoming edges sets are incrementally updated, and these processes are described in detail in the following text.

We show the basic routine of the employed ICD algorithm below. When adding an edge, say, \(e_i \prec e_j\) , into the event graph, we first check whether \(lv(e_i) \lt lv(e_j)\) . If so, then \(e_j\) cannot reach \(e_i\) ; thus, no cycle exists, and the previous pseudo-topological order is valid; therefore, \(\mathcal {T}_{{\it ord}}\) -solver can safely insert this edge and update \(out(e_i)\) . Otherwise, \(lv(e_i) \ge lv(e_j)\) , we need to check whether there is a path from \(e_j\) to \(e_i\) . If it does exist, this path, together with \(e_i \prec e_j\) , forms a cycle, indicating that the current edge insertion causes \(\mathcal {T}_{{\it ord}}\) -inconsistency. Otherwise, \(e_j\) cannot reach \(e_i\) (i.e., no cycle exists), the ICD algorithm reports the current edge insertion being \(\mathcal {T}_{{\it ord}}\) -consistent. Meanwhile, the previous pseudo-topological order and the related in/out sets are updated in the search process.

The standard approach to finding a path from \(e_j\) to \(e_i\) is to search forward exhaustively from the outgoing edges of \(e_j\) , e.g., Tarjan’s strongly connected component(SCC) algorithm [53]. However, this approach totally costs \(\mathcal {O}(m^2)\) time to construct a graph with m edges. In contrast, the employed ICD algorithm [12] in \(\mathcal {T}_{{\it ord}}\) -solver performs cycle detection based on an elaborate two-way search, i.e., backward search and forward search. Bender et al. [12] prove that by setting a threshold \(\Delta = min\lbrace n^{\frac{2}{3}}, m^{\frac{1}{2}}\rbrace\) to bound the maximum number of steps in the backward search, the employed ICD algorithm can achieve \(\mathcal {O}(m\ \times \ min\lbrace n^{\frac{2}{3}}, m^{\frac{1}{2}}\rbrace)\) time complexity in constructing a graph with n nodes and m edges. Since thread interleaving may cause numerous edge insertions, the employed ICD algorithm can achieve higher efficiency than Tarjan’s SCC algorithm. In section 6.3, we report the comparison results between the ICD algorithm and Tarjan’s SCC algorithm.

We detail the two-way search below. When adding an edge, say, \(e_i \prec e_j\) , into the event graph and \(lv(e_i) \ge lv(e_j)\) , before the forward search for a path from \(e_j\) to \(e_i\) , a backward search starts from \(e_i\) first. Recall that \(in(e_i)\) only stores \(e_i\) ’s incoming edges \(e\prec e_i\) with \(lv(e) = lv(e_i)\) . Following the in sets, we search for nodes that can reach \(e_i\) and have the same level of \(e_i\) backwardly. We use a set \(\mathbb {B}\) to record the visited nodes during the backward search. The backward search ends in any of the following situations:

•

If \(e_j\) is visited, then the algorithm reports a cycle and returns.

•

If the backward search completes before visiting \(\Delta\) edges, then the ICD algorithm runs out of the edges that should be traversed in the backward search and fails to find a path from \(e_j\) to \(e_i\) . Since \(lv(e_i) \ge lv(e_j)\) , there are two situations:

—

If \(lv(e_i) = lv(e_j)\) , then no cycle exists. We prove this by contradiction. Assume there is a path from \(e_j\) to \(e_i\) . Since \(lv(e_i) = lv(e_j)\) , all nodes along the path should have the same level, and the backward search should be able to find \(e_j\) . But the backward search completes without finding \(e_j\) , which is a contradiction. Therefore, there is no path from \(e_j\) to \(e_i\) . The algorithm thus reports acyclic and returns.

—

If \(lv(e_i) \gt lv(e_j)\) , then for each visited \(e_b \in \mathbb {B}\) , \(lv(e_b) = lv(e_i) \gt lv(e_j)\) . In this case, the algorithm does not know if there is a path from \(e_j\) to \(e_b\) —so we set \(lv(e_j) := lv(e_i)\) and attempt to compute a new pseudo-topological order by invoking a forward search from \(e_j\) .

•

If \(\Delta\) edges are visited, we stop backward search, set \(lv(e_j) := lv(e_i) + 1\) , and then attempt to compute a new pseudo-topological order by invoking a forward search from \(e_j\) .⁷

The forward search explores nodes reachable from \(e_j\) . Among outgoing edges in \(out(e_j)\) , we consider only edges \(e_j \prec e_k\) such that \(lv(e_j) \ge lv(e_k)\) :

•

if \(lv(e_j) = lv(e_k)\) , then add \(e_j\) to \(in(e_k)\) ;

•

if \(lv(e_j) \gt lv(e_k)\) , then set \(lv(e_k) := lv(e_j)\) , clear \(in(e_k)\) , and then add \(e_j\) to \(in(e_k)\) .

After \(e_k\) is visited, we continue to check the outgoing edges of \(e_k\) in the same way. Nodes visited in the forward search are stored in a set \(\mathbb {F}\) . We then check if \(\mathbb {F} \cap \mathbb {B}\) produces an empty set. If not, then any node \(e\in \mathbb {F}\cap \mathbb {B}\) witnesses a cycle, composed of the path segment from \(e_j\) to e (by \(e\in \mathbb {F}\) ), the segment from e to \(e_i\) (by \(e\in \mathbb {B}\) ), and the inserted edge \(e_i\prec e_j\) . Otherwise, if the forward search completes with \(\mathbb {F} \cap \mathbb {B} = \emptyset\) , then we confirm the absence of any cycle.

Note that even if we skip the backward search but only perform an exhaustive forward search, the algorithm is still correct but degenerates into a fresh cycle detection on each edge insertion. From this perspective, the key insight of the employed ICD algorithm is to integrate a backward search—bounded by a “magic” threshold \(\Delta = min\lbrace n^{\frac{2}{3}}, m^{\frac{1}{2}}\rbrace\) —to limit the forward search. The following lemma is proved in Reference [12], which is crucial for analyzing the complexity of the ICD algorithm:

Lemma 5 ([12]).

For a graph with n nodes and m edges, with the ICD algorithm, no node’s pseudo-topological level is greater than \(\Delta + 2\) .

Complexity of the ICD algorithm can be analyzed as follows: First, the backward search of a single edge insertion visits at most \(\Delta\) edges, so the backward search during m edge insertions visits at most \(m \times \Delta\) edges. Second, the forward search is invoked when the algorithm updates the pseudo-topological level of a node, which, according to lemma 5, happens at most \(\Delta + 2\) times per edge during the whole m edge insertions. Therefore, forward search during m edge insertions visits at most \(m \times (\Delta + 2)\) edges. Finally, each edge takes \(\mathcal {O}(1)\) time to visit. Thus, the ICD algorithm’s complexity is \(\mathcal {O}(m \times \Delta)\) .

Example.

Figure 5 shows parts of nodes and edges of a graph, where the digit alongside each node represents the pseudo-topological level of that node. Assume \(\Delta\) is currently 3, and the edge \(e_3 \prec e_4\) (the red dashed arrow in fig. 6) is inserted.

Fig. 5.

Fig. 6.

The algorithm first checks whether \(lv(e_3)\lt lv(e_4)\) —the result is not, thus the backward search is invoked from \(e_3\) . The backward search visits \(e_1 \prec e_3\) , \(e_2 \prec e_3\) , and \(e_1 \prec e_2\) (marked as red solid arrows in fig. 6) in turn; and then stops (since the number of visited edges reaches \(\Delta\) ), and sets \(lv(e_4) = lv(e_3) + 1 = 2\) . The set \(\mathbb {B} = \left\lbrace e_1, e_2, e_3 \right\rbrace\) records nodes visited in the backward search.

Then, the forward search starts from \(e_4\) . Note that \(lv(e_4)\) is now 2, equaling \(lv(e_5)\) , the forward search visits \(e_4 \prec e_5\) and adds \(e_4\) to \(in(e_5)\) ; moreover, since \(lv(e_4) \lt lv(e_6)\) , the forward search ignores \(e_4 \prec e_6\) . The set \(\mathbb {F} = \left\lbrace e_4, e_5 \right\rbrace\) records nodes visited in the forward search. Finally, we confirm there is no cycle, since \(\mathbb {B} \cap \mathbb {F} = \emptyset\) .

5.3 Conflict Clause Generation

If a \(\mathcal {T}_{{\it ord}}\) -inconsistency occurs, then we need to find the inconsistency reason and report it to the SAT solver. To find the inconsistency reason, it is sufficient to consider critical cycles [50]. Formally, a cycle is critical if it is simple (i.e., no duplicate nodes) and has no chords in \(\prec _{{\it ppo}}\cup \approx\) [50]. Recall that \(\mathcal {T}_{{\it ord}}\) -solver records a derivation reason for each edge, and the derivation reason of a path can be calculated accordingly (see section 4.2). The derivation reason of critical cycles can be returned as the inconsistency reason. To ease the following discussions, we call the edges corresponding to PPO and atomicity orders static edges (they are fixed during SMT solving), and the edges corresponding to RF, WS, and FR orders induced edges (they are induced by variable assignments).

When a \(\mathcal {T}_{{\it ord}}\) -inconsistency occurs, the event graph may contain many critical cycles; we prefer those with the shortest width (defined as the number of induced edges on the cycle). Their derivation reasons contain the minimal number of ordering literals and can be used to prune more search space. If there are multiple critical cycles with the shortest width, then we generate them all.

An important fact is that the event graph must be acyclic before the current edge insertion. Therefore, the newly added edge should present in all cycles. Let \(e_i\prec e_j\) be the newly added edge; the conflict clause generation needs to find all derivation reasons of \(e_j \prec ^+ e_i\) with the shortest width.

Let \(\mathbb {E}_{j-i}\) be the set of nodes that occur on any path of \(e_j \prec ^+ e_i\) , including \(e_j\) and \(e_i\) themselves. For each \(e_n\in \mathbb {E}_{j-i}\) , denote \(reasons(e_n)\) the set of all derivation reasons of \(e_j \prec ^+ e_n\) with the shortest width. We compute \(reasons(e_n)\) in the following routine:

Step 1 (Subgraph construction).

We first construct subgraph \(\mathbb {E}_{j-i}\) . Remember that in consistency checking, set \(\mathbb {B}\) contains visited nodes from incoming edges of \(e_i\) and set \(\mathbb {F}\) contains visited nodes from outgoing edges of \(e_j\) . Actually, for each node in \(\mathbb {B}\) or \(\mathbb {F}\) , we also record its parents (e.g., if edges \(e_1 \prec e_3\) and \(e_2 \prec e_3\) are visited in the forward search, then \(e_3\) ’s parents are \(\lbrace e_1, e_2\rbrace\) ). In consistency checking, once a cycle is detected when visiting a node (assumed to be \(e_k\) ), then \(e_k\) must be in both \(\mathbb {B}\) and \(\mathbb {F}\) . We can find all nodes on path \(e_j \prec ^+ e_i\) by tracking back to \(e_k\) ’s parents; we add these nodes to \(\mathbb {E}_{j-i}\) . We construct this subgraph of the event graph by removing all nodes other than \(\mathbb {E}_{j-i}\) and deleting induced edges that have a chord in \(\prec _{{\it ppo}}\cup \approx\) (e.g., \((\!| n_1|\!) ^w \prec _{{\it rf}}(\!| n_2|\!) ^r\) in Figure 3(b)).

Step 2 (Iterative solving).

We traverse the subgraph in topological order, starting from the outgoing edges of \(e_j\) . Let \(e_n\) be the current node to be visited. Without loss of generality, when there are multiple edges waiting to be visited, we first visit PPO edges. There are two situations:

•

When visiting PPO edge \(e_p \prec _{{\it ppo}}e_n\) , we append \(reasons(e_p)\) to \(reasons(e_n)\) . However, if node \(e_n\) has been visited once (so \(width(e_n)\) has been calculated) and \(width(e_p) \gt width(e_n)\) , then we skip this visit.

•

We visit induced edges only when no PPO edges are waiting to be visited. When visiting induced edge \(e_p \prec e_n\) (either \(\prec _{{\it rf}}\) , \(\prec _{{\it ws}}\) , or \(\prec _{{\it fr}}\) ), we append \(reasons(e_p)\) to \(reasons(e_n)\) and add \(reason(e_p \prec e_n)\) to each newly appended reason. However, if node \(e_n\) has been visited once and \(width(e_p) \gt width(e_n) -1\) , then we skip this visit.

In this traversal order, we can always find the shortest width of a node at its first visit, i.e., we first visit all nodes whose \(width = 0\) , then all nodes whose \(width = 1\) , and so on.

Formally, suppose the predecessors of \(e_n\) are \(e_{p1}, \ldots , e_{pa}\) , \(e_{q1}, \ldots , e_{qb}\) where \(e_{pi} \prec _{{\it ppo}}e_n\) and \(e_{qj} \prec e_n\) (induced edges). According to the procedure described above, we have:

\begin{equation*} width(e_n) = min\lbrace width(e_{p1}), \ldots , width(e_{pa}), width(e_{q1}) + 1, \ldots , width(e_{qb}) + 1\rbrace , \end{equation*}

so \(\forall e_{pi} \,\, width(e_{pi}) \ge width(e_n)\) and \(\forall e_{qj} \,\, width(e_{qj}) \ge width(e_n) - 1\) .

Denote \(SP(e_n)\) the set of shortest predecessors of \(e_n\) such that the paths \(e_j\prec ^+ SP(e_n) \prec e_n\) have the shortest width. We lift \(\wedge\) operator to sets, and compute \(reasons(e_n)\) as

\begin{equation*} \bigcup _{e_p\in SP(e_n)} reasons(e_p)\wedge reason(e_p\prec e_n). \end{equation*}

After the traversal, \(reasons(e_i)\) records the set of shortest derivation reasons of \(e_j\prec ^+ e_i\) . A path in \(e_j\prec ^+ e_i\) and \(e_i\prec e_j\) forms a cycle. We append \(reason(e_i \prec e_j)\) to each reason in \(reasons(e_i)\) and return them as conflict clauses.

Example.

Consider the event graph in fig. 7. Let \(e_3 \prec _{{\it fr}}e_1\) be the newly added edge. The consistency checking reports \(\mathcal {T}_{{\it ord}}\) -inconsistency, i.e., cycle is detected in the event graph.

Fig. 7.

•

First, we construct the subgraph by keeping all nodes that appear on any path from \(e_1 \prec ^+ e_3\) . There are two paths: \(e_1 \prec _{{\it ppo}}e_2 \prec _{{\it rf}}e_3\) and \(e_1\prec _{{\it rf}}e_4\prec _{{\it fr}}e_5\prec _{{\it ppo}}e_2\prec _{{\it rf}}e_3\) ; therefore, \(e_6\) and \(e_7\) are removed.

•

Second, we traverse the subgraph in topological order, starting from the outgoing edges of \(e_1\) . Note that width can be computed as the number of \(\prec _{{\it rf}}, \prec _{{\it ws}}\) , and \(\prec _{{\it fr}}\) edges in the cycle; Figures 8 and 9 show two cycles with \(width=2\) and \(width=4\) , respectively. Since we try to find cycle(s) with the shortest width, we return the conflict clause: The conjunction of ordering literals that imply \(e_2 \prec _{{\it rf}}e_3\) and \(e_3 \prec _{{\it fr}}e_1\) .

Fig. 8.

Fig. 9.

We show the correctness and complexity of our conflict clause generation algorithm by the following theorems.

Theorem 3.

The above conflict clause generation algorithm finds all conflict clauses with the shortest width.

Proof.

Using mathematical induction, we prove that for each node \(e_n\) with predecessors \(e_{p1}, \ldots , e_{pa}\) , \(e_{q1}, \ldots , e_{qb}\) where \(e_{pi} \prec _{{\it ppo}}e_n\) and \(e_{qi} \prec e_n\) (induced edges), \(reasons(e_n)\) contains all reasons for \(e_j \prec ^+ e_n\) with the shortest width, where soundness means there is no shorter reason for \(e_j \prec ^+ e_n\) ; and completeness means there is no reason for \(e_j \prec ^+ e_n\) with the same width but not contained in \(reasons(e_n)\) . Both soundness and completeness trivially hold for \(e_j\) , for the graph is acyclic before the edge addition.

Soundness: Assume there exists a path \(path_{err}\) from \(e_j\) to \(e_n\) corresponding to \(reason_{err}\) of \(e_j \prec ^+ e_n\) whose \(width_{err} \lt width(e_n)\) , If its last step is \(e_{pi} \prec e_n\) , then \(reasons(e_{pi})\) must include this \(reason_{err}\) , which contradicts \(width(e_{pi}) \ge width(e_n)\) ; otherwise, if the last step is \(e_{qj} \prec e_n\) , then \(reasons(e_{qj})\) must include \(reason_{err}\) without the last reason(whose width is \(width_{err} - 1\) ), which contradicts \(width(e_{qj}) + 1 \ge width(e_n)\) . Thus, we prove the soundness of n from the soundness of its predecessors.

Completeness: Assume there exists a path \(path_{err}\) from \(e_j\) to \(e_n\) corresponding to \(reason_{err}\) of \(e_j \prec ^+ e_n\) whose width is \(width(e_n)\) but not included in \(reasons(e_n)\) . If the last step of \(path_{err}\) is \(e_{pi} \prec e_n\) , then \(reason_{err}\) is also a reason of \(e_j \prec ^+ e_{pi}\) , thus, by soundness of \(e_{pi}\) , \(width(e_{pi}) \le width(e_n)\) , which gives \(width(e_{pi}) = width(e_n)\) . By completeness of \(e_{pi}\) , \(reason_{err}\) is included in \(reasons(e_{pi})\) and should be collected by \(reasons(e_n)\) , which gives a conflict; otherwise, if the last step is \(e_{qj} \prec e_n\) , then \(reason_{err}\) without the last reason is a reason of \(e_j \prec ^+ e_{qj}\) , thus \(width(e_{qj}) \le width(e_n) - 1\) , which gives \(width(e_{qj}) = width(e_n) - 1\) . Then, \(reason_{err}\) without the last reason is included in \(reasons(e_{qj})\) and should be appended to the last reason and collected by \(reasons(e_n)\) , which also gives a conflict. Thus, we prove the completeness of \(e_n\) .

The graph has a topological order, since it is acyclic. By mathematical induction, we prove that the algorithm is sound and complete.□

Theorem 4.

The time complexity of our conflict clause generation algorithm is \(\mathcal {O}(c \times m^{\prime })\) , where c is the number of computed conflict clauses and \(m^{\prime }\) is the number of edges in the constructed subgraph.

Proof.

A reason is a conjunction of literals, usually implemented as a vector or set. To reduce the time cost of copying reasons, when a reason is copied from \(reasons(d)\) to \(reasons(e)\) , we only copy its pointer. When a \(new\_reason\) is constructed by appending r to an existing reason, we represent the \(new\_reason\) with r appending to the pointer of reason.

In this manner, any reason whose last element is r is constructed when visiting nodes e where \(reason(d \prec e) = r\) . Thus, each reason is constructed only once and we can give each reason a unique ID to distinguish different reasons, i.e., it is easy to identify and remove duplicate reason when computing \(reasons(e)\) . \(reasons(e)\) has at most c elements, so for each edge \(e_x \prec e_y\) in the subgraph, at most c reasons are copied from \(reasons(e_x)\) to \(reason(e_y)\) . As shown above, either copy or construction of reasons cost \(\mathcal {O}(1)\) time. Thus, the whole algorithm takes \(\mathcal {O}(c \times m^{\prime })\) time.□

5.4 Backtracking

After the SAT solver being noticed of the conflict clauses, it backtracks to a previous state to get rid of the current inconsistency. On this occasion, \(\mathcal {T}_{{\it ord}}\) -solver also needs to restore the event graph to the correlating previous state, i.e., to remove all edges added after this state. Each edge deletion takes \(\mathcal {O}(1)\) time. Moreover, \(\mathcal {T}_{{\it ord}}\) -solver needs to restore the pseudo-topological levels, which is necessary to preserve ICD’s complexity analyzed in Reference [12]. The restoration of each node also takes \(\mathcal {O}(1)\) time. Therefore, in the worst case (all edges are removed), \(\mathcal {T}_{{\it ord}}\) -solver costs \(\mathcal {O}(m+n)\) time on backtracking an event graph with n nodes and m edges. In practice, the time complexity is often much smaller, since usually only a small portion of edges are removed.

5.5 Theory Propagation

We employ theory propagation to derive more ordering constraints. In detail, \(\mathcal {T}_{{\it ord}}\) -solver deduces values of unassigned literals by unit-edge propagation and derives from-read orders by from-read propagation.

5.5.1 Unit-edge Propagation.

For ease of implementation, we pre-create an edge for each ordering variable in \(X_{{\it rf}}\cup X_{{\it ws}}\) . Each of these edges has two states, active and inactive (initially inactive). Only active edges are present in the event graph. An inactive edge is activated when the corresponding ordering variable is set to \(\mathit {true}\) . An active edge is inactivated if the corresponding ordering variable is unassigned (due to backtracking of DPLL(T)).

Let \(e_i \prec e_j\) be an inactive edge for an ordering variable v. It is a unit edge if there already exists a path from \(e_j\) to \(e_i\) in the event graph. In other words, once the ordering variable v is assigned \(\mathit {true}\) , a cycle \(e_i \prec e_j \prec ^+ e_i\) forms. To prevent this cycle, v must be set to false. In this way, we deduce the value of an unassigned variable. We call this unit-edge propagation.

Unit-edge propagation is performed after incremental cycle detection. Let \(\mathbb {B}\) and \(\mathbb {F}\) be the node sets obtained in the backward and forward search of ICD, respectively. For any node \(e_b\in \mathbb {B}\) and any node \(e_f\in \mathbb {F}\) , there must be a path from \(e_b\) to \(e_f\) that passes the newly added edge. We enumerate each such node pair and check if \((e_f, e_b)\) corresponds to an inactive edge; if it does, then the corresponding inactive edge is a unit edge.

Figure 3(c) shows a cycle led by assignments \(rf_{2,3}^y\mapsto \mathit {true}\) and \(rf_{5,2}^x\mapsto \mathit {true}\) . Assuming that \(rf_{5,2}^x\mapsto \mathit {true}\) is assigned first: the edge \((\!| x_5|\!) ^w \prec _{{\it rf}}(\!| x_2|\!) ^r\) is added; then there forms a path from \((\!| y_3|\!) ^r\) to \((\!| x_5|\!) ^w\) (by PPO edges), to \((\!| x_2|\!) ^r\) (by this added edge), and to \((\!| y_2|\!) ^w\) (by PPO edges). According to our unit-edge propagation, \((\!| y_2|\!) ^w \prec _{{\it rf}}(\!| y_3|\!) ^r\) is a unit edge, so the value of \(rf_{2,3}^y\) is enforced to false. In this way, our unit-edge propagation can prevent the \(\mathcal {T}_{{\it ord}}\) -inconsistency shown in Figure 3(c).

5.5.2 From-read Propagation.

FR constraints are not included in our encoding formula. We depend on \(\mathcal {T}_{{\it ord}}\) -solver to deduce FR orders.

When adding an RF edge \((\!| x_i|\!) ^w \prec _{{\it rf}}(\!| x_j|\!) ^r\) , \(\mathcal {T}_{{\it ord}}\) -solver seeks outgoing WS edges of node \((\!| x_i|\!) ^w\) . For each of such edges, say, \((\!| x_i|\!) ^w \prec _{{\it ws}}(\!| x_k|\!) ^w\) , \(\mathcal {T}_{{\it ord}}\) -solver derives \((\!| x_j|\!) ^r \prec _{{\it fr}}(\!| x_k|\!) ^w\) and instantly adds it to the event graph. Similarly, when adding a WS edge \((\!| x_i|\!) ^w \prec _{{\it ws}}(\!| x_j|\!) ^w\) , \(\mathcal {T}_{{\it ord}}\) -solver seeks outgoing RF edges of \((\!| x_i|\!) ^w\) , say, \((\!| x_i|\!) ^w \prec _{{\it rf}}(\!| x_k|\!) ^r\) , and derives \((\!| x_k|\!) ^r \prec _{{\it fr}}(\!| x_j|\!) ^w\) .

Figure 3(b) shows an example of from-read propagation, where \((\!| n_1|\!) ^w \prec _{{\it ws}}(\!| n_4|\!) ^w\) is added prior to \((\!| n_1|\!) ^w \prec _{{\it rf}}(\!| n_2|\!) ^r\) . During the addition of \((\!| n_1|\!) ^w \prec _{{\it ws}}(\!| n_4|\!) ^w\) , since \((\!| n_1|\!) ^w\) has no outgoing RF edges yet, from-read propagation obtains nothing. Then, while adding \((\!| n_1|\!) ^w \prec _{{\it rf}}(\!| n_2|\!) ^r\) , there is an outgoing WS edge \((\!| n_1|\!) ^w \prec _{{\it ws}}(\!| n_4|\!) ^w\) from \((\!| n_1|\!) ^w\) . By from-read propagation, we deduce \((\!| n_2|\!) ^r \prec _{{\it fr}}(\!| n_4|\!) ^w\) .

In \(\mathcal {T}_{{\it ord}}\) theory solving, only unit-edge propagation interacts with the outer SAT solver, since it can propagate values of some Boolean variables. In contrast, when from-read propagation deduces a new edge, the theory solver inserts the edge and calls the consistency checking procedure, all inside the \(\mathcal {T}_{{\it ord}}\) -solver.

Example.

Let us consider how to verify the example program in fig. 2 using \(\mathcal {T}_{{\it ord}}\) -solver integrated with DPLL(T). Given the encoding formula \(\Psi\) , DPLL(T) first applies unit-clause propagation and theory propagation to make as many as possible deductions.

Before deciding any literal, DPLL(T) assigns PPO order constraints \(\mathit {true}\) for they occur alone in the CNF-formulated input SMT formula. Unit-edge propagation assigns:

\begin{equation*} \begin{split} &ws_{5,1}^x, ws_{2,1}^y, ws_{3,1}^m, ws_{4,1}^m, ws_{4,3}^m, ws_{3,1}^n, ws_{4,1}^n, ws_{4,3}^n \end{split} \end{equation*}

to false. Then, \(ws_{1,5}^x\) , \(ws_{1,2}^y\) are deduced to be \(\mathit {true}\) by DPLL(T). Ordering variables related to variable m and n are not derived for some guards not necessarily hold.

Assuming DPLL(T) chooses \(rf_{3,2}^m\) and decides its value to \(\mathit {true}\) , we perform deduction as follows:

\begin{equation*} \begin{array}{rll}rf_{3,2}^m &\Longrightarrow x_2 = 1 &(\text{Guard holds})\\ &\Longrightarrow rf_{5,2}^x &(\text{RF-Val, RF-Some})\\ &\Longrightarrow \lnot rf_{2,3}^y \wedge \lnot rf_{2,4}^y \,\,\,\,\,\,\,\,\, &(\text{Unit-edge})\\ &\Longrightarrow rf_{1,3}^y \wedge \lnot rf_{2,4}^y &(\text{RF-Some})\\ &\Longrightarrow y_3 = 0 \wedge \lnot rf_{2,4}^y &(\text{RF-Val})\\ &\Longrightarrow rf_{1,4}^y &(\text{RF-Some})\\ &\Longrightarrow y_4 = 0 &(\text{RF-Val}).\end{array} \end{equation*}

Then, we decide from which write ( \((\!| n_1|\!) ^w\) , \((\!| n_3|\!) ^w\) , or \((\!| n_4|\!) ^w\) ) the access \((\!| n_2|\!) ^r\) obtains its value. First, \((\!| n_3|\!) ^w\) is excluded from consideration, since its guard condition ( \(y_3 = 1\) ) conflicts with the current assignment ( \(y_3 = 0\) ). Second, \((\!| n_2|\!) ^r\) is also excluded, since the values of \(n_2\) (equals 1) and \(n_1\) (equals 0) are not equal. Third, if \((\!| n_2|\!) ^r\) reads from \((\!| n_4|\!) ^w\) , then \(n_2 = n_4 = y_4 = 1\) , conflicting with the current assignment ( \(y_4 = 0\) ), too. Therefore, the deduction from \(rf_{3,2}^m = \mathit {true}\) gets to a contradiction.

Then, DPLL(T) backtracks and assigns \(rf_{3,2}^m\) to \(\mathit {false}\) , i.e., \((\!| m_2|\!) ^r\) cannot read from \((\!| m_3|\!) ^w\) . Note that \((\!| m_2|\!) ^r\) also cannot read from \((\!| m_1|\!) ^w\) for \(m_2 \ne m_1\) . Thus, \(rf_{4,2}^m\) is the only choice. The subsequent deduction is as follows:

\begin{equation*} \begin{array}{rll}rf_{4,2}^m &\Longrightarrow x_3 = m_4 = m_2 = 1 \,\,\,\,\,\, &(\text{RF-Val}) \\ &\Longrightarrow \lnot rf_{1,3}^x &(\text{RF-Val})\\ &\Longrightarrow rf_{5,3}^x &(\text{RF-Some})\\ &\Longrightarrow \lnot rf_{2,3}^y \wedge \lnot rf_{2,4}^y &(\text{Unit-edge}).\end{array} \end{equation*}

Now the deduction gets to the same point as in the third line of the first deduction procedure. The same as in the first deduction, it also leads to a contradiction.

From the deduction procedures above, we conclude that the encoding formula for the program is unsatisfiable. Therefore, the safety property of this program holds.

6 Experimental Evaluation

This section introduces the implementation of our approach and reports the comparative results with some state-of-the-art verification tools.

6.1 Implementation and Setup

We implemented our techniques on top of CBMC [41] and Z3 [21]. CBMC is powerful and flexible bounded model checker for C/C++ programs, and Z3 is a well-known and widely adopted SMT solver. In our implementation, CBMC and Z3 act as the front and back ends, responsible for generating and solving SMT formulas, respectively. We enhance CBMC by consulting our \(\mathcal {T}_{{\it ord}}\) -theory, and extend Z3 with our \(\mathcal {T}_{{\it ord}}\) -solver. We follow the same strategy as in Reference [59] for loop unrolling. All generated SMT formulas are in the SMT-LIB-v2.6 format. In the following, we call our implementation Zord.⁸

All experiments were conducted on a computer with an Intel(R) Core(TM) i7-8700 CPU and 32 GB DDR4 memory. The operating system is ArchLinux-5.11.10. The time limit for each verification task is 900 seconds.

6.2 Experiment on SV-COMP Benchmarks

We collect benchmarks from the ConcurrencySafety category of SV-COMP 2020.⁹ This category is divided into 11 sub-categories, namely, ldv-races (12), pthread (38), atomic (11), C-DAC (4), complex (5), divine (16), driver-races (21), ext (53), lit (11), nondet (6), and wmm (898), where the number adhered to each sub-category represents the number of programs it contains. There are 14 programs in divine sub-category that cannot be compiled by CBMC and are thus excluded from the benchmark set. In total, we get 1,061 test cases.

We compare Zord with the following tools:

•

CBMC ¹⁰: a tool that implements the standard partial-order-based verification algorithms [9], with Z3 as the underlying SMT solver.

•

Lazy-CSeq ¹¹: a tool that verifies concurrent programs under SC memory model using the lazy sequentialization schema [36, 37].

•

Lazy-SMA ¹²: a tool that verifies concurrent programs under TSO and PSO memory models using the lazy sequentialization schema [37, 54].

•

CPA-Seq ¹³: a configurable program verification platform [13, 14] with sequentially combined analysis strategies.

•

Dartagnan ¹⁴: a bounded model checker [27] for concurrent program verification under various memory models.

Results under SC.

The experimental results under SC memory model are summarized in table 1. The first column #Solved shows the number of cases successfully solved by each tool. The following columns list results for both-solved cases (cases that can be correctly verified by both Zord and the tool being compared): Num gives the number of cases solved by Zord and the baseline tool; True and False show the number of solved cases that satisfy and violate the desired properties, respectively; CPU_time and Memory show the time and memory consumption of the baseline tool and Zord, respectively.

Table 1.

Tool	#Solved	#Both-solved
Tool	#Solved	Num	True	False	CPU_time (s)(-/Zord)	Memory (GB)(-/Zord)
Zord	1,051	-	-	-	-	-
CBMC	1,013	1,013	224	789	3,091/1,255	7.87/6.21
CPA-Seq	932	930	165	765	33,771/363	1,321.46/5.41
Dartagnan	154	154	130	24	6,572/51	72.33/0.69
Lazy-CSeq	1,057	1,050	248	802	14,066/1,943	115.45/6.55

Table 1. Results on 1,061 SV-COMP Benchmarks under SC

In total, Zord solves 38 more cases than CBMC, 119 more cases than CPA-Seq, and 897 more cases than Dartagnan. The only exception is Lazy-CSeq, which solves 6 more cases than ours. Considering that Lazy-CSeq is a highly optimized tool (winner of the ConcurrencySafety category of SV-COMP 2020), this result is acceptable. Considering the both-solved cases, Zord is 2.44 \(\times\) faster than CBMC, 90.04 \(\times\) faster than CPA-Seq, 139.47 \(\times\) faster than Dartagnan, and 7.20 \(\times\) faster than Lazy-CSeq. Meanwhile, Zord uses 20.8% less memory than CBMC, 99.6% less memory than CPA-Seq, 99.0% less memory than Dartagnan, and 94.5% less memory than Lazy-CSeq.

We notice that programs in wmm sub-category are all very small ones with instrumentations to model the weak memory semantics. One may not consider them as representative concurrent programs. table 2 lists the summary results with wmm sub-category excluded. In total, Zord solves 38, 113, and 122 more cases than CBMC, CPA-Seq, and Dartagnan, respectively, and 6 less cases than Lazy-CSeq. Counting on the both-solved non-wmm cases, Zord is 2.76 \(\times\) , 11.06 \(\times\) , 157.47 \(\times\) , and 6.24 \(\times\) faster and uses 21.5%, 97.5%, 99.6%, and 97.7% less memory than CBMC, CPA-Seq, Dartagnan, and Lazy-CSeq, respectively.

Table 2.

Tool	#Solved	#Both-solved
Tool	#Solved	Num	True	False	CPU_time (s)(-/Zord)	Memory (GB)(-/Zord)
Zord	153	-	-	-	-	-
CBMC	115	115	80	35	2,731/989	1.94/1.53
CPA-Seq	40	38	23	15	1,135/102	30.03/0.77
Dartagnan	31	31	7	24	2,968/21	54.77/0.23
Lazy-CSeq	159	152	104	48	10,528/1,693	72.64/1.66

Table 2. Results on 163 SV-COMP Benchmarks under SC (i.e., with wmm Sub-category Excluded)

Figure 10 compares Zord with CBMC on the CPU time of each verification task. A point below (or above) the diagonal represents a case that Zord is superior (inferior) to CBMC. Programs in wmm sub-category are all simple, so their accumulated CPU time by Zord and CBMC are \(269 \,\mathrm{s}\) and \(338 \,\mathrm{s}\) , respectively; we only draw a single point in the figure to represent the whole wmm sub-category. There is a cluster of points at the bottom left of fig. 10, which indicates that these cases are solved extremely fast by both tools, and CBMC even solves slightly faster on some tasks. This is because either these tasks are trivial or counterexamples occur at a low depth. When the cases become complex, our method starts to show its strength.

Fig. 10.

Figures 11 and 12 compare Zord with Lazy-CSeq, and Zord with CPA-Seq (blue points) and Dartagnan (orange points) on each case, respectively. These results conform to those in table 1. Zord is remarkably superior to these three tools in most cases. Among all cases, Zord outperforms Dartagnan. Only in 14 and 3 cases is Zord inferior to Lazy-CSeq and CPA-Seq, respectively.

Fig. 11.

Fig. 12.

Results under TSO and PSO.

Because CPA-Seq has no configuration for weak memory models, we compare Zord with CBMC, Dartagnan, and Lazy-SMA. The experimental results under TSO and PSO are summarized in Tables 3 and 4, respectively. The meaning of each column is the same as table 1.

Table 3.

Tool	#Solved (TSO)	#Both-solved (TSO)
Tool	#Solved (TSO)	Num	True	False	CPU_time (s)(-/Zord)	Memory (GB)(-/Zord)
Zord	1,048	-	-	-	-	-
CBMC	1,001	1,000	217	783	3,428/1,387	13.55/9.27
Dartagnan	123	123	92	31	5,946/37	88.34/1.69
Lazy-SMA	515	515	153	362	992/219	32.53/2.42

Table 3. Summary Results on 1,061 SV-COMP Benchmarks under TSO

Table 4.

Tool	#Solved (PSO)	#Both-solved (PSO)
Tool	#Solved (PSO)	Num	True	False	CPU_time (s)(-/Zord)	Memory (GB)(-/Zord)
Zord	1,049	-	-	-	-	-
CBMC	999	998	97	901	4,576/1,877	11.44/7.93
Dartagnan	159	159	15	144	7,689/48	87.63/1.56
Lazy-SMA	776	776	57	719	2,741/245	28.66/1.98

Table 4. Summary Results on 1,061 SV-COMP Benchmarks under PSO

In TSO, Zord solves 47 more cases than CBMC, 925 more cases than Dartagnan, and 533 more cases than Lazy-SMA. Note that the number of cases Lazy-SMA solves under the weak memory model is much less than Lazy-CSeq under SC. This is because Lazy-CSeq’s support for the weak memory model (called Lazy-SMA) has not been updated and maintained in the follow-up, and the performance is worse than the competition version (for SC). Considering both-solved cases, Zord is 2.47 \(\times\) , 174.18 \(\times\) , and 4.49 \(\times\) faster than CBMC, Dartagnan, and Lazy-SMA, respectively. Meanwhile, Zord uses 31.8%, 98.2%, and 92.6% less memory than CBMC, Dartagnan, and Lazy-SMA, respectively.

In PSO, Zord solves 50, 890, and 273 more cases than CBMC, Dartagnan, and Lazy-SMA, respectively. Considering both-solved cases, Zord is 2.44 \(\times\) , 163.43 \(\times\) , and 11.29 \(\times\) faster than CBMC, Dartagnan, and Lazy-SMA, respectively. Meanwhile, Zord uses 31.0%, 98.4%, and 93.2% less memory than CBMC, Dartagnan, and Lazy-SMA, respectively.

From the statistics in Tables 3 and 4, as the memory model changes from SC to TSO and PSO, all the \(\mathit {false}\) tasks in SC are still \(\mathit {false}\) in TSO and PSO, whereas some \(\mathit {true}\) tasks flip to \(\mathit {false}\) . From the experimental results, relaxing some ordering constraints in TSO and PSO causes more safety property violations, especially when allowing the reordering of two write events that access different addresses.

Tables 5 and 6 list the summary results with wmm sub-category excluded under TSO and PSO, respectively. In total, Zord solves 33, 114, and 111 more cases in TSO, and 44, 121, and 111 more cases in PSO, than CBMC, Dartagnan, and Lazy-SMA, respectively. Counting on both-solved cases, in TSO, our approach is 2.77 \(\times\) , 211.80 \(\times\) , faster and consumes 53.5%, 98.2% less memory than CBMC and Dartagnan; in PSO, our approach runs 2.71 \(\times\) , 212.28 \(\times\) faster and consumes 8.0%, 99.4% less memory than CBMC and Dartagnan. Once we exclude the programs in wmm sub-category, Lazy-SMA only solves—39 simple cases in TSO and 40 simple cases in PSO. Considering these both-solved cases, Zord is slightly inferior to Lazy-SMA. However, Zord consumes 97.6% and 97.8% less memory than Lazy-SMA.

Table 5.

Tool	#Solved (TSO)	#Both-solved (TSO)
Tool	#Solved (TSO)	Num	True	False	CPU_time (s)(-/Zord)	Memory (GB)(-/Zord)
Zord	150	-	-	-	-	-
CBMC	117	116	73	43	3,121/1,119	3.893/1.82
Dartagnan	36	36	15	21	3,155/16	68.79/0.55
Lazy-SMA	39	39	27	12	22/29	11.21/0.26

Table 5. Summary Results on 163 SV-COMP Benchmarks under TSO (with wmm Sub-category Excluded)

Table 6.

Tool	#Solved (PSO)	#Both-solved (PSO)
Tool	#Solved (PSO)	Num	True	False	CPU_time (s) (-/Zord)	Memory (GB) (-/Zord)
Zord	151	-	-	-	-	-
CBMC	107	106	54	52	4,339/1,608	1.74/1.60
Dartagnan	30	30	15	15	7,656/46	62.88/0.34
Lazy-SMA	40	40	21	19	18/25	9.20/0.21

Table 6. Summary Results on 163 SV-COMP Benchmarks under PSO (with wmm Sub-category Excluded)

Figure. 13 compares Zord with CBMC on the CPU time of each verification task under TSO (blue) and PSO (orange). A point below (or above) the diagonal represents a case that Zord is superior (inferior) to CBMC. Similar to fig. 10, programs in wmm sub-category are all simple so their accumulated CPU time by Zord and CBMC are— \(278 \,\mathrm{s}\) and \(333 \,\mathrm{s}\) in TSO, \(272 \,\mathrm{s}\) and \(306 \,\mathrm{s}\) in PSO, respectively; we only draw a single point for each memory model in the figure to represent the whole wmm sub-category. From fig. 13, our approach is superior to CBMC in most cases, since most of the points are below the diagonal.

Fig. 13.

Figures 14 and 15 compare Zord with Dartagnan and Lazy-SMA on each case under TSO (blue) and PSO (orange). These results are consistent with those in Tables 3 and 4. Zord significantly outperforms Dartagnan among all cases and is superior to Lazy-SMA in most cases. In only 7 and 5 cases is Zord inferior to Lazy-SMA under TSO and PSO, respectively.

Fig. 14.

Fig. 15.

6.3 Experiment on Strategies of 𝒯_ord-solver

This experiment evaluates strategies of \(\mathcal {T}_{{\it ord}}\) -solver by testing their effects on the whole performance of program verification. Experiment settings are identical to 6.2, and we set the memory model to SC for brevity.

Effects of FR Propagation.

This experiment compares the effect of the front-end FR constraints propagation. Zord does not encode FR constraints into the SMT formula. Instead, the \(\mathcal {T}_{{\it ord}}\) -solver conducts the derivation of FR constraints. An alternative strategy, called Zord \(^-\) , forces the front-end to encode all FR constraints into the SMT formula and lets \(\mathcal {T}_{{\it ord}}\) -solver skip their derivation.

Figure 16 shows the time comparison between Zord \(^-\) and Zord on each verification task. In most cases, Zord is more efficient than Zord \(^-\) . table 7 summarizes the comparison result of Zord \(^-\) and Zord. The columns Total_time, Encoding_time, and Solving_time show the total CPU time, encoding time, and constraint solving time, respectively. The column SMT_size lists the size of the generated SMT formulas. In the Ratio row, we compute the ratio of Zord \(^-\) to Zord: A ratio greater than 1.0 indicates Zord outperforming Zord \(^-\) on the selected measurement.

Fig. 16.

Table 7.

Tool	#Solved	#Both-solved
Tool	#Solved	Num	Total_time (s)	Encoding_time (s)	SMT_size (GB)	Solving_time (s)
Zord \(^-\)	1,049	1,049	3,090.5	642.0	6.53	2,448.4
Zord	1,051	1,049	2,040.5	491.9	2.28	1,548.6
Ratio	-	-	1.52 \(\times\)	1.31 \(\times\)	2.86 \(\times\)	1.58 \(\times\)

Table 7. Summary Results of FR Propagation

Among 1,061 available cases, Zord solves 1,051 cases, and Zord \(^-\) solves 1,049 cases. Zord \(^-\) has two more timeout cases, whereas Zord solves them within \(95.4 \,\mathrm{s}\) and \(248.3 \,\mathrm{s}\) , respectively. On 1,049 both-solved cases, Zord \(^-\) spends 3,090.5 s while Zord spends 2,040.5 s—these cases only take Zord 66.0% time as much as Zord \(^-\) . Since Zord ignores FR constraints when generating the SMT formula, Zord encodes a smaller formula than Zord \(^-\) . The size of the encoding formula of Zord \(^-\) and Zord is \(6.53 \,{\rm GB}\) and \(2.28 \,{\rm GB}\) , respectively—a 65.1% reduction for Zord over Zord \(^-\) . In more detail, in 439 of 1,061 available cases, the size of generated encoding formula by Zord is less than 60% of that by Zord \(^-\) . Besides, Zord reduces the encoding time by \(150.1 \,\mathrm{s}\) compared to Zord \(^-\) .

Note that the encoding time of Zord \(^-\) and Zord only accounts for 20.8% and 24.1% of total time, respectively. In contrast, constraint solving takes up most of the verification time. Compared to Zord \(^-\) , Zord reduces the solving time by \(899.8 \,\mathrm{s}\) ; Zord shows promising improvement in constraints solving. In more detail, Zord \(^-\) uses unit clause propagation at the SAT level to assign FR constraints, whereas \(\mathcal {T}_{{\it ord}}\) -solver performs from-read propagation on the event graph to derive FR constraints with \(\mathcal {O}(1)\) complexity—our application of the from-read axiom is more efficient than Zord \(^-\) . The statistics in table 7 also show that the overall efficiency improvement mainly comes from the FR axiom in constraint solving, not just from the simplified encoding. In summary, it is more efficient to pass the derivation of FR constraints to \(\mathcal {T}_{{\it ord}}\) -solver than to encode all FR constraints into the SMT formula.

Effects of Unit-edge Propagation.

This experiment evaluates the effect of unit-edge propagation. Unit-edge propagation enables \(\mathcal {T}_{{\it ord}}\) -solver to search for unit edges and propagate their derivation reasons to false as many as possible. Note that unit-edge propagation only affects the efficiency of theory propagation but not its correctness.

An alternative strategy is to disable unit-edge propagation, signed as Zord \(^{^{\prime }}\) . fig. 17 shows time comparison between Zord and Zord \(^{^{\prime }}\) on each verification task. Among 1,061 available cases, Zord \(^{^{\prime }}\) verifies 1,048 cases correctly, and Zord succeeds in 1,051 cases—by applying unit-edge propagation, we solve 3 more cases within \(93.0 \,\mathrm{s}\) , \(606.3 \,\mathrm{s}\) , and \(245.0 \,\mathrm{s}\) , respectively. On 1,048 both-solved cases, Zord \(^{^{\prime }}\) spends \(1590.2 \,\mathrm{s}\) and Zord spends \(1389.3 \,\mathrm{s}\) —the time reduction is 12.6%. Moreover, compared to Zord \(^{^{\prime }}\) , Zord reduces memory consumption, the number of decisions, propagations, and conflicts to 97.5%, 84.4%, 90.1%, and 79.0%, respectively. Zord reports 15.6% fewer decisions and 21% fewer conflicts than Zord \(^{^{\prime }}\) . Therefore, we confirm that unit-edge propagation helps DPLL(T) avoid possible cycles in advance and make fewer decisions to reduce the search space. In summary, unit-edge propagation is an obvious optimization for \(\mathcal {T}_{{\it ord}}\) -solver.

Fig. 17.

Effects of ICD Algorithm.

We also implement Tarjan’s non-incremental cycle detection algorithm in our \(\mathcal {T}_{{\it ord}}\) -solver and compare it with the ICD algorithm. fig. 18 shows the CPU time of SMT solving with these two algorithms on each verification task. In small cases (< \(1 \,\mathrm{s}\) ), the performance is similar, whereas in most complicated cases, the ICD algorithm is more efficient than Tarjan’s algorithm. Considering both-solved cases, the total CPU time with Tarjan’s algorithm is \(4813 \,\mathrm{s}\) , and that with ICD is \(2368 \,\mathrm{s}\) —SMT solving with the ICD algorithm is 2.03 \(\times\) times faster than that with Tarjan’s algorithm. For an event graph with n nodes and m edges, the complexity of Tarjan’s’ algorithm is \(\mathcal {O}(m\ \times \ (n\ +\ m))\) and that of ICD is \(\mathcal {O}(m\ \times \ min\lbrace n^{\frac{2}{3}}, m^{\frac{1}{2}}\rbrace)\) . Generally, the event graph of a multi-threaded program is sparse ( \(m \lt \lt n^2\) ). As programs become complex, incremental cycle detection starts to show its efficacy.

Fig. 18.

6.4 Comparison with Stateless Model Checking

Stateless model checking (SMC) [29] is another successful technique for multi-threaded program verification. It systematically explores all interleaving traces of the concurrent program. Meanwhile, various partial order reduction techniques [1, 16, 35, 62] are developed to alleviate the path space explosion problem. In this experiment, we compare Zord with two state-of-the-art SMC tools:

•

Nidhugg ¹⁵: an SMC tool that implements the reads-from equivalence exploration algorithm [4] in SC (rfsc) and supports various weak memory models [2, 3].

•

GenMC ¹⁶: an SMC tool that implements an optimal algorithm to build execution graphs dynamically [40]. GenMC supports various access modes in Repaired C/C++11 memory model (RC11) [39].

Originally, we wanted to perform this experiment on SV-COMP benchmarks. However, Nidhugg and GenMC currently do not support many library functions used in SV-COMP benchmarks, e.g., abort(), strcmp(), strcpy(), and some of __VERIFIER_*. As a result, among the 1,061 SV-COMP verification tasks, Nidhugg and GenMC only solve 45 and 38 cases, respectively.

Therefore, we decide to use benchmarks from Nidhugg.¹⁷ The original benchmark set contains 83 multi-threaded C programs, many of which do not contain assertions, since they are prepared for testing Partial Order Reduction(POR) techniques [1, 2]. We select benchmarks using the following rules: (1) gcc-compilable, (2) contain at least one assertion, (3) parameterizable, and (4) verifiable by Nidhugg.

Finally, we get nine examples with parameter N:

•

CO-2+2W(N) and float_r(N) are two examples that check the safety of atomic read and atomic write in a multi-threaded environment with N threads.

•

airline(N) implements a ticket sale example from Reference [35], where N is the number threads.

•

fib_bench(N) implements a concurrent version of Fibonacci number; szymanski(N) implements Szymaski’s critical section algorithm; lamport(N) implements the Lamport’s algorithm; cir_buf(N) implements a circular buffer. The above all four programs using two threads, where N determines their unrolling bound of loops.

•

parker(N) implements the Parker lock algorithm using one thread. N is used in a assume statement.

•

account(N) depicts a bank account system [35]. It contains N deposit threads and one withdraw thread.

Given that Zord is implemented on top of CBMC, we also make a comparison with CBMC on Nidhugg benchmarks. In general, we compare Zord with CBMC and Nidhugg under SC, TSO, and PSO, respectively. Moreover, since GenMC does not currently support TSO and PSO, we make a comparison with GenMC ¹⁸ under SC.

Results.

Table 8 details the experimental results, where the Traces column lists the number of traces explored by GenMC. There is an obvious positive correlation between the time efficiency of stateless model checking tool and the number of program paths it searches. If a verification result is returned within 900 seconds, then the number reported in Traces is a measurement of the size of the trace space. For each verification tool (e.g., Nidhugg), we show its verification time under SC, TSO, and PSO, respectively. The abbreviation TO means timeout, i.e., no verification result returned within 900 seconds.

Table 8.

Files	Traces	Nidhugg			GenMC	CBMC			Zord
Files	Traces	SC	TSO	PSO	SC	SC	TSO	PSO	SC	TSO	PSO
CO-2+2W(5)	5	0.10	13.22	13.86	0.02	0.35	6.04	5.89	0.31	0.22	0.17
CO-2+2W(15)	15	0.12	TO	TO	0.03	33.89	494.10	557.24	1.35	1.35	1.34
CO-2+2W(25)	25	0.10	TO	TO	0.04	322.61	TO	TO	6.57	7.31	7.29
float_r(10)	11	0.08	TO	TO	0.03	3.182	287.63	344.37	0.21	0.18	0.14
float_r(50)	51	0.11	TO	TO	0.04	TO	TO	TO	1.68	2.12	2.44
float_r(100)	101	0.30	TO	TO	0.10	TO	TO	TO	4.90	TO	TO
airline(3)	27	0.09	2.23	2.36	0.03	0.19	0.21	0.21	0.22	0.22	0.22
airline(7)	\(8.2 \times 10^5\)	161.21	TO	TO	28.03	3.73	51.56	40.32	0.32	0.36	0.24
airline(9)	-	TO	TO	TO	TO	20.03	245.40	114.06	0.51	0.41	0.25
fib_bench(4)	\(3.4\times 10^4\)	2.74	1.18	1.59	0.72	31.02	112.36	121.62	0.35	0.35	0.38
fib_bench(5)	\(5.3 \times 10^5\)	37.53	21.21	18.80	12.98	88.29	558.70	511.04	0.82	0.61	1.17
fib_bench(6)	\(8.1\times 10^6\)	537.98	283.31	242.18	264.65	TO	TO	TO	1.53	1.66	1.55
szymanski(2)	-	4.07	2.96	2.73	TO	15.41	13.13	7.81	1.69	1.38	1.29
szymanski(4)	-	153.92	106.65	92.96	TO	258.15	29.65	16.62	6.14	5.11	4.77
szymanski(6)	-	TO	TO	TO	TO	792.42	63.34	61.62	12.52	7.89	8.63
lamport(2)	\(1.7 \times 10^4\)	2.46	3.82	3.48	1.18	TO	TO	TO	4.15	2.09	1.33
lamport(6)	\(1.3\times 10^6\)	282.10	286.89	256.74	65.42	TO	TO	TO	379.21	163.39	139.99
lamport(10)	-	TO	TO	TO	TO	TO	TO	TO	TO	TO	TO
cir_buf(5)	252	0.19	0.16	0.18	0.06	11.19	27.97	35.78	1.70	0.17	0.18
cir_buf(9)	\(4.9 \times 10^4\)	25.90	16.69	18.33	11.84	176.43	213.89	TO	52.73	69.10	62.54
cir_buf(13)	-	TO	TO	TO	TO	TO	TO	TO	897.44	TO	TO
parker(12)	\(7.0 \times 10^4\)	2.94	2.05	1.99	15.58	114.63	125.47	114.41	0.33	0.35	0.32
parker(20)	\(4.8 \times 10^5\)	19.21	11.75	10.98	89.56	TO	TO	TO	0.36	0.35	0.37
parker(28)	\(1.7 \times 10^6\)	52.60	38.21	36.50	458.29	TO	TO	TO	0.38	0.37	0.36
account(5)	1	0.01	0.01	0.09	0.01	0.67	0.72	0.66	0.25	3.34	0.05
account(15)	1	0.01	0.09	0.09	0.02	87.14	688.33	843.31	43.30	98.28	6.44
account(25)	1	0.13	0.34	0.08	0.04	TO	TO	TO	TO	TO	TO

Table 8. Experiment Results on Nidhugg Benchmarks

In CO-2+2W(N) and float_r(N), there is no branching statement; the main thread contains only one read event; and each child thread contains one visible atomic write. As a result, the number of traces in each program is equal (or nearly equal) to the number of child threads (i.e., N). Compared to SMC tools, Nidhugg is efficient under SC, since it employs the rfsc strategy to reduce equivalent traces. However, as N increases, Nidhugg fails to verify these programs under TSO and PSO. In contrast, Zord is obviously faster than Nidhugg under TSO and PSO. GenMC verifies these programs within a few tenths of a second. This is understandable, since the largest float_r(100) contains 101 traces only. GenMC only explores \(N + c\) (c is a non-negative integer and \(c\ll N\) ) traces and returns instantly. The time consumption of CBMC and Zord increase gradually on the increment of N because the size of the encoding formula grows accordingly. However, Zord is obviously faster than CBMC under both SC, TSO, and PSO. Our ordering theory and elaborated theory solver help Zord achieve higher efficiency.

Airline(N), fib_bench(N), and szymanski(N) contain branching statements, with N representing either the number of threads or the unrolling bound of loops. As N increases, Nidhugg and GenMC slow down rapidly. Basically, the verification time of Nidhugg and GenMC is proportional to the number of traces in the program. From table 8, Zord is significantly superior to CBMC, Nidhugg, and GenMC.

In lamport(N) and cir_buffer(N), Zord is slightly inferior to Nidhugg and GenMC. The main reason is that these two examples contain numerous array operations. As a result, the array theory solver is involved in SMT solving, which is not as efficient as the bit-vector solver. Nevertheless, Zord is the only tool that solves cir_buffer(13) under SC.

In parker(N), a _parker procedure is invoked N times. As N increases, Nidhugg and GenMC slow down rapidly, in line with the fast growth of traces. In contrast, Zord verifies each program in less than 0.4 second, and the time does not notably change as N increases. The main reasons are: (1) this example contains only one thread, and (2) Zord encodes the path condition of each memory event into the SMT formula. As a result, increasing N has no influence on the size of the SMT formula. Moreover, Zord significantly outperforms CBMC under three memory models.

In program account(N), Nidhugg obviously outperforms others. Note that all variations of this example are buggy. It happens that Nidhugg finds the safety violation by exploring one trace. In account(5), the performance of Zord, CBMC, and GenMC is similar; but as N increases to 15, Zord is obviously faster than CBMC and GenMC.

Summary.

First, as N increases, Zord outperforms CBMC significantly, demonstrating the efficiency of our \(\mathcal {T}_{{\it ord}}\) -theory solver. Second, SMC is suitable for programs with simple branches, while Zord is more suitable for complex programs. Third, arrays and complex data structures can slow down the SMT solving. In general, as the scale (measured by the number of traces) of the program increases, our approach is superior to these tools in most cases.

6.5 Threats to Validity and Limitations

The main threats to validity are whether the performance improvements are due to our tactic and whether our implementation and experiments are credible.

First, since we added support for \(\mathcal {T}_{{\it ord}}\) in CBMC and elaborated on its theory solver in Z3, we compare Zord with them instantly. The improvements over CBMC must come from our tactic. Second, the experimental results of from-read generation and unit-edge propagation are consistent with the theoretical analysis, which confirms that the improvements are indeed from these strategies. Third, we compare the ICD algorithm to Tarjan’s SCC algorithm during theory solving. The above three aspects show that the performance improvements are due to our tactic.

Implementation of \(\mathcal {T}_{{\it ord}}\) in CBMC is simple and clear, and the implementation of \(\mathcal {T}_{{\it ord}}\) -solver is loosely coupled with the overall framework of Z3. Benchmarks are collected from the ConcurrencySafety category of SV-COMP 2020 and Nidhugg. Many studies perform their experiments on these benchmarks to demonstrate the effectiveness of their method. Finally, we performed an extensive comparison with CBMC, Lazy-CSeq, Lazy-SMA, CPA-Seq, Dartagnan, and two SMC techniques implemented in Nidhugg and GenMC. The detailed results and analysis of these tools show that Zord is correct and competitive. We are thus confident in the effectiveness of Zord.

Another threat to the validity is whether our approach can be generalized to other SMT solvers than Z3. We model multi-threaded programs and encode them into SMT formulas, which are independent of the SMT solver. If we follow the axioms of multi-threaded program verification and implement these rules into other SMT solvers, then they are also suitable for solving these SMT formulas. Therefore, our approach can be implemented in other DPLL(T)-based SMT solvers.

The main limitations of our approach are summarized below. First, our method is designed for verifying safety properties. We need to support more modeling techniques and elaborate on domain-specific solving strategies for other concurrent verification problems (e.g., data race detection). Second, our method can not handle the different memory order modes for atomic access and acquire/release operations. Third, our method only supports SC, TSO, and PSO memory models. Other mainstream weak memory models with richer semantics and operations need to be supported in the future. Finally, we use CBMC’s original pointer analysis component. It is developed based on the classical Steensgaard-style algorithm [52], which is flow-insensitive and is applicable for symbolic encoding in TSO and PSO. However, the may-alias analysis in handling pointers may cause spurious counterexamples. Therefore, more advanced strategies are required for analyzing pointers in multi-threaded programs.

7 Related Work

There is much research on improving the availability and efficiency of multi-threaded program verification by constraint solving. In this section, we discuss representative and related techniques in recent years.

Multi-threaded program verification has been comprehensively studied in recent years. Alglave et al. [9, 10] use a partial order relation on memory events to represent possible executions caused by thread interleaving. All partial order constraints are encoded into a formula. If no counterexample is found, then the multi-threaded program is safe. Otherwise, we need to check that the counterexample is not spurious, i.e., that the corresponding execution does not form a cycle. The above technique inspires our method. We propose a new theory to denote order constraints of memory events. Then, we elaborate on the theory solver of \(\mathcal {T}_{{\it ord}}\) in Z3 [21]. It can benefit from optimizations of Z3 inherently and is fully compatible with DPLL(T) framework. By utilizing axioms of \(\mathcal {T}_{{\it ord}}\) , \(\mathcal {T}_{{\it ord}}\) -solver is responsible for verifying whether the current order constraints are valid, i.e., whether the current order is acyclic. In general, we develop a new partial order theory to extend the basic idea of References [9, 10] with SMT-based constraint solving.

Horn et al. [33, 34] provide a different approach for representing and solving partial order constraints. They propose a novel partial-string theory where order constraints are represented as partial strings and operators on partial strings are defined accordingly.

\(\mathcal {T}_{{\it ord}}\) -solver’s decision procedure is closely related to incremental cycle detection (ICD), a mature algorithm with a long research history. Based on online topological ordering [41], research on ICD begins with Reference [43]. When an edge is inserted, it considers updating the topological order by reusing the previous order instead of calculating a new one. The authors propose a method that takes amortized \(\mathcal {O}(n)\) time for each edge insertion in a graph with n nodes, faster than the \(\mathcal {O}(n+m)\) offline algorithm (with m representing the number of edges). Later works cited in References [6, 7, 12, 31, 38, 48] aim at proposing new cycle detection algorithms; some work better on sparse graphs (assuming \(m = \mathcal {O}(n)\) ) and some better on dense graphs (assuming \(m = \mathcal {O}(n^2)\) ).

To the best of our knowledge, the fastest algorithms are proposed by Bender et al. [12]. They propose two \(\mathcal {O}(min\lbrace m^{1/2}\) , \(n^{2/3}\rbrace m)\) algorithms for sparse graphs and one \(\mathcal {O}(n^{2}log(n))\) algorithm for dense graphs. Among them, we apply the two-way-search algorithm for sparse graphs using pseudo-topological order in our approach. This algorithm achieves \(\mathcal {O}(min\lbrace m^{1/2}, n^{2/3}\rbrace)\) for each edge insertion by using pseudo-topological order and setting a well-designed limit to the order. However, it can only find a strongly connected component when a cycle occurs. Therefore, we extend the algorithm to search for all critical cycles with the shortest width.

Nieuwenhuis et al. [46] design an SMT implementation of difference logic, where literal \(x\ -\ y \le k\) is represented with an edge from x to y with weight k and \(x - y \lt k\) represented with an edge with weight \(k - \epsilon\) where \(\epsilon = 1\) in integer difference logic and \(\epsilon\) can be computed in linear time in real difference logic [49]. After adding an edge from x to y, a traverse of outgoing edges from y and a traverse of incoming edges from x are performed to check whether this edge addition forms a cycle. If a cycle on which weights of edges sum negative is found, then an inconsistency occurs. Their difference logic solver’s behavior after each edge addition is similar to ours but far from incremental. It does not maintain a topological order, so it cannot make use of previous information. Thus, each traverse is exhaustive and takes \(\mathcal {O}(n + m)\) time in a graph with n nodes and m edges.

Ge et al. [28] propose another method to solve order constraints. They only define one sort of partial order relation, i.e., the “happens-before” relation, and unite other sorts of partial order relations to “happens-before.” Then, they check if there is a cycle among ordering constraints. In this manner, their theory cannot handle the intricate relation between different sorts of partial orders for concurrent program verification, which, in our understanding, is very important for the efficiency of SMT solving. In comparison, our method can model various order relations to formulate a multi-threaded program’s possible executions. After each edge addition, their method calls Tarjan’s algorithm to find cycles. We notice the high frequency of edge addition and deletion while solving an SMT formula with order constraints. Instead, we develop a new ordering consistency (OC) theory specifically for multi-threaded program verification and elaborate on its theory solver with incremental cycle detection to achieve higher efficiency. Using incremental cycle detection, m additions in a graph with n nodes take only \(\mathcal {O}(m \times min\lbrace n^{\frac{2}{3}}, m^{\frac{1}{2}}\rbrace)\) time, better than Tarjan’s algorithm that spends \(\mathcal {O}(m \times (n + m))\) .

Multi-threaded program verification is complicated because of the uncertainty caused by thread interleaving. Too many possible execution paths may result in the state explosion problem. The most efficient techniques to alleviate this problem include but are not limited to bounded model checking [15, 17, 25, 27, 55], partial order reduction [1, 24, 57], abstraction refinement [23, 30, 51], and stateless model checking [2, 3, 39].

Moreover, to improve the availability of multi-threaded program verification, many researchers combine BMC with other techniques. Inverso et al. [37] propose a new approach named Lazy Sequentialization, which transforms a multi-threaded program into a sequential program under a specific unrolling bound and execution round. This method simulates thread interactions and is efficient in bug-finding. Yin et al. [59, 61] develop a SAT-based verification framework called scheduling constraints-based abstraction refinement (SCAR) for verifying multi-threaded programs under several memory models. They ignore the order constraints at first; instead, they add conflict clauses in the iterations of abstraction refinement to enhance the formula. They use the transitive closure to check the consistency of order relation, which achieves high efficiency. Cordeiro et al. [18] combine BMC with Satisfiable Modulo Theory (SMT). They encode all possible executions and utilize theory conflict to abstract thread interleaving. They implemented their tactics in ESCBMC, an SMT-based verification tool.

Stateless model checking (SMC) [29] checks correctness by enumerating all possible execution traces. Since thread interleaving results in numerous traces, traces inducing the same order between conflicting events are sorted into an equivalence class, namely, a Mazurkiewicz trace [44]. Mazurkiewicz traces can be further weakened to equivalence classes of traces with identical read-from relations [4]. Efficient traversal algorithms are developed to discover all Mazurkiewicz traces consistent with the memory model and prune inconsistent ones. Popular stateless model checkers, e.g., Nidhugg [2], GenMC [40], RCMC [39], DC-DPOR [16], are based on this technique.

Weak memory models are another issue for multi-threaded program verification, since they bring intra-threaded uncertainty, making concurrent program verification even harder. Lots of groundbreaking research extends their approach designed for SC to weak memory models. Tomasco [54] extends Lazy Sequentialization to TSO and PSO (called Lazy-SMA). They replace memory access with operations on a shared memory abstraction and verify their validity under SC. Yin [60] extends their SCAR technique under SC to TSO and PSO by relaxing ordering constraints between neighboring events. Gavrilenko et al. [27] combine BMC with relation analysis and employ the backend SMT solver to verify concurrent programs under weak memory models. In this article, we propose a new modeling technique and extend our \(\mathcal {T}_{{\it ord}}\) -theory and \(\mathcal {T}_{{\it ord}}\) -solver to weak memory models (e.g., TSO, PSO). We performed extensive experiments with CBMC, Lazy-SMA, Dartagnan, and several state-of-the-art SMC tools (e.g., Nidhugg, GenMC) under various memory models. The experimental results illustrate that our approach is competitive and efficient.

8 Conclusion and Future Work

In this article, we presented a novel SMT-based approach for verifying multi-threaded programs. We proposed a dedicated ordering consistency theory for multi-threaded program verification under SC, TSO, and PSO. We also elaborated on its theory solver, which realizes incremental consistency checking, minimal conflict clause generation, and specialized theory propagation to improve the efficiency of SMT solving. We implemented our techniques on top of CBMC and Z3 and conducted experiments on SV-COMP benchmarks and Nidhugg benchmarks to evaluate its effectiveness and efficiency. The experimental results show that our approach has significant improvements over the state-of-the-art verification techniques.

Our approach is designed for multi-threaded program verification using SMT. We are planning to extend our approach to SAT-based framework in the future.

Footnotes

Lazy-SMA does not support SC and is excluded from the experiment under SC.

CPA-Seq and Lazy-CSeq do not support TSO and PSO and are excluded from the corresponding experiments.

SV-COMP benchmarks use atomic_begin and atomic_end to declare atomic operations.

⁴

The atomicity constraints can be regarded as bi-directed ordering constraints.

⁵

A concurrent program can use fence instructions to prevent non-SC behaviors.

⁶

Equivalence relations are regarded as undirected edges.

⁷

If the graph contains no cycles, then we can always construct a new topological order at the end of the search.

⁸

https://thufv.github.io/research/zord.html.

⁹

https://sv-comp.sosy-lab.org/2020/.

¹⁰

CBMC-v5.52.0: https://github.com/diffblue/cbmc/releases/tag/cbmc-5.52.0.

¹¹

Lazy-CSeq-v2.1: https://github.com/omainv/cseq/releases/tag/SVCOMP2021.

¹²

Lazy-SMA: http://users.ecs.soton.ac.uk/gp4/cseq/fmcad16.zip.

¹³

CPA-Seq-v2.0: https://gitlab.com/sosy-lab/sv-comp/archives-2021/raw/svcomp21/2021/cpa-seq.zip.

¹⁴

Dartagnan-v2.0.7: https://gitlab.com/sosy-lab/sv-comp/archives-2021/raw/svcomp21/2021/dartagnan.zip.

¹⁵

Nidhugg-rfsc: https://github.com/nidhugg/nidhugg/tree/master/benchmarks.

¹⁶

GenMC v8.0 https://github.com/MPI-SWS/genmc/releases/tag/v0.8.

¹⁷

Nidhugg benchmarks: https://github.com/nidhugg/nidhugg/tree/master/benchmarks,commit6af46b8.

¹⁸

GenMC has two configurations: modification order (MO) and writes-before (WB). The latter is much superior to the former [40]. In our experiments, GenMC is configured using WB.

References

[1]

Parosh Abdulla, Stavros Aronis, Bengt Jonsson, and Konstantinos Sagonas. 2014. Optimal dynamic partial order reduction. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’14). Association for Computing Machinery, New York, NY, 373–384. DOI:DOI:

Digital Library

Google Scholar

[2]

Parosh Aziz Abdulla, Stavros Aronis, Mohamed Faouzi Atig, Bengt Jonsson, Carl Leonardsson, and Konstantinos Sagonas. 2015. Stateless model checking for TSO and PSO. In Tools and Algorithms for the Construction and Analysis of Systems, Christel Baier and Cesare Tinelli (Eds.). Springer Berlin, 353–367. DOI:

Digital Library

Google Scholar

[3]

Parosh Aziz Abdulla, Mohamed Faouzi Atig, Bengt Jonsson, and Carl Leonardsson. 2016. Stateless model checking for POWER. In Computer Aided Verification, Swarat Chaudhuri and Azadeh Farzan (Eds.). Springer International Publishing, Cham, 134–156. DOI:

Crossref

Google Scholar

[4]

Parosh Aziz Abdulla, Mohamed Faouzi Atig, Bengt Jonsson, Magnus Lång, Tuan Phong Ngo, and Konstantinos Sagonas. 2019. Optimal stateless model checking for reads-from equivalence under sequential consistency. Proc. ACM Program. Lang. 3, OOPSLA (Oct.2019), 29 pages. DOI:DOI:

Digital Library

Google Scholar

[5]

Sarita V. Adve and Kourosh Gharachorloo. 1996. Shared memory consistency models: A tutorial. Computer 29, 12 (1996), 66–76. DOI:

Digital Library

Google Scholar

[6]

Deepak Ajwani and Tobias Friedrich. 2007. Average-case analysis of online topological ordering. In Proceedings of the 18th International Conference on Algorithms and Computation (ISAAC’07). Springer-Verlag, Berlin, 464–475. DOI:

Crossref

Google Scholar

[7]

Deepak Ajwani, Tobias Friedrich, and Ulrich Meyer. 2008. An O(N2.75) algorithm for incremental topological ordering. ACM Trans. Algor. 4, 4 (Aug.2008). DOI:DOI:

Digital Library

Google Scholar

[8]

Jade Alglave, Daniel Kroening, Vincent Nimal, and Daniel Poetzl. 2014. Don’t sit on the fence. In Computer Aided Verification, Armin Biere and Roderick Bloem (Eds.). Springer International Publishing, Cham, 508–524. DOI:

Digital Library

Google Scholar

[9]

Jade Alglave, Daniel Kroening, and Michael Tautschnig. 2013. Partial orders for efficient bounded model checking of concurrent software. In Computer Aided Verification, Natasha Sharygina and Helmut Veith (Eds.). Springer Berlin, 141–157. DOI:

Crossref

Google Scholar

[10]

Jade Alglave, Luc Maranget, Susmit Sarkar, and Peter Sewell. 2010. Fences in weak memory models. In Computer Aided Verification, Tayssir Touili, Byron Cook, and Paul Jackson (Eds.). Springer Berlin, 258–272. DOI:

Digital Library

Google Scholar

[11]

Clark Barrett and Cesare Tinelli. 2018. Satisfiability Modulo Theories. Springer International Publishing, Cham, 305–343. DOI:DOI:

Crossref

Google Scholar

[12]

Michael A. Bender, Jeremy T. Fineman, Seth Gilbert, and Robert E. Tarjan. 2015. A new approach to incremental cycle detection and related problems. ACM Trans. Algor. 12, 2 (Dec.2015). DOI:DOI:

Digital Library

Google Scholar

[13]

Dirk Beyer, Thomas A. Henzinger, and Grégory Théoduloz. 2007. Configurable software verification: Concretizing the convergence of model checking and program analysis. In Computer Aided Verification, Werner Damm and Holger Hermanns (Eds.). Springer Berlin, 504–518. DOI:

Crossref

Google Scholar

[14]

Dirk Beyer and M. Erkan Keremoglu. 2011. CPAchecker: A tool for configurable software verification. In Computer Aided Verification, Ganesh Gopalakrishnan and Shaz Qadeer (Eds.). Springer Berlin, 184–190. DOI:

Crossref

Google Scholar

[15]

Armin Biere, Alessandro Cimatti, Edmund Clarke, and Yunshan Zhu. 1999. Symbolic model checking without BDDs. In Tools and Algorithms for the Construction and Analysis of Systems, W. Rance Cleaveland (Ed.). Springer Berlin, 193–207. DOI:

Crossref

Google Scholar

[16]

Marek Chalupa, Krishnendu Chatterjee, Andreas Pavlogiannis, Nishant Sinha, and Kapil Vaidya. 2017. Data-centric dynamic partial order reduction. Proc. ACM Program. Lang. 2, POPL (Dec.2017). DOI:DOI:

Digital Library

Google Scholar

[17]

Edmund Clarke, Armin Biere, Richard Raimi, and Yunshan Zhu. 2001. Bounded model checking using satisfiability solving. Form. Methods Syst. Des. 19, 1 (July2001), 7–34. DOI:DOI:

Digital Library

Google Scholar

[18]

Lucas Cordeiro and Bernd Fischer. 2011. Verifying multi-threaded software using SMT-based context-bounded model checking. In Proceedings of the 33rd International Conference on Software Engineering (ICSE’11). Association for Computing Machinery, New York, NY, 331–340. DOI:DOI:

Digital Library

Google Scholar

[19]

Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13 (1991), 451–490. DOI:

Digital Library

Google Scholar

[20]

Martin Davis, George Logemann, and Donald Loveland. 1962. A machine program for theorem-proving. Commun. ACM 5, 7 (July1962), 394–397. DOI:DOI:

Digital Library

Google Scholar

[21]

Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and Jakob Rehof (Eds.). Springer Berlin, 337–340. DOI:

Crossref

Google Scholar

[22]

Leonardo De Moura and Nikolaj Bjørner. 2011. Satisfiability modulo theories: Introduction and applications. Commun. ACM 54, 9 (Sept.2011), 69–77. DOI:DOI:

Digital Library

Google Scholar

[23]

Thomas Dinsdale-Young, Mike Dodds, Philippa Gardner, Matthew J. Parkinson, and Viktor Vafeiadis. 2010. Concurrent abstract predicates. In ECOOP 2010—Object-Oriented Programming, Theo D’Hondt (Ed.). Springer Berlin, 504–528. DOI:

Crossref

Google Scholar

[24]

Cormac Flanagan and Patrice Godefroid. 2005. Dynamic partial-order reduction for model checking software. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’05). Association for Computing Machinery, New York, NY, 110–121. DOI:DOI:

Digital Library

Google Scholar

[25]

Malay K. Ganai and Aarti Gupta. 2008. Efficient modeling of concurrent systems in BMC. In Model Checking Software, Klaus Havelund, Rupak Majumdar, and Jens Palsberg (Eds.). Springer Berlin, 114–133. DOI:

Digital Library

Google Scholar

[26]

Harald Ganzinger, George Hagen, Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. 2004. DPLL(T): Fast decision procedures. In Computer Aided Verification, Rajeev Alur and Doron A. Peled (Eds.). Springer Berlin, 175–188.

Google Scholar

[27]

Natalia Gavrilenko, Hernán Ponce-de León, Florian Furbach, Keijo Heljanko, and Roland Meyer. 2019. BMC for weak memory models: Relation analysis for compact SMT encodings. In Computer Aided Verification, Isil Dillig and Serdar Tasiran (Eds.). Springer International Publishing, Cham, 355–365. DOI:

Crossref

Google Scholar

[28]

Cunjing Ge, Feifei Ma, Jeff Huang, and Jian Zhang. 2016. SMT solving for the theory of ordering constraints. In Languages and Compilers for Parallel Computing, Xipeng Shen, Frank Mueller, and James Tuck (Eds.). Springer International Publishing, Cham, 287–302. DOI:

Digital Library

Google Scholar

[29]

Patrice Godefroid. 1997. Model checking for programming languages using VeriSoft. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’97). Association for Computing Machinery, New York, NY, 174–186. DOI:DOI:

Digital Library

Google Scholar

[30]

Ashutosh Gupta, Corneliu Popeea, and Andrey Rybalchenko. 2011. Predicate abstraction and refinement for verifying multi-threaded programs. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’11). Association for Computing Machinery, New York, NY, 331–344. DOI:DOI:

Digital Library

Google Scholar

[31]

Bernhard Haeupler, Telikepalli Kavitha, Rogers Mathew, Siddhartha Sen, and Robert E. Tarjan. 2012. Incremental cycle detection, topological ordering, and strong component maintenance. ACM Trans. Algor. 8, 1 (Jan.2012). DOI:DOI:

Digital Library

Google Scholar

[32]

Fei He, Zhihang Sun, and Hongyu Fan. 2021. Satisfiability modulo ordering consistency theory for multi-threaded program verification. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI’21). Association for Computing Machinery, New York, NY, 1264–1279. DOI:DOI:

Digital Library

Google Scholar

[33]

Alex Horn and Jade Alglave. 2014. Concurrent Kleene Algebra of Partial Strings. arxiv:1407.0385 [cs.LO].

Google Scholar

[34]

Alex Horn and Daniel Kroening. 2015. On Partial Order Semantics for SAT/SMT-based Symbolic Encodings of Weak Memory Concurrency. arxiv:1504.00037 [cs.LO].

Google Scholar

[35]

Jeff Huang. 2015. Stateless model checking concurrent programs with maximal causality reduction. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’15). Association for Computing Machinery, New York, NY, 165–174. DOI:DOI:

Digital Library

Google Scholar

[36]

Omar Inverso, Truc L. Nguyen, Bernd Fischer, Salvatore La Torre, and Gennaro Parlato. 2015. Lazy-CSeq: A context-bounded model checking tool for multi-threaded C-Programs. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE’15). IEEE Press, 807–812. DOI:DOI:

Digital Library

Google Scholar

[37]

Omar Inverso, Ermenegildo Tomasco, Bernd Fischer, Salvatore La Torre, and Gennaro Parlato. 2014. Bounded model checking of multi-threaded C programs via lazy sequentialization. In Computer Aided Verification, Armin Biere and Roderick Bloem (Eds.). Springer International Publishing, Cham, 585–602. DOI:

Digital Library

Google Scholar

[38]

Irit Katriel and Hans L. Bodlaender. 2006. Online topological ordering. ACM Trans. Algor. 2, 3 (July2006), 364–379. DOI:DOI:

Digital Library

Google Scholar

[39]

Michalis Kokologiannakis, Ori Lahav, Konstantinos Sagonas, and Viktor Vafeiadis. 2017. Effective stateless model checking for C/C++ concurrency. Proc. ACM Program. Lang. 2, POPL (Dec.2017). DOI:DOI:

Digital Library

Google Scholar

[40]

Michalis Kokologiannakis, Azalea Raad, and Viktor Vafeiadis. 2019. Model checking for weakly consistent libraries. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’19). Association for Computing Machinery, New York, NY, 96–110. DOI:DOI:

Digital Library

Google Scholar

[41]

Daniel Kroening and Michael Tautschnig. 2014. CBMC—C bounded model checker. In Tools and Algorithms for the Construction and Analysis of Systems, Erika Ábrahám and Klaus Havelund (Eds.). Springer Berlin, 389–391. DOI:

Crossref

Google Scholar

[42]

Leslie Lamport. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Comput. Archit. Lett. 28, 9 (1979), 690–691. DOI:

Digital Library

Google Scholar

[43]

Alberto Marchetti-Spaccamela, Umberto Nanni, and Hans Rohnert. 1996. Maintaining a topological order under edge insertions. Inform. Process. Lett. 59, 1 (1996), 53–58. DOI:DOI:

Digital Library

Google Scholar

[44]

Antoni Mazurkiewicz. 1987. Trace theory. In Petri Nets: Applications and Relationships to Other Models of Concurrency, W. Brauer, W. Reisig, and G. Rozenberg (Eds.). Springer Berlin, 278–324.

Google Scholar

[45]

Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang, and Sharad Malik. 2001. Chaff: Engineering an efficient SAT solver. In Proceedings of the 38th Annual Design Automation Conference (DAC’01). Association for Computing Machinery, New York, NY, 530–535. DOI:DOI:

Digital Library

Google Scholar

[46]

Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. 2006. Solving SAT and SAT modulo theories: From an abstract Davis-Putnam-Logemann-Loveland procedure to DPLL(T). J. ACM 53, 6 (Nov.2006), 937–977. DOI:DOI:

Digital Library

Google Scholar

[47]

Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A better X86 memory model: X86-TSO. In Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics (TPHOLs’09). Springer-Verlag, Berlin, 391–407. DOI:DOI:

Digital Library

Google Scholar

[48]

David J. Pearce and Paul H. J. Kelly. 2007. A dynamic topological sort algorithm for directed acyclic graphs. ACM J. Exp. Algor. 11 (Feb.2007), 1.7–es. DOI:DOI:

Digital Library

Google Scholar

[49]

Alexander Schrijver. 1986. Theory of Linear and Integer Programming. John Wiley & Sons, Inc.

Digital Library

Google Scholar

[50]

Dennis Shasha and Marc Snir. 1988. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst. 10, 2 (Apr.1988), 282–312. DOI:DOI:

Digital Library

Google Scholar

[51]

Nishant Sinha and Chao Wang. 2011. On interference abstractions. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’11). Association for Computing Machinery, New York, NY, 423–434. DOI:DOI:

Digital Library

Google Scholar

[52]

Bjarne Steensgaard. 1996. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’96). Association for Computing Machinery, New York, NY, 32–41. DOI:DOI:

Digital Library

Google Scholar

[53]

R. Tarjan. 1971. Depth-first search and linear graph algorithms. In Proceedings of the 12th Annual Symposium on Switching and Automata Theory (SWAT’71). 114–121. DOI:DOI:

Digital Library

Google Scholar

[54]

Ermenegildo Tomasco, Truc L. Nguyen, Omar Inverso, Bernd Fischer, Salvatore La Torre, and Gennaro Parlato. 2016. Lazy sequentialization for TSO and PSO via shared memory abstractions. In Proceedings of the Conference on Formal Methods in Computer-Aided Design (FMCAD’16). 193–200. DOI:DOI:

Crossref

Google Scholar

[55]

Chao Wang, HoonSang Jin, Gary D. Hachtel, and Fabio Somenzi. 2004. Refining the SAT decision ordering for bounded model checking. In Proceedings of the 41st Annual Design Automation Conference (DAC’04). Association for Computing Machinery, New York, NY, 535–538. DOI:DOI:

Digital Library

Google Scholar

[56]

Chao Wang, Sudipta Kundu, Malay Ganai, and Aarti Gupta. 2009. Symbolic predictive analysis for concurrent programs. In FM 2009: Formal Methods, Ana Cavalcanti and Dennis R. Dams (Eds.). Springer Berlin, 256–272. DOI:

Digital Library

Google Scholar

[57]

Chao Wang, Zijiang Yang, Vineet Kahlon, and Aarti Gupta. 2008. Peephole partial order reduction. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and Jakob Rehof (Eds.). Springer Berlin, 382–396. DOI:

Crossref

Google Scholar

[58]

CORPORATE SPARC International, Inc. 1994. The SPARC Architecture Manual (Version 9). Prentice-Hall, Inc.

Digital Library

Google Scholar

[59]

Liangze Yin, Wei Dong, Wanwei Liu, Yunchou Li, and Ji Wang. 2018. YOGAR-CBMC: CBMC with scheduling constraint based abstraction refinement. In Tools and Algorithms for the Construction and Analysis of Systems, Dirk Beyer and Marieke Huisman (Eds.). Springer International Publishing, Cham, 422–426. DOI:

Crossref

Google Scholar

[60]

Liangze Yin, Wei Dong, Wanwei Liu, and Ji Wang. 2018. Scheduling constraint based abstraction refinement for weak memory models. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE’18). Association for Computing Machinery, New York, NY, 645–655. DOI:DOI:

Digital Library

Google Scholar

[61]

L. Yin, W. Dong, W. Liu, and J. Wang. 2020. On scheduling constraint abstraction for multi-threaded program verification. IEEE Trans. Softw. Eng. 46, 5 (May2020), 549–565. DOI:DOI:

Crossref

Google Scholar

[62]

Naling Zhang, Markus Kusano, and Chao Wang. 2015. Dynamic partial order reduction for relaxed memory models. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’15). Association for Computing Machinery, New York, NY, 250–259. DOI:DOI:

Digital Library

Google Scholar

Cited By

View all

Luo JXiao LZhu HSu Z(2024)Trace and Algebraic Semantics for Partial Store Order Memory Model2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00348(2171-2176)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00348
Iorga DWickerson JDonaldson A(2023)Simulating Operational Memory Models Using Off-the-Shelf Program Analysis ToolsIEEE Transactions on Software Engineering10.1109/TSE.2023.332605649:12(5084-5102)Online publication date: 24-Oct-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3326056

Index Terms

Satisfiability Modulo Ordering Consistency Theory for SC, TSO, and PSO Memory Models
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Formal software verification
2. Theory of computation
  1. Logic
    1. Logic and verification

Recommendations

Interference relation-guided SMT solving for multi-threaded program verification
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Concurrent program verification is challenging due to a large number of thread interferences. A popular approach is to encode concurrent programs as SMT formulas and then rely on off-the-shelf SMT solvers to accomplish the verification. In most existing ...
Satisfiability modulo ordering consistency theory for multi-threaded program verification
PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Analyzing multi-threaded programs is hard due to the number of thread interleavings. Partial orders can be used for modeling and analyzing multi-threaded programs. However, there is no dedicated decision procedure for solving partial-order constraints. ...
Partial-coherence abstractions for relaxed memory models
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

We present an approach for automatic verification and fence inference in concurrent programs running under relaxed memory models. Verification under relaxed memory models is a hard problem. Given a finite state program and a safety specification, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems

ACM Transactions on Programming Languages and Systems Volume 45, Issue 1

March 2023

274 pages

ISSN:0164-0925

EISSN:1558-4593

DOI:10.1145/3572862

Editor:
Jan Vitek
Northeastern University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 March 2023

Online AM: 17 January 2023

Accepted: 02 December 2022

Revised: 17 October 2022

Received: 24 October 2021

Published in TOPLAS Volume 45, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National Key Research and Development Program of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
1,156
Total Downloads

Downloads (Last 12 months)678
Downloads (Last 6 weeks)66

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Luo JXiao LZhu HSu Z(2024)Trace and Algebraic Semantics for Partial Store Order Memory Model2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00348(2171-2176)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00348
Iorga DWickerson JDonaldson A(2023)Simulating Operational Memory Models Using Off-the-Shelf Program Analysis ToolsIEEE Transactions on Software Engineering10.1109/TSE.2023.332605649:12(5084-5102)Online publication date: 24-Oct-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3326056

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

1 Introduction

2 Preliminaries

2.1 Notions

2.2 Satisfiability Modulo Theory and DPLL(T)

2.3 Concurrent Execution as Partial Order

3 Symbolic Encoding of Multi-threaded Programs

3.1 Symbolic Encoding

SSA Variables and Access Events.

Value Assignment Encoding.

Error Condition.

Program Order Constraints.

Atomicity Constraints.

Read-from Variables and Constraints.

Write-serialization Variables and Constraints.

From-read Constraints.

The Whole Encoding Formula.

3.2 Constraints Generation

3.2.1 Preserved Program Order and Atomicity Constraints.

3.2.2 Read-from and Write-serialization Constraints.

3.3 Comparison to Other Approaches

3.4 Correctness of the Encoding

4 Ordering Consistency Theory

4.1 Theory Definition

4.2 Event Graph

5 THEORY SOLVER FOR 𝒯ord

5.1 Overview

5.2 Consistency Checking

5.2.1 Incremental Cycle Detection.

Example.

5.3 Conflict Clause Generation

Step 1 (Subgraph construction).

Step 2 (Iterative solving).

Example.

5.4 Backtracking

5.5 Theory Propagation

5.5.1 Unit-edge Propagation.

5.5.2 From-read Propagation.

Example.

6 Experimental Evaluation

6.1 Implementation and Setup

6.2 Experiment on SV-COMP Benchmarks

Results under SC.

Results under TSO and PSO.

6.3 Experiment on Strategies of 𝒯ord-solver

Effects of FR Propagation.

Effects of Unit-edge Propagation.

Effects of ICD Algorithm.

6.4 Comparison with Stateless Model Checking

Results.

Summary.

6.5 Threats to Validity and Limitations

7 Related Work

8 Conclusion and Future Work

Footnotes

References

Cited By

Index Terms

Recommendations

Interference relation-guided SMT solving for multi-threaded program verification

Satisfiability modulo ordering consistency theory for multi-threaded program verification

Partial-coherence abstractions for relaxed memory models

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

5 THEORY SOLVER FOR 𝒯_ord

6.3 Experiment on Strategies of 𝒯_ord-solver