1 Introduction

Software changes frequently during its life-cycle: developers fix bugs, adapt existing features, or add new features. In agile development, software construction is an intrinsically incremental process. Every change to a working system holds a risk to introduce a new defect. Since software failures are often costly and may even endanger human lives, it is an integral part of software development to find potential failures and ensure their absence.

figure a

However, running a full verification after each change is inadequate: Changes rarely affect the complete program behavior. For example, consider program absSum (Fig. 1, middle). If the assignment of program variable r is changed in the else-branch at location 5 (\(\texttt {absSum}^\texttt {mod}\), Fig. 1, right), only program executions that take that else-branch show different behavior. Program executions that take the if-branch (highlighted in gray) are not affected by the change. This is typical for program changes: A modified program \(P'\) exhibits some new or changed program executions compared to an original program P, but some executions also stay the same (Fig. 1, left). To ensure the safety of \(P'\), it is sufficient to inspect only the changed behavior \(\mathrm {ex}(P')\setminus \mathrm {ex}(P)\).

Fig. 1.
figure 1

Relation between program executions of original and modified program (left) and an example: Program absSum (middle) and its modified version \(\texttt {absSum}^\texttt {mod}\) (right). The modification at location 5 is shown in blue. Program parts unaffected by the modification are highlighted in gray.

Many incremental verification approaches [39, 40] use this insight: Regression-test selection [62] tries to only execute those tests in a test suite that are relevant w.r.t. the change, and incremental formal verification techniques adapt existing proofs [33, 49, 53, 54], reuse intermediate results [16, 59], or skip the exploration of unchanged behavior [21, 47, 60, 61]. However, they (a) all focus on one fixed verification approach, (b) require a strong coupling between the original verification approach and the incremental technique, and (c) require an initial, full verification run. Often, this inflexibility makes an approach prohibitive.

As an alternative, we define the concept of difference verification with conditions: Given the original and the changed software, difference verification with conditions first identifies all executions that are affected by changes and encodes them in a condition, an exchange format already known from conditional model checking [10]—we call this first part diffCond. Then, a conditional verifier uses that condition to verify only the changed program behavior. For this step, any existing off-the-shelf verifier can be turned into a conditional verifier with the reducer-based approach [13].

Difference verification with conditions allows us to (a) use varying verification approaches for incremental verification, (b) automatically turn any existing verifier into an incremental verifier, and (c) skip an initial, costly verification run.

Contributions. We make the following contributions:

  • We propose difference verification with conditions, which is an incremental verification approach that combines existing tools and approaches.

  • We provide the algorithm diffCond, an integral part of difference verification with conditions, which outputs a description of the modified execution paths in an exchangeable condition format. We also prove its correctness.

  • We implemented diffCond in the verification framework CPAchecker and combined it with existing verifiers to construct difference verifiers.

  • To study the effectiveness and efficiency of difference verification with conditions, we performed an extensive evaluation on more than 10 000 C programs.

  • diffCond and all our data are available for replication and to construct further difference verifiers (see Sect. 7).

Fig. 2.
figure 2

CFA of absSum (Fig. 1), CFA of \(\texttt {absSum}^\texttt {mod}\), and a condition that describes the common executions of both programs, as created by our approach

2 Background

Programs. For ease of presentation, we consider imperative programs with deterministic control-flow, which execute statements from a set Ops. Our implementation supports C programs. Following literature [8, 9, 30], we model programs as control-flow automata.

Definition 1

A control-flow automaton (CFA) \(P=(L,\ell _0,G)\) consists of

  • a set L of program locations with initial location \(\ell _0\in L\), and

  • a set \(G\subseteq L\times Ops\times L\) of control-flow edges.

CFA P is deterministic if \((\ell ,op,\ell '),(\ell ,op,\ell '')\in G ~\Rightarrow ~ \ell '=\ell ''\).

Figure  2 shows the CFA of the example program absSum from Fig. 1. A sequence \(\ell _0{\mathop {\rightarrow }\limits ^{op_1}}\ell _1\dots {\mathop {\rightarrow }\limits ^{op_n}}\ell _n\) is a syntactical path through CFA \(P = (L, \ell _0, G)\), if \(\forall i \in [1, n]: (\ell _{i-1}, op_i, \ell _i) \in G\). We rely on standard operational semantics and model a program state by a pair of (1) the program counter, whose value refers to a program location in the CFA, and (2) a concrete data state c, whose shape we do not further specify [8]. We denote the set of all concrete data states as C. The function \(sp_{op} : C \rightarrow 2^C\) describes the possible effects of operation \(op \in Ops\) on concrete data state \(c \in C\). Based on this, a sequence \((\ell _0,c_0){\mathop {\rightarrow }\limits ^{op_1}}(\ell _1, c_1)\dots {\mathop {\rightarrow }\limits ^{op_n}}(\ell _n, c_n)\) is a program path through CFA \(P=(L, \ell _0,G)\), if \(\ell _0{\mathop {\rightarrow }\limits ^{op_1}}\ell _1\dots {\mathop {\rightarrow }\limits ^{op_n}}\ell _n\) is a syntactical path through P and \(\forall i \in [1, n]: c_i \in sp_{op_i}(c_{i-1})\). We denote the set of all program paths by \(\mathrm {paths}(P)\). Program executions are derived from program paths. If \(p=(\ell _0,c_0){\mathop {\rightarrow }\limits ^{op_1}}(\ell _1,c_1)\dots {\mathop {\rightarrow }\limits ^{op_n}}(\ell _n,c_n)\) is a program path, then \(\mathrm {ex}(p)=c_0{\mathop {\rightarrow }\limits ^{op_1}}c_1\dots {\mathop {\rightarrow }\limits ^{op_n}}c_n\) is a program execution. The executions of a program P are defined as \(\mathrm {ex}(P):=\{ex(p)\mid p\in \mathrm {paths}(P)\}\).

Conditions. A condition describes which program executions were already verified, e.g., in a previous verification run. We use automata to represent conditions and use accepting states to identify already verified executions [13].

Definition 2

A condition \(A=(Q,\delta ,q_0,F)\) consists of:

  • a finite set Q of states,

  • a transition relation \(\delta \subseteq Q\times Ops\times Q\) ensuring \(\forall (q, op, q')\in \delta : q\!\in \! F \Rightarrow q'\!\in \! F\),

  • the initial state \(q_0\in Q\), and a set \(F\subseteq Q\) of accepting states.Footnote 1

The goal of absSum (left program in Fig. 2) is to compute \(r=\sum _{i=0}^{|a|}\). However, the original program is buggy: In location \(\ell _5\), it must compute the product of a and \(a+1\), not the sum. The fixed program is shown in the middle of Fig. 2—the fix is highlighted in blue. The original and modified version of the program only differ in the else-branch. If we assume that the original program was already verified, we know that program executions passing through the if-branch have already been verified and do not need to be considered during a reverification. In contrast, executions that pass through the else-branch and reach the modified statement must be verified. The condition shown on the right of Fig. 2 encodes this insight. Program executions that pass through the if-branch (\(a < 0\)) lead to the accepting state \(q_2\)—we say they are covered by the condition. In contrast, program executions that pass through the else-branch (\(\lnot a < 0\)) never reach \(q_2\) —they are not covered by the condition, and must be analyzed.

Definition 3

A condition \(A\quad \,\,\,{=}\quad \,\,\,(Q,\delta ,q_0,F)\) covers an execution \(\pi ={c_0{\mathop {\rightarrow }\limits ^{op_1}}c_1\dots {\mathop {\rightarrow }\limits ^{op_n}}c_n}\) if there exists an index \(k\in [0, n]\) and a run \(\rho \,\,{=}\,\,q_0{\mathop {\rightarrow }\limits ^{op_1}}q_2\dots {\mathop {\rightarrow }\limits ^{op_k}}q_k\), s.t. \(q_k\in F\) and \(\forall i \in [1, k]: (q_{i-1},op_i,q_i)\in \delta \).

Next, we introduce a simple and efficient way to systematically compute a condition that covers the common executions of an original and a modified program.

3 Component diffCond for Modular Construction

The ultimate goal of difference verification with conditions is to speed up reverification of modified programs. To achieve this goal, we aim at ignoring unmodified program behavior during verification. Conditions are a well-fitting format to describe the unmodified program behavior. However, to benefit from difference verification with conditions, the construction of such conditions must be efficient, i.e., consume only a small portion of the overall execution time of the verification. Therefore, we use a syntactic approach to compute the condition, diffCond (Alg. 1), which is linear in time regarding the size of the modified program.

figure b

diffCond  gets as input the original program P and the modified program \(P'\). In lines 1 to 11, diffCond traverses the modified and the original program in parallel, stops traversal if the original and the modified program differ, and remembers the edge that differs in the modified program.

It uses three data structures: Set \(E \subseteq L \times L' \times Ops \times L \times L'\) stores all compared edges \((\ell _1, op, \ell _2)\) and \((\ell _1', op, \ell _2')\) that are equal in both programs. These edges are called standard edges. They are stored in the composite form \(((\ell _1, \ell _1'), op, (\ell _2, \ell _2'))\). Set \(D \subseteq L \times L' \times Ops \times L'\) stores all edges \((\ell _1', op, \ell _2')\) of the modified program \(P'\) that represent a change from the original program P at \(\ell _1\), called difference edges. They are stored in the form \(((\ell _1, \ell _1'), op, \ell _2')\). Set \(\textit{waitlist}\subseteq L \times L'\) stores all pairs of program locations \((\ell _1, \ell _1')\) for which a program path with the same syntactic structure exist in P and \(P'\), and for which no outgoing edges have been considered yet. Initially, E and D are empty—no edges were checked so far, and the algorithm starts at the two initial program locations, i.e., \(\textit{waitlist}= \{(\ell _0, \ell _0')\}\) (lines 1 and 2). As long as \(\textit{waitlist}\) contains program locations, the algorithm picks one of them, here depicted as \((\ell _1, \ell _1')\) (line 4). It considers all outgoing edges \((\ell _1', op, \ell _2')\) of \(\ell _1'\) in the modified program. If the same operation op does not exist at any outgoing edge of \(\ell _1\), it is considered to be changed and the difference edge \(((\ell _1, \ell _1'), op, \ell _2')\) is stored in D before continuing with the next state in \(\textit{waitlist}\). However, if the same operation op exists at an outgoing edge \((\ell _1, op, \ell _2)\), it is considered to be equal and the standard edge \(((\ell _1, \ell _1'), op, (\ell _2, \ell _2'))\) is stored in E before continuing with the next state in \(\textit{waitlist}\). To this end, diffCond explores the syntactical composition of the original and modified program. In addition, if the tuple \((\ell _2, \ell _2')\) of locations has not been detected before (line 10), it is added to the waitlist for further exploration. Figure 3 shows the graph built from edges E (black) and D (blue and dashed) when executing diffCond on absSum and \(\texttt {absSum}^\texttt {mod}\).

Fig. 3.
figure 3

Parallel composition of absSum and \(\texttt {absSum}^\texttt {mod}\) as computed by diffCond

To compute the condition, we first determine the condition’s states. Lines 12 to 18 compute all nodes that can reach a successor of a difference edge. Figure 3 highlights these nodes in green. Nodes that are not discovered in lines 12–18 cannot lead to a difference edge and, thus, not to different program behavior. Consequently, undiscovered nodes that are successors of nodes discovered in lines 12–18 become final states (line 23). Figure 3 highlights these nodes in gray (only node \((\ell _2, \ell '_2)\)). The union of discovered and final states become our condition states. To complete the construction, we use the pair of initial program locations as the initial state \((\ell _0, \ell _0')\) and add to the transition relation all transitions from E and D that connect condition states. Figure 2c shows the condition created from Fig. 3.

Finally, note that lines 19–21 handle the special case that the set D of difference edges is empty, thus resulting in \(Q=\emptyset \) in line 19. The set D is empty if the original and the modified program only differ in the names of their program locationsFootnote 2 or if the modified program is empty (\((\ell '_0,\cdot ,\cdot )\notin G'\)). In both cases, all executions of the modified program are covered by the executions of the original program. As a result, the condition covers all executions: its only state is both initial and accepting state, and the condition has no transitions.

The purpose of algorithm diffCond is to compute a condition that skipping unchanged behavior during reverification of a modified program. To still have a sound reverification, the produced condition must not cover executions that do not occur in the original program. The following theorem states this property of algorithm diffCond.

Theorem 1

Let \(P=(L,\ell _0,G)\) and \(P'=(L',\ell '_0,G')\) be two CFAs. diffCond\((P,P')\) does not cover any execution from \(\mathrm {ex}(P')\setminus \mathrm {ex}(P)\).

Proof

Assume \(\mathrm {ex}(P')\setminus \mathrm {ex}(P)\ne \emptyset \). Hence, \({\textsc {diffCond}}(P,P') = (Q,\delta ,q_0, F)\) is returned in line 27. Let \((Q,\delta ,q_0,F) = A\), let \(\pi =c_0{\mathop {\rightarrow }\limits ^{op_1}}c_1\dots {\mathop {\rightarrow }\limits ^{op_n}}c_n\in \mathrm {ex}(P')\setminus \mathrm {ex}(P)\), and let \(\rho =q_0{\mathop {\rightarrow }\limits ^{op_1}}q_1\dots {\mathop {\rightarrow }\limits ^{op_k}}q_k\) be a run through A, s.t. \(0\le k\le n\) and \(\forall 1\le i\le k: (q_{i-1},op_i,q_i)\in \delta \). By construction, (1) \(q_0\notin F\), (2) \({\forall 1\le i<k}: (q_{i-1}, op_i, q_i)\in E\wedge q_i\notin F\), and (3) \((q_{k-1},op_k,q_k)\in E\cup D\). We need to show that \(q_k\notin F\). Case \(k=0\) follows from (1).

Next, consider the case \(k=n\). If \((q_{k-1},op_k,q_k)\in E\), by construction there exists syntactical path \(sp=\ell _0{\mathop {\rightarrow }\limits ^{op_1}}\ell _2\dots {\mathop {\rightarrow }\limits ^{op_n}}\ell _n\) in P and due to program semantics, \(\pi {\,\in \,}\mathrm {ex}(P)\). Since \(\pi {\,\in \,}\mathrm {ex}(P')\setminus \mathrm {ex}(P)\), we infer \((q_{k-1},op_k,q_k){\,\in \,} D\) and thus \(q_k{\,\notin \,} F\).

Finally, consider the case \(k<n\). If \((q_{k-1},op_k,q_k)\in D\), we infer \(q_k\notin F\). Assume \((q_{k-1},op_k,q_k)\in E\). By construction, there exists a syntactical path \(sp=\ell _0{\mathop {\rightarrow }\limits ^{op_1}}\ell _2\dots {\mathop {\rightarrow }\limits ^{op_k}}\ell _k\) in program P and a syntactical path \(sp'=\ell '_0{\mathop {\rightarrow }\limits ^{op_1}}\ell '_2\dots {\mathop {\rightarrow }\limits ^{op_k}}\ell '_k\) in program \(P'\), s.t. \(\forall 0\le i\le k: q_i=(\ell _i,\ell '_i)\). Let \(\ell _0{\mathop {\rightarrow }\limits ^{op_1}}\ell _2\dots {\mathop {\rightarrow }\limits ^{op_k}}\ell _k{\mathop {\rightarrow }\limits ^{op_{k+1}}}\ell _{k+1}\dots {\mathop {\rightarrow }\limits ^{op_m}}\ell _m\) be an extension of the syntactical path sp s.t. \(m=n\) or \((\ell _m, op_{m+1},\cdot )\notin G\). Due to program semantics and \(\pi \in \mathrm {ex}(P')\setminus \mathrm {ex}(P)\), we conclude \(k\le m<n\). Due to program semantics, \(P'\) being deterministic, and \(\pi \in \mathrm {ex}(P')\), there exists an extension \(\ell '_0{\mathop {\rightarrow }\limits ^{op_1}}\ell '_2\dots {\mathop {\rightarrow }\limits ^{op_k}}\ell '_k{\mathop {\rightarrow }\limits ^{op_{k+1}}}\ell '_{k+1}\dots {\mathop {\rightarrow }\limits ^{op_m}}\ell '_m\) of the syntactical path \(sp'\). By construction, \(\forall 1\le i\le m: ((\ell _{i-1},\ell '_{i-1}), op_i,(\ell _i,\ell '_i))\in E\) and there exists \(((\ell _{m},\ell '_{m}), op_{m+1},\cdot )\in D\). Hence, \(\forall 0\le i\le m: (\ell _{i-1},\ell '_{i-1})\in Q\setminus F\). Since \(q_k=(\ell _k,\ell '_k)\) and \(k\le m\), \(q_k\notin F\).

Theoretical Limitations. The effectiveness of difference verification with conditions depends on the amount of program code potentially affected by a change, which is determined by the diffCond component. diffCond only excludes program parts that cannot be syntactically reached from a program change. Therefore, difference verification is ineffective if some initial variable assignments at the very beginning of the program or some global declarations change. Moreover, the structure of a program strongly influences the effectiveness of difference verification. For example, programs like \(\texttt {absSum}^\infty \) (Fig. 4) that mainly consist of a loop are problematic. Program \(\texttt {absSum}^\infty \) (Fig. 4) is similar to absSum, but has an additional, outer loop that dominates the program. So when location \(\ell _7\) is changed in \(\texttt {absSum}^\infty \), difference verification with conditions can only exclude the if-branch for the very first iteration of the outer loop. Thereafter, the change in location \(\ell _7\) may propagate into the if-branch.

Fig. 4.
figure 4

Example program \(\texttt {absSum}^\infty \) with loop dominating the whole program

In contrast, difference verification with conditions can be effective on programs that allow the exclusion of program parts, e.g., if the program is modular and, thus, consists of multiple, loosely coupled parts. Examples for modularity are the strategy design pattern, object-oriented software, or software applications with multiple program features.

When designing our experiments, we will consider these limitations of difference verification with conditions. Before we get to our experiments, we must describe the modular composition of the diffCond component with a verifier, which specifies the difference verifier.

4 Modular Combinations with Existing Verifiers

The diffCond algorithm can be combined with any off-the-shelf conditional verifier [10] to produce a difference verifier in a modular way. The goal of a difference verifier is to verify only modified program paths. To this end, it first uses diffCond to discover potentially modified program paths and then runs a conditional verifier to explore only those paths identified by diffCond. Figure  5 shows the construction template for difference verification with conditions. diffCond gets the original and modified program as input and encodes the modified paths in a condition. The constructed condition is forwarded to a conditional verifier, which uses the condition to restrict its analysis of the modified program to those paths that are not covered by the condition (i.e., the modified paths). Based on this template, we can construct difference verifiers from arbitrary conditional verifiers. Moreover, we can construct difference verifiers from non-conditional verifiers by using the concept of reducer-based conditional verifiers [13]. The idea of a reducer-based conditional verifier is shown on the right of Fig. 5. To turn an arbitrary verifier into a conditional one, a reducer-based conditional verifier puts a preprocessor (called reducer) in front of the verifier. The reducer gets a program and a condition and outputs a new, residual program that represents the program paths not covered by the condition. A full verification of this residual program is then equivalent to a conditional verification of the original program with the produced condition. However, note that the existing reducers are designed for model checkers and do not necessarily work with other verification technologies like deductive verifiers.

Fig. 5.
figure 5

diffCond + conditional verifier = difference verifier

In this paper, we transform three verifiers into difference verifiers: CPA-Seq, UAutomizer, and Predicate. The first two are the best verifiers from SV-COMP 2020 [5], and the third is a predicate-abstraction approach. We use the off-the-shelf verifiers CPA-Seq and UAutomizer as non-conditional verifiers and thus add a reducer, while we use Predicate as conditional verifier. Since a difference verifier can now be built from any off-the-shelf verifier, we can also combine difference verification with other incremental verification techniques. As an example, we can use precision reuse [16]. This technique is implemented in CPAchecker  [16] and UAutomizer  [49] and can be used with the previously mentioned approaches. Next we explain the technologies of the selected verifiers.

CPA-Seq uses several different strategies from the CPAchecker verification framework [6, 11, 14]. CPA-Seq first analyzes different features of the program under verification. The program features considered are: recursion, concurrency, occurrence of loops, and occurrence of complex data types like pointers and structs. Based on these features, CPA-Seq uses one of five different verification techniques (cf. [6]). For non-recursive, non-concurrent programs with a non-trivial control flow, CPA-Seq uses a sequential combination of four different analyses: It uses value analysis with and without Counterexample-guided Abstraction Refinement (CEGAR) [24], a predicate analysis similar to Predicate, and k-induction with invariant generation [7]. Invariants are generated by numerical and predicate analyses and are forwarded to the k-induction analysis.

UAutomizer is the automata-based approach from the Ultimate verification framework [29, 31]. It uses a CEGAR approach to successively refine an over-approximation of the error paths, which is given in form of automata. In each refinement step, a generalization of an infeasible error path is excluded from the over-approximation. The generalization of the error path is described by a Floyd-Hoare automaton [31], which assigns Boolean formulas over predicates to its states. The predicates are obtained via interpolation along the infeasible error path [43].

Predicate is the predicate-abstraction approach from the CPAchecker framework [14] with adjustable-block encoding (ABE) [15]. ABE is instructed to abstract at loop heads only. CEGAR together with lazy refinement [34] and interpolation [32] determines the necessary set of predicates.

PrecisionReuse is a competitive incremental approach that avoids recomputing the required abstraction level [16]. The idea is to start with the abstraction level determined in a previous verification run. To this end, it stores and reuses the precision, which describes the abstraction level, e.g., the set of predicates to be tracked. We use the version as implemented in CPAchecker.

5 Evaluation

We systematically evaluate our proposed approach along the following claims:

Claim 1

Difference verification with conditions can be more effective than a full verification. Evaluation Plan: For all verifiers, we compare the number of tasks solved by difference verification with conditions and by the pure verifier.

Claim 2

Difference verification with conditions is more effective when using multiple verifiers. Evaluation Plan: We compare the number of tasks solved by each difference verifier with the union of tasks solved by all difference verifiers.

Claim 3

Difference verification with conditions can be more efficient than a full verification. Evaluation Plan: For all verifiers, we compare the run time of difference verification with conditions and of the pure verifier.

Claim 4

The run time of difference verification with conditions is dominated by the run time of the verifier. Evaluation Plan: We relate the time for verification to the time required by the diffCond algorithm and the reducer.

Claim 5

Difference verification with conditions can complement existing incremental verification approaches. Evaluation Plan: We compare the results of difference verification with conditions with the results of precision reuse [16], a competitive incremental verification approach.

Claim 6

Combining difference verification with conditions with existing incremental verification approaches can be beneficial. Evaluation Plan: We compare the results of difference verification with the results of a combination of difference verification with conditions and precision reuse.

5.1 Experiment Setup

Computing Environment. We performed all experiments on machines with an Intel Xeon E3-1230 v5 CPU, 3.4 GHz, with 8 cores each, and 33 GB of memory, running Ubuntu 18.04 with Linux kernel 4.15. We limited each analysis run to 15 GB of memory, a time limit of 900 s, and 4 CPU cores. To enforce these limits, we ran our experiments with BenchExec  [17], version 2.3.

Verifiers. For our experiments, we use the software verifiers CPA-SeqFootnote 3  [6, 14] and UAutomizerFootnote 4  [29, 31] as submitted for SV-COMP 2020, and CPAchecker   [14, 15] in revision 32864Footnote 5. CPA-Seq and UAutomizer are used as verifiers. CPAchecker provides the verifier Predicate, but also the new diffCond component and the Reducer component for reducer-based conditional verification. The difference verifier based on Predicate is realized as a single run. In contrast, the difference verifiers based on CPA-Seq and UAutomizer are realized as composition of two separate runs. The first run executes the diffCond algorithm followed by the reducer to generate the residual program. It is only executed once per task, i.e., the same residual programs are given to CPA-Seq and UAutomizer. In a second run, CPA-Seq and UAutomizer, respectively, verify the residual program. To deal with residual programs, we increased the Java stack size for CPA-Seq and UAutomizer.

Existing Incremental Verifier. We use Predicate with precision reuse [16].

Verification Tasks. We use verification tasks from the public repository sv-benchmarks (tag svcomp20)Footnote 6, which is the most diverse, largest, and well-established collection of verification tasks. Since difference verification with conditions is an incremental verification approach, we require different program versions. We searched the benchmark repository for programs that come with multiple versions and for which at least one version is hard to solve, i.e., at least one of the three considered verifiers takes more than 100 s for verification of that version, but is successful. From these programs, we arbitrarily picked the following: eca05 and eca12 (event-condition-action systems, both have 10 versions each), gcd (greatest common divisor computation, has 4 versions), newton (approximation of sine, has 24 versions), pals (leader election, has 26 versions), sfifo (second-chance FIFO replacement, has 5 versions), softflt (a software implementation of floats, has 5 versions), square (square-root computation, has 8 versions), and token (a communication protocol, has 28 versions). Unfortunately, all of these programs are specialized implementations with a single purpose. Thus, their implementation is strongly coupled and any reasonable program change affects the complete program. As explained before, this prohibits effective difference verification with conditions.

Fig. 6.
figure 6

Conceptual example of combination of verification tasks

To get benchmark tasks that instead contain independent program parts, we create new combinations from the selected programs. We choose two programs, e.g., eca05 and token. We then combine these two programs according to the following scheme: We create a new program with all declarations and definitions of both original programs, but a new main function. This new main function randomly calls the main function of one of the two original programs. Name clashes are resolved via renaming. Figure 6 shows the conceptual structure of each program created through this combination. For our experiments, we consider the following combinations of programs: (1) eca05+token, (2) gcd+newton, (3) pals+eca12, (4) sfifo+token, (5) square+softflt. To different versions of our combinations, we replace one of the two program parts with a different version of that part. For example, to get a different version of the original program eca05+token, we change the version of the eca05 part or the token part, but never both.

With this procedure, we get a large amount of different versions of our program combinations. For our evaluation, we consider each pair (O, N) of versions O and N of program combinations that fulfills the following two conditions: (1) N reflects a change, i.e., the two programs are different. (2) Version O, version N, or both versions are bug-free. This ensures that verification and difference verification can only find the same bugs. With this construction of benchmark tasks for incremental verification we get a total of 10 426 tasks that we use in our experiments.

5.2 Experimental Results

Claim 1

(Difference verification with conditions more effective). Table 1 gives an overview of our experimental results. Each column represents one task set. The rows refer to verifiers, i.e., pure verifiers (X) and difference verifiers (\(X^\varDelta \)). The last two rows are the union of the results of all three verifiers. For each task set and verifier, the table provides the number of tasks for which the verifier finds a proof ( ), finds a bug ( ), and only the difference verifier gives a conclusive answer ( ). It also shows the number of tasks ( ) that cannot be solved. Neither the pure nor the difference verifiers reported incorrect results.

Table 1. Experimental results for Predicate, CPA-Seq and UAutomizer, as pure verifiers (X) and difference verifiers (\(X^\varDelta \)) showing how many correct tasks ( ) and tasks with a bug ( ) are solved, how many tasks are only solved by the difference verifier ( ) and which are too hard to solve ( )

The table shows that for each verifier there exist task sets on which the number of solved correct tasks ( ) is higher for the difference verifier. Looking at columns , we observe that typically there exist tasks that only the difference verifier can solve. Thus, this shows that our new difference verification with conditions can be more effective.

Difference verification with conditions is not always more effective. Especially, \({\small {\textsc {CPA}}} \hbox {-}{\small {\textsc {Seq}}} ^\varDelta \) and \({\small {\textsc {UAutomizer}}} ^\varDelta \) sometimes perform worse. For example, \({\small {\textsc {CPA}}} \hbox {-}{\small {\textsc {Seq}}} ^\varDelta \) finds significantly less bugs than CPA-Seq for eca05+token. The reason for this is the residual program constructed by the reducer, which is necessary to turn CPA-Seq into the required conditional verifier. The created residual programs, on which the off-the-shelf verifiers run, have a different structure than the original program. They make heavy use of goto statements and deeply nested branching structures. While semantically equivalent, this can have unexpected effects on analyses: In the case of the tasks in eca05+token, CPA-Seq was not able to detect required information about loops and thus aborts its verification. Note that this is not a direct issue of difference verification with conditions, but an orthogonal issue. To fix the problem, verification tools must be improved to better deal with the generated residual programs or the structure of the residual program must be improved. Despite of the problem with residual programs, difference verification can solve many tasks that a full verification run cannot solve.

Since Predicate is already a conditional model checker, \({\small {\textsc {Predicate}}} ^\varDelta \) does not suffer from the residual program problem. Thus, the effectiveness of difference verification with conditions becomes even more obvious when comparing Predicate with \({\small {\textsc {Predicate}}} ^\varDelta \). For the first three task sets, \({\small {\textsc {Predicate}}} ^\varDelta \) solves all tasks that Predicate solves plus a significant amount of additional tasks that Predicate cannot solve. For the last two task sets \({\small {\textsc {Predicate}}} ^\varDelta \) fails to solve a few tasks that Predicate can solve. However, \({\small {\textsc {Predicate}}} ^\varDelta \) still solves more tasks in total. One reason for this is that the predicate abstraction used by Predicate may compute different predicates (due to a slightly different exploration of the state space), which may result in a more expensive abstraction, if the explored state-space looks different. For some tasks, these different predicates may be less suited to solve the task and thus require more time, which results in the analysis hitting the time limit. Typically, we observe this phenomenon when Predicate is expensive already (in our experiments, when it takes at least 700 s). While for complicated tasks with large changes, difference verification may produce worse results, \({\small {\textsc {Predicate}}} ^\varDelta \) is still more effective than Predicate in all categories.

Fig. 7.
figure 7

CPU time (in s) of full verification vs. difference verification, per task

Claim 2

(Better with several verifiers). To study the usefulness of using several verifiers in difference verification, we look at the tasks solved by the three difference verifiers together. We observe that \({\small {\textsc {Predicate}}} ^\varDelta \) solves the most tasks in all task sets except for pals+eca12, in which \({\small {\textsc {CPA}}} \hbox {-}{\small {\textsc {Seq}}} ^\varDelta \) is better. Moreover, when looking at All\(^\varDelta \), which takes the union of all results, we observe that for eca05+token multiple tasks without a property violation exist that cannot be solved by the best difference verifier of this task set (\({\small {\textsc {Predicate}}} ^\varDelta \)). Thus, the difference verification is more effective when using several verifiers.

Claim 3

(Difference verification with conditions more efficient). We compare the run times of the verifiers with the run times of the difference verifiers. For all three verifiers, the scatter plots in Fig. 7 show the CPU time required to check a task without (x-axis) and with difference verification (y-axis). If a task was not solved, because the verifier either runs out of resources or encountered an error, we assume the maximum CPU time of 900 s. Figures 7a and 7b compare the two non-conditional verifiers CPA-Seq and UAutomizer, for which we use the reducer-based conditional verifier approach. For a significant number of tasks (below diagonal), the difference verifier is faster than the respective verifier CPA-Seq and UAutomizer, and the tasks on the right edge can only be solved by the difference verifier. There are tasks for which difference verification is slower (above diagonal). Note that the problem is the residual program, not our approach. For example, many tasks located at the upper edge do not represent timeouts of the difference verification, but failures of the verifier caused by the structure of the residual program. Figure 7c compares the conditional verifier Predicate. For the majority of tasks, the CPU time required by \({\small {\textsc {Predicate}}} ^\varDelta \) is equal to or less than the time required by Predicate (tasks below the line). Moreover, there are only few tasks for which \({\small {\textsc {Predicate}}} ^\varDelta \) is slower than Predicate (tasks above the line). The reason for this slow-down is most likely the computation of worse predicates (see Claim 1). To sum up, difference verification with conditions can successfully increase efficiency.

Claim 4

(Verifier dominates run time). We aim to show that the diffCond component and the residual program construction (in the reducer-based approach to construct conditional verifiers) require a negligible run time compared to the complete verification run time. We show in Fig. 8a for each task verified with \({\small {\textsc {CPA}}} \hbox {-}{\small {\textsc {Seq}}} ^\varDelta \) and \({\small {\textsc {UAutomizer}}} ^\varDelta \), the CPU time required by the full verification run (x-axis) and the CPU time of that run spent for diffCond plus the reducer (y-axis). The time required by diffCond + reducer does not depend on the run time of the verifier, and it is below 60 s for all tasks.

Fig. 8.
figure 8

CPU time (in s) of (a) full difference-verification runs and the time spent for the two diff. components diffCond + reducer, (b) Predicate with precision reuse (\({\small {\textsc {Predicate}}} ^{\varvec{\circlearrowright }}\) ) vs. Predicate with difference verification (\({\small {\textsc {Predicate}}} ^\varDelta \) ), and (c) \({\small {\textsc {Predicate}}} ^\varDelta \) vs. \({\small {\textsc {Predicate}}} ^\varDelta \) with precision reuse (\({\small {\textsc {Predicate}}} ^{\varDelta \varvec{\circlearrowright }}\) )

Claim 5

(Difference verification with conditions complementary). To show that difference verification with conditions complements existing incremental verification, we need to compare difference verification with conditions against an existing incremental approach. Looking at existing approaches that are (1) available as replication artifact and (b) able to run on verification tasks from sv-benchmarks, we identified two: both based on precision reuse, one implemented in CPAchecker [16] and one in Ultimate [49]. We use the one in CPAchecker. Figure 8b shows the CPU time of precision reuse with Predicate, called \({\small {\textsc {Predicate}}} ^{\varvec{\circlearrowright }}\)  (x-axis) against our difference verification with Predicate, called \({\small {\textsc {Predicate}}} ^\varDelta \)  (y-axis). Many tasks are solved efficiently by both techniques (large cluster in lower left). For the remaining hard tasks, difference verification is often faster than precision reuse, or precision reuse cannot even solve the task (points below the diagonal and on right edge). This shows that difference verification with conditions can improve on precision reuse for a significant number of tasks. It can thus complement existing incremental techniques.

Claim 6

(Combinations sometimes beneficial). We combined difference verification with conditions with precision reuse, called \({\small {\textsc {Predicate}}} ^{\varDelta \varvec{\circlearrowright }}\). Figure 8c shows that this combination rarely becomes faster than difference verification \({\small {\textsc {Predicate}}} ^\varDelta \) alone. In the worst case, the combination even slows down because precision reuse tracks previously used predicates from the beginning while difference verification would only detect the necessary ones lazily. This more precise abstraction leads to more, sometimes unnecessary computations. Nevertheless, the combination can solve 29 tasks that neither Predicate, its difference verifier, nor precision reuse can solve alone. Thus, while a combination of the two incremental techniques is not beneficial in general, it can be.

5.3 Threats to Validity

External Validity. (1) Our benchmark tasks might not represent real program changes, and thus, our results might not transfer to reality. However, we built our tasks from a well-established collection of software-verification problems, which are considered relevant in the verification community. Moreover, many of the combined programs implement known algorithms (greatest common divisor, Newton approximation of a sine function, Taylor expansion of a square root) or are derived from real applications (OpenSSL, SystemC design, leader election). Also, our combination is not uncommon in practice. Such combination patterns e.g. result from implementing the strategy pattern. Finally, our task set contains pairs of programs whose only difference is a bug fix to eliminate the reachability of the __VERIFIER_error() call. We believe that similar fixes are done in practice to eliminate bugs. (2) We compared our approach only with a single existing approach for incremental verification, and this comparison is restricted to a single verifier. Our observations may not apply to different incremental verification approaches or different verifiers. The same holds for the combination of difference verification with orthogonal, incremental verification approaches.

Internal Validity. (3) The implementation of the diffCond algorithm may contain bugs, and thus, produces conditions that also exclude modified paths. We would expect that such a bug also excludes error paths. Since we never observed false proofs, we assume this is unlikely. (4) Difference verification with CPA-Seq and UAutomizer could appear improved simply because we separated verification from the execution of diffCond + Reducer and granted both runs a limit of 900 s. But the sum of the two times are always below 900 s for all correctly solved tasks.

6 Related Work

Equivalence Checking. Regression verification [27, 28, 55, 56], SymDiff [23], UC-Klee [48], and other approaches [4, 26] check whether the input-output behavior of the original and modified method or program is the same. Differential assertion checking [38] inspects whether the original and modified program trigger the same assertions when given the same inputs. Equivalence checking does not need to be restricted to a simple yes or no answer. Semantic Diff [35] reports all dependencies between variables and input values that occur either in the original or modified program. Conditional equivalence [37] infers under which input assumption the original and modified program produce the same output. Over-approximation of the differences between the original and modified program was also investigated [45]. Differential symbolic execution [46] compares function summaries and constructs a delta that describes the input values on which the summaries are unequal. Partition-based regression verification [19] splits the program input space into inputs on which original and modified program behave equivalently and those on which the two programs are unequal. Equivalence checking is not directly tailored to property verification, but determining when the original and modified programs may behave differently is similar to the goal of the diffCond algorithm.

Result Adaption. Incremental data-flow analysis [51], Reviser [3], and IncA [57, 58] adapt the existing data-flow solution to program modifications. Similarly, incremental abstract interpretation [52] adapts the solution of the abstract interpreter. Incremental model checking in the modal-\(\mu \) calculus [54] adapts a previous fixed point and restarts the fixed-point iteration. Other approaches [18, 20] model data-flow analysis and verification as computation of attributed parse trees. A change results in an update of the attributed parse tree. Extreme model checking [33] reuses valid parts of the abstract reachability graph (ARG) and resumes the state-space exploration from those nodes with invalid successors. Incremental state-space exploration [41] reuses a previous state-space graph to prune the current exploration. HiFrog [1] and eVolCheck [25] implement an approach that reuses function summaries and recomputes invalid summaries [53]. UAutomizer adapts a previous trace abstraction [49], a set of Floyd-Hoare automata that describe infeasible error paths, to reuse it on the modified program. While result adaption uses the same verification technique for original and modified program, our approach may use different techniques.

Reusing Intermediate Results. Green [59], GreenTrie [36], and Recal [2] support the reuse of constraint proofs. Similarly, iSaturn [44] supports the reuse of SAT results of Boolean constraints that are identical. Precision reuse [16] reuses the precision of an abstraction, e.g., which variables or predicates to track, from a previous verification run. These approaches are orthogonal to our approach. In the experiments, we even combined precision reuse [16] with our approach.

Skipping Unaffected Verification Steps. Regression model checking [60] stops exploration of a state as soon as no program change can be reached from that state. Directed incremental [47, 50] and memoized [61] symbolic execution restrict the exploration to paths that may be affected by the program change. Additionally, memoized symbolic execution does not check constraints as long as the path prefix is unchanged. The Dafny verifier rechecks methods affected by a change reusing unchanged verification conditions [42]. iCoq [21, 22] detects and only rechecks those Coq proofs that are affected by a change in the Coq project. These ideas are similar to ours but are tailored to specific techniques.

7 Conclusion

Software is frequently changed during development. Verification techniques must deal with repeatedly verifying nearly the same software again and again. To be able to construct efficient incremental verifiers from off-the-shelf components, we introduce difference verification with conditions, which steers an arbitrary existing verifier to reverify only the changed parts. Compared to existing techniques, our approach is tool-agnostic and can be used with arbitrary algorithms for change analysis. We provide an implementation of a change analysis as reusable component, which we combined with three existing verifiers. In a thorough evaluation on more than 10 000 tasks, we showed the effectiveness and efficiency of difference verification with conditions.

Data Availability Statement. diffCond and all our data are available for replication and to construct further difference verifiers on our supplementary web pageFootnote 7 and in a replication package on Zenodo  [12].