G \provide@command\C \renew@command\C C
55email: {helmut.seidl, julian.erhard, sarah.tilscher, m.schwarz}@tum.de
Non-Numerical Weakly Relational Domains
Abstract
The weakly relational domain of Octagons offers a decent compromise between precision and efficiency for numerical properties. Here, we are concerned with the construction of non-numerical relational domains. We provide a general construction of weakly relational domains, which we exemplify with an extension of constant propagation by disjunctions. Since for the resulting domain of 2-disjunctive formulas, satisfiability is NP-complete, we provide a general construction for a further, more abstract weakly relational domain where the abstract operations of restriction and least upper bound can be efficiently implemented.
In the second step, we consider a relational domain that tracks conjunctions of inequalities between variables, and between variables and constants for arbitrary partial orders of values. Examples are sub(multi)sets, as well as prefix, substring or scattered substring orderings on strings. When the partial order is a lattice, we provide precise polynomial algorithms for satisfiability, restriction, and the best abstraction of disjunction. Complementary to the constructions for lattices, we find that, in general, satisfiability of conjunctions is NP-complete. We therefore again provide polynomial abstract versions of restriction, conjunction, and join. By using our generic constructions, these domains are extended to weakly relational domains that additionally track disjunctions.
For all our domains, we indicate how abstract transformers for assignments and guards can be constructed.
Keywords:
weakly relational domains, 2-decomposable relational domains, 2-disjunctive constants, directed domains1 Introduction
Relational analyses have been observed to be indispensable for verifying intricate program properties. In particular, this is the case when for the purpose of verification, ghost variables have been introduced which must be related to program variables. Termination may be verified by introducing a ghost loop counter, which can be proven bounded by a relational domain relating it to the actual bounded iteration variable Albert et al. (2014). The validity of string operations on null-terminated strings as employed, e.g., in the programming language C, may be verified by introducing ghost variables for the length of a buffer as well as for tracking the position of the null byte in the buffer Dor et al. (2001). It also has been observed that monolithic relational domains such as the polyhedra abstract domain Cousot and Halbwachs (1978) scale badly to larger programs. Therefore, weakly relational domains have been proposed which can only express simple relational properties, but have the potential to scale better Miné (2004). Examples of weakly relational numerical properties are the Two Variables Per Inequality domain Simon et al. (2002), or domains given by a finite set of linear templates Sankaranarayanan et al. (2005). The most prominent example of a template numerical domain is the Octagon domain Miné (2001, 2006) which allows tracking upper and lower bounds not only of program variables but also of sums and differences of two program variables. One such octagon abstract relation could, e.g., be given by the conjunction
Octagons thus can be considered as a mild extension of the non-relational domain of Intervals for program variables, and a variety of efficient algorithms have been provided Bagnara et al. (2008, 2009); Chawdhary et al. (2019); Schwarz and Seidl (2023). Here, we are concerned with constructing non-numerical abstract domains.
For that, we provide a general technique to construct from every relational domain a weakly relational domain. As one instance of the general construction, we consider 2-disjunctive constants as mentioned in Schwarz et al. (2023). This weakly relational domain allows, e.g., to relate the names of functions with function pointers as in the formula
Since satisfiability of formulas from that domain turns out to be NP-complete, we provide a further mild abstraction, again for arbitrary relational domains, to provide us with a weakly relational domain where all required operations become tractable.
Another family of relational non-numerical domains has been introduced by Arceri et al. (2022). Based on a partial order of values, conjunctions of ordering constraints for program variables are considered. They observe that analyses of prefixes or the substring relation could be helpful for programs in programming languages supporting high-level operations on strings. Here, we study this kind of directed domains in greater detail. For conjunctions of inequalities over some partial order , we extend the constraints from Arceri et al. (2022) by allowing for variables both lower and upper bounds from . For arbitrary partial orders, though, we find that then satisfiability is NP-complete. Partial orders that are lattices form a notable exception. An instance of this are subsets of some universe or multisets. For lattices, we show that satisfiability is decidable in polynomial time. Moreover, we provide polynomial constructions both for restriction as well as the optimal join operation. Turning to general partial orders of values, we thus cannot hope for polynomial algorithms. Therefore, we provide a meaningful abstraction so that both abstract restriction as well as join is again polynomial. This family of relational domains is already weakly relational. Still, our generic constructions can be applied to obtain more expressive weakly relational domains that additionally support disjunctions at a limited amount of extra costs.
The paper is organized as follows: Section 2 provides background definitions on relational domains. It formally introduces our notion of weakly relational domains and provides a general construction of weakly relational domains. Section 3 is dedicated to disjunctive constants. When applying the generic construction from the last section to this relational domain, the weakly relational domain of 2-disjunctive constants is obtained. Here, we prove that satisfiability for these formulas still is NP-complete. Therefore, a generic abstraction technique is presented so that, when applied to disjunctive constants, normalization, projection, as well as least upper bounds all turn out to be polynomial time.
Finally, abstract transformers for assignments as well as guards are derived. Section 4 then introduces directed domains which do not track equalities but inequalities over a partial order of values. While the first subsection provides polynomial constructions for the case that the partial order for values is a lattice, the second subsection is concerned with arbitrary partial orders as value domain. Since satisfiability, in general, turns out to be NP-complete, again a polynomial abstraction is provided. In a further subsection, we indicate how the generic constructions from the last sections provide us with weakly relational domains that additionally support disjunctions of inequalities. We exemplify the resulting domains with conjunctions and disjunctions of inequalities over the integers. In the final subsection, dedicated abstract transformers are constructed for assignments, while the last subsection discusses the treatment of guards. Section 5 summarizes the contributions and sketches further directions of research.
2 Weakly Relational Domains
Let us recall basic definitions for relational domains. We mostly follow the notation used in previous work Schwarz et al. (2023), where the notion of -decomposability has been introduced. Let be some finite set of variables. A relational domain maintains relations between variables in . We require that a relational domain is a bounded lattice, i.e., has a partial order , a least element , a greatest element , as well as binary operators for the greatest lower bound (meet) and the least upper bound (join) . We do not demand relational domains to be complete lattices, i.e., to provide for every subset of elements a least upper bound: the polyhedral domain, e.g., is not complete Cousot and Halbwachs (1978). However, we demand that a relational domain supports the following monotonic operations:
where and are from some expression and condition language, respectively.
The abstract transformers for basic actions of programs are given by these functions. Restricting a relation to a subset of variables amounts to forgetting all information about variables in . Thus, we require that
(1) |
A restriction to some set therefore is an idempotent operation. We remark that from these axioms it follows that and for any . Given that there is some relation describing all states satisfying the condition , the transformation for the guard can be described by
(2) |
– at least, if there is a concretization function such that
(3) |
i.e., the binary meet operation is precise.
Example 1
For numerical variables, a variety of such relational domains have been proposed, e.g., (conjunctions of) affine equalities Karr (1976); Müller-Olm and Seidl (2004, 2007) or affine inequalities Cousot and Halbwachs (1978). For affine equalities or inequalities, restriction to a subset of of variables corresponds to the geometric projection onto the subspace defined by , combined with arbitrary values for variables . ∎
One way to tackle the high cost of relational domains is to track the relationships not between all variables, but only between subclusters of variables. We call such domains Weakly Relational Domains.
For a subset , let be the set of all abstract values from that contains only information on those variables in . For any collection of clusters of variables, a relation can be approximated by a meet of relations from since for every ,
(4) |
holds, as holds for each . In fact, the right-hand side of (4) is the best approximation of by some meet over abstract relations with , i.e., with , since
holds for all .
Schwarz et al. (2023) have introduced -decomposable relational domains. These are domains where the full value can be recovered from the restrictions of to all clusters from the set of non-empty clusters of variables of size at most . Furthermore, Schwarz et al. (2023) ask for binary least upper bounds to be determined by computing within these clusters only. More precisely, this amounts to requiring the following two properties
(5) | |||||
(6) |
to hold for all abstract relations . The most prominent example of a -decomposable domain is the octagon domain Miné (2001) – either over rationals or integers, while affine equalities or affine inequalities are examples of domains that are not -decomposable.
Any relational domain , however, which satisfies (6) gives rise to a 2-decomposable domain of its 2-cluster approximations.
For , let denote the approximation of by the meet of its restrictions to clusters . Let denote the subset of of all abstract relations of the form , where the ordering is inherited from . In particular, as well as from are also in .
Theorem 2.1
Assume that is an abstract relational domain which satisfies (6). Then the following holds:
-
1.
for all conjunctions with , i.e., all such conjunctions are contained in .
-
2.
For , the abstract relation , as provided by , is in .
-
3.
The binary least upper bound operation in exists and is given by
-
4.
For , the best approximation to the restriction of onto some subset of variables is given by
-
5.
the partial order with the given binary greatest lower and least upper bounds is a 2-decomposable relational domain.
Proof
For a proof of statement (1), we first observe that for each ,
by monotonicity and idempotence of restriction. Thus,
where the first inequality follows from Eq. 4. Thus, statement (1) follows.
For a proof of statement (2), consider elements . Then
Now, we claim that for every ,
To prove the claim, we argue that
and the claim follows. So far, we have proven that
for some , . Then, statement (2) follows from statement (1).
For a proof of statement (3), we note that any upper bound of in is also an upper bound of in . Therefore, the least upper bound od in is given by . We calculate:
and statement (3) follows.
The best approximation of in is given by . Thus, we have
i.e., it can be determined by applying the restriction onto variables from for each cluster separately. This implies statement (4).
Statement (5) is an immediate consequence of statements (3) and (4). ∎
The polyhedral domain, e.g., satisfies (6). Applied to the polyhedral relational domain, the construction from Theorem 2.1 results in the domain of affine inequalities with at most two variables per inequality Simon et al. (2002).
According to Theorem 2.1, every value from the -decomposable relational domain can be represented as the meet of its restrictions to -clusters, i.e., by the collection . We call this representation normal, and an algorithm that computes it normalization. Consider now an arbitrary collection with with . Then always holds, while equality need not hold. In the Octagon domain over the rationals or the integers, the normal representation of an octagon value corresponds to its closure as introduced in previous work Miné (2001); Bagnara et al. (2008). While for rational Octagons, closure in cubic time was already proposed by Miné (2001), it is much more recent that a corresponding algorithm was provided for the case when constraints are interpreted over integers Bagnara et al. (2008, 2009).
Subsequently, we introduce non-numerical weakly relational domains and provide polynomial algorithms for these.
3 Disjunctive Constants
Constant propagation relies on a domain that maintains conjunctions of atomic propositions where is a program variable and is from a finite set of possible values. In the following, we consider a (mild) generalization of this domain where also disjunctions of at most two atomic propositions are allowed.
Assume we are given a finite set representing possible values for variables from . We consider propositions of the form for which correspond to the disjunction of atomic propositions . Thus, the proposition for some can be understood as an atomic proposition of a multi-valued propositional logic where serves as the set of logical values of the propositional variable Beckert et al. (2000). Every monotonic Boolean combination of propositions with , represents a function defined by
Let denote the complete lattice of all equivalence classes of formulas where the ordering is semantic implication. The least element in this ordering can be represented by the empty disjunction or (false), while the greatest element is equivalent to the empty conjunction or (true). Each formula has an equivalent CNF as well as an equivalent DNF where each clause (conjunction) contains at most one proposition for every variable . Converting into DNF allows checking satisfiability and computing the restriction onto a subset of variables. A formula for is obtained from a DNF for where each conjunction contains at most one proposition for each variable by the following steps: First, every conjunction which contains for some is removed. From each remaining conjunction, then every proposition with is removed. It follows that is distributive, i.e., commutes with binary least upper bounds.
For an arbitrary , computing an equivalent DNF is an exponential time operation. The same holds if all restrictions are computed via this normal form. Let denote the 2-decomposable domain obtained from according to theorem 2.1. The lattice consists of all elements which can be represented as conjunctions of clauses with at most two propositions per clause. According to theorem 2.1, the least upper bound operation for can be realized by a clusterwise disjunction. In particular, it does not coincide with logical disjunction – but is an over-approximation of it.
Example 2
Let and . Then both and are from , but their disjunction is not. In fact, the least upper bound in for
is . ∎
3.1 Approximating 2-disjunctive Conjunctions
Any CNF over some set of variables of bounded size can, in polynomial time, be transformed into a DNF . Each DNF over two distinct variables can be brought into the canonical normal form
(7) |
for some . Conjunction and disjunction of two such normal forms then correspond to intersection and union of the respective subsets of .
For arbitrary sets of variables, though, it is non-trivial even to decide whether a given conjunction is different from .
Theorem 3.1
To decide for a formula whether or not is satisfiable, i.e., different from , is NP-complete.
Proof
Since a satisfying assignment for can be guessed and then checked in polynomial time, satisfiablity of is in NP. NP-hardness, on the other hand, follows by a reduction from 3-colorability of graphs Beckert et al. (2000). We illustrate the reduction with an example.
Example 3
For , consider the formula
where is given by
Then is satisfiable iff the undirected graph has a 3-coloring. In the given example, the graph
cannot be colored by three colors. Therefore, is equivalent to . ∎
Exact normalization (as defined in Section 2) of a relation represented by some 2-CNF thus, in general, may be difficult to compute. Instead of giving dedicated further abstraction techniques, we prefer to provide for an arbitrary relational domain , a general construction to approximate the 2-decomposable domain further by a 2-decomposable domain . This construction is based on approximate normalization.
Assume that an element in is given by the meet where is the collection with (). According to Theorem 2.1, for all . As we have seen for 2-disjunctive constants, however, exact normalization of , i.e., the values may be hard to compute precisely. For an approximate normalization, we introduce a constraint system in unknowns with the constraints
(8) |
This constraint system has already been considered for the normalization of -projective domains Schwarz and Seidl (2023). As all right-hand sides are monotonic, the constraint system has a greatest solution – whenever each is a complete lattice.
In case that there is a greatest solution , holds for all , since is also a solution of the system (8). Then we call the collection the approximate normal form of the collection . Here, we are not only interested in the existence of a greatest solution of (8) but also that it can be effectively computed. For that, we consider the sets of values possibly occurring during some fixpoint iteration for a particular collection .
Let be the least collection of sets such that
-
•
;
-
•
If then also ;
-
•
If and , then
for all .
The sets collect the potential iterates occurring during greatest fixpoint iteration of (8). By construction, each set has a greatest element, namely, , and is closed under binary . For the termination of Kleene fixpoint iteration for (8), it suffices for each set to have a least element – whose collection then coincides with the greatest solution of (8). This observation is summarized in the following proposition.
Proposition 1
The following two statements are equivalent:
-
1.
For each , has a least element;
-
2.
The constraint system (8) has a greatest solution which can be attained by Kleene fixpoint iteration.
Proof
Assume that for each , there is a least element . We claim that is the greatest solution of (8). Since for each , is a lower bound to all elements in , all constraints of (8) are satisfied. Therefore, is a solution. By induction on the definition of the sets , any other solution consists of lower bounds of these sets, i.e., – implying our claim. To conclude statement (2), it remains to prove that the greatest solution can be reached by Kleene iteration. For every , is an element of the set , and therefore, has arrived there after finitely many applications of the inductive rule of their definitions. Let be an upper bound to these numbers for all . Then, Kleene iteration for the constraint system (8) will also reach these values after at most iterations.
For the reverse direction, assume that Kleene iteration for the greatest solution of (8) terminates after iterations with a collection . By induction on the number of rounds, we find each value attained for , , after rounds, is an element of . Therefore, for all . It remains to prove that is also a lower bound of . To show this, we again proceed by induction, this time on the number of applications of the inductive rule for the construction of the , and prove that for all and any value added to some set in the th step, it holds that . Therefore, is a lower bound to for all , and statement (1) follows. ∎
If all operations on abstract relations for clusters of size at most 3 are constant time and the height of all are bounded by , then the greatest solution of the constraint system (8) can be computed in time polynomial in and the number of variables.
We call a relational domain 2-nice, if the statements of Proposition 1 are satisfied for each collection with .
Let us instantiate this construction to 2-disjunctive constants. First, we note that the relational domain is finite and thus, in particular, 2-nice. Let denote a collection with for all . Assume that consists of variables, and let be the number of constants occurring in any of the . According to the normal form (7), the lattice has height at most if consists of a single variable, and height bounded by if is a two-element set. Since there are clusters, fixpoint iteration will terminate after updates. ∎
Due to NP-hardness of satisfiability, we cannot expect the greatest solution of the constraint system for 2-disjunctive constants to always return the exact normal form. For the formula from Example 3, e.g., it returns for each pair , ,
– which is different from .
For a relational domain , we call a collection with for all , stable if it is a solution of the constraint system (8) with . We remark that stability of implies that, if for some , then for all other as well. Now we introduce for a relational domain the domain of all stable collections. The ordering on the domain is defined by if for all when and . Thus, whenever .
Abstract join as well as abstract restriction for then is modeled along the definitions of join and restriction for , but refers to the representation as solution to the constraint system (8). For , in , we define the abstract join by
while for , and , we define abstract restriction by
where the latter equality follows since for , . We have:
Proposition 2
Proof
For the first statement, let and . As the ordering on is componentwise, it suffices to prove that is again in , i.e., the collection is a solution of the constraints in (8). For this, we calculate:
for all variables . From that, the statement follows.
To prove the second statement, we must verify that the collection satisfies all constraints in (8). Indeed, we find by monotonicity,
for all , and the claim follows. The final statement then follows from the definition. ∎
Elements of are collections . For every , we can consider elements as elements of as well by assuming that represents the stable collection .
According to Proposition 2, both joins and restrictions can be computed componentwise. As a consequence, we find:
Theorem 3.2
For a 2-nice relational domain which satisfies (6), the domain is a 2-decomposable relational domain. ∎
Fig. 1 shows the abstract relational domains , and together with the mappings between them.
According to Theorem 3.2, the domain of abstract 2-disjunctive constants is indeed 2-decomposable. The given construction provides us with polynomial algorithms for least upper bound, greatest lower bound, and projection.
3.2 Assignments
Let us return to the relational domain of 2-disjunctive constants and indicate how abstract transformers for assignments can be tailored. For 2-disjunctive constants, we only consider right-hand sides where is either (unknown value), or of the form where is a set of constants and are variables. The concrete semantics of such an assignment is given by
Generalizing the corresponding abstract semantics for (copy) constant propagation, we define the logic transformer for by
Proposition 3
-
1.
The logic transformer is precise, i.e.,
(9) In particular, it is distributive and commutes with .
-
2.
The logic transformer is precise, if the logic transformers for , , are.
Thus, we have reduced the construction of logic transformers for assignments to restriction and the construction of logic transformers for variable-variable assignments . For , the assignment is the identity, i.e., we set . Therefore, assume that is different from , and assume that . Let denote the set of constants so that equals . Let denote the conjunction of all formulas for with . Let denote the formula obtained from by renaming each occurrence of the variable with . Then we define
Let denote the formula returned by that transformer for . Intuitively, our definition means for , that , i.e., is preserved while additionally, , , and for , .
Proposition 4
The logic transformer is precise, i.e.,
(10) |
holds. ∎
The same construction allows us to construct abstract logic transformers – only that the least upper bound operation and projection of must be replaced by the corresponding operations of . The abstract transformer then, however, is only sound and no longer precise, since the projection operation of may return for an abstract relation whose concretization is empty an abstract relation with a non-empty concretization. Accordingly, Eq. 9 and Eq. 10 may be violated.
3.3 Guards
It remains to provide the semantics of guards. Again, we first consider the domain of 2-disjunctive formulas (modulo logical equivalence), ordered by implication. We consider positive guards of the form , and conversely, negative guards of the form . Positive guards thus can directly be expressed in . Thus we set
(11) |
Negative guards on the other hand cannot be directly expressed in – at least if there are unknown constant values beyond the finite universe . To deal with this, we introduce a dedicated fresh symbol with the understanding that repesents any value . The property then can equivalently be represented by
allowing us to deal with such co-finite sets of possible values in the same way as we did for finite sets of values alone.
4 Directed Relational Domains
Instead of plain equalities, let us now consider inequalities between variables and constants instead of equalities and abandon disjunctions. We will, however, add disjunctions in the end as well. Thus for now, we just consider finite conjunctions of inequalities of the form
for variables and constant values . As usual, we consider conjunctions only up to semantic equivalence. We call inequalities of the form lower bound constraints, and a lower bound for . Analogously for upper bounds. Inequalities of the form are called variable constraints.
Assume we are given a partial order (po), i.e., a set partially ordered by some relation . Examples of partial orders of interest are
- Subsets.
-
The set of all subsets of some finite universe where the ordering is subset inclusion ;
- Integers.
-
The set of integers equipped with the natural ordering ;
- Multisets.
-
Multisets, i.e., the set of all mappings from elements in to their multiplicities ordered by multiset inclusion .
- Strings.
-
The set of all strings for some finite alphabet . Several partial orderings are of interest:
-
–
the prefix ordering ; e.g., ;
-
–
the substring ordering , e.g., ;
-
–
the scattered substring ordering , e.g., .
-
–
Much more expressive constraints on strings have been studied, e.g., in Chen et al. (2018); Day et al. (2023); Abdulla et al. (2019); Ganesh et al. (2011). In particular, for a fragment containing the prefix ordering, decision procedures are known based on (synchronous) multi-tape finite automata Yu et al. (2011). Due to their expressiveness, these techniques come with a considerable computational effort. Instead, we follow Arceri et al. (2022) where basic relational domains are considered for reasoning about variables of string type, sets (of characters), or integers (lengths of strings). Their analyses relate program variables only according to some partial order, and also consider lower bounds. Here, these considerations are complemented by taking upper bounds into account as well and, eventually, by adding disjunctions.
A mapping is a model of (relative to ), written as , if , and
-
•
(in ) for each constraint in ;
-
•
(in ) for each constraint in ; and
-
•
(in ) for each constraint in .
Let denote all finite conjunctions over modulo semantic equivalence where the ordering on is semantic implication. As before, normal forms of conjunctions will be considered up to reordering of atomic propositions. Thus, syntactic equality of conjunctions here means equality of the respective sets of propositions. Let denote a finite conjunction where is the set of values occurring in as lower or upper bounds. To provide a first normal form for , we proceed in two steps. First, we determine the transitive closure on the set of the constraints provided by . In case that for where does not hold in , then is unsatisfiable and therefore represented by the dedicated element . If this is not the case, let denote the conjunction of all inequalities where and either or or both are in .
In the second step, when , we remove all redundant constraints. These are constraints of the form
-
•
for , as these constraints hold vacuously;
-
•
for and if there is also a constraint with , i.e., there is a stricter lower bound;
-
•
for and if there is also a constraint with , i.e., there is a stricter upper bound.
Additionally, we set to whenever for some variable ,
-
•
there is no lower bound in for the set of upper bounds provided for by ; or
-
•
there is no upper bound in for the set of lower bounds provided for by .
Assume, e.g., that is given by
where we consider the prefix order on strings. Since cannot be prefixes of the same string, this conjunction is considered equivalent to .
Let us denote the resulting conjunction by and call it the 0-normal form of . Assuming that comparisons of values as well as checks for common lower or upper bounds are constant-time operations, 0-normal forms can be computed in polynomial time.
4.1 Lattice Domains
An important special case is when is a lattice, i.e., a po where every two elements both have a least upper bound and a greatest lower bound .
Example 4
The po ordered by subset inclusion is a complete lattice and thus, in particular, a lattice. The integers with the natural ordering is another example of a lattice, this time without least or greatest element. Yet another example are multisets: this lattice has a least, but no greatest element.
The po of strings ordered by the prefix relation is not a lattice. provides a least element , as well as greatest lower bounds, namely, the maximal common prefix, but does not have least upper bounds to all pairs of strings. There is, for example, no upper bound to abc and abd in . ∎
When is a lattice, we can provide a dedicated normal form which, however, may now use constants from which did not occur in before. Assume now that is the 0-normal form of . If has a least element , we add the vacuous constraint to every variable . Likewise, if has a greatest element , we add the constraint .
If is different from , we subsequently simplify further by replacing for each variable ,
-
•
the set of upper bound constraints occurring in , if it is non-empty and consists of , with the single constraint ;
-
•
the set of lower bound constraints in , if it is non-empty and consists of , with the single constraint .
Let us denote the resulting formula by and call it the 1-normal form of . The 1-normal form of can be computed in polynomial time as well – given that comparisons as well as pairwise least upper bounds and greatest lower bounds in are constant time. We have:
Theorem 4.1
Assume that the po is a lattice. Then the following holds:
-
1.
A conjunction is satisfiable over iff .
-
2.
For arbitrary conjunctions over , iff .
Satisfiability as well as implication are decidable in polynomial time. ∎
Proof
If , then cannot be satisfiable since any of the simplification steps preserves the set of satisfying assignments. So, assume that is syntactically different from . Let be the variable assignment which maps each variable to its lower bound – if it exists, and to some fixed element which is less or equal to any other lower bound mentioned in . Then all single variable constraints are satisfied as well as, by transitivity, all constraints occurring in . Therefore, – implying that is satisfiable. From this, statement (1) follows.
To prove statement (2), consider conjunctions both in 1-normal form. If these syntactically coincide, then obviously also holds. For the reverse direction, we prove that if are distinct, then they cannot be equivalent. From that, the assertion follows. If one of them equals and the other not, then by statement (1), they cannot be equivalent. Therefore, assume that both are satisfiable and thus, different from . We consider all cases how the may differ.
- Lower bounds.
-
First, assume that there are constraints , , for some variable in where is different from . Assume w.l.o.g. that holds. Let denote the set consisting of together with variables where has a constraint . Let denote some assignment with . Then we construct a variable assignment such that but by
Then still . But since , it follows that does not satisfy and thus it does not model .
If there is a constraint in , but no lower bound constraint for in , then there is some value different from so that holds. This value allows us to construct an analogous distinguishing assignment where we use instead of .
- Upper bounds.
-
First, assume that there are constraints , , for some variable in where is different from . W.l.o.g., assume that . Let denote the subset consisting of together with all unknowns where has a constraint . Let denote some assignment with . Then we construct a variable assignment by:
Then still holds. But since , does not satisfy .
If there is a constraint in , but no upper bound constraint for in , we introduce a value which is different from with , and construct an analogous distinguishing assignment only that we use instead of .
- Variable Constraints.
-
Assume that, w.l.o.g., has a constraint for which does not occur in where we assume that for every variable both lower and upper bounds are provided by iff they are provided by and that, whenever they are provided, they agree. Consider again the set of together with all variables with constraints , and the set of together with all variables with constraints occurring in . Since does not occur in , .
Let denote an assignment with . First assume that has constraints and . From not occurring in , it follows that . Now we construct an assignment by:
Then , while and . As , does not fulfill the constraint from .
If no upper bound of is provided, we choose some value strictly larger than , and define a variable assignment by for , and otherwise. Then . In order to additionally satisfy , we would have – which is impossible.
Likewise, if no lower bound of is provided, we choose some value strictly less than , and define a variable assignment by for , and otherwise. Then . In order to additionally satisfy , we would have – which again is impossible.
∎
For lattices, therefore, the construction of normal forms allows deciding satisfiability as well as semantic implication. From our examples, sets, integers, and multisets are lattices. Strings, ordered by the prefix relation, on the other hand, already do not form a lattice anymore. This po, however, is bounded-complete. Recall that a po is bounded-complete if every subset which has some upper bound, also has a least upper bound. When is bounded-complete, then we at least know that
-
•
every non-empty subset has a greatest lower bound; and
-
•
has a least element .
Thus, every formula over a bounded-complete po which provides some upper bound to every variable also can be brought into 1-normal form. Let us call such conjunctions bounded. We obtain:
Proposition 5
Given a po that is bounded-complete, the following holds:
-
1.
A bounded conjunction is satisfiable over iff .
-
2.
For arbitrary bounded conjunctions over , iff . ∎
When we drop the extra assumption that conjunctions are bounded, Proposition 5 need no longer hold.
Example 5
For prefixes of strings, consider the conjunction
This formula is semantically equivalent to
although the formulas are syntactically different.
Even without upper bounds, not all implications can be inferred via transitive closure alone. Again for prefixes of strings, consider
The first four constraints imply that , which, by the last constraint, implies that must hold as well. ∎
For a conjunction and a subset of variables, let yield if equals , and otherwise, yield the conjunction of all constraints in that only uses variables from .
For conjunctions in 1-normal form and different from , we define the abstract join as the conjunction of the following constraints:
-
•
all constraints , , which occur both in and ;
-
•
all constraints , , where occurs in ;
-
•
all constraints , , where occurs in .
Then we have:
Theorem 4.2
Assume that is a lattice.
-
1.
If is a conjunction in 1-normal form, then for every subset , is given by where the latter conjunction is again in 1-normal form.
-
2.
For in 1-normal form, is the least upper bound of in .
-
3.
The domain is a 2-decomposable relational domain. ∎
While statement (1) of Theorem 4.2 remains true also for bounded conjunctions over a bounded-complete po, the least upper bound of two bounded conjunctions need no longer be bounded, as the least upper bounds of the respective upper bounds need not exist. For the prefix ordering on , e.g., we have
i.e., all information about upper bounds is lost.
4.2 The General Case
For general (even finite) partial orders, the dedicated constructions for lattices cannot be directly applied. Already the problem of determining whether or not a conjunction is satisfiable, turns out to be surprisingly difficult. Assume that elements in can be represented and compared in polynomial time. Then we find:
Theorem 4.3
The problem of determining for a given partial order and a conjunction , whether is satisfiable over , is NP-complete.
Proof
Since a satisfying assignment for a conjunction can be guessed in polynomial time, it remains to prove the hardness part. For that, consider the problem of 3-colorability of an undirected finite graph . Let be an enumeration of the vertices in . Then, we construct a partial order consisting of the elements
where the partial ordering of is the least partial order satisfying
For , we define a conjunction in the variables , by
Both and can be constructed from in polynomial time. Moreover, it holds that iff for some coloring with . It follows that is satisfiable iff has a 3-coloring. In summary, we obtain a polynomial time reduction from the problem of 3-colorability of undirected finite graphs into satisfiability of finite conjunctions over some partial order. This concludes the proof. ∎.
For general partial orders , however, we still may rely on the 0-normal form and otherwise perform the same constructions as we did for lattices with the 1-normal form. Thus, we define an abstract ordering by
(12) |
Let us denote the resulting abstract domain by . We have:
Theorem 4.4
For an arbitrary po , the following holds:
-
1.
If a conjunction is satisfiable over then .
-
2.
For all conjunctions , implies that .
∎
For arbitrary po , we define the abstract projection in the same way as for conjunctions over a lattice – only that we now rely on formulas in 0-normal form. For such a formula the projection onto a subset of variables, is again defined by removing all constraints mentioning variables not in .
It is for the abstract join operation that we must find a more general definition, since least upper bounds or greatest lower bounds of sets of values in are no longer at hand. Assume that are in 0-normal form and different from . Then, we define the abstract join as the conjunction of the following constraints
-
•
all constraints , , which occur both in and ;
-
•
all constraints , , where occurs in for and ;
-
•
all constraints , , where occurs in for and .
This definition essentially amounts to keeping those ordering constraints between variables in which and agree and only keep a lower or upper bound if it is more liberal than a corresponding bound of the other formula.
Example 6
For the po with the substring ordering, consider the formulas
Then, according to our definition,
∎
With these definitions, the binary operation returns the least upper bound of its arguments w.r.t. the ordering . Moreover, turns into a 2-decomposable relational domain as well.
Theorem 4.5
For every po , is a 2-decomposable relational domain. ∎
4.3 Directed Domains with Disjunctions
Subsequently, we extend the relational domain for lattices (resp. for arbitrary po’s) with disjunctions. This extension corresponds to the disjunctive completion of (resp. ) Cousot and Cousot (1992). The elements of the resulting relational domain are disjunctions of normal form conjunctions (1-normal forms if is a lattice, and 0-normal forms in general) where for , the restriction of the disjunction is defined as the disjunction of the restrictions of the normal form conjunctions contained in . By definition, restrictions therefore are distributive. Let (resp. ) denote the resulting relational abstract domains. If is infinite, these relational domains have infinite strictly ascending chains, and therefore must have also strictly descending chains of unbounded length. For the lattice , e.g., there are even infinite strictly descending chains, e.g.,
Nonetheless, we have:
Proposition 6
-
1.
For every po , is 2-nice.
-
2.
For every lattice , is 2-nice.
Proof
Let denote an arbitrary collection with . Consider an arbitrary formula from the set . It consists of disjunctions of conjunctions each of which may only mention variables from or constants occurring in any of the . Since the number of these formulas is finite, statement (1) follows.
The proof of the second statement is analogous – only that the occurring constants now may also be finite meets of constants occurring in upper-bound constraints of the initial collection or finite joins of constants occurring in lower-boudn constraints. Still, the number of possible formulas remains finite. ∎
Due to Proposition 6, the construction from Section 3 can be applied resulting in the 2-decomposable relational domains (in case of lattices ) and (for arbitrary pos).
We exemplify the construction for the lattice of integers, i.e., for . One-variable properties expressible in this lattice are disjunctions of interval constraints such as
Two-variable properties expressible in this lattice are, e.g.,
Arbitrary elements in can be understood as representations of conjunctions of such properties.
Assume that we are given a collection with – which is not yet stable, and we would like to determine the corresponding stable collection by performing a fixpoint iteration to determine the greatest solution of Eq. 8. During that iteration, we only need to consider upper and lower bounds for each variable which have already occurred in the formulas . Therefore, the length of each intermediate formula is bounded by a polynomial in the input, and each unknown is updated only polynomially often. As a consequence, all operations abstract join, abstract meet and abstract projection for are polynomial. For arbitrary lattice or po , we may proceed analogously. Efficiency of the fixpoint iteration, though, remains to be checked separately for every .
4.4 Assignments
Let us turn to the construction of abstract transformers for assignments. We only describe these for the relational domains and , respectively. We first consider three simple cases: assignments of unknown values; assignments of constants; and copying one variable into the other.
(13) |
for and with . Again, we realize the assignment of unknown values by restriction. For assigning constants and variables, we remark that equality can be expressed via a pair of inequalities.
Individual partial orders, though, may support further forms of right-hand sides in assignments. Subsequently, we enumerate more general forms of assignments for sets and for the prefix, substring, and scattered substring partial orders on strings.
- Sets.
-
For sets, we consider right-hand sides of the form or for with . We define
Thus, we obtain after the assignment as new upper (lower) bounds of in terms of the variables and . An analogous construction can also be applied to multisets. We remark that the given right-hand sides do not entail that the equalities and , respectively, hold after the assignments.
- Prefixes.
-
In this case, right-hand sides of interest are concatenations of a constant or variable, possibly followed by some further value, i.e., are of the form for either in , or in , with “?” again denoting unknown input. We define
i.e., we only obtain information about lower bounds for after the assignment but lose all information about upper bounds.
- Substrings.
-
Again, we consider right-hand sides which are concatenations of constants or variables with further values. These now are of the form (). We define
For scattered substrings, we proceed similarly. In both cases, no information is obtained for upper bounds to the left-hand side variable after the assignment.
So far, we have assumed that the right-hand side does not contain the variable from the left-hand side. In case that occurs in , we split the assignment into the sequence
for some fresh variable tmp, i.e., first store the value of the right-hand side in tmp whose value only then is assigned to the left-hand side variable .
These abstract tranformers for the relational domains (resp. ) are readily lifted to corresponding transformers for the weakly relational domains (resp. ).
4.5 Guards and Negated Inequalities
Let us now turn to a treatment of guards for the directed domain where is a lattice. The case for (when is not a lattice) is analogous.
A condition which consists of an inequality for being variables or constants already represents an abstract relation. Therefore, Eq. 2 can be used to define the abstract effect of .
If the condition is a negated inequality , this is not immediately possible. Assume that the variables occurring in all occur in . Now consider an arbitrary element . In particular, , i.e., for conjunctions all using variables from only. In this case, we define
Thus, the negated inequality allows to improve the abstract relation by possibly removing those conjuncts from which contradict .
5 Conclusion
We considered a construction of 2-decomposable relational domains from arbitrary relational domains and exemplified this construction by deriving 2-disjunctive constants from the relational domain of disjunctive constants. For 2-disjunctive constants, it turned out that normalization is prohibitively expensive. Therefore, we provided a second general construction of 2-decomposable relational domains, now based on greatest solutions of constraint systems, which – in the case of disjunctive constants – results in a 2-decomposable domain where the operations join, meet, and restriction are polynomial.
In the second part, we then considered directed domains as conjunctions of inequalities over lattices or general partial orders. For lattices, we provided the 1-normal form for a syntactic characterization of semantic equivalence. We showed that the resulting domain is 2-decomposable and provided precise polynomial algorithms for 1-normalization, projection, join, and meet. For arbitrary partial orders, we use a weaker form of normalization for constructing a weaker 2-decomposable relational domain, for which we again provided polynomial algorithms, now for 0-normalization, projection, join, and meet. Only in the very last step, we added disjunctions by applying the general construction of 2-decomposable domain based on approximate normalization from the previous section. Both for 2-disjunctive constants and for directed domains, we indicated how transfer functions for assignments and guards can be constructed.
Our results can be extended in several directions. In the case of constants, one may, e.g., additionally, track equalities as well as disequalities between variables; likewise for directed domains, an extensive study of the impact of negated inequalities could be of interest. Here, we only studied lattice operations and transfer functions. Directed domains, though, may have infinite strictly ascending chains. Therefore, tailored widening and narrowing operators are of interest when these domains are employed for practical static analysis.
Acknowledgements.
This work has been supported by Shota Rustaveli National Science Foundation of Georgia under the project FR-21-7973 and by Deutsche Forschungsgemeinschaft (DFG) – 378803395/2428 ConVeY.
References
- Abdulla et al. (2019) Abdulla, P.A., Atig, M.F., Diep, B.P., Holík, L., Janku, P.: Chain-free string constraints. In: Chen, Y., Cheng, C., Esparza, J. (eds.) Automated Technology for Verification and Analysis - 17th International Symposium, ATVA 2019, Taipei, Taiwan, October 28-31, 2019, Proceedings, Lecture Notes in Computer Science, vol. 11781, pp. 277–293. Springer (2019). URL https://doi.org/10.1007/978-3-030-31784-3_16
- Albert et al. (2014) Albert, E., Arenas, P., Genaim, S., Puebla, G., Román-Díez, G.: Conditional termination of loops over heap-allocated data. Sci. Comput. Program. 92, 2–24 (2014). URL https://doi.org/10.1016/j.scico.2013.04.006
- Arceri et al. (2022) Arceri, V., Olliaro, M., Cortesi, A., Ferrara, P.: Relational string abstract domains. In: Finkbeiner, B., Wies, T. (eds.) Verification, Model Checking, and Abstract Interpretation - 23rd International Conference, VMCAI 2022, Philadelphia, PA, USA, January 16-18, 2022, Proceedings, Lecture Notes in Computer Science, vol. 13182, pp. 20–42. Springer (2022). URL https://doi.org/10.1007/978-3-030-94583-1_2
- Bagnara et al. (2008) Bagnara, R., Hill, P.M., Zaffanella, E.: An improved tight closure algorithm for integer octagonal constraints. In: Logozzo, F., Peled, D.A., Zuck, L.D. (eds.) Verification, Model Checking, and Abstract Interpretation, pp. 8–21. Springer Berlin Heidelberg, Berlin, Heidelberg (2008)
- Bagnara et al. (2009) Bagnara, R., Hill, P.M., Zaffanella, E.: Weakly-relational shapes for numeric abstractions: improved algorithms and proofs of correctness. Formal Methods Syst. Des. 35(3), 279–323 (2009). URL https://doi.org/10.1007/s10703-009-0073-1
- Beckert et al. (2000) Beckert, B., Hähnle, R., Manyà, F.: The 2-sat problem of regular signed CNF formulas. In: 30th IEEE International Symposium on Multiple-Valued Logic, ISMVL 2000, Portland, Oregon, USA, May 23-25, 2000, Proceedings, pp. 331–336. IEEE Computer Society (2000). URL https://doi.org/10.1109/ISMVL.2000.848640
- Chawdhary et al. (2019) Chawdhary, A., Robbins, E., King, A.: Incrementally closing octagons. Formal Methods Syst. Des. 54(2), 232–277 (2019). URL https://doi.org/10.1007/s10703-017-0314-7
- Chen et al. (2018) Chen, T., Chen, Y., Hague, M., Lin, A.W., Wu, Z.: What is decidable about string constraints with the replaceall function. Proc. ACM Program. Lang. 2(POPL), 3:1–3:29 (2018). URL https://doi.org/10.1145/3158091
- Cousot and Cousot (1992) Cousot, P., Cousot, R.: Abstract interpretation frameworks. Journal of logic and computation 2(4), 511–547 (1992)
- Cousot and Halbwachs (1978) Cousot, P., Halbwachs, N.: Automatic discovery of linear restraints among variables of a program. In: Aho, A.V., Zilles, S.N., Szymanski, T.G. (eds.) Conference Record of the Fifth Annual ACM Symposium on Principles of Programming Languages, Tucson, Arizona, USA, January 1978, pp. 84–96. ACM Press (1978). URL https://doi.org/10.1145/512760.512770
- Day et al. (2023) Day, J.D., Ganesh, V., Grewal, N., Manea, F.: On the expressive power of string constraints. Proc. ACM Program. Lang. 7(POPL), 278–308 (2023). URL https://doi.org/10.1145/3571203
- Dor et al. (2001) Dor, N., Rodeh, M., Sagiv, S.: Cleanness checking of string manipulations in C programs via integer analysis. In: Cousot, P. (ed.) Static Analysis, 8th International Symposium, SAS 2001, Paris, France, July 16-18, 2001, Proceedings, pp. 194–212. Springer, LNCS 2126 (2001). URL https://doi.org/10.1007/3-540-47764-0_12
- Ganesh et al. (2011) Ganesh, V., Minnes, M., Solar-Lezama, A., Rinard, M.: What is decidable about strings? (2011)
- Karr (1976) Karr, M.: Affine relationships among variables of a program. Acta Informatica 6, 133–151 (1976). URL https://doi.org/10.1007/BF00268497
- Miné (2001) Miné, A.: The octagon abstract domain. In: WCRE’ 01, p. 310. IEEE Computer Society (2001). DOI 10.1109/WCRE.2001.957836
- Miné (2004) Miné, A.: Weakly relational numerical abstract domains. (domaines numériques abstraits faiblement relationnels). Ph.D. thesis, École Polytechnique, Palaiseau, France (2004). URL https://tel.archives-ouvertes.fr/tel-00136630
- Miné (2006) Miné, A.: The octagon abstract domain. Higher Order Symbol. Comput. 19(1), 31–100 (2006). URL https://doi.org/10.1007/s10990-006-8609-1
- Müller-Olm and Seidl (2004) Müller-Olm, M., Seidl, H.: Precise interprocedural analysis through linear algebra. In: Jones, N.D., Leroy, X. (eds.) Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2004, Venice, Italy, January 14-16, 2004, pp. 330–341. ACM (2004). URL https://doi.org/10.1145/964001.964029
- Müller-Olm and Seidl (2007) Müller-Olm, M., Seidl, H.: Analysis of modular arithmetic. ACM Trans. Program. Lang. Syst. 29(5), 29 (2007). URL https://doi.org/10.1145/1275497.1275504
- Sankaranarayanan et al. (2005) Sankaranarayanan, S., Sipma, H.B., Manna, Z.: Scalable analysis of linear systems using mathematical programming. In: Cousot, R. (ed.) Verification, Model Checking, and Abstract Interpretation, LNCS, vol. 3385, pp. 25–41. Springer, Berlin, Heidelberg (2005)
- Schwarz et al. (2023) Schwarz, M., Saan, S., Seidl, H., Erhard, J., Vojdani, V.: Clustered relational thread-modular abstract interpretation with local traces. In: Wies, T. (ed.) Programming Languages and Systems - 32nd European Symposium on Programming, ESOP 2023, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2023, Paris, France, April 22-27, 2023, Proceedings, Lecture Notes in Computer Science, vol. 13990, pp. 28–58. Springer (2023). URL https://doi.org/10.1007/978-3-031-30044-8_2
- Schwarz and Seidl (2023) Schwarz, M., Seidl, H.: Octagons revisited. In: Hermenegildo, M.V., Morales, J.F. (eds.) Static Analysis, pp. 485–507. Springer Nature Switzerland, Cham (2023)
- Simon et al. (2002) Simon, A., King, A., Howe, J.M.: Two variables per linear inequality as an abstract domain. In: Leuschel, M. (ed.) Logic Based Program Synthesis and Transformation, 12th International Workshop, LOPSTR 2002, Madrid, Spain, September 17-20,2002, Revised Selected Papers, LNCS, vol. 2664, pp. 71–89. Springer (2002). URL https://doi.org/10.1007/3-540-45013-0_7
- Yu et al. (2011) Yu, F., Bultan, T., Hardekopf, B.: String abstractions for string verification. In: Groce, A., Musuvathi, M. (eds.) Model Checking Software - 18th International SPIN Workshop, Snowbird, UT, USA, July 14-15, 2011. Proceedings, Lecture Notes in Computer Science, vol. 6823, pp. 20–37. Springer (2011). URL https://doi.org/10.1007/978-3-642-22306-8_3