Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Automated Reasoning in ALCQ via SMT Volker Haarslev∗ , Roberto Sebastiani+ , and Michele Vescovi+ ∗ CSE, Concordia University, Montreal, haarslev@cse.concordia.ca + DISI, Università di Trento, {rseba, vescovi}@disi.unitn.it Abstract. Reasoning techniques for qualified number restrictions (QNRs) in Description Logics (DLs) have been investigated in the past but they mostly do not make use of the arithmetic knowledge implied by QNRs. In this paper we propose and investigate a novel approach for concept satisfiability in acyclic ALCQ ontologies. It is based on the idea of encoding an ALCQ ontology into a formula in Satisfiability Modulo the Theory of Costs (SMT(C)), which is a specific and computationally much cheaper subcase of Linear Arithmetic under the Integers, and to exploit the power of modern SMT solvers to compute every conceptsatisfiability query on a given ontology. We implemented and tested our approach, which includes a very effective individuals-partitioning technique, on a wide set of synthesized benchmark formulas, comparing the approach with the main stateof-the-art DL reasoners available. Our empirical evaluation confirms the potential of the approach. 1 Introduction Description logics (DLs) form one of the major foundations of the semantic web and its web ontology language (OWL). In fact, OWL 2, a recent W3C recommendation, is a syntactic variant of a very expressive DL that supports reasoning with so-called qualified number restrictions (QNRs). A sound and complete calculus for reasoning with the DL ALCQ that adds QNRs to the basic DL ALC was first proposed in [9]. For example, this calculus decides the satisfiability of an ALCQ concept (≥5 s.C ⊓≥5 s.D⊓≤2 s.E) by trying to find a model with fillers for the role s such that at least 5 fillers are instances of C, at least 5 fillers are instances of D, and at most 2 fillers are instances of E. It satisfies the at-least restrictions by creating 10 fillers for S, 5 of which are instances of C and 5 are instances of D. A concept choose rule non-deterministically assigns E or ¬E to these fillers. In case the at-most restriction (≤2 s.E) is violated a merge rule non-deterministically merges pairs of fillers for s that are instances of E [9]. Searching for a model in such an arithmetically uninformed way can become very inefficient especially when bigger numbers occur in QNRs or several QNRs interact. To the best of our knowledge this calculus still serves as reference in most tableau-based OWL reasoners (e.g., Pellet [15], FaCT++ [16]) for implementing reasoning about QNRs. The only exception is Racer [7] where conceptual QNR reasoning is based on an algebraic approach [8] that integrates integer linear programming with DL tableau methods. The work presented in this paper was inspired by two recent novel approaches, combined with the progress in satisfiability modulo theory (SMT) solving techniques. First, [13,14] explored the idea of performing automated reasoning tasks in DLs by ⊥I = ∅, ⊤I = ∆I , (¬C)I = ∆I \ C I , (C ⊓ D)I = C I ∩ DI , (C ⊔ D)I = C I ∪ DI , (∃r.C)I = {x ∈ ∆I | there exists y ∈ ∆I s.t. (x, y) ∈ rI and y ∈ C I }, (∀r.C)I = {x ∈ ∆I | for all y ∈ ∆I s.t. (x, y) ∈ rI then y ∈ C I }, (≥nr.C)I = {x ∈ ∆I | |F IL(r, x) ∩ C I | ≥ n}, (≤mr.C)I = {x ∈ ∆I | |F IL(r, x) ∩ C I | ≤ m}, C ⊑ D is satisfied iff C I ⊆ DI Fig. 1: Syntax and semantics of ALCQ (n ≥ 1 and m ≥ 0). encoding problems into Boolean formulas and by exploiting the power of modern SAT techniques. In particular, the experiments in [13] showed that, in practice and despite the theoretical worst-case complexity limits, this approach could handle most or all the ALC satisfiablity problems which also the other approaches could handle, with performances which were comparable with, and often better than, those of state-of-theart tools. Second, a revised and extended algebraic approach was presented for SHQ [6] and SHOQ [4]. These approaches represent knowledge about interacting QNRs as systems of linear inequations where numerical variables represent cardinalities of sets of domain elements (e.g., role fillers) divided into mutually disjoint decompositions. On a set of synthetic QNR benchmarks these algebraic approaches demonstrated a superior performance for most test cases [6,5]. The main idea of this paper is thus to encode an ALCQ ontology into a formula in Satisfiability Modulo the Theory of Costs (SMT(C)) [3], which is a specific and computationally much cheaper subcase of Linear Arithmetic under the Integers (LA(Z)), and to exploit the power of modern SMT solvers to compute every concept-satisfiability query on a given ontology. We have implemented and tested our approach (called ALCQ2SM TC ) that includes a very effective individuals-partitioning technique on a wide set of synthesized benchmark formulas and compared it with main state-of-the-art OWL reasoners. Our empirical evaluation demonstrates the potential of our approach and, compared with the tested OWL reasoners, demonstrates a significantly better performance in the case of benchmarks having multiple/balanced sources of complexity. 2 Background 2.1 The Description Logic ALCQ The logic ALCQ extends the well-known logic ALC by adding qualified number restrictions (QNRs). In more details, the concept descriptions in ALCQ (namely Ĉ, D̂, . . .) are inductively defined through the constructors listed in Figure 1, starting from the non-empty and pair-wise disjoint sets of concept names NC (denoted by the letters A, B, C, . . .) and role names NR (denoted by the letters r, s, . . .). It allows for negations, conjunctions/disjunctions, existential/universal restrictions and, indeed, QNRs. An ALCQ TBox (or ontology) is a finite set of general concept inclusion (GCI) axioms as defined in Figure 1. Given a TBox T , we denote with BCT the set of the basic concepts for T , i.e. the smallest set of concepts containing: (i) the top and the bottom concepts ⊤ and ⊥; (ii) all the concepts of T in the form C and ¬C where C is a concept name in NC . We denote the basic concepts in BCT with the letters C, D, . . . (thus, C may represent a concept ¬C ′ with C ′ ∈ BCT ), whilst we use Ĉ, D̂, . . . for complex concepts, i.e. Ĉ, D̂ 6∈ BCT . Our approach is currently restricted to acyclic (or unfoldable) TBoxes. We call a TBox T acyclic if there exist no cyclic dependencies between its concept names, i.e., named concepts are neither defined directly or indirectly in terms of themselves through the axioms in T . Semantics. The semantics of ALCQ is defined in terms of interpretations. An interpretation I is a couple I = (∆I , ·I ), where ∆I is the domain (i.e. a non-empty set of individuals), and ·I is the interpretation function which maps each concept name (atomic concept) A ∈ NC to a set AI ⊆ ∆I and maps each role name (atomic role) r to a binary relation rI ⊆ ∆I × ∆I . In Figure 1 the inductive extensions of ·I to arbitrary concept descriptions are defined, where n and m are positive integer values and F IL(r, x) is the set of the r-fillers of the individual x ∈ ∆I for the role r ∈ NR and is defined as F IL(r, x) = {y ∈ ∆I |(x, y) ∈ rI }. An interpretation I is a model of a given TBox T if and only if the conditions given in Figure 1 are respected for every axiom in T ; when this is the case, the TBox T is said to be consistent. A concept Ĉ is said to be satisfiable wrt. T if and only if there exists a model I of T with Ĉ I 6= ∅, i.e. there exists an individual x ∈ ∆I as an instance of Ĉ, i.e. such that x ∈ Ĉ I . Normal Form. We assume wlog. that all ALCQ concept descriptions are in negative normal form (NNF), i.e. negation signs only occurs in front of concept names (see [17] for details). Then, for the sake of an easier exposition, we restrict our attention to those ALCQ TBoxes in which all axioms are in the following normal form: C⊑D ⊓ i Ci ⊑ D C ⊑ ⊓i D i ℜr.C ⊑ D C ⊑ ℜr.D (1) with ℜ ∈ {∀, ≥ n, ≤ m} s.t. n, m ≥ 1, and C, Ci , D, Di ∈ BCT . 1 Every given TBox T can be turned into a normalized TBox T ′ (where all concept description in T ′ are in NNF) that is a conservative extension of T by introducing new concept names. The transformation of a TBox T into T ′ can be done in linear time, and the size of T ′ is linear wrt. the size of T . We call every non-conjunctive and non-disjunctive concept description occurring in the concept inclusions of T ′ a normal concept of a normalized TBox T ′ ; we call NCT ′ the set of all the normal concepts of T ′ . For more details we refer the reader to [17]. 2.2 Satisfiablity Modulo Theory with Cost Functions Satisfiability Modulo (the) Theory T , SMT(T ), is the problem of deciding the satisfiability of a (typically) ground formula under a background theory T . Most state-of-the art SMT solvers are based on the lazy SMT schema: in a nutshell, a SAT solver is used to search for a truth assignment µ to the atomic subformulas of the input ground formula ϕ, s.t. µ tautologically entails ϕ and µ is found consistent in T by the T -solver. (We refer the reader to, e.g., [12] for details and further references.) 1 In particular, we avoid redundant existential and at-most restrictions that are replaced by their following equivalents: ∃r.C =⇒ ≥1r.C and ≤0r.C =⇒ ∀r.nnf (¬C). The work in [3] addresses the problem of the satisfiability in some theory T of a formula ϕ augmented with a set of cost functions {cost1 , ..., costN } s.t., for every i: costi = P Ni j=1 if-then-else(Aij , cij , 0), lbi < costi ≤ ubi , (2) Aij being Boolean atoms occurring in ϕ, and Ni , lbi , ubi , cij being integer values ≥ 0. P (Intuitively, in (2) costi = j Aij cij s.t. Aij ∈ {0, 1}.) The problem can be encoded into SMT(T ∪ LA(Z)). However, [3] remarked the inefficiency of such solution, which does not fully exploit the fact that the values of costi derive deterministically from the truth values of all the Aij ’s. They proposed instead a specific theory of costs C, which is much simpler and computationally much cheaper than LA(Z), and developed a specific very-fast T -solver for C. In a nutshell, C consists of: (i) a collection of integer variables cost1 , . . . , costN , that we call cost variables, denoting the output of the cost functions in (2); (ii) an interpreted predicate BC “bound cost” s.t. BC(costi , c) is true iff costi is upper-bounded by the integer value c; (i.e., iff costi ≤ c); (iii) an interpreted predicate IC “incur cost” s.t. IC(costi , cij , j) is true if the j-th element of sum (2) is cij , false if it is 0. Thus, ϕ is satisfiable in T under the cost constraints (2) iff the formula ϕ∧ VN i=1 (BC(costi , ubi ) ∧ ¬BC(costi , lbi ) ∧ VN i j=1 (Aij ↔ IC(costi , cij , j))) (3) is satisfiable in T ∪ C. A specific T -solver for C works simply by adding the value cij [resp. 0] to the current minimum value of costi and 0 [resp. cij ] to its current maximum when IC(costi , cij , j) (i.e. Aij ) is assigned to true [resp. false], and by checking if such minimum [resp. maximum] value of costi is smaller or equal than ubi [resp. greater or equal than lbi ]. We refer the reader to [3] for details and further references. 3 Concept Satisfiability via SMT with Costs 3.1 Encoding ALCQ into SMT(C) The encoding we propose simulates the construction of an interpretation I by introducing new individuals, assigning individuals to the interpretations of concepts in T , and counting their occurrences in the interpretations. We represent uniquely individuals in ∆I by means of labels σ, represented as non-empty sequences of positive integer values and role names in NR . A label σ can be either the label 1 or in the form σ ′ .r.n, with σ ′ another label, r ∈ NR and n ≥ 1. With a small abuse of notation, hereafter we may say “the individual σ” meaning “the individual labeled by σ”. Moreover, we call instantiated concept a pair hσ, Ci, s.t. σ ∈ ∆I and C is an ALCQ normal concept of T , representing the fact that the σ is an instance of C in the interpretation I, i.e. σ ∈ C I . We define Ah , i an injective function which maps one instantiated concept hσ, Ci s.t. C is not in the form ¬C ′ , into a Boolean variable Ahσ, Ci that we call concept variable. The so-called concept literal Lhσ, Ci , denotes ¬Ahσ, C ′ i if C is in the form ¬C ′ , Ahσ, Ci otherwise. The truth value of Lhσ, Ci states whether the instantiation relation between σ and C [resp. ¬C] holds, i.e. if hσ, Ci [resp. hσ, ¬Ci ] is an existing instantiated concept in I. We conventionally assume that Ahσ, ⊥i is ⊥. Notice also that hσ, ⊤i means σ ∈ ∆I , i.e. that if Ahσ, ⊤i is assigned to true then σ exists in ∆I . We informally say that σ (meaning hσ, ⊤i) or hσ, Ci is “enabled” when the respective literal is assigned to true. We define indiv a function which maps one instantiated concept hσ, ℜr.Ci, such that ℜ ∈ {≥ n, ≤ m} and C is a basic concept (since we are considering concepts in normal form), into a cost variable indivC σ.r in the Theory of Costs, that we call individuals cost variable. Notice that indiv is not injective since the same cost variable indivC σ.r is “shared” among all the instantiated concepts which refer both to the same σ and to QNRs involving the same r and C. However, notice also that hσ, ℜr.Ci and hσ, ℜr.¬Ci are mapped to different cost variables. The final value of the individuals cost variable indivC σ.r represents the number of individuals which are in relation with the individual σ via the role r and are in the interpretation of C, in other words the final value of indivC σ.r exactly represents the cardinality of F IL(r, σ) ∩ C I . Our encoding works by means of the following principles: – GCIs are represented via Boolean implications between instantiated concepts. – Every at-least restriction hσ, ≥nr.Ci is handled by introducing exactly n individuals σ.r.i associated to C. The existence of individuals is forced by binding each of them to an incur cost of value 1 for indivC σ.r , and then fixing a lower-bound for indivC σ.r . – When both at-least and at-most restrictions coexist wrt. σ, the encoding allows for sharing individuals separately introduced by distinct at-least restrictions. At-most restrictions are handled by fixing upper-bounds for the respective cost variables. – It mimics the construction of a labeled tableaux with the difference of the above exposed sharing of individuals which generalizes the merging of pairs of fillers to satisfy at-most QNR. Definition 1 (ALCQ2SM TC (T ) encoding). Let T be an acyclic ALCQ TBox in normal form. Wlog., we represent every axiom Ĉ ⊑ D̂ of T as ⊓i Ĉi ⊑ ⊔j D̂j where i, j ≥ 1 and i = 1 (resp. j = 1), with Ĉ1 (resp. D̂1 ) a normal concept, for every normal form (1) except for the second (resp. the third) one. The SMT(C) encoding T T ALCQ2SM TC (T ) for T is defined as the sextuple hΣ T , I− , I+ , Ah , i , indiv, ϕT i, where: – Σ T is the set of all the possible individuals introduced; T T represent respectively the set of the implicant (i.e. left-side) and implied , I+ – I− (i.e. right-side) instantiated concepts that must be encoded accordingly to their side; – Ah , i and indiv are the functions defined above; – ϕT is a CNF formula on propositional- and C-literals encoding T into SMT(C). We represent ϕT as the set of its clauses. 2 T T The sets Σ T , I− , I+ and ϕT are incrementally defined as the minimum sets s.t.: T T 1. Initialization. 1 ∈ Σ T , h1, ⊤i ∈ I− , h1, ⊤i ∈ I+ and (Ah1, ⊤i ) ∈ ϕT . T 2. Axioms initialization. If Ĉ ⊑ D̂ ∈ T , then {h1, Ci i | Ĉ = ⊓i Ci } ⊆ I− . 2 For better readability we often represent the clauses of ϕT as implications. 3. Axioms expansion. If σ ∈ Σ T , ⊓i Ci ⊑ ⊔j Dj ∈ T , {hσ, Ci i | Ĉ = ⊓i Ci } ⊆ T T I− ∪ I+ , then T { hσ, Dj i | D̂ = ⊔j Dj } ⊆ I+ , _ ^ ( Lhσ, Ci i ) → ( Lhσ, Dj i ) ∈ ϕT . (4) j i T with ℜ′ ∈ {≥ n′ , ≤ m′ , ∀}, 4. Handle left-side QNRs. If σ ∈ Σ T , hσ, ℜ′ .r.C ′ i ∈ I+ then T , ℜ ∈ {≥ n, ≤ m, ∀}. { hσ, ℜr.Ci | ℜr.C ⊑ D̂ ∈ T } ⊆ I− T 5. At-least restrictions: introduce individuals. If σ ∈ Σ T , hσ, ≥nr.Ci ∈ I+ then { σ.r.kiC | i = 1, . . . , n } ⊂ Σ T , T { hσ.r.kiC , Ci | i = 1, . . . , n } ∪ { hσ.r.kiC , ⊤i | i = 1, . . . , n } ⊂ I− , T C { IC(indivC σ.r , 1, ki ) → Lhσ.r.kiC , Ci | i = 1, . . . , n } ⊂ ϕ , { C IC(indivC σ.r , 1, ki ) T → Ahσ.r.kiC , ⊤i | i = 1, . . . , n } ⊂ ϕ , (5) (6) C T where k1C ≥ 1, ki+1 = kiC + 1 and kiC 6= kjD for every hσ, ≥n′ r.Di ∈ I+ with ′ C 6= D and i = 1, ..., n, j = 1, ..., n . We assume consecutive values for all the σ.r.j. 3 T 6. At-least restrictions: fix lower bounds. If σ ∈ Σ T , hσ, ≥nr.Ci ∈ I+ , then T ((Ahσ, ≥nr.Ci ∧ Ahσ, ⊤i ) → ¬BC(indivC σ.r , n − 1)) ∈ ϕ , (7) T if σ ∈ Σ T , hσ, ≥nr.Ci ∈ I− , then T ((¬BC(indivC σ.r , n − 1) ∧ Ahσ, ⊤i ) → Ahσ, ≥nr.Ci ) ∈ ϕ . (8) T 7. Coexisting at-least/at-most: sharing individuals. If σ ∈ Σ T , hσ, ≤mr.Ei ∈ I+ , ′ T T hσ, ≥nr.Ci ∈ I+ , hσ, ≥n r.Di ∈ I+ , with C 6= D, then T { hσ.r.kiC , Di | i = 1, . . . , n } ∪ { hσ.r.kiD , Ci | i = 1, . . . , n′ } ⊂ I− , C { IC(indivD σ.r , 1, ki ) → Lhσ.r.kiC , Di | i = 1, . . . , n } ∪ ′ T D { IC(indivC σ.r , 1, ki ) → Lhσ.r.kiD , Ci | i = 1, . . . , n } ⊂ ϕ , { C IC(indivD σ.r , 1, ki ) → Ahσ.r.kiC , ⊤i | i = 1, . . . , n } ∪ { D IC(indivC σ.r , 1, ki ) → Ahσ.r.kiD , ⊤i | i = 1, . . . , n′ } ⊂ ϕT . (9) (10) T then 8. At-most restrictions: count individuals. If σ ∈ Σ T , hσ, ≤mr.Ci ∈ I+ T { hσ.r.j, Ci | σ.r.j ∈ Σ T } ⊂ I− , T T { (Lhσ.r.j, Ci ∧ Ahσ.r.j, ⊤i ) → IC(indivC σ.r , 1, j) | σ.r.j ∈ Σ } ⊂ ϕ . 3 T Hence, either k1C = 1 or k1C = knD′ +1 for some hσ, ≥n′ r.Di ∈ I+ (11) T 9. At-most restrictions: fix upper bounds. If σ ∈ Σ T , hσ, ≤mr.Ci ∈ I+ , then T ((Ahσ, ≤mr.Ci ∧ Ahσ, ⊤i ) → BC(indivC σ.r , m)) ∈ ϕ , (12) T if σ ∈ Σ T , hσ, ≤mr.Ci ∈ I− , then T ((BC(indivC σ.r , m) ∧ Ahσ, ⊤i ) → Ahσ, ≤mr.Ci ) ∈ ϕ . (13) T 10. Universal restrictions. if σ ∈ Σ T , hσ, ∀r.Ci ∈ I+ , then T { hσ.r.j, Ci | σ.r.j ∈ Σ T } ⊂ I− { ((Ahσ, ∀r.Ci ∧ Ahσ.r.j, ⊤i ) → Lhσ.r.j, Ci ) | σ.r.j ∈ Σ T } ⊂ ϕT , (14) T if σ ∈ Σ T , hσ, ∀r.Ci ∈ I− , then T ((BC(indiv¬C σ.r , 0) ∧ Ahσ, ⊤i ) → Ahσ, ∀r.Ci ) ∈ ϕ . (15) Importantly, at the effect of the encoding, left-side at-most (and universal) restrictions behave as right-side at-least restrictions, and vice versa. Thus, for instance, the def T T instantiated concept hσ, ≤n − 1r.Ci ∈ I− [resp. hσ, ∀r.¬Ci ∈ I− , n = 1] must T be handled by the encoding as if it were the instantiated concept hσ, ≥nr.Ci ∈ I+ . In order to simplify the exposition, in Definition 1 and afterwards, we generically refer to at-least/at-most restrictions (respectively to the instantiated concepts hσ, ≥nr.Ci/ hσ, ≤mr.Ci) meaning the right-side ones, but implicitly including left-side at-most (or universal)/at-least restrictions, respectively. The interested reader can find in [17] the complete ALCQ2SM TC encoding and some encoding examples. The following facts concerning ALCQ2SM TC hold. (We refer the reader to [17] for the formal proofs.) Theorem 1. An ALCQ acyclic TBox T in normal form is consistent if and only if the SMT(C)-formula ϕT of ALCQ2SM TC (T ) (Definition 1) is satisfiable. Theorem 2. Given an ALCQ acyclic TBox T in normal form and the encoding T T ALCQ2SM TC (T ) = hΣ T , I− , I+ , Ah , i , indiv, ϕT i of Definition 1, every C ∈ BCT is satisfiable wrt. T iff ϕT ∧ Lh1, Ci is satisfiable. We remark on some facts about the encoding of Definition 1: – Point 4. is necessary to force the encoding of axioms having on the left-hand side restrictions wrt. the role r, when other restrictions wrt. r are involved. Such kind of axioms can create cycles in TBoxes (we remark that our encoding ensures termination for acyclic TBoxes). – In all the clauses of type (5), (6), (9), (10) and (5), (11), every IC-literal has cost value 1 and the same index of the bound individual. This ensures that IC-literals referring to distinct individuals/cost variables are represented by distinct atoms. – Due to the theory C clauses (7) and (12), are those concretely ensuring the numerical satisfiability of both at-least and at-most restrictions. In order to be satisfied: (i) a clause of type (7) forces some IC-literals to be assigned to true (thus (5), (9) work in only one direction); (ii) a clause of type (12), instead, bounds the number of ICliterals that can be enabled (motivating the opposite direction of (11)). Clauses (8) and (13) instead enforce the application of an axiom having a left-side QNRs if it is numerically satisfied. Notice that if, for the same σ, r and C, more than one restriction satisfies the conditions of point 5. with different values of n (being n∗ the highest of these values), then only exactly n∗ new individuals and n∗ clauses (5) and (6) are in ϕT . In contrast, one distinct clause (7) is in ϕT for every different value of n (the same holds for the clauses (12) in case of different values of m wrt. the same σ, r and C). ALCQ with general TBoxes has the finite tree model property [11], thus every satisfiable ALCQ concept is satisfiable in a finite interpretation (in this case of worstcase exponential size) which has the shape of a tree. Intuitively the individuals in Σ T form a super-tree of all such models. Let N represent the sum of the values occurring in the QNRs of T : a very coarse upper bound to the cardinality of Σ T is Θ(|T |N ), in fact the number of nested restrictions is bounded by the number of axioms of T while N bounds the number of branches in the tree for every nesting level. The size of ϕT is, instead, bounded by Θ(|T |2N ) because for every individual and every concept of T a fixed number of clauses can be introduced. In [17] we define a terminating queuebased algorithm building ALCQ2SM TC by means of expansion rules which mimic Definition 1. Since we are restricted to acyclic TBoxes it is ensured that our encoding algorithm terminates even without introducing blocking techniques [1], in particular, the proposed algorithm is polynomial in the size of the SMT(C) formula produced. 4 Partitioning Individuals One potential drawback of the basic ALCQ2SM TC is the high number of individuals introduced, that is linear wrt. the values occurring in the at-least restrictions. This number can increase exponentially when nested restrictions must be encoded, significantly impacting on the size and on the hardness of the resulting SMT(C) formula. However, similarly to the hybrid approach of [6,4], we can cope with this problem by encoding groups of individuals having identical properties (instead of using single ones) and by using only one “proxy” individual as representative of the group. We aim at partitioning the individuals introduced in Definition 1 on the basis of the following considerations: 4 – Individuals are naturally pre-partitioned in groups wrt. r and the predecessor σ. 4 Notice that here we present a different partitioning that avoids the a-priori exponential number of partitions in [6,4] (wrt. the number of coexisting QNRs). In our case, we consider the whole set of individuals necessary to trivially satisfy all the coexisting at-least restrictions, then, only on the basis of the numbers involved in QNRs, we compute a partitioning of such a set, where the target of our approach is to decide which partitions of individuals belong to a concept interpretation. – If, given σ, r, no at-most restriction exists, all the fillers σ.r.kiC referring to one at-least restriction P can be represented by one single proxy individual. – Otherwise, the j nj distinct individuals introduced by some hσ, ≥nj r.Cj i can still be partitioned, but the partitioning must allow for representing possible intersections between the CjI . In the latter case not all possible cardinalities of the intersections must be considered. Instead, it is sufficient to distinguish between the empty intersection and some “limit” cases depending on the values occurring in the QNRs. To sum up, given σ, r, we can compute a partitioning of the individuals referring to σ and r by taking into account the values of the restrictions which concern σ and r. Example 1. Suppose that it is necessary to encode the restrictions: hσ, ≥10r.Ci and hσ, ≥1000r.Di. The basic ALCQ2SM TC encoding would introduce 1010 distinct individuals. Applying the idea explained above, instead, we could divide these 1010 individuals in, e.g., three partitions of respectively 10, 990 and again 10 individuals. If, for example, also hσ, ≤1005r.⊤i must be encoded, then the last 10 individuals could be further divided into two distinct partitions. This partitioning allows for representing the cases in which 0, 5, 10, 15, 20, 990, 995, 1000, 1005 or 1010 of these individuals exist in ∆I (being part or not of C I and/or DI ). Even if not exhaustive these combinations are enough to represent the significant cases concerning satisfiability. 4.1 Smart Partitioning In order to handle partitions of individuals we extend ALCQ2SM TC with cumulative labels and proxy individuals. Given a normal/cumulative label σ ′ and a role r, a cumulative label σ ′ .r.(i → j) represents a group of consecutive individuals by means of the range of integer values i → j, with i ≤ j, thus it represents a set of individuals whose cardinality is j − i + 1. When i = j we can both write σ ′ .r.(i → i) and σ ′ .r.i. With a small abuse of notation, in the following we call proxy individual any σ.r.(i → j), meaning both: (i) the cumulative label representing the set of individuals σ.r.i, σ.r.i+1, . . . , σ.r.j and (ii) that σ.r.(i → j) can be one/any of these individuals acting as proxy for all the other individuals of the set. The idea is to compute a “smart” partitioning of the individuals to be encoded into ALCQ2SM TC . With “smart” we mean a “safe but as small as possible” partitioning, i.e. with “a small” number of partitions but “safely” preserving the semantics of the problem, so that the cardinality of the computed partitions allow for representing every relevant case wrt. satisfiability. We formally define our smart partitioning: Definition 2. Let T being an acyclic ALCQ TBox in normal form. Given ALCQ2SM TC (T ) (Definition 1), σ ∈ Σ T and r ∈ NR we define the arrays: 5 def T ≥ }i = { ni | hσ, ≥ni r.Ci i ∈ I+ Nσ.r ≤ def ={ Nσ.r 5 6 and T }j . mj | hσ, ≤mj r.Dj i ∈ I+ ≤ ≥ ] as many times [resp. Nσ.r With array we mean that equal ni [resp. mj ] values repeat in Nσ.r as they occur in the involved QNRs. ≥ ≤ From Nσ.r and Nσ.r , respectively, we define the integer values: def def ≥ ≤ Nσ.r = Σni ∈Nσ.r and Nσ.r = Σmj ∈Nσ.r ≥ ni ≤ mj . def ≥ ≤ ≥ We define the set Pσ.r = Pσ.r ∪ Pσ.r as the smart partitioning for the Nσ.r individuals of Σ T in the form σ.r.k, where: def ≥ def ≤ ≥ = { nS | S ∈ 2Nσ.r , nS = Σnk ∈S nk } and Pσ.r ≤ = { mS | S ∈ 2Nσ.r , mS = Σmk ∈S mk }. 7 Pσ.r Finally, we define pi ∈ Pσ.r the i-th sorted element of Pσ.r , so that pi < pi+1 . We have ≥ ≤ in particular: p1 = 0 and p|Pσ.r | = max{Nσ.r , Nσ.r }. ≥ ≤ As Pσ.r , Pσ.r , Pσ.r are sets, equal values are uniquely represented in them. Given σ, r, and assuming to include in each partition consecutive individuals among ≥ σ.r.1, ..., σ.r.Nσ.r , then Pσ.r is the set containing the indexes of the last individual of all the partitions. Hence, partitions can be represented by the proxy individuals σ.r.(pj−1 + 1 → pj ), with j > 1. For instance, notice that the partitionings shown in Example 1 are computed in accordance with Definition 2. We remark that Definition 2 defines a safe 8 partitioning, in fact: – it takes into account all the values in QNRs for σ, r; – it considers all the possible sums of the values ni [resp. mj ] for all the at-least [resp. at-most] restrictions, which allows for representing all the possible lower-bounds [resp. upper-bounds] in case of disjoint concept interpretations; ≥ ≤ – the union of Pσ.r , Pσ.r represents the combination of lower- and upper-bounds; – by sorting all the possible sums and by using the distance between these values (a partition ranges from pj−1 + 1 to pj ) as the cardinality of the partitions, it allows for representing all the possible intersecting concept interpretations. We remark that partitioning makes our approach independent from the magnitude/offset of the values occurring in QNRs. 4.2 Exploit Smart Partitioning in ALCQ2SM TC Using partitions and proxy individuals does not affect the ALCQ2SM TC encoding, because the Theory of Costs allows for arbitrary incur costs. We can enhance Definition 1 by taking advantage of smart partitioning as follows. First we assume that the T T sets Σ T , I− , I− and the functions Ah , i , indiv are defined consistently with the use of proxy individuals. Second, assuming to compute the partitioning Pσ.r of Definition 2 for every σ, r, we modify ALCQ2SM TC as follows: 6 7 8 T T T , while hσ, ≤ni −1r.Ci i ∈ I− must be considered like hσ, ≤0r.¬Ci i ∈ I− hσ, ∀r.Ci i ∈ I− T must be considered like hσ, ≥ni r.Ci i ∈ I+ and vice versa. Being 2X the power set for the set/array X. I.e. it preserves the semantics of the problem wrt. satisfiability. – The n clauses of the types (5) and (6) at point 5. are replaced by the following: T { IC(indivC σ.r , costj , idxj ) → Lhσproxyj , Ci | pj ∈ Pσ.r , 0 < pj ≤ n } ⊂ ϕ , T { IC(indivC σ.r , costj , idxj ) → Ahσproxyj , ⊤i | pj ∈ Pσ.r , 0 < pj ≤ n } ⊂ ϕ , costj = pj −p(j−1) , idxj = k1C +p(j−1) , σproxyj = σ.r.k1C +p(j−1) → k1C +pj −1. – Clauses (9), (10) at point 7. are modified accordingly. – The clauses (11) defined at point 8. must take into account the use of proxy individuals and of incur costs potentially bigger than 1. Hence they are replaced by: T {(Lhσ.r.(i → j), Ci ∧Ahσ.r.(i → j), ⊤i ) → IC(indivC σ.r , j−i+1, i) | σ.r.(i → j) ∈ Σ }. – Clauses (14) at point 10. are modified in the same way, handling proxy individuals. We make the following observations: – If, for σ, r, the conditions of point 7. of Definition 1 do not hold (e.g. no at-most restriction exists), then an even more efficient partitioning requires only the following two clauses for every hσ, ≥nr.Ci: T C IC(indivC σ.r , n, k1 ) → Lhσ.r.(k1C→k1C+n−1), Ci , ∈ ϕ , C T IC(indivC σ.r , n, k1 ) → Ahσ.r.(k1C→k1C+n−1), ⊤i ∈ ϕ . – Otherwise, if the conditions of point 7. hold, then ϕT contains all the clauses: { IC(indivC σ.r , pj −pj−1 , pj−1 +1) → Lhσ.r.(pj−1+1→ pj ), Ci | pj ∈ Pσ.r , j > 1} ∪ { IC(indivC σ.r , pj −pj−1 , pj−1 +1) → Ahσ.r.(pj−1+1→ pj ), ⊤i | pj ∈ Pσ.r , j > 1} for every hσ, ≥nr.Ci, as consequence of point 5. and of the sharing of (proxy) individuals performed at point 7. An exponential-time algorithm computing the smart partitioning Pσ.r (Definition 2) for every given individual σ and the role r is presented in [17]. Taken as input ≥ ≤ the arrays Nσ.r and Nσ.r , the algorithm is shown to have worst-case complexity ≥ ≤ max |Nσ.r |,|Nσ.r | O(2 ). 5 Empirical Evaluation We have implemented the encoder ALCQ2SMT in C++; smart partitioning (§4) can be enabled optionally (denoted with S.P. hereafter). In combination with ALCQ2SMT, we have used M ATH S AT (v. 3.4.1) [2] that is the SMT-solver including the Theory of Costs [3]. We have evaluated the effectiveness of our novel approach by performing an empirical test session on about 700 synthesized9 and parameterized ALCQ-concept 9 Due to lack of space we refer the reader to Section 6.1 in [6] for a discussion on why real-world ontologies are not yet available as suitable QNR benchmarks. satisfiability problems adapted from [6], plus more. In order to compare with the available state-of-the-art reasoners we have executed the following tools on every test case: FACT++ (v. v1.4.0) [16], P ELLET (v. 2.1.1) [15], and R ACER (RacerPro 1-9-0) [7]. All the results presented in this section have been obtained on a 64bit biprocessor dual-core IntelXeon2.66GHz machine, with 16GB of RAM. We set a 1000 seconds timeout for every concept satisfiability query. We also fixed a bound of 1GB of disk space for the SMT(C) encoding output from ALCQ2SMT. When reporting the results for one ALCQ2SMT+M ATH S AT configuration (either including S.P. or not), every CPU time reported is the sum of both the ALCQ2SMT encoding and the M ATH S AT solving time (both including the loading and parsing of the respective input problem). 10 Importantly, with all test problems, all tools under examination (including both the variants of ALCQ2SMT+M ATH S AT) agreed on the expected un/satisfiability results when terminating within the timeout, with the exception of P ELLET which incorrectly returned “sat” on some nested_restr_unsat problems. Test Description. For our empirical evaluation, we have made use of synthesized test cases adapted from those in [6]. The benchmark problems of [6] focus on concept expressions containing only QNRs and define different sets of indexed problems, increasingly stressing on different sources of complexity at the increase of the index i. Since the values occurring in QNRs are one of the sources of complexity which can strongly affect the performance of reasoning for some tools, wrt. the original test cases of [6] we further parameterized such values making them depend on a parameter n. Below we list the sources of complexity of the reasoning in ALCQconsidered in [6] with the relative test set names, the ranges of the indexes i and the values chosen for n in our empirical evaluation: 11 1. the size of the values occurring in QNRs (test cases: incr_lin_sat/unsati with i =1–100; incr_exp_sat/unsati with i =1–6, satisfiable/unsatisfiable); 2. the number of QNRs (test cases: restr_numi (n) with i =1–100, n = 1, 5, 50, satisfiable); 3. effect of backtracking (number of disjoint concepts) (test cases: backtrackingi (n) with i =1–20, n = 1, 2, 3, 10, unsatisfiable); 4. the ratio between the number of at-least restrictions and the number of at-most restrictions (test cases: restr_ratioi (n) with i =0–14, n = 1, 5, satisfiable); 5. the satisfiability versus the unsatisfiability of the input concept expression (test cases: sat_unsati (n) with i = 1, 2, 4, 6, . . . , 24, n = 1, 10, half-and-half). For the sake of fairness of the comparison, we introduced two novel groups of problems which we believe can stress the main limitations of our approach wrt. the competitors. These groups stress two sources of complexity which were not considered in [6]: 6. the variability of the values occurring in QNRs, i.e. in every restriction occurs a unique value (test cases: var_restr_numi (n) with i =1–100, n = 100, satisfiable); 10 11 To make the experiments reproducible, all the plots in full size, the tools, the problems, and the results are available at http://disi.unitn.it/~rseba/cade11/tests.tar.gz. The benchmark of [6] are defined for SHQ but we have adapted them to ALCQ by flattening all the role hierarchies to the only role r. The value of n originally used in [6] is underlined. 7. the number of nested QNRs (test cases: nested_restr_sat/unsati (n), with i =1–20, n = 5, 50, satisfiable/unsatisfiable). For a much more detailed description and the exact TBoxes we refer the reader to [6,17]. Results. We compare ALCQ2SMT+M ATH S AT against the other state-of-the-art reasoners R ACER, FACT++ and P ELLET. In Figures 2 and 3 we plot, as representative, the results in the most challenging test cases for every benchmark category. (More plots and all the detailed results can be found in [17].) We notice the following facts about ALCQ2SMT+M ATH S AT S.P.: – in all tests, it performs uniformly much better than plain ALCQ2SMT+M ATH S AT; – with R ACER it is the best performer in the incr_lin and incr_exp (sat/unsat) categories (Fig. 2 rows 1,2), solving all problems in negligible time; – in the nested_restr_sat it is the worst performer, but in nested_restr_unsat it performs better than FACT++ and P ELLET (Fig. 2 row 3); 12 – with FACT++ it is the best performer in the restr_numi (5) and restr_ratioi (5) categories (Fig. 3 rows 1,2 left), solving all problems in negligible time; – it is the absolute best scorer in restr_numi (50) test set (Fig. 3 row 1 right); – in the var_restr_num category it performs worse than FACT++ and P ELLET, but better than R ACER13 (Fig. 3 row 2 right); – with R ACER it is the best performer in the sat_unsat category (Fig. 3 row 3 left), solving all problems in negligible time; – it is the worst performer on the backtracking problems (Fig. 3 row 3 right). Looking into the data we notice a few more facts. First, the size of the encoded problems of ALCQ2SMT –for both the basic and the S.P. variants– never exceed the 1GB-filesize limit, except for the nested_restr test cases with i ≥ 5 and i ≥ 7, respectively. (In fact, nested QNRs exponentially affect the size of our encoding.) In general, the encoded problems present a very low number of cost variables, which depends on the number of the QNRs in the input problem, and a possibly huge number of Boolean variables and clauses, which depend on the number of (proxy-) individuals introduced. Second, in the vast majority of the input problems the encoding required by ALCQ2SMT is negligible (≤ 10−2 s); with S.P. it is significant only with the nested_restr and var_restr_num benchmarks (still ≤ 4s for the hardest problems). Discussion. The performances of ALCQ2SMT+M ATH S AT S.P. wrt. other state-ofthe-art reasoners range from some cases where it is much less efficient (backtracking, nested_restr_sat and var_restr_num) up to problems in which it significantly outperforms other tools (incr, sat_unsat and restr_num). Notice that we have specifically designed the nested_restr and var_restr_num problems in order to enforce the exponentiality of the encoding and to maximally inhibit smart partitioning, respectively. The backtracking problems, instead, have been designed in [6] to test the capability of performing dependency-directed backtracking [10]. Since the encoding is decoupled from the search, our approach cannot benefit of this optimization. 12 13 Notice that that P ELLET gave wrong “sat” results on nested_restr_unsat problems. R ACER’s implementation of the algebraic approach [8] is best-case exponential wrt. the number of QNRs. 1000 500 1000 500 Racer FaCT++ Pellet ALCQ2SMT+MathSat ALCQ2SMT+MathSat S.P. 100 50 Racer FaCT++ Pellet ALCQ2SMT+MathSat ALCQ2SMT+MathSat S.P. 100 50 10 5 10 5 1 0.5 1 0.5 0.1 0.05 0.1 0.05 0.01 0.01 4 8 12 1000 500 16 20 Racer FaCT++ Pellet ALCQ2SMT+MathSat ALCQ2SMT+MathSat S.P. 100 50 4 8 12 1000 500 10 5 1 0.5 1 0.5 0.1 0.05 0.1 0.05 0.01 20 Racer FaCT++ Pellet ALCQ2SMT+MathSat ALCQ2SMT+MathSat S.P. 100 50 10 5 16 0.01 1 2 3 1000 500 4 5 6 2 3 1000 500 Racer FaCT++ Pellet ALCQ2SMT+MathSat ALCQ2SMT+MathSat S.P. 100 50 1 10 5 1 0.5 1 0.5 0.1 0.05 0.1 0.05 0.01 5 6 Racer FaCT++ Pellet ALCQ2SMT+MathSat ALCQ2SMT+MathSat S.P. 100 50 10 5 4 0.01 4 8 12 16 20 4 8 12 16 20 Fig. 2: Tools comparison. From left to right: sat, unsat problems; 1st row: incr_lin; 2nd row: incr_exp; 3rd row: nested_restr, n = 5. X axis: test case index; Y axis: CPU time (sec). 1000 500 1000 500 Racer FaCT++ Pellet ALCQ2SMT+MathSat ALCQ2SMT+MathSat S.P. 100 50 Racer FaCT++ Pellet ALCQ2SMT+MathSat ALCQ2SMT+MathSat S.P. 100 50 10 5 10 5 1 0.5 1 0.5 0.1 0.05 0.1 0.05 0.01 0.01 4 8 12 1000 500 16 20 Racer FaCT++ Pellet ALCQ2SMT+MathSat ALCQ2SMT+MathSat S.P. 100 50 20 40 1000 500 10 5 1 0.5 1 0.5 0.1 0.05 0.1 0.05 0.01 80 100 Racer FaCT++ Pellet ALCQ2SMT+MathSat ALCQ2SMT+MathSat S.P. 100 50 10 5 60 0.01 0 2 4 6 1000 500 8 10 12 14 Racer FaCT++ Pellet ALCQ2SMT+MathSat ALCQ2SMT+MathSat S.P. 100 50 20 1000 500 100 50 10 5 10 5 1 0.5 1 0.5 0.1 0.05 0.1 0.05 0.01 40 60 80 100 16 20 Racer FaCT++ Pellet ALCQ2SMT+MathSat ALCQ2SMT+MathSat S.P. 0.01 4 8 12 16 20 24 4 8 12 Fig. 3: Tools comparison. From let to right: 1st row: restr_numi (5), restr_numi (50); 2nd row: restr_ratioi (5), var_restr_numi (100). 3rd row: sat_unsati (10), backtrackingi (10). X axis: test case index; Y axis: CPU time (sec). ALCQ2SMT+M ATH S AT S.P. instead performs extremely well in those benchmarks presenting multiple/balanced sources of complexity. Moreover, the size of the encoding is not the main complexity issue for our approach, which is very effective also on large or really complex problems (e.g., M ATH S AT scales up to problems with more than 105 Boolean variables and clauses, and 104 cost variables, in the nested_restr benchmarks). Finally, smart partitioning is extremely effective, being able to drastically (and exponentially) reduce the size of the output SMT(C) problems, up to three orders of magnitude in the extreme rest_num and nested_restr cases wrt. basic ALCQ2SMT (it exponentially impacts also in the number of cost variables in case of nested QNRs). The empirical evaluation clearly confirms that partitioning makes our approach independent from the magnitude/offset of the values occurring in QNRs. References 1. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation, and Applications. 2003. 2. R. Bruttomesso, A. Cimatti, A. Franzén, A. Griggio, and R. Sebastiani. The MathSAT 4 SMT Solver. In Proc. of the CAV/08, LNCS. Springer, 2008. 3. A. Cimatti, A. Franzén, A. Griggio, R. Sebastiani, and C. Stenico. Satisfiability Modulo the Theory of Costs: Foundations and Applications. In TACAS, LNCS. Springer, 2010. 4. J. Faddoul and V. Haarslev. Algebraic Tableau Reasoning for the Description Logic SHOQ. Journal of Applied Logic, Special Issue on Hybrid Logics, 8(4), 2010. 5. J. Faddoul and V. Haarslev. Optimizing Algebraic Tableau Reasoning for SHOQ: First Experimental Results. In Proc. of DL, Waterloo, Canada, May 4-7, 2010. 6. N. Farsiniamarj and V. Haarslev. Practical reasoning with qualified number restrictions: A hybrid abox calculus for the description logic SHQ. J. AI Communications, 23(2-3), 2010. 7. V. Haarslev and R. Möller. RACER System Description. In Proc. of IJCAR’01, LNAI. Springer, 2001. 8. V. Haarslev, M. Timmann, and R. Möller. Combining Tableaux and Algebraic Methods for Reasoning with Qualified Number Restrictions. In Proc. of DL’2001, 2001. 9. B. Hollunder and F. Baader. Qualifying Number Restrictions in Concept Languages. In Proc. of KR, Boston (USA), 1991. 10. I. Horrocks, U. Sattler, and S. Tobies. Practical reasoning for very expressive description logics. Logic Journal of the IGPL, 8(3), 2000. 11. C. Lutz, C. Areces, I. Horrocks, and U. Sattler. Keys, Nominals, and Concrete Domains. Journal of Artificial Intelligence Research, JAIR, 23(1), 2005. 12. R. Sebastiani. Lazy Satisfiability Modulo Theories. Journal on Satisfiability, Boolean Modeling and Computation, JSAT, 3, p. 141–224, 2007. 13. R. Sebastiani and M. Vescovi. Automated Reasoning in Modal and Description Logics via SAT Encoding: the Case Study of K(m)/ALC-satisfiability. JAIR, 35(1), 2009. 14. R. Sebastiani and M. Vescovi. Axiom Pinpointing in Lightweight Description Logics via Horn-SAT Encoding and Conflict Analysis. In Proc. of CADE-22, LNCS. Springer, 2009. 15. E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical OWL-DL reasoner. Journal of Web Semantics, 5(2), 2007. 16. D. Tsarkov and I. Horrocks. FaCT++ description logic reasoner: System description. In Proc. of IJCAR’06, LNAI. Springer, 2006. 17. M. Vescovi, R. Sebastiani, and V. Haarslev. Automated Reasoning on TBoxes with Qualified Number Restrictions via SMT. Tech.Rep. DISI-11-001, Università di Trento, April 2011. Available at http://disi.unitn.it/~rseba/cade11/techrep.pdf.