Automated Reasoning in ALCQ via SMT
Volker Haarslev∗ , Roberto Sebastiani+ , and Michele Vescovi+
∗
CSE, Concordia University, Montreal, haarslev@cse.concordia.ca
+
DISI, Università di Trento, {rseba, vescovi}@disi.unitn.it
Abstract. Reasoning techniques for qualified number restrictions (QNRs) in Description Logics (DLs) have been investigated in the past but they mostly do not
make use of the arithmetic knowledge implied by QNRs. In this paper we propose and investigate a novel approach for concept satisfiability in acyclic ALCQ
ontologies. It is based on the idea of encoding an ALCQ ontology into a formula in Satisfiability Modulo the Theory of Costs (SMT(C)), which is a specific
and computationally much cheaper subcase of Linear Arithmetic under the Integers, and to exploit the power of modern SMT solvers to compute every conceptsatisfiability query on a given ontology. We implemented and tested our approach,
which includes a very effective individuals-partitioning technique, on a wide set
of synthesized benchmark formulas, comparing the approach with the main stateof-the-art DL reasoners available. Our empirical evaluation confirms the potential
of the approach.
1 Introduction
Description logics (DLs) form one of the major foundations of the semantic web and
its web ontology language (OWL). In fact, OWL 2, a recent W3C recommendation, is a
syntactic variant of a very expressive DL that supports reasoning with so-called qualified number restrictions (QNRs). A sound and complete calculus for reasoning with the
DL ALCQ that adds QNRs to the basic DL ALC was first proposed in [9]. For example,
this calculus decides the satisfiability of an ALCQ concept (≥5 s.C ⊓≥5 s.D⊓≤2 s.E)
by trying to find a model with fillers for the role s such that at least 5 fillers are instances
of C, at least 5 fillers are instances of D, and at most 2 fillers are instances of E. It satisfies the at-least restrictions by creating 10 fillers for S, 5 of which are instances of
C and 5 are instances of D. A concept choose rule non-deterministically assigns E or
¬E to these fillers. In case the at-most restriction (≤2 s.E) is violated a merge rule
non-deterministically merges pairs of fillers for s that are instances of E [9]. Searching
for a model in such an arithmetically uninformed way can become very inefficient especially when bigger numbers occur in QNRs or several QNRs interact. To the best of
our knowledge this calculus still serves as reference in most tableau-based OWL reasoners (e.g., Pellet [15], FaCT++ [16]) for implementing reasoning about QNRs. The
only exception is Racer [7] where conceptual QNR reasoning is based on an algebraic
approach [8] that integrates integer linear programming with DL tableau methods.
The work presented in this paper was inspired by two recent novel approaches,
combined with the progress in satisfiability modulo theory (SMT) solving techniques.
First, [13,14] explored the idea of performing automated reasoning tasks in DLs by
⊥I = ∅, ⊤I = ∆I , (¬C)I = ∆I \ C I , (C ⊓ D)I = C I ∩ DI , (C ⊔ D)I = C I ∪ DI ,
(∃r.C)I = {x ∈ ∆I | there exists y ∈ ∆I s.t. (x, y) ∈ rI and y ∈ C I },
(∀r.C)I = {x ∈ ∆I | for all y ∈ ∆I s.t. (x, y) ∈ rI then y ∈ C I },
(≥nr.C)I = {x ∈ ∆I | |F IL(r, x) ∩ C I | ≥ n},
(≤mr.C)I = {x ∈ ∆I | |F IL(r, x) ∩ C I | ≤ m}, C ⊑ D is satisfied iff C I ⊆ DI
Fig. 1: Syntax and semantics of ALCQ (n ≥ 1 and m ≥ 0).
encoding problems into Boolean formulas and by exploiting the power of modern SAT
techniques. In particular, the experiments in [13] showed that, in practice and despite
the theoretical worst-case complexity limits, this approach could handle most or all
the ALC satisfiablity problems which also the other approaches could handle, with
performances which were comparable with, and often better than, those of state-of-theart tools. Second, a revised and extended algebraic approach was presented for SHQ
[6] and SHOQ [4]. These approaches represent knowledge about interacting QNRs as
systems of linear inequations where numerical variables represent cardinalities of sets
of domain elements (e.g., role fillers) divided into mutually disjoint decompositions. On
a set of synthetic QNR benchmarks these algebraic approaches demonstrated a superior
performance for most test cases [6,5].
The main idea of this paper is thus to encode an ALCQ ontology into a formula in
Satisfiability Modulo the Theory of Costs (SMT(C)) [3], which is a specific and computationally much cheaper subcase of Linear Arithmetic under the Integers (LA(Z)), and
to exploit the power of modern SMT solvers to compute every concept-satisfiability
query on a given ontology. We have implemented and tested our approach (called
ALCQ2SM TC ) that includes a very effective individuals-partitioning technique on a
wide set of synthesized benchmark formulas and compared it with main state-of-the-art
OWL reasoners. Our empirical evaluation demonstrates the potential of our approach
and, compared with the tested OWL reasoners, demonstrates a significantly better performance in the case of benchmarks having multiple/balanced sources of complexity.
2 Background
2.1
The Description Logic ALCQ
The logic ALCQ extends the well-known logic ALC by adding qualified number restrictions (QNRs). In more details, the concept descriptions in ALCQ (namely Ĉ, D̂, . . .)
are inductively defined through the constructors listed in Figure 1, starting from the
non-empty and pair-wise disjoint sets of concept names NC (denoted by the letters
A, B, C, . . .) and role names NR (denoted by the letters r, s, . . .). It allows for negations, conjunctions/disjunctions, existential/universal restrictions and, indeed, QNRs.
An ALCQ TBox (or ontology) is a finite set of general concept inclusion (GCI) axioms
as defined in Figure 1.
Given a TBox T , we denote with BCT the set of the basic concepts for T , i.e. the
smallest set of concepts containing: (i) the top and the bottom concepts ⊤ and ⊥; (ii) all
the concepts of T in the form C and ¬C where C is a concept name in NC . We denote
the basic concepts in BCT with the letters C, D, . . . (thus, C may represent a concept
¬C ′ with C ′ ∈ BCT ), whilst we use Ĉ, D̂, . . . for complex concepts, i.e. Ĉ, D̂ 6∈ BCT .
Our approach is currently restricted to acyclic (or unfoldable) TBoxes. We call a TBox
T acyclic if there exist no cyclic dependencies between its concept names, i.e., named
concepts are neither defined directly or indirectly in terms of themselves through the
axioms in T .
Semantics. The semantics of ALCQ is defined in terms of interpretations. An interpretation I is a couple I = (∆I , ·I ), where ∆I is the domain (i.e. a non-empty set
of individuals), and ·I is the interpretation function which maps each concept name
(atomic concept) A ∈ NC to a set AI ⊆ ∆I and maps each role name (atomic role)
r to a binary relation rI ⊆ ∆I × ∆I . In Figure 1 the inductive extensions of ·I to
arbitrary concept descriptions are defined, where n and m are positive integer values
and F IL(r, x) is the set of the r-fillers of the individual x ∈ ∆I for the role r ∈ NR
and is defined as F IL(r, x) = {y ∈ ∆I |(x, y) ∈ rI }. An interpretation I is a model
of a given TBox T if and only if the conditions given in Figure 1 are respected for every
axiom in T ; when this is the case, the TBox T is said to be consistent. A concept Ĉ is
said to be satisfiable wrt. T if and only if there exists a model I of T with Ĉ I 6= ∅, i.e.
there exists an individual x ∈ ∆I as an instance of Ĉ, i.e. such that x ∈ Ĉ I .
Normal Form. We assume wlog. that all ALCQ concept descriptions are in negative
normal form (NNF), i.e. negation signs only occurs in front of concept names (see [17]
for details). Then, for the sake of an easier exposition, we restrict our attention to those
ALCQ TBoxes in which all axioms are in the following normal form:
C⊑D
⊓ i Ci ⊑ D
C ⊑ ⊓i D i
ℜr.C ⊑ D
C ⊑ ℜr.D
(1)
with ℜ ∈ {∀, ≥ n, ≤ m} s.t. n, m ≥ 1, and C, Ci , D, Di ∈ BCT . 1 Every given TBox
T can be turned into a normalized TBox T ′ (where all concept description in T ′ are
in NNF) that is a conservative extension of T by introducing new concept names. The
transformation of a TBox T into T ′ can be done in linear time, and the size of T ′ is
linear wrt. the size of T . We call every non-conjunctive and non-disjunctive concept
description occurring in the concept inclusions of T ′ a normal concept of a normalized
TBox T ′ ; we call NCT ′ the set of all the normal concepts of T ′ . For more details we
refer the reader to [17].
2.2
Satisfiablity Modulo Theory with Cost Functions
Satisfiability Modulo (the) Theory T , SMT(T ), is the problem of deciding the satisfiability of a (typically) ground formula under a background theory T . Most state-of-the
art SMT solvers are based on the lazy SMT schema: in a nutshell, a SAT solver is used to
search for a truth assignment µ to the atomic subformulas of the input ground formula
ϕ, s.t. µ tautologically entails ϕ and µ is found consistent in T by the T -solver. (We
refer the reader to, e.g., [12] for details and further references.)
1
In particular, we avoid redundant existential and at-most restrictions that are replaced by their
following equivalents: ∃r.C =⇒ ≥1r.C and ≤0r.C =⇒ ∀r.nnf (¬C).
The work in [3] addresses the problem of the satisfiability in some theory T of a
formula ϕ augmented with a set of cost functions {cost1 , ..., costN } s.t., for every i:
costi =
P Ni
j=1
if-then-else(Aij , cij , 0), lbi < costi ≤ ubi ,
(2)
Aij being Boolean atoms occurring
in ϕ, and Ni , lbi , ubi , cij being integer values ≥ 0.
P
(Intuitively, in (2) costi = j Aij cij s.t. Aij ∈ {0, 1}.) The problem can be encoded
into SMT(T ∪ LA(Z)). However, [3] remarked the inefficiency of such solution, which
does not fully exploit the fact that the values of costi derive deterministically from the
truth values of all the Aij ’s. They proposed instead a specific theory of costs C, which is
much simpler and computationally much cheaper than LA(Z), and developed a specific
very-fast T -solver for C. In a nutshell, C consists of: (i) a collection of integer variables
cost1 , . . . , costN , that we call cost variables, denoting the output of the cost functions
in (2); (ii) an interpreted predicate BC “bound cost” s.t. BC(costi , c) is true iff costi is
upper-bounded by the integer value c; (i.e., iff costi ≤ c); (iii) an interpreted predicate
IC “incur cost” s.t. IC(costi , cij , j) is true if the j-th element of sum (2) is cij , false if
it is 0. Thus, ϕ is satisfiable in T under the cost constraints (2) iff the formula
ϕ∧
VN
i=1 (BC(costi , ubi )
∧ ¬BC(costi , lbi ) ∧
VN i
j=1 (Aij
↔ IC(costi , cij , j))) (3)
is satisfiable in T ∪ C. A specific T -solver for C works simply by adding the value cij
[resp. 0] to the current minimum value of costi and 0 [resp. cij ] to its current maximum
when IC(costi , cij , j) (i.e. Aij ) is assigned to true [resp. false], and by checking if such
minimum [resp. maximum] value of costi is smaller or equal than ubi [resp. greater or
equal than lbi ]. We refer the reader to [3] for details and further references.
3 Concept Satisfiability via SMT with Costs
3.1
Encoding ALCQ into SMT(C)
The encoding we propose simulates the construction of an interpretation I by introducing new individuals, assigning individuals to the interpretations of concepts in T ,
and counting their occurrences in the interpretations. We represent uniquely individuals
in ∆I by means of labels σ, represented as non-empty sequences of positive integer
values and role names in NR . A label σ can be either the label 1 or in the form σ ′ .r.n,
with σ ′ another label, r ∈ NR and n ≥ 1. With a small abuse of notation, hereafter we
may say “the individual σ” meaning “the individual labeled by σ”. Moreover, we call
instantiated concept a pair hσ, Ci, s.t. σ ∈ ∆I and C is an ALCQ normal concept of T ,
representing the fact that the σ is an instance of C in the interpretation I, i.e. σ ∈ C I .
We define Ah , i an injective function which maps one instantiated concept hσ, Ci
s.t. C is not in the form ¬C ′ , into a Boolean variable Ahσ, Ci that we call concept variable. The so-called concept literal Lhσ, Ci , denotes ¬Ahσ, C ′ i if C is in the form ¬C ′ ,
Ahσ, Ci otherwise. The truth value of Lhσ, Ci states whether the instantiation relation
between σ and C [resp. ¬C] holds, i.e. if hσ, Ci [resp. hσ, ¬Ci ] is an existing instantiated concept in I. We conventionally assume that Ahσ, ⊥i is ⊥. Notice also that hσ, ⊤i
means σ ∈ ∆I , i.e. that if Ahσ, ⊤i is assigned to true then σ exists in ∆I . We informally say that σ (meaning hσ, ⊤i) or hσ, Ci is “enabled” when the respective literal is
assigned to true.
We define indiv a function which maps one instantiated concept hσ, ℜr.Ci, such that
ℜ ∈ {≥ n, ≤ m} and C is a basic concept (since we are considering concepts in normal
form), into a cost variable indivC
σ.r in the Theory of Costs, that we call individuals
cost variable. Notice that indiv is not injective since the same cost variable indivC
σ.r
is “shared” among all the instantiated concepts which refer both to the same σ and to
QNRs involving the same r and C. However, notice also that hσ, ℜr.Ci and hσ, ℜr.¬Ci
are mapped to different cost variables. The final value of the individuals cost variable
indivC
σ.r represents the number of individuals which are in relation with the individual σ
via the role r and are in the interpretation of C, in other words the final value of indivC
σ.r
exactly represents the cardinality of F IL(r, σ) ∩ C I .
Our encoding works by means of the following principles:
– GCIs are represented via Boolean implications between instantiated concepts.
– Every at-least restriction hσ, ≥nr.Ci is handled by introducing exactly n individuals σ.r.i associated to C. The existence of individuals is forced by binding each
of them to an incur cost of value 1 for indivC
σ.r , and then fixing a lower-bound for
indivC
σ.r .
– When both at-least and at-most restrictions coexist wrt. σ, the encoding allows for
sharing individuals separately introduced by distinct at-least restrictions. At-most
restrictions are handled by fixing upper-bounds for the respective cost variables.
– It mimics the construction of a labeled tableaux with the difference of the above
exposed sharing of individuals which generalizes the merging of pairs of fillers to
satisfy at-most QNR.
Definition 1 (ALCQ2SM TC (T ) encoding). Let T be an acyclic ALCQ TBox in normal form. Wlog., we represent every axiom Ĉ ⊑ D̂ of T as ⊓i Ĉi ⊑ ⊔j D̂j where
i, j ≥ 1 and i = 1 (resp. j = 1), with Ĉ1 (resp. D̂1 ) a normal concept, for every
normal form (1) except for the second (resp. the third) one. The SMT(C) encoding
T
T
ALCQ2SM TC (T ) for T is defined as the sextuple hΣ T , I−
, I+
, Ah , i , indiv, ϕT i,
where:
– Σ T is the set of all the possible individuals introduced;
T
T
represent respectively the set of the implicant (i.e. left-side) and implied
, I+
– I−
(i.e. right-side) instantiated concepts that must be encoded accordingly to their
side;
– Ah , i and indiv are the functions defined above;
– ϕT is a CNF formula on propositional- and C-literals encoding T into SMT(C).
We represent ϕT as the set of its clauses. 2
T
T
The sets Σ T , I−
, I+
and ϕT are incrementally defined as the minimum sets s.t.:
T
T
1. Initialization. 1 ∈ Σ T , h1, ⊤i ∈ I−
, h1, ⊤i ∈ I+
and (Ah1, ⊤i ) ∈ ϕT .
T
2. Axioms initialization. If Ĉ ⊑ D̂ ∈ T , then {h1, Ci i | Ĉ = ⊓i Ci } ⊆ I−
.
2
For better readability we often represent the clauses of ϕT as implications.
3. Axioms expansion. If σ ∈ Σ T , ⊓i Ci ⊑ ⊔j Dj ∈ T , {hσ, Ci i | Ĉ = ⊓i Ci } ⊆
T
T
I−
∪ I+
, then
T
{ hσ, Dj i | D̂ = ⊔j Dj } ⊆ I+
,
_
^
( Lhσ, Ci i ) → ( Lhσ, Dj i ) ∈ ϕT .
(4)
j
i
T
with ℜ′ ∈ {≥ n′ , ≤ m′ , ∀},
4. Handle left-side QNRs. If σ ∈ Σ T , hσ, ℜ′ .r.C ′ i ∈ I+
then
T
, ℜ ∈ {≥ n, ≤ m, ∀}.
{ hσ, ℜr.Ci | ℜr.C ⊑ D̂ ∈ T } ⊆ I−
T
5. At-least restrictions: introduce individuals. If σ ∈ Σ T , hσ, ≥nr.Ci ∈ I+
then
{ σ.r.kiC | i = 1, . . . , n } ⊂ Σ T ,
T
{ hσ.r.kiC , Ci | i = 1, . . . , n } ∪ { hσ.r.kiC , ⊤i | i = 1, . . . , n } ⊂ I−
,
T
C
{ IC(indivC
σ.r , 1, ki ) → Lhσ.r.kiC , Ci | i = 1, . . . , n } ⊂ ϕ ,
{
C
IC(indivC
σ.r , 1, ki )
T
→ Ahσ.r.kiC , ⊤i | i = 1, . . . , n } ⊂ ϕ ,
(5)
(6)
C
T
where k1C ≥ 1, ki+1
= kiC + 1 and kiC 6= kjD for every hσ, ≥n′ r.Di ∈ I+
with
′
C 6= D and i = 1, ..., n, j = 1, ..., n . We assume consecutive values for all the
σ.r.j. 3
T
6. At-least restrictions: fix lower bounds. If σ ∈ Σ T , hσ, ≥nr.Ci ∈ I+
, then
T
((Ahσ, ≥nr.Ci ∧ Ahσ, ⊤i ) → ¬BC(indivC
σ.r , n − 1)) ∈ ϕ ,
(7)
T
if σ ∈ Σ T , hσ, ≥nr.Ci ∈ I−
, then
T
((¬BC(indivC
σ.r , n − 1) ∧ Ahσ, ⊤i ) → Ahσ, ≥nr.Ci ) ∈ ϕ .
(8)
T
7. Coexisting at-least/at-most: sharing individuals. If σ ∈ Σ T , hσ, ≤mr.Ei ∈ I+
,
′
T
T
hσ, ≥nr.Ci ∈ I+ , hσ, ≥n r.Di ∈ I+ , with C 6= D, then
T
{ hσ.r.kiC , Di | i = 1, . . . , n } ∪ { hσ.r.kiD , Ci | i = 1, . . . , n′ } ⊂ I−
,
C
{ IC(indivD
σ.r , 1, ki ) → Lhσ.r.kiC , Di | i = 1, . . . , n } ∪
′
T
D
{ IC(indivC
σ.r , 1, ki ) → Lhσ.r.kiD , Ci | i = 1, . . . , n } ⊂ ϕ ,
{
C
IC(indivD
σ.r , 1, ki )
→ Ahσ.r.kiC , ⊤i | i = 1, . . . , n } ∪
{
D
IC(indivC
σ.r , 1, ki )
→ Ahσ.r.kiD , ⊤i | i = 1, . . . , n′ } ⊂ ϕT .
(9)
(10)
T
then
8. At-most restrictions: count individuals. If σ ∈ Σ T , hσ, ≤mr.Ci ∈ I+
T
{ hσ.r.j, Ci | σ.r.j ∈ Σ T } ⊂ I−
,
T
T
{ (Lhσ.r.j, Ci ∧ Ahσ.r.j, ⊤i ) → IC(indivC
σ.r , 1, j) | σ.r.j ∈ Σ } ⊂ ϕ .
3
T
Hence, either k1C = 1 or k1C = knD′ +1 for some hσ, ≥n′ r.Di ∈ I+
(11)
T
9. At-most restrictions: fix upper bounds. If σ ∈ Σ T , hσ, ≤mr.Ci ∈ I+
, then
T
((Ahσ, ≤mr.Ci ∧ Ahσ, ⊤i ) → BC(indivC
σ.r , m)) ∈ ϕ ,
(12)
T
if σ ∈ Σ T , hσ, ≤mr.Ci ∈ I−
, then
T
((BC(indivC
σ.r , m) ∧ Ahσ, ⊤i ) → Ahσ, ≤mr.Ci ) ∈ ϕ .
(13)
T
10. Universal restrictions. if σ ∈ Σ T , hσ, ∀r.Ci ∈ I+
, then
T
{ hσ.r.j, Ci | σ.r.j ∈ Σ T } ⊂ I−
{ ((Ahσ, ∀r.Ci ∧ Ahσ.r.j, ⊤i ) → Lhσ.r.j, Ci ) | σ.r.j ∈ Σ T } ⊂ ϕT ,
(14)
T
if σ ∈ Σ T , hσ, ∀r.Ci ∈ I−
, then
T
((BC(indiv¬C
σ.r , 0) ∧ Ahσ, ⊤i ) → Ahσ, ∀r.Ci ) ∈ ϕ .
(15)
Importantly, at the effect of the encoding, left-side at-most (and universal) restrictions behave as right-side at-least restrictions, and vice versa. Thus, for instance, the
def
T
T
instantiated concept hσ, ≤n − 1r.Ci ∈ I−
[resp. hσ, ∀r.¬Ci ∈ I−
, n = 1] must
T
be handled by the encoding as if it were the instantiated concept hσ, ≥nr.Ci ∈ I+
.
In order to simplify the exposition, in Definition 1 and afterwards, we generically refer to at-least/at-most restrictions (respectively to the instantiated concepts hσ, ≥nr.Ci/
hσ, ≤mr.Ci) meaning the right-side ones, but implicitly including left-side at-most (or
universal)/at-least restrictions, respectively. The interested reader can find in [17] the
complete ALCQ2SM TC encoding and some encoding examples.
The following facts concerning ALCQ2SM TC hold. (We refer the reader to [17]
for the formal proofs.)
Theorem 1. An ALCQ acyclic TBox T in normal form is consistent if and only if the
SMT(C)-formula ϕT of ALCQ2SM TC (T ) (Definition 1) is satisfiable.
Theorem 2. Given an ALCQ acyclic TBox T in normal form and the encoding
T
T
ALCQ2SM TC (T ) = hΣ T , I−
, I+
, Ah , i , indiv, ϕT i of Definition 1, every C ∈ BCT
is satisfiable wrt. T iff ϕT ∧ Lh1, Ci is satisfiable.
We remark on some facts about the encoding of Definition 1:
– Point 4. is necessary to force the encoding of axioms having on the left-hand side
restrictions wrt. the role r, when other restrictions wrt. r are involved. Such kind of
axioms can create cycles in TBoxes (we remark that our encoding ensures termination for acyclic TBoxes).
– In all the clauses of type (5), (6), (9), (10) and (5), (11), every IC-literal has cost
value 1 and the same index of the bound individual. This ensures that IC-literals
referring to distinct individuals/cost variables are represented by distinct atoms.
– Due to the theory C clauses (7) and (12), are those concretely ensuring the numerical satisfiability of both at-least and at-most restrictions. In order to be satisfied: (i)
a clause of type (7) forces some IC-literals to be assigned to true (thus (5), (9) work
in only one direction); (ii) a clause of type (12), instead, bounds the number of ICliterals that can be enabled (motivating the opposite direction of (11)). Clauses (8)
and (13) instead enforce the application of an axiom having a left-side QNRs if it
is numerically satisfied.
Notice that if, for the same σ, r and C, more than one restriction satisfies the conditions
of point 5. with different values of n (being n∗ the highest of these values), then only
exactly n∗ new individuals and n∗ clauses (5) and (6) are in ϕT . In contrast, one distinct
clause (7) is in ϕT for every different value of n (the same holds for the clauses (12) in
case of different values of m wrt. the same σ, r and C).
ALCQ with general TBoxes has the finite tree model property [11], thus every
satisfiable ALCQ concept is satisfiable in a finite interpretation (in this case of worstcase exponential size) which has the shape of a tree. Intuitively the individuals in Σ T
form a super-tree of all such models. Let N represent the sum of the values occurring
in the QNRs of T : a very coarse upper bound to the cardinality of Σ T is Θ(|T |N ), in
fact the number of nested restrictions is bounded by the number of axioms of T while
N bounds the number of branches in the tree for every nesting level. The size of ϕT
is, instead, bounded by Θ(|T |2N ) because for every individual and every concept of
T a fixed number of clauses can be introduced. In [17] we define a terminating queuebased algorithm building ALCQ2SM TC by means of expansion rules which mimic
Definition 1. Since we are restricted to acyclic TBoxes it is ensured that our encoding
algorithm terminates even without introducing blocking techniques [1], in particular,
the proposed algorithm is polynomial in the size of the SMT(C) formula produced.
4 Partitioning Individuals
One potential drawback of the basic ALCQ2SM TC is the high number of individuals
introduced, that is linear wrt. the values occurring in the at-least restrictions. This number can increase exponentially when nested restrictions must be encoded, significantly
impacting on the size and on the hardness of the resulting SMT(C) formula. However,
similarly to the hybrid approach of [6,4], we can cope with this problem by encoding
groups of individuals having identical properties (instead of using single ones) and by
using only one “proxy” individual as representative of the group. We aim at partitioning
the individuals introduced in Definition 1 on the basis of the following considerations: 4
– Individuals are naturally pre-partitioned in groups wrt. r and the predecessor σ.
4
Notice that here we present a different partitioning that avoids the a-priori exponential number
of partitions in [6,4] (wrt. the number of coexisting QNRs). In our case, we consider the whole
set of individuals necessary to trivially satisfy all the coexisting at-least restrictions, then, only
on the basis of the numbers involved in QNRs, we compute a partitioning of such a set, where
the target of our approach is to decide which partitions of individuals belong to a concept
interpretation.
– If, given σ, r, no at-most restriction exists, all the fillers σ.r.kiC referring to one
at-least restriction
P can be represented by one single proxy individual.
– Otherwise, the j nj distinct individuals introduced by some hσ, ≥nj r.Cj i can
still be partitioned, but the partitioning must allow for representing possible intersections between the CjI .
In the latter case not all possible cardinalities of the intersections must be considered.
Instead, it is sufficient to distinguish between the empty intersection and some “limit”
cases depending on the values occurring in the QNRs. To sum up, given σ, r, we can
compute a partitioning of the individuals referring to σ and r by taking into account the
values of the restrictions which concern σ and r.
Example 1. Suppose that it is necessary to encode the restrictions: hσ, ≥10r.Ci and
hσ, ≥1000r.Di. The basic ALCQ2SM TC encoding would introduce 1010 distinct individuals. Applying the idea explained above, instead, we could divide these 1010 individuals in, e.g., three partitions of respectively 10, 990 and again 10 individuals. If,
for example, also hσ, ≤1005r.⊤i must be encoded, then the last 10 individuals could be
further divided into two distinct partitions. This partitioning allows for representing the
cases in which 0, 5, 10, 15, 20, 990, 995, 1000, 1005 or 1010 of these individuals exist
in ∆I (being part or not of C I and/or DI ). Even if not exhaustive these combinations
are enough to represent the significant cases concerning satisfiability.
4.1
Smart Partitioning
In order to handle partitions of individuals we extend ALCQ2SM TC with cumulative
labels and proxy individuals. Given a normal/cumulative label σ ′ and a role r, a
cumulative label σ ′ .r.(i → j) represents a group of consecutive individuals by means
of the range of integer values i → j, with i ≤ j, thus it represents a set of individuals
whose cardinality is j − i + 1. When i = j we can both write σ ′ .r.(i → i) and
σ ′ .r.i. With a small abuse of notation, in the following we call proxy individual any
σ.r.(i → j), meaning both: (i) the cumulative label representing the set of individuals
σ.r.i, σ.r.i+1, . . . , σ.r.j and (ii) that σ.r.(i → j) can be one/any of these individuals
acting as proxy for all the other individuals of the set.
The idea is to compute a “smart” partitioning of the individuals to be encoded into
ALCQ2SM TC . With “smart” we mean a “safe but as small as possible” partitioning,
i.e. with “a small” number of partitions but “safely” preserving the semantics of the
problem, so that the cardinality of the computed partitions allow for representing every
relevant case wrt. satisfiability. We formally define our smart partitioning:
Definition 2. Let T being an acyclic ALCQ TBox in normal form. Given
ALCQ2SM TC (T ) (Definition 1), σ ∈ Σ T and r ∈ NR we define the arrays: 5
def
T
≥
}i
= { ni | hσ, ≥ni r.Ci i ∈ I+
Nσ.r
≤ def
={
Nσ.r
5
6
and
T
}j .
mj | hσ, ≤mj r.Dj i ∈ I+
≤
≥
] as many times
[resp. Nσ.r
With array we mean that equal ni [resp. mj ] values repeat in Nσ.r
as they occur in the involved QNRs.
≥
≤
From Nσ.r
and Nσ.r
, respectively, we define the integer values:
def
def
≥
≤
Nσ.r
= Σni ∈Nσ.r
and Nσ.r
= Σmj ∈Nσ.r
≥ ni
≤ mj .
def
≥
≤
≥
We define the set Pσ.r = Pσ.r
∪ Pσ.r
as the smart partitioning for the Nσ.r
individuals
of Σ T in the form σ.r.k, where:
def
≥
def
≤
≥
= { nS | S ∈ 2Nσ.r , nS = Σnk ∈S nk } and
Pσ.r
≤
= { mS | S ∈ 2Nσ.r , mS = Σmk ∈S mk }. 7
Pσ.r
Finally, we define pi ∈ Pσ.r the i-th sorted element of Pσ.r , so that pi < pi+1 . We have
≥
≤
in particular: p1 = 0 and p|Pσ.r | = max{Nσ.r
, Nσ.r
}.
≥
≤
As Pσ.r
, Pσ.r
, Pσ.r are sets, equal values are uniquely represented in them. Given
σ, r, and assuming to include in each partition consecutive individuals among
≥
σ.r.1, ..., σ.r.Nσ.r
, then Pσ.r is the set containing the indexes of the last individual
of all the partitions. Hence, partitions can be represented by the proxy individuals
σ.r.(pj−1 + 1 → pj ), with j > 1. For instance, notice that the partitionings shown in
Example 1 are computed in accordance with Definition 2. We remark that Definition 2
defines a safe 8 partitioning, in fact:
– it takes into account all the values in QNRs for σ, r;
– it considers all the possible sums of the values ni [resp. mj ] for all the at-least [resp.
at-most] restrictions, which allows for representing all the possible lower-bounds
[resp. upper-bounds] in case of disjoint concept interpretations;
≥
≤
– the union of Pσ.r
, Pσ.r
represents the combination of lower- and upper-bounds;
– by sorting all the possible sums and by using the distance between these values (a
partition ranges from pj−1 + 1 to pj ) as the cardinality of the partitions, it allows
for representing all the possible intersecting concept interpretations.
We remark that partitioning makes our approach independent from the magnitude/offset
of the values occurring in QNRs.
4.2
Exploit Smart Partitioning in ALCQ2SM TC
Using partitions and proxy individuals does not affect the ALCQ2SM TC encoding,
because the Theory of Costs allows for arbitrary incur costs. We can enhance Definition 1 by taking advantage of smart partitioning as follows. First we assume that the
T
T
sets Σ T , I−
, I−
and the functions Ah , i , indiv are defined consistently with the use of
proxy individuals. Second, assuming to compute the partitioning Pσ.r of Definition 2
for every σ, r, we modify ALCQ2SM TC as follows:
6
7
8
T
T
T
, while hσ, ≤ni −1r.Ci i ∈ I−
must be considered like hσ, ≤0r.¬Ci i ∈ I−
hσ, ∀r.Ci i ∈ I−
T
must be considered like hσ, ≥ni r.Ci i ∈ I+ and vice versa.
Being 2X the power set for the set/array X.
I.e. it preserves the semantics of the problem wrt. satisfiability.
– The n clauses of the types (5) and (6) at point 5. are replaced by the following:
T
{ IC(indivC
σ.r , costj , idxj ) → Lhσproxyj , Ci | pj ∈ Pσ.r , 0 < pj ≤ n } ⊂ ϕ ,
T
{ IC(indivC
σ.r , costj , idxj ) → Ahσproxyj , ⊤i | pj ∈ Pσ.r , 0 < pj ≤ n } ⊂ ϕ ,
costj = pj −p(j−1) , idxj = k1C +p(j−1) , σproxyj = σ.r.k1C +p(j−1) → k1C +pj −1.
– Clauses (9), (10) at point 7. are modified accordingly.
– The clauses (11) defined at point 8. must take into account the use of proxy individuals and of incur costs potentially bigger than 1. Hence they are replaced by:
T
{(Lhσ.r.(i → j), Ci ∧Ahσ.r.(i → j), ⊤i ) → IC(indivC
σ.r , j−i+1, i) | σ.r.(i → j) ∈ Σ }.
– Clauses (14) at point 10. are modified in the same way, handling proxy individuals.
We make the following observations:
– If, for σ, r, the conditions of point 7. of Definition 1 do not hold (e.g. no at-most restriction exists), then an even more efficient partitioning requires only the following
two clauses for every hσ, ≥nr.Ci:
T
C
IC(indivC
σ.r , n, k1 ) → Lhσ.r.(k1C→k1C+n−1), Ci , ∈ ϕ ,
C
T
IC(indivC
σ.r , n, k1 ) → Ahσ.r.(k1C→k1C+n−1), ⊤i ∈ ϕ .
– Otherwise, if the conditions of point 7. hold, then ϕT contains all the clauses:
{ IC(indivC
σ.r , pj −pj−1 , pj−1 +1) → Lhσ.r.(pj−1+1→ pj ), Ci | pj ∈ Pσ.r , j > 1} ∪
{ IC(indivC
σ.r , pj −pj−1 , pj−1 +1) → Ahσ.r.(pj−1+1→ pj ), ⊤i | pj ∈ Pσ.r , j > 1}
for every hσ, ≥nr.Ci, as consequence of point 5. and of the sharing of (proxy)
individuals performed at point 7.
An exponential-time algorithm computing the smart partitioning Pσ.r (Definition 2)
for every given individual σ and the role r is presented in [17]. Taken as input
≥
≤
the arrays Nσ.r
and Nσ.r
, the algorithm is shown to have worst-case complexity
≥
≤
max |Nσ.r
|,|Nσ.r
|
O(2
).
5 Empirical Evaluation
We have implemented the encoder ALCQ2SMT in C++; smart partitioning (§4) can
be enabled optionally (denoted with S.P. hereafter). In combination with ALCQ2SMT,
we have used M ATH S AT (v. 3.4.1) [2] that is the SMT-solver including the Theory of
Costs [3]. We have evaluated the effectiveness of our novel approach by performing
an empirical test session on about 700 synthesized9 and parameterized ALCQ-concept
9
Due to lack of space we refer the reader to Section 6.1 in [6] for a discussion on why real-world
ontologies are not yet available as suitable QNR benchmarks.
satisfiability problems adapted from [6], plus more. In order to compare with the available state-of-the-art reasoners we have executed the following tools on every test case:
FACT++ (v. v1.4.0) [16], P ELLET (v. 2.1.1) [15], and R ACER (RacerPro 1-9-0) [7].
All the results presented in this section have been obtained on a 64bit biprocessor
dual-core IntelXeon2.66GHz machine, with 16GB of RAM. We set a 1000 seconds
timeout for every concept satisfiability query. We also fixed a bound of 1GB of disk
space for the SMT(C) encoding output from ALCQ2SMT. When reporting the results
for one ALCQ2SMT+M ATH S AT configuration (either including S.P. or not), every
CPU time reported is the sum of both the ALCQ2SMT encoding and the M ATH S AT
solving time (both including the loading and parsing of the respective input problem). 10
Importantly, with all test problems, all tools under examination (including both the
variants of ALCQ2SMT+M ATH S AT) agreed on the expected un/satisfiability results
when terminating within the timeout, with the exception of P ELLET which incorrectly
returned “sat” on some nested_restr_unsat problems.
Test Description. For our empirical evaluation, we have made use of synthesized test
cases adapted from those in [6]. The benchmark problems of [6] focus on concept expressions containing only QNRs and define different sets of indexed problems, increasingly stressing on different sources of complexity at the increase of the index i. Since
the values occurring in QNRs are one of the sources of complexity which can strongly
affect the performance of reasoning for some tools, wrt. the original test cases of [6] we
further parameterized such values making them depend on a parameter n. Below we list
the sources of complexity of the reasoning in ALCQconsidered in [6] with the relative
test set names, the ranges of the indexes i and the values chosen for n in our empirical
evaluation: 11
1. the size of the values occurring in QNRs (test cases: incr_lin_sat/unsati with
i =1–100; incr_exp_sat/unsati with i =1–6, satisfiable/unsatisfiable);
2. the number of QNRs (test cases: restr_numi (n) with i =1–100, n = 1, 5, 50,
satisfiable);
3. effect of backtracking (number of disjoint concepts) (test cases:
backtrackingi (n) with i =1–20, n = 1, 2, 3, 10, unsatisfiable);
4. the ratio between the number of at-least restrictions and the number of at-most
restrictions (test cases: restr_ratioi (n) with i =0–14, n = 1, 5, satisfiable);
5. the satisfiability versus the unsatisfiability of the input concept expression (test
cases: sat_unsati (n) with i = 1, 2, 4, 6, . . . , 24, n = 1, 10, half-and-half).
For the sake of fairness of the comparison, we introduced two novel groups of problems
which we believe can stress the main limitations of our approach wrt. the competitors.
These groups stress two sources of complexity which were not considered in [6]:
6. the variability of the values occurring in QNRs, i.e. in every restriction occurs a
unique value (test cases: var_restr_numi (n) with i =1–100, n = 100, satisfiable);
10
11
To make the experiments reproducible, all the plots in full size, the tools, the problems, and the
results are available at http://disi.unitn.it/~rseba/cade11/tests.tar.gz.
The benchmark of [6] are defined for SHQ but we have adapted them to ALCQ by flattening
all the role hierarchies to the only role r. The value of n originally used in [6] is underlined.
7. the number of nested QNRs (test cases: nested_restr_sat/unsati (n), with
i =1–20, n = 5, 50, satisfiable/unsatisfiable).
For a much more detailed description and the exact TBoxes we refer the reader to [6,17].
Results. We compare ALCQ2SMT+M ATH S AT against the other state-of-the-art reasoners R ACER, FACT++ and P ELLET. In Figures 2 and 3 we plot, as representative,
the results in the most challenging test cases for every benchmark category. (More plots
and all the detailed results can be found in [17].) We notice the following facts about
ALCQ2SMT+M ATH S AT S.P.:
– in all tests, it performs uniformly much better than plain ALCQ2SMT+M ATH S AT;
– with R ACER it is the best performer in the incr_lin and incr_exp (sat/unsat)
categories (Fig. 2 rows 1,2), solving all problems in negligible time;
– in the nested_restr_sat it is the worst performer, but in nested_restr_unsat
it performs better than FACT++ and P ELLET (Fig. 2 row 3); 12
– with FACT++ it is the best performer in the restr_numi (5) and restr_ratioi (5)
categories (Fig. 3 rows 1,2 left), solving all problems in negligible time;
– it is the absolute best scorer in restr_numi (50) test set (Fig. 3 row 1 right);
– in the var_restr_num category it performs worse than FACT++ and P ELLET, but
better than R ACER13 (Fig. 3 row 2 right);
– with R ACER it is the best performer in the sat_unsat category (Fig. 3 row 3 left),
solving all problems in negligible time;
– it is the worst performer on the backtracking problems (Fig. 3 row 3 right).
Looking into the data we notice a few more facts. First, the size of the encoded problems
of ALCQ2SMT –for both the basic and the S.P. variants– never exceed the 1GB-filesize limit, except for the nested_restr test cases with i ≥ 5 and i ≥ 7, respectively.
(In fact, nested QNRs exponentially affect the size of our encoding.) In general, the
encoded problems present a very low number of cost variables, which depends on the
number of the QNRs in the input problem, and a possibly huge number of Boolean variables and clauses, which depend on the number of (proxy-) individuals introduced. Second, in the vast majority of the input problems the encoding required by ALCQ2SMT
is negligible (≤ 10−2 s); with S.P. it is significant only with the nested_restr and
var_restr_num benchmarks (still ≤ 4s for the hardest problems).
Discussion. The performances of ALCQ2SMT+M ATH S AT S.P. wrt. other state-ofthe-art reasoners range from some cases where it is much less efficient (backtracking,
nested_restr_sat and var_restr_num) up to problems in which it significantly outperforms other tools (incr, sat_unsat and restr_num). Notice that we have specifically designed the nested_restr and var_restr_num problems in order to enforce
the exponentiality of the encoding and to maximally inhibit smart partitioning, respectively. The backtracking problems, instead, have been designed in [6] to test the
capability of performing dependency-directed backtracking [10]. Since the encoding is
decoupled from the search, our approach cannot benefit of this optimization.
12
13
Notice that that P ELLET gave wrong “sat” results on nested_restr_unsat problems.
R ACER’s implementation of the algebraic approach [8] is best-case exponential wrt. the number of QNRs.
1000
500
1000
500
Racer
FaCT++
Pellet
ALCQ2SMT+MathSat
ALCQ2SMT+MathSat S.P.
100
50
Racer
FaCT++
Pellet
ALCQ2SMT+MathSat
ALCQ2SMT+MathSat S.P.
100
50
10
5
10
5
1
0.5
1
0.5
0.1
0.05
0.1
0.05
0.01
0.01
4
8
12
1000
500
16
20
Racer
FaCT++
Pellet
ALCQ2SMT+MathSat
ALCQ2SMT+MathSat S.P.
100
50
4
8
12
1000
500
10
5
1
0.5
1
0.5
0.1
0.05
0.1
0.05
0.01
20
Racer
FaCT++
Pellet
ALCQ2SMT+MathSat
ALCQ2SMT+MathSat S.P.
100
50
10
5
16
0.01
1
2
3
1000
500
4
5
6
2
3
1000
500
Racer
FaCT++
Pellet
ALCQ2SMT+MathSat
ALCQ2SMT+MathSat S.P.
100
50
1
10
5
1
0.5
1
0.5
0.1
0.05
0.1
0.05
0.01
5
6
Racer
FaCT++
Pellet
ALCQ2SMT+MathSat
ALCQ2SMT+MathSat S.P.
100
50
10
5
4
0.01
4
8
12
16
20
4
8
12
16
20
Fig. 2: Tools comparison. From left to right: sat, unsat problems; 1st row: incr_lin; 2nd row:
incr_exp; 3rd row: nested_restr, n = 5. X axis: test case index; Y axis: CPU time (sec).
1000
500
1000
500
Racer
FaCT++
Pellet
ALCQ2SMT+MathSat
ALCQ2SMT+MathSat S.P.
100
50
Racer
FaCT++
Pellet
ALCQ2SMT+MathSat
ALCQ2SMT+MathSat S.P.
100
50
10
5
10
5
1
0.5
1
0.5
0.1
0.05
0.1
0.05
0.01
0.01
4
8
12
1000
500
16
20
Racer
FaCT++
Pellet
ALCQ2SMT+MathSat
ALCQ2SMT+MathSat S.P.
100
50
20
40
1000
500
10
5
1
0.5
1
0.5
0.1
0.05
0.1
0.05
0.01
80
100
Racer
FaCT++
Pellet
ALCQ2SMT+MathSat
ALCQ2SMT+MathSat S.P.
100
50
10
5
60
0.01
0
2
4
6
1000
500
8
10
12
14
Racer
FaCT++
Pellet
ALCQ2SMT+MathSat
ALCQ2SMT+MathSat S.P.
100
50
20
1000
500
100
50
10
5
10
5
1
0.5
1
0.5
0.1
0.05
0.1
0.05
0.01
40
60
80
100
16
20
Racer
FaCT++
Pellet
ALCQ2SMT+MathSat
ALCQ2SMT+MathSat S.P.
0.01
4
8
12
16
20
24
4
8
12
Fig. 3: Tools comparison. From let to right: 1st row: restr_numi (5), restr_numi (50); 2nd row:
restr_ratioi (5), var_restr_numi (100). 3rd row: sat_unsati (10), backtrackingi (10).
X axis: test case index; Y axis: CPU time (sec).
ALCQ2SMT+M ATH S AT S.P. instead performs extremely well in those benchmarks presenting multiple/balanced sources of complexity. Moreover, the size of the
encoding is not the main complexity issue for our approach, which is very effective also
on large or really complex problems (e.g., M ATH S AT scales up to problems with more
than 105 Boolean variables and clauses, and 104 cost variables, in the nested_restr
benchmarks). Finally, smart partitioning is extremely effective, being able to drastically (and exponentially) reduce the size of the output SMT(C) problems, up to three
orders of magnitude in the extreme rest_num and nested_restr cases wrt. basic
ALCQ2SMT (it exponentially impacts also in the number of cost variables in case of
nested QNRs). The empirical evaluation clearly confirms that partitioning makes our
approach independent from the magnitude/offset of the values occurring in QNRs.
References
1. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The
Description Logic Handbook: Theory, Implementation, and Applications. 2003.
2. R. Bruttomesso, A. Cimatti, A. Franzén, A. Griggio, and R. Sebastiani. The MathSAT 4
SMT Solver. In Proc. of the CAV/08, LNCS. Springer, 2008.
3. A. Cimatti, A. Franzén, A. Griggio, R. Sebastiani, and C. Stenico. Satisfiability Modulo the
Theory of Costs: Foundations and Applications. In TACAS, LNCS. Springer, 2010.
4. J. Faddoul and V. Haarslev. Algebraic Tableau Reasoning for the Description Logic SHOQ.
Journal of Applied Logic, Special Issue on Hybrid Logics, 8(4), 2010.
5. J. Faddoul and V. Haarslev. Optimizing Algebraic Tableau Reasoning for SHOQ: First
Experimental Results. In Proc. of DL, Waterloo, Canada, May 4-7, 2010.
6. N. Farsiniamarj and V. Haarslev. Practical reasoning with qualified number restrictions: A
hybrid abox calculus for the description logic SHQ. J. AI Communications, 23(2-3), 2010.
7. V. Haarslev and R. Möller. RACER System Description. In Proc. of IJCAR’01, LNAI.
Springer, 2001.
8. V. Haarslev, M. Timmann, and R. Möller. Combining Tableaux and Algebraic Methods for
Reasoning with Qualified Number Restrictions. In Proc. of DL’2001, 2001.
9. B. Hollunder and F. Baader. Qualifying Number Restrictions in Concept Languages. In
Proc. of KR, Boston (USA), 1991.
10. I. Horrocks, U. Sattler, and S. Tobies. Practical reasoning for very expressive description
logics. Logic Journal of the IGPL, 8(3), 2000.
11. C. Lutz, C. Areces, I. Horrocks, and U. Sattler. Keys, Nominals, and Concrete Domains.
Journal of Artificial Intelligence Research, JAIR, 23(1), 2005.
12. R. Sebastiani. Lazy Satisfiability Modulo Theories. Journal on Satisfiability, Boolean Modeling and Computation, JSAT, 3, p. 141–224, 2007.
13. R. Sebastiani and M. Vescovi. Automated Reasoning in Modal and Description Logics via
SAT Encoding: the Case Study of K(m)/ALC-satisfiability. JAIR, 35(1), 2009.
14. R. Sebastiani and M. Vescovi. Axiom Pinpointing in Lightweight Description Logics via
Horn-SAT Encoding and Conflict Analysis. In Proc. of CADE-22, LNCS. Springer, 2009.
15. E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical OWL-DL
reasoner. Journal of Web Semantics, 5(2), 2007.
16. D. Tsarkov and I. Horrocks. FaCT++ description logic reasoner: System description. In
Proc. of IJCAR’06, LNAI. Springer, 2006.
17. M. Vescovi, R. Sebastiani, and V. Haarslev. Automated Reasoning on TBoxes with Qualified
Number Restrictions via SMT. Tech.Rep. DISI-11-001, Università di Trento, April 2011.
Available at http://disi.unitn.it/~rseba/cade11/techrep.pdf.