Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Link Search Menu Expand Document

Advanced Reasoning Features

This page discusses advanced reasoning features in Stardog.

Page Contents
  1. Same As Reasoning
  2. Special Reasoning Query Predicates
    1. Direct/Strict Subclasses, Subproperties, & Direct Types
  3. New individuals with user-defined rules
  4. Explaining Reasoning Results
    1. Proof Trees
    2. Examples

There are several advanced reasoning features in Stardog that provide additional expressivity and inference capabilities, as explained in the following sections.

The advanced reasoning features explained in this chapter are not supported by the new Stride reasoner. You need to disable the Stride reasoner by setting the reasoning.stride option to false in database configuration.

Same As Reasoning

Stardog has full support for OWL 2 sameAs reasoning. However, sameAs reasoning works in a different way than the rest of the reasoning mechanism. The sameAs inferences are computed and indexed eagerly so that these materialized inferences can be used directly at query rewriting time. The sameAs index is updated automatically as the database is modified, so the difference is not of much direct concern to users.

In order to use sameAs reasoning, the database configuration option reasoning.sameas should be set either at database creation time or at a later time when the database is offline. This can be done using the command line as follows:

$ stardog-admin db create -o reasoning.sameas=FULL -n myDB

There are three legal values for this option:

  1. OFF disables all sameAs inferences; that is, only asserted sameAs triples will be included in query results.
  2. ON computes sameAs inferences using only asserted sameAs triples, considering the reflexivity, symmetry, and transitivity of the sameAs relation.
  3. FULL is the same as ON but also considers OWL functional properties, inverse functional properties, and hasKey axioms while computing sameAs inferences.

The way sameAs reasoning works differs from the OWL semantics slightly, in the sense that Stardog designates one canonical individual for each sameAs equivalence set and only returns the canonical individual. This avoids the combinatorial explosion in query results while providing the data integration benefits.

Let’s see an example showing how sameAs reasoning works. Consider the following database where sameAs reasoning is set to ON:

dbpedia:Elvis_Presley
    dbpedia-owl:birthPlace dbpedia:Mississippi ;
    owl:sameAs freebase:en.elvis_presley .

nyt:presley_elvis_per
    nyt:associated_article_count 35 ;
    rdfs:label "Elvis Presley" ;
    owl:sameAs dbpedia:Elvis_Presley .
 
freebase:en.elvis_presley
	freebase:common.topic.official_website <http://www.elvis.com/> .

Now consider the following query and its results:

$ stardog query execute --reasoning elvis 'SELECT * { ?s dbpedia-owl:birthPlace ?o; rdfs:label "Elvis Presley" }'
+-----------------------+---------------------+
|           s           |          o          |
+-----------------------+---------------------+
| nyt:presley_elvis_per | dbpedia:Mississippi |
+-----------------------+---------------------+

Let’s unpack this carefully. There are three things to note.

First, the query returns only one result even though there are three different URIs that denote Elvis Presley. Second, the URI returned is fixed but chosen randomly. Stardog picks one of the URIs as the canonical URI and always returns that and only that canonical URI in the results. If more sameAs triples are added, the chosen canonical individual may change. Third, it is important to point out that even though only one URI is returned, the effect of sameAs reasoning is visible in the results since the rdfs:label and dbpedia-owl:birthPlace properties were asserted about different instances (i.e., different URIs).

Now, you might be inclined to write queries such as this to get all the properties for a specific URI:

SELECT * {
   nyt:presley_elvis_per owl:sameAs ?elvis .
   ?elvis ?p ?o
}

However, this is completely unnecessary; rather, you can write the following query:

SELECT * {
   nyt:presley_elvis_per ?p ?o
}

and get the same results since sameAs reasoning would automatically merge the results for you:

+----------------------------------------+-----------------------+
|                   p                    |           o           |
+----------------------------------------+-----------------------+
| rdfs:label                             | "Elvis Presley"       |
| dbpedia-owl:birthPlace                 | dbpedia:Mississippi   |
| nyt:associated_article_count           | 35                    |
| freebase:common.topic.official_website | http://www.elvis.com/ |
| rdf:type                               | owl:Thing             |
+----------------------------------------+-----------------------+

The URI used in the query does not need to be the same one returned in the results. Thus, the following query would return the exact same results, too:

SELECT * {
   dbpedia:Elvis_Presley ?p ?o
}

The only time Stardog will return a non-canonical URI in the query results is when you explicitly query for the sameAs inferences, as in this next example:

$ stardog query execute -r elvis 'SELECT * { freebase:en.elvis_presley owl:sameAs ?elvis }'
+---------------------------+
|           elvis           |
+---------------------------+
| dbpedia:Elvis_Presley     |
| freebase:en.elvis_presley |
| nyt:presley_elvis_per     |
+---------------------------+

In the FULL sameAs reasoning mode, Stardog will also take other OWL axioms into account when computing sameAs inferences. Consider the following example:

#Everyone has a unique SSN number
:hasSSN a owl:InverseFunctionalProperty , owl:DatatypeProperty .

:JohnDoe :hasSSN "123-45-6789" .
:JDoe :hasSSN "123-45-6789" .

#Nobody can work for more than one company (for the sake of the example)
:worksFor a owl:FunctionalProperty , owl:ObjectProperty ;
	rdfs:domain :Employee ;
	rdfs:range :Company .

:JohnDoe :worksFor :Acme .
:JDoe :worksFor :AcmeInc .

#For each company, there can only be one employee with the same employee ID
:Employee owl:hasKey (:employeeID :worksFor ).

:JohnDoe :employeeID "1234-ABC" .

:JohnD :employeeID "1234-ABC" ;
       :worksFor :AcmeInc .
       
:JD :employeeID "5678-XYZ" ;
    :worksFor :AcmeInc .
         
:John :employeeID "1234-ABC" ;
      :worksFor :Emca .

For this database, with sameAs reasoning set to FULL, we would get the following answers:

$ stardog query execute -r acme "SELECT * {?x owl:sameAs ?y}"
+----------+----------+
|    x     |    y     |
+----------+----------+
| :JohnDoe | :JohnD   |
| :JDoe    | :JohnD   |
| :Acme    | :AcmeInc |
+----------+----------+

We can follow the chain of inferences to understand how these results were computed:

  1. :JohnDoe owl:sameAs :JohnD can be computed due to the fact that both have the same SSN numbers and the hasSSN property is inverse functional.
  2. We can infer :Acme owl:sameAs :AcmeInc since :JohnDoe can work for at most one company.
  3. :JohnDoe owl:sameAs :JohnD can be inferred using the owl:hasKey definition since both individuals are known to work for the same company and have the same employee ID.
  4. No more sameAs inferences can be computed due to the key definition, since other employees either have different IDs or work for other companies.

Special Reasoning Query Predicates

Stardog supports some built-in predicates with special meaning in order to make queries easier to read and write. These special predicates are primarily syntactic sugar for more complex structures.

Direct/Strict Subclasses, Subproperties, & Direct Types

Besides the standard RDF(S) predicates rdf:type, rdfs:subClassOf and rdfs:subPropertyOf, Stardog supports the following special built-in predicates:

  • sp:directType
  • sp:directSubClassOf
  • sp:strictSubClassOf
  • sp:directSubPropertyOf
  • sp:strictSubPropertyOf

Where the sp prefix binds to tag:stardog:api:property:. Stardog also recognizes the sesame:directType, sesame:directSubClassOf, and sesame:strictSubClassOf predicates, where the prefix sesame binds to http://www.openrdf.org/schema/sesame#.

We show what each of these predicates means by relating them to an equivalent triple pattern; that is, you can just write the predicate rather than the (more unwieldy) triple pattern.

#c1 is a subclass of c2 but not equivalent to c2

:c1 sp:strictSubClassOf :c2      =>       :c1 rdfs:subClassOf :c2 .
                                          FILTER NOT EXISTS {
                                             :c1 owl:equivalentClass :c2 .
                                          }

#c1 is a strict subclass of c2 and there is no c3 between c1 and c2 in
#the strict subclass hierarchy

:c1 sp:directSubClassOf :c2      =>       :c1 sp:strictSubClassOf :c2 .
                                          FILTER NOT EXISTS {
                                             :c1 sp:strictSubClassOf :c3 .
                                             :c3 sp:strictSubClassOf :c2 .
                                          }

#ind is an instance of c1 but not an instance of any strict subclass of c1

:ind sp:directType :c1           =>       :ind rdf:type :c1 .
                                          FILTER NOT EXISTS {
                                             :ind rdf:type :c2 .
                                             :c2 sp:strictSubClassOf :c1 .
                                          }

The predicates sp:directSubPropertyOf and sp:strictSubPropertyOf are defined analogously.

New individuals with user-defined rules

Stardog supports a special predicate that extends the expressivity of user-defined rules. According to SWRL, you can’t create new individuals (i.e., new instances of classes) in a SWRL rule.

Don’t get hung up by the tech jargon here. “Can’t create new individuals” just means you can’t have a rule that creates a new instance of some RDF or OWL class as a result of the rule firing.

This restriction is well-motivated; without it, you can easily create rules that do not terminate (i.e., never reach a fixed point). Stardog’s user-defined rules weaken this restriction in some crucial aspects, subject to the following restrictions, conditions, and warnings.

This special predicate is very powerful, and you can mess things up badly if you aren’t careful.

Despite the general restriction in SWRL, in Stardog we can create new individuals with a rule by using the function UUID(), as follows:

IF {
    ?p a :Parent .
    BIND (UUID() AS ?parent) .
}
THEN {
    ?parent a :Person .
}

This rule will create a random URI for each instance of the class :Parent and also assert that each new instance is an instance of :Person – parents are people, too!

There are a couple important things to note about this feature:

  1. The URIs for the generated individuals are meaningless, in the sense that they should not be used in further queries; that is, these URIs are not guaranteed by Stardog to be stable.

  2. Due to normalization, rules with more than one atom in the head are broken up into several rules.

    IF {
       ?person a :Person .
       BIND (UUID() AS ?parent) .
    }
    THEN {
       ?parent a :Parent ;
                a :Male .
    }
    

    will be normalized into two rules:

    IF {
       ?person a :Person .
       BIND (UUID() AS ?parent) .
    }
    THEN {
       ?parent a :Parent .
    }
    
    IF {
       ?person a :Person .
       BIND (UUID() AS ?parent) .
    }
    THEN {
       ?parent a :Male .
    }
    

    As a consequence, instead of stating that the new individual is both an instance of :Male and :Parent, we would create two different new individuals and assert that one is male and the other is a parent. If you need to assert various things about the new individual, we recommend the use of extra rules or axioms. In the previous example, we can introduce a new class (:Father) and add the following rule to our schema:

    IF {
       ?person a :Father .
    }
    THEN {
       ?person a :Parent ;
               a :Male .
    }
    

    And then modify the original rule accordingly:

    IF {
       ?person a :Person .
       BIND (UUID() AS ?parent) .
    }
    THEN {
       ?parent a :Father .
    }
    

Explaining Reasoning Results

Stardog can be used to check if the current database logically entails a set of triples; moreover, Stardog can explain why this is so. An explanation of an inference is the minimum set of statements explicitly stored in the database that, together with the schema and any valid inferences, logically justify the inference. Explanations are useful for understanding data, schema, and their interactions, especially when large number of statements interact with each other to infer new statements.

Explanations can be retrieved using the CLI reasoning explain command by providing an input file that contains the inferences to be explained:

$ stardog reasoning explain myDB inference_to_explain.ttl

The output is displayed in a concise syntax designed to be legible; but it can be rendered in any one of the supported RDF syntaxes, if desired. Explanations are also accessible through the HTTP API and programmatically. See the example in the stardog-examples Github repo for more details about retrieving explanations programmatically.

Proof Trees

Proof trees are a hierarchical presentation of multiple explanations (of inferences) to make data, schemas, and rules more intelligible. Proof trees provide an explanation for an inference or an inconsistency as a hierarchical structure. Nodes in the proof tree may represent an assertion in a Stardog database. Multiple assertion nodes are grouped under an inferred node.

Examples

For example, if we are explaining the inferred triple :Alice rdf:type :Employee, the root of the proof tree will show that inference:

INFERRED :Alice rdf:type :Employee

The children of an inferred node will provide more explanation for that inference:

INFERRED :Alice rdf:type :Employee
    ASSERTED :Manager rdfs:subClassOf :Employee
    INFERRED :Alice rdf:type :Manager

The fully expanded proof tree will show the asserted triples and axioms for every inference:

INFERRED :Alice rdf:type :Employee
    ASSERTED :Manager rdfs:subClassOf :Employee
    INFERRED :Alice rdf:type :Manager
        ASSERTED :Alice :supervises :Bob
        ASSERTED :supervises rdfs:domain :Manager

The CLI reasoning explain command prints the proof tree using indented text; but, using the Java API, it is easy to create a tree widget in a GUI to show the explanation tree, such that users can expand and collapse details in the explanation.

Another feature of proof trees is the ability to merge multiple explanations into a single proof tree with multiple branches when explanations have common statements. Consider the following example database:

#schema
:Manager rdfs:subClassOf :Employee
:ProjectManager rdfs:subClassOf :Manager
:ProjectManager owl:equivalentClass (:manages some :Project)
:supervises rdfs:domain :Manager
:ResearchProject rdfs:subClassOf :Project
:projectID rdfs:domain :Project

#instance data
:Alice :supervises :Bob
:Alice :manages :ProjectX
:ProjectX a :ResearchProject
:ProjectX :projectID "123-45-6789"

In this database, there are three different unique explanations for the inference :Alice rdf:type :Employee:

Explanation 1

:Manager rdfs:subClassOf :Employee
:ProjectManager rdfs:subClassOf :Manager
:supervises rdfs:domain :Manager
:Alice :supervises :Bob

Explanation 2

:Manager rdfs:subClassOf :Employee
:ProjectManager rdfs:subClassOf :Manager
:ProjectManager owl:equivalentClass (:manages some :Project)
:ResearchProject rdfs:subClassOf :Project
:Alice :manages :ProjectX
:ProjectX a :ResearchProject

Explanation 3

:Manager rdfs:subClassOf :Employee
:ProjectManager rdfs:subClassOf :Manager
:ProjectManager owl:equivalentClass (:manages some :Project)
:projectID rdfs:domain :Project
:Alice :manages :ProjectX
:ProjectX :projectID "123-45-6789"

All three explanations have some triples in common; but when explanations are retrieved separately, it is hard to see how these explanations are related. When explanations are merged, we get a single proof tree where alternatives for subtrees of the proof are shown inline. In indented text rendering, the merged tree for the above explanations would look as follows:

INFERRED :Alice a :Employee
   ASSERTED :Manager rdfs:subClassOf :Employee
   1.1) INFERRED :Alice a :Manager
      ASSERTED :supervises rdfs:domain :Manager
      ASSERTED :Alice :supervises :Bob
   1.2) INFERRED :Alice a :Manager
      ASSERTED :ProjectManager rdfs:subClassOf :Manager
      INFERRED :Alice a :ProjectManager
         ASSERTED :ProjectManager owl:equivalentClass (:manages some :Project)
         ASSERTED :Alice :manages :ProjectX
         2.1) INFERRED :ProjectX a :Project
            ASSERTED :projectID rdfs:domain :Project
            ASSERTED :ProjectX :projectID "123-45-6789"
         2.2) INFERRED :ProjectX a :Project
            ASSERTED :ResearchProject rdfs:subClassOf :Project
            ASSERTED :ProjectX a :ResearchProject

In the merged proof tree, alternatives for an explanation are shown with a number id. In the above tree, :Alice a :Manager is the first inference for which we have multiple explanations, so it gets the id 1. Then each alternative explanation gets an id appended to this (so explanations 1.1 and 1.2 are both alternative explanations for inference 1). We also have multiple explanations for inference :ProjectX a :Project, so its alternatives get ids 2.1 and 2.2.