This document discusses techniques for efficient query-driven management of linked data quality. It presents research challenges around completeness of SPARQL queries and interrelations between data quality aspects. It then describes current results on characterizing query completeness over time using the guaranteed completeness date. It also discusses efficient techniques for completeness reasoning using predicate relevance principles and indexing approaches. The document concludes by discussing future work on completeness reasoning with concrete graphs and ensuring correctness of queries with negation.
1. Query-Driven Management of Linked Data Quality
Fariz Darari
Supervised by: Werner Nutt and Simon Razniewski
Fariz Darari (unibz) RSP Dec 3, 2015 1 / 33
3. Linked Data is everywhere, but how good is it?
Fariz Darari (unibz) RSP Dec 3, 2015 3 / 33
4. [Darari et al. ISWC 2013] Results
Fariz Darari (unibz) RSP Dec 3, 2015 4 / 33
5. Research Challenges
Completeness of SPARQL Queries
Interrelations between Data Quality Aspects
Data Completeness in Linked Data Streaming
Fariz Darari (unibz) RSP Dec 3, 2015 5 / 33
9. 1. Completeness with Time
We characterize when query completeness at a date d
can be guaranteed.
Lemma: Query Completeness at a Date
ˆC |= Compl(Q, d) iff ˜P ⊆ TˆC≥d
(˜P)
Guaranteed completeness date (gcd) is the latest date
when the query completeness can be guaranteed.
Theorem: Guaranteed Completeness Date
gcd(Q, ˆC) = max{ d ∈ date( ˆC) | ˜P ⊆ TˆC≥d
(˜P) }
Fariz Darari (unibz) RSP Dec 3, 2015 9 / 33
10. 2. Efficient Techniques for Completeness Reasoning
Problem: Reasoning with 1 mio statements is super slow!
Fariz Darari (unibz) RSP Dec 3, 2015 10 / 33
11. 2. Efficient Techniques for Completeness Reasoning
Problem: Reasoning with 1 mio statements is super slow!
Why? Because we have to evaluate all the statements
over the query.
Fariz Darari (unibz) RSP Dec 3, 2015 10 / 33
12. 2. Efficient Techniques for Completeness Reasoning
Problem: Reasoning with 1 mio statements is super slow!
Why? Because we have to evaluate all the statements
over the query.
But not all the statements are relevant to the query!
Fariz Darari (unibz) RSP Dec 3, 2015 10 / 33
13. 2. Efficient Techniques for Completeness Reasoning
Solution: Predicate relevance principle
Statements are relevant iff their predicates are contained
in the query predicates.
Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
14. 2. Efficient Techniques for Completeness Reasoning
Solution: Predicate relevance principle
Statements are relevant iff their predicates are contained
in the query predicates. For instance:
Compl((?x, child, ?y)) is not relevant for the query
({ ?y }, (italy, president, ?y)), while Compl((?x, president, ?y)) is
Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
15. 2. Efficient Techniques for Completeness Reasoning
Solution: Predicate relevance principle
Statements are relevant iff their predicates are contained
in the query predicates. For instance:
Compl((?x, child, ?y)) is not relevant for the query
({ ?y }, (italy, president, ?y)), while Compl((?x, president, ?y)) is
How can we retrieve those predicate-relevant statements?
Reduce to well-known problem, the subset querying
Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
16. 2. Efficient Techniques for Completeness Reasoning
Solution: Predicate relevance principle
Statements are relevant iff their predicates are contained
in the query predicates. For instance:
Compl((?x, child, ?y)) is not relevant for the query
({ ?y }, (italy, president, ?y)), while Compl((?x, president, ?y)) is
How can we retrieve those predicate-relevant statements?
Reduce to well-known problem, the subset querying
We compared three subset querying approaches:
standard hashing, tries, inverted indexes
Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
17. 2. Efficient Techniques for Completeness Reasoning
Solution: Predicate relevance principle
Statements are relevant iff their predicates are contained
in the query predicates. For instance:
Compl((?x, child, ?y)) is not relevant for the query
({ ?y }, (italy, president, ?y)), while Compl((?x, president, ?y)) is
How can we retrieve those predicate-relevant statements?
Reduce to well-known problem, the subset querying
We compared three subset querying approaches:
standard hashing, tries, inverted indexes
Then, we compared reasoning w/o indexing,
reasoning w/ indexing, and query evaluation
Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
18. 2. Efficient Techniques for Completeness Reasoning
Experiment results from randomly generated statements and queries
with 1 mio statements:
Fariz Darari (unibz) RSP Dec 3, 2015 12 / 33
21. 4. Completeness Reasoning with Concrete Graphs
Suppose that by C, Wikidata is complete for all Obama’s children,
all Sasha’s schools, and all Malia’s schools.
Give us schools of Obama’s children:
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. the completeness information,
can we guarantee the completeness of Q?
Fariz Darari (unibz) RSP Dec 3, 2015 15 / 33
22. 4. Completeness Reasoning with Concrete Graphs
Suppose that by C, Wikidata is complete for all Obama’s children,
all Sasha’s schools, and all Malia’s schools.
Give us schools of Obama’s children:
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. the completeness information,
can we guarantee the completeness of Q? No, not sure who are
Obama’s children.
Fariz Darari (unibz) RSP Dec 3, 2015 15 / 33
23. 4. Completeness Reasoning with Concrete Graphs
Graph of Wikidata G:
{(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)}
Suppose that by C, Wikidata is complete for all Obama’s children,
all Sasha’s schools, and all Malia’s schools.
Give us schools of Obama’s children:
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. Wikidata graph and its completeness information,
can we guarantee the completeness of Q?
Fariz Darari (unibz) RSP Dec 3, 2015 16 / 33
24. 4. Completeness Reasoning with Concrete Graphs
Graph of Wikidata G:
{(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)}
Suppose that by C, Wikidata is complete for all Obama’s children,
all Sasha’s schools, and all Malia’s schools.
Give us schools of Obama’s children:
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. Wikidata graph and its completeness information,
can we guarantee the completeness of Q? Yes, let’s see why.
Fariz Darari (unibz) RSP Dec 3, 2015 16 / 33
26. 4. Completeness Reasoning with Concrete Graphs
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. C and G, the query Q is equivalent to:
Q1 = ({sasha, ?s}, { (obama, child, sasha), (sasha, school, ?s) })
Q2 = ({malia, ?s}, { (obama, child, malia), (malia, school, ?s) })
Fariz Darari (unibz) RSP Dec 3, 2015 17 / 33
27. 4. Completeness Reasoning with Concrete Graphs
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. C and G, the query Q is equivalent to:
Q1 = ({sasha, ?s}, { (obama, child, sasha), (sasha, school, ?s) })
Q2 = ({malia, ?s}, { (obama, child, malia), (malia, school, ?s) })
Which again wrt. C and G, are equivalent to:
Q3 = (W3, P3) =
({sasha, unibz}, { (obama, child, sasha), (sasha, school, unibz) })
Fariz Darari (unibz) RSP Dec 3, 2015 17 / 33
28. 4. Completeness Reasoning with Concrete Graphs
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. C and G, the query Q is equivalent to:
Q1 = ({sasha, ?s}, { (obama, child, sasha), (sasha, school, ?s) })
Q2 = ({malia, ?s}, { (obama, child, malia), (malia, school, ?s) })
Which again wrt. C and G, are equivalent to:
Q3 = (W3, P3) =
({sasha, unibz}, { (obama, child, sasha), (sasha, school, unibz) })
However, it holds P3 ⊆ G, so we conclude: C, G |= Compl(Q)
Fariz Darari (unibz) RSP Dec 3, 2015 17 / 33
29. 4. Completeness Reasoning with Concrete Graphs
Next steps:
Polish the formalization
Build CORNER-WD, a completeness reasoner for Wikidata
Reasoning engine implementation is finished
UI layout is done, need to implement it
Build optimization techniques for reasoning with a large number of
realistic Wikidata completeness statements
Optimization techniques are 70% finished (latest results:
minutes for plain reasoning w/ 1 K CSs vs.
700 ms for optimized reasoning w/ 1 M CSs)
Still needs to generate testing completeness statements
And to perform experimental evaluations
Fariz Darari (unibz) RSP Dec 3, 2015 18 / 33
30. 5. Ensuring Correctness of Queries with Negation
Graph of Wikidata G:
{(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)}
Give us Obama’s children not going to school:
Q = ({?c}, { (obama, child, ?c), ¬(?c, school, ?s) })
Are we sure that { ?c → malia } ∈ Q G is a correct answer?
Fariz Darari (unibz) RSP Dec 3, 2015 19 / 33
31. 5. Ensuring Correctness of Queries with Negation
Graph of Wikidata G:
{(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)}
Give us Obama’s children not going to school:
Q = ({?c}, { (obama, child, ?c), ¬(?c, school, ?s) })
Are we sure that { ?c → malia } ∈ Q G is a correct answer?
Given that RDF follows the Open World Assumption (OWA), then no!
Fariz Darari (unibz) RSP Dec 3, 2015 19 / 33
32. 5. Ensuring Correctness of Queries with Negation
Graph of Wikidata G:
{(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)}
Suppose that by C, Wikidata is complete for all Obama’s children,
all Sasha’s schools, and all Malia’s schools.
Give us Obama’s children not going to school:
Q = ({?c}, { (obama, child, ?c), ¬(?c, school, ?s) })
Are we sure that { ?c → malia } ∈ Q G is a correct answer?
Fariz Darari (unibz) RSP Dec 3, 2015 20 / 33
33. 5. Ensuring Correctness of Queries with Negation
Graph of Wikidata G:
{(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)}
Suppose that by C, Wikidata is complete for all Obama’s children,
all Sasha’s schools, and all Malia’s schools.
Give us Obama’s children not going to school:
Q = ({?c}, { (obama, child, ?c), ¬(?c, school, ?s) })
Are we sure that { ?c → malia } ∈ Q G is a correct answer?
Given that the graph is complete for all Malia’s schools, which are
empty, then yes!
From the statements, we are sure that the negated part is really empty!
Fariz Darari (unibz) RSP Dec 3, 2015 20 / 33
34. 5. Ensuring Correctness of Queries with Negation
Next steps:
Polish the formalization
Implement a correctness reasoning system of queries with
negation
Conduct experimental evaluations with (again) realistic Wikidata
completeness statements
Fariz Darari (unibz) RSP Dec 3, 2015 21 / 33
35. 6. Annotating RDF Streams with Completeness
RDF streams: real-time, continuous, unbounded
Two stream modeling paradigms:
Triple-based
(+) Information is sent immediately
(-) Hard to give a complete view (i.e., how long is the window?)
vs.
Graph-based
(+) Complete view
(-) long delay before information is given
What about both?
Fariz Darari (unibz) RSP Dec 3, 2015 22 / 33
36. 6. Annotating RDF Streams with Completeness
We propose a session-based modeling. Let us see an example in
the wedding events domain.
Suppose the query asking for wedding events where Kim arrived →
Timeliness requirement: Result is given as soon as possible
Fariz Darari (unibz) RSP Dec 3, 2015 23 / 33
37. 6. Annotating RDF Streams with Completeness
We propose a session-based modeling. Let us see an example in the
wedding events domain.
Suppose the query asking for how many people attended a wedding
event → Completeness requirement: We need complete data of
interest for correct result.
Fariz Darari (unibz) RSP Dec 3, 2015 24 / 33
38. 6. Annotating RDF Streams with Completeness
Next steps:
Build a formal model for RDF streams in sessions
Formalize query answering over sessions
with completeness statements
Build a prototype of stream processing engine
using such formalization
Fariz Darari (unibz) RSP Dec 3, 2015 25 / 33
40. Publications
Fariz Darari, Radityo Eko Prasojo, Werner Nutt: CORNER: A
Completeness Reasoner for SPARQL Queries Over RDF Data Sources.
ESWC (Satellite Events) 2014.
Fariz Darari, Simon Razniewski, Werner Nutt: Bridging the Semantic
Gap between RDF and SPARQL using Completeness Statements.
ISWC (Posters & Demos) 2014.
Fariz Darari, Radityo Eko Prasojo and Werner Nutt: Expressing
No-Value Information in RDF. ISWC (Posters & Demos) 2015.
(Nominated for best poster)
Fariz Darari, Werner Nutt, Giuseppe Pirrò, Simon Razniewski:
Completeness Management for RDF Data Sources. (Submitted to a
journal) 2015.
Fariz Darari (unibz) RSP Dec 3, 2015 27 / 33
41. Publications Plan
Completeness Reasoning with Concrete Graphs and Its Application to
Wikidata. ICWE 2016.
Ensuring Correctness of Queries with Negation. ISWC 2016.
Session Management on RDF Streams with Completeness Statements.
SEMANTiCS 2016.
Fariz Darari (unibz) RSP Dec 3, 2015 28 / 33
42. Selected Activities
SSSW 2015 Summer School
External reviewer of COLD 2015
Poster presenter at ISWC 2015
Invited seminar at the Data Semantics Lab (led by: Pascal Hitzler)
in 2015
(Confirmed Plan) Research visit in Dresden in 2016 with
Sebastian Rudolph
Fariz Darari (unibz) RSP Dec 3, 2015 29 / 33
43. Thesis Outline
1 Introduction
2 Motivation
3 Formal Framework
1 RDF and SPARQL
2 Completeness Statements
3 Query Completeness
4 Standard Completeness Entailment Framework
4 Completeness Reasoning with Concrete Graphs and Partial
Query Completeness
1 Completeness Reasoning with Concrete Graphs
2 Completeness Reasoning with Partial Query Completeness
3 Combining Concrete Graphs and Partial Query Completeness in
Completeness Reasoning
Fariz Darari (unibz) RSP Dec 3, 2015 30 / 33
44. Thesis Outline
5 Completeness Reasoning with Time
1 Time-Extended Completeness Framework
2 Computing the Guaranteed Completeness Date
6 Efficient Implementation of Completeness Reasoner
1 Problem Overview
2 Filtering Based on Predicate Relevance
3 Experimental Evaluation
7 Reasoning about Query Emptiness
1 No-Value Statements
2 Query Emptiness
3 Emptiness Entailment
8 Ensuring Correctness of Queries with Negation
9 Completeness Statements as Session Annotations for RDF
Streams
10 Deploying Completeness Statements over Wikidata
11 Related Work
12 References
Fariz Darari (unibz) RSP Dec 3, 2015 31 / 33
45. Conclusions
Linked Data quality is as important as Linked Data itself
In particular, we related Linked Data quality
with SPARQL query answering
We focused on three aspects of data quality: completeness,
timeliness, correctness
Fariz Darari (unibz) RSP Dec 3, 2015 32 / 33
46. Thank you!
Complete for all the slides ;-)
Compl({(thisSlideset, hasSlide, ?s)})
Fariz Darari (unibz) RSP Dec 3, 2015 33 / 33