Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Query-Driven Management of Linked Data Quality
Fariz Darari
Supervised by: Werner Nutt and Simon Razniewski
Fariz Darari (unibz) RSP Dec 3, 2015 1 / 33
Background
Fariz Darari (unibz) RSP Dec 3, 2015 2 / 33
Linked Data is everywhere, but how good is it?
Fariz Darari (unibz) RSP Dec 3, 2015 3 / 33
[Darari et al. ISWC 2013] Results
Fariz Darari (unibz) RSP Dec 3, 2015 4 / 33
Research Challenges
Completeness of SPARQL Queries
Interrelations between Data Quality Aspects
Data Completeness in Linked Data Streaming
Fariz Darari (unibz) RSP Dec 3, 2015 5 / 33
Current Results
Fariz Darari (unibz) RSP Dec 3, 2015 6 / 33
1. Completeness with Time
Fariz Darari (unibz) RSP Dec 3, 2015 7 / 33
1. Completeness with Time
Fariz Darari (unibz) RSP Dec 3, 2015 8 / 33
1. Completeness with Time
We characterize when query completeness at a date d
can be guaranteed.
Lemma: Query Completeness at a Date
ˆC |= Compl(Q, d) iff ˜P ⊆ TˆC≥d
(˜P)
Guaranteed completeness date (gcd) is the latest date
when the query completeness can be guaranteed.
Theorem: Guaranteed Completeness Date
gcd(Q, ˆC) = max{ d ∈ date( ˆC) | ˜P ⊆ TˆC≥d
(˜P) }
Fariz Darari (unibz) RSP Dec 3, 2015 9 / 33
2. Efficient Techniques for Completeness Reasoning
Problem: Reasoning with 1 mio statements is super slow!
Fariz Darari (unibz) RSP Dec 3, 2015 10 / 33
2. Efficient Techniques for Completeness Reasoning
Problem: Reasoning with 1 mio statements is super slow!
Why? Because we have to evaluate all the statements
over the query.
Fariz Darari (unibz) RSP Dec 3, 2015 10 / 33
2. Efficient Techniques for Completeness Reasoning
Problem: Reasoning with 1 mio statements is super slow!
Why? Because we have to evaluate all the statements
over the query.
But not all the statements are relevant to the query!
Fariz Darari (unibz) RSP Dec 3, 2015 10 / 33
2. Efficient Techniques for Completeness Reasoning
Solution: Predicate relevance principle
Statements are relevant iff their predicates are contained
in the query predicates.
Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
2. Efficient Techniques for Completeness Reasoning
Solution: Predicate relevance principle
Statements are relevant iff their predicates are contained
in the query predicates. For instance:
Compl((?x, child, ?y)) is not relevant for the query
({ ?y }, (italy, president, ?y)), while Compl((?x, president, ?y)) is
Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
2. Efficient Techniques for Completeness Reasoning
Solution: Predicate relevance principle
Statements are relevant iff their predicates are contained
in the query predicates. For instance:
Compl((?x, child, ?y)) is not relevant for the query
({ ?y }, (italy, president, ?y)), while Compl((?x, president, ?y)) is
How can we retrieve those predicate-relevant statements?
Reduce to well-known problem, the subset querying
Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
2. Efficient Techniques for Completeness Reasoning
Solution: Predicate relevance principle
Statements are relevant iff their predicates are contained
in the query predicates. For instance:
Compl((?x, child, ?y)) is not relevant for the query
({ ?y }, (italy, president, ?y)), while Compl((?x, president, ?y)) is
How can we retrieve those predicate-relevant statements?
Reduce to well-known problem, the subset querying
We compared three subset querying approaches:
standard hashing, tries, inverted indexes
Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
2. Efficient Techniques for Completeness Reasoning
Solution: Predicate relevance principle
Statements are relevant iff their predicates are contained
in the query predicates. For instance:
Compl((?x, child, ?y)) is not relevant for the query
({ ?y }, (italy, president, ?y)), while Compl((?x, president, ?y)) is
How can we retrieve those predicate-relevant statements?
Reduce to well-known problem, the subset querying
We compared three subset querying approaches:
standard hashing, tries, inverted indexes
Then, we compared reasoning w/o indexing,
reasoning w/ indexing, and query evaluation
Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
2. Efficient Techniques for Completeness Reasoning
Experiment results from randomly generated statements and queries
with 1 mio statements:
Fariz Darari (unibz) RSP Dec 3, 2015 12 / 33
3. Expressing No-Value Information in RDF
Fariz Darari (unibz) RSP Dec 3, 2015 13 / 33
3. Expressing No-Value Information in RDF
Fariz Darari (unibz) RSP Dec 3, 2015 14 / 33
4. Completeness Reasoning with Concrete Graphs
Suppose that by C, Wikidata is complete for all Obama’s children,
all Sasha’s schools, and all Malia’s schools.
Give us schools of Obama’s children:
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. the completeness information,
can we guarantee the completeness of Q?
Fariz Darari (unibz) RSP Dec 3, 2015 15 / 33
4. Completeness Reasoning with Concrete Graphs
Suppose that by C, Wikidata is complete for all Obama’s children,
all Sasha’s schools, and all Malia’s schools.
Give us schools of Obama’s children:
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. the completeness information,
can we guarantee the completeness of Q? No, not sure who are
Obama’s children.
Fariz Darari (unibz) RSP Dec 3, 2015 15 / 33
4. Completeness Reasoning with Concrete Graphs
Graph of Wikidata G:
{(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)}
Suppose that by C, Wikidata is complete for all Obama’s children,
all Sasha’s schools, and all Malia’s schools.
Give us schools of Obama’s children:
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. Wikidata graph and its completeness information,
can we guarantee the completeness of Q?
Fariz Darari (unibz) RSP Dec 3, 2015 16 / 33
4. Completeness Reasoning with Concrete Graphs
Graph of Wikidata G:
{(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)}
Suppose that by C, Wikidata is complete for all Obama’s children,
all Sasha’s schools, and all Malia’s schools.
Give us schools of Obama’s children:
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. Wikidata graph and its completeness information,
can we guarantee the completeness of Q? Yes, let’s see why.
Fariz Darari (unibz) RSP Dec 3, 2015 16 / 33
4. Completeness Reasoning with Concrete Graphs
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Fariz Darari (unibz) RSP Dec 3, 2015 17 / 33
4. Completeness Reasoning with Concrete Graphs
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. C and G, the query Q is equivalent to:
Q1 = ({sasha, ?s}, { (obama, child, sasha), (sasha, school, ?s) })
Q2 = ({malia, ?s}, { (obama, child, malia), (malia, school, ?s) })
Fariz Darari (unibz) RSP Dec 3, 2015 17 / 33
4. Completeness Reasoning with Concrete Graphs
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. C and G, the query Q is equivalent to:
Q1 = ({sasha, ?s}, { (obama, child, sasha), (sasha, school, ?s) })
Q2 = ({malia, ?s}, { (obama, child, malia), (malia, school, ?s) })
Which again wrt. C and G, are equivalent to:
Q3 = (W3, P3) =
({sasha, unibz}, { (obama, child, sasha), (sasha, school, unibz) })
Fariz Darari (unibz) RSP Dec 3, 2015 17 / 33
4. Completeness Reasoning with Concrete Graphs
Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) })
Wrt. C and G, the query Q is equivalent to:
Q1 = ({sasha, ?s}, { (obama, child, sasha), (sasha, school, ?s) })
Q2 = ({malia, ?s}, { (obama, child, malia), (malia, school, ?s) })
Which again wrt. C and G, are equivalent to:
Q3 = (W3, P3) =
({sasha, unibz}, { (obama, child, sasha), (sasha, school, unibz) })
However, it holds P3 ⊆ G, so we conclude: C, G |= Compl(Q)
Fariz Darari (unibz) RSP Dec 3, 2015 17 / 33
4. Completeness Reasoning with Concrete Graphs
Next steps:
Polish the formalization
Build CORNER-WD, a completeness reasoner for Wikidata
Reasoning engine implementation is finished
UI layout is done, need to implement it
Build optimization techniques for reasoning with a large number of
realistic Wikidata completeness statements
Optimization techniques are 70% finished (latest results:
minutes for plain reasoning w/ 1 K CSs vs.
700 ms for optimized reasoning w/ 1 M CSs)
Still needs to generate testing completeness statements
And to perform experimental evaluations
Fariz Darari (unibz) RSP Dec 3, 2015 18 / 33
5. Ensuring Correctness of Queries with Negation
Graph of Wikidata G:
{(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)}
Give us Obama’s children not going to school:
Q = ({?c}, { (obama, child, ?c), ¬(?c, school, ?s) })
Are we sure that { ?c → malia } ∈ Q G is a correct answer?
Fariz Darari (unibz) RSP Dec 3, 2015 19 / 33
5. Ensuring Correctness of Queries with Negation
Graph of Wikidata G:
{(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)}
Give us Obama’s children not going to school:
Q = ({?c}, { (obama, child, ?c), ¬(?c, school, ?s) })
Are we sure that { ?c → malia } ∈ Q G is a correct answer?
Given that RDF follows the Open World Assumption (OWA), then no!
Fariz Darari (unibz) RSP Dec 3, 2015 19 / 33
5. Ensuring Correctness of Queries with Negation
Graph of Wikidata G:
{(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)}
Suppose that by C, Wikidata is complete for all Obama’s children,
all Sasha’s schools, and all Malia’s schools.
Give us Obama’s children not going to school:
Q = ({?c}, { (obama, child, ?c), ¬(?c, school, ?s) })
Are we sure that { ?c → malia } ∈ Q G is a correct answer?
Fariz Darari (unibz) RSP Dec 3, 2015 20 / 33
5. Ensuring Correctness of Queries with Negation
Graph of Wikidata G:
{(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)}
Suppose that by C, Wikidata is complete for all Obama’s children,
all Sasha’s schools, and all Malia’s schools.
Give us Obama’s children not going to school:
Q = ({?c}, { (obama, child, ?c), ¬(?c, school, ?s) })
Are we sure that { ?c → malia } ∈ Q G is a correct answer?
Given that the graph is complete for all Malia’s schools, which are
empty, then yes!
From the statements, we are sure that the negated part is really empty!
Fariz Darari (unibz) RSP Dec 3, 2015 20 / 33
5. Ensuring Correctness of Queries with Negation
Next steps:
Polish the formalization
Implement a correctness reasoning system of queries with
negation
Conduct experimental evaluations with (again) realistic Wikidata
completeness statements
Fariz Darari (unibz) RSP Dec 3, 2015 21 / 33
6. Annotating RDF Streams with Completeness
RDF streams: real-time, continuous, unbounded
Two stream modeling paradigms:
Triple-based
(+) Information is sent immediately
(-) Hard to give a complete view (i.e., how long is the window?)
vs.
Graph-based
(+) Complete view
(-) long delay before information is given
What about both?
Fariz Darari (unibz) RSP Dec 3, 2015 22 / 33
6. Annotating RDF Streams with Completeness
We propose a session-based modeling. Let us see an example in
the wedding events domain.
Suppose the query asking for wedding events where Kim arrived →
Timeliness requirement: Result is given as soon as possible
Fariz Darari (unibz) RSP Dec 3, 2015 23 / 33
6. Annotating RDF Streams with Completeness
We propose a session-based modeling. Let us see an example in the
wedding events domain.
Suppose the query asking for how many people attended a wedding
event → Completeness requirement: We need complete data of
interest for correct result.
Fariz Darari (unibz) RSP Dec 3, 2015 24 / 33
6. Annotating RDF Streams with Completeness
Next steps:
Build a formal model for RDF streams in sessions
Formalize query answering over sessions
with completeness statements
Build a prototype of stream processing engine
using such formalization
Fariz Darari (unibz) RSP Dec 3, 2015 25 / 33
Epilogue
Fariz Darari (unibz) RSP Dec 3, 2015 26 / 33
Publications
Fariz Darari, Radityo Eko Prasojo, Werner Nutt: CORNER: A
Completeness Reasoner for SPARQL Queries Over RDF Data Sources.
ESWC (Satellite Events) 2014.
Fariz Darari, Simon Razniewski, Werner Nutt: Bridging the Semantic
Gap between RDF and SPARQL using Completeness Statements.
ISWC (Posters & Demos) 2014.
Fariz Darari, Radityo Eko Prasojo and Werner Nutt: Expressing
No-Value Information in RDF. ISWC (Posters & Demos) 2015.
(Nominated for best poster)
Fariz Darari, Werner Nutt, Giuseppe Pirrò, Simon Razniewski:
Completeness Management for RDF Data Sources. (Submitted to a
journal) 2015.
Fariz Darari (unibz) RSP Dec 3, 2015 27 / 33
Publications Plan
Completeness Reasoning with Concrete Graphs and Its Application to
Wikidata. ICWE 2016.
Ensuring Correctness of Queries with Negation. ISWC 2016.
Session Management on RDF Streams with Completeness Statements.
SEMANTiCS 2016.
Fariz Darari (unibz) RSP Dec 3, 2015 28 / 33
Selected Activities
SSSW 2015 Summer School
External reviewer of COLD 2015
Poster presenter at ISWC 2015
Invited seminar at the Data Semantics Lab (led by: Pascal Hitzler)
in 2015
(Confirmed Plan) Research visit in Dresden in 2016 with
Sebastian Rudolph
Fariz Darari (unibz) RSP Dec 3, 2015 29 / 33
Thesis Outline
1 Introduction
2 Motivation
3 Formal Framework
1 RDF and SPARQL
2 Completeness Statements
3 Query Completeness
4 Standard Completeness Entailment Framework
4 Completeness Reasoning with Concrete Graphs and Partial
Query Completeness
1 Completeness Reasoning with Concrete Graphs
2 Completeness Reasoning with Partial Query Completeness
3 Combining Concrete Graphs and Partial Query Completeness in
Completeness Reasoning
Fariz Darari (unibz) RSP Dec 3, 2015 30 / 33
Thesis Outline
5 Completeness Reasoning with Time
1 Time-Extended Completeness Framework
2 Computing the Guaranteed Completeness Date
6 Efficient Implementation of Completeness Reasoner
1 Problem Overview
2 Filtering Based on Predicate Relevance
3 Experimental Evaluation
7 Reasoning about Query Emptiness
1 No-Value Statements
2 Query Emptiness
3 Emptiness Entailment
8 Ensuring Correctness of Queries with Negation
9 Completeness Statements as Session Annotations for RDF
Streams
10 Deploying Completeness Statements over Wikidata
11 Related Work
12 References
Fariz Darari (unibz) RSP Dec 3, 2015 31 / 33
Conclusions
Linked Data quality is as important as Linked Data itself
In particular, we related Linked Data quality
with SPARQL query answering
We focused on three aspects of data quality: completeness,
timeliness, correctness
Fariz Darari (unibz) RSP Dec 3, 2015 32 / 33
Thank you!
Complete for all the slides ;-)
Compl({(thisSlideset, hasSlide, ?s)})
Fariz Darari (unibz) RSP Dec 3, 2015 33 / 33

More Related Content

Research and Study Plan: Year II

  • 1. Query-Driven Management of Linked Data Quality Fariz Darari Supervised by: Werner Nutt and Simon Razniewski Fariz Darari (unibz) RSP Dec 3, 2015 1 / 33
  • 2. Background Fariz Darari (unibz) RSP Dec 3, 2015 2 / 33
  • 3. Linked Data is everywhere, but how good is it? Fariz Darari (unibz) RSP Dec 3, 2015 3 / 33
  • 4. [Darari et al. ISWC 2013] Results Fariz Darari (unibz) RSP Dec 3, 2015 4 / 33
  • 5. Research Challenges Completeness of SPARQL Queries Interrelations between Data Quality Aspects Data Completeness in Linked Data Streaming Fariz Darari (unibz) RSP Dec 3, 2015 5 / 33
  • 6. Current Results Fariz Darari (unibz) RSP Dec 3, 2015 6 / 33
  • 7. 1. Completeness with Time Fariz Darari (unibz) RSP Dec 3, 2015 7 / 33
  • 8. 1. Completeness with Time Fariz Darari (unibz) RSP Dec 3, 2015 8 / 33
  • 9. 1. Completeness with Time We characterize when query completeness at a date d can be guaranteed. Lemma: Query Completeness at a Date ˆC |= Compl(Q, d) iff ˜P ⊆ TˆC≥d (˜P) Guaranteed completeness date (gcd) is the latest date when the query completeness can be guaranteed. Theorem: Guaranteed Completeness Date gcd(Q, ˆC) = max{ d ∈ date( ˆC) | ˜P ⊆ TˆC≥d (˜P) } Fariz Darari (unibz) RSP Dec 3, 2015 9 / 33
  • 10. 2. Efficient Techniques for Completeness Reasoning Problem: Reasoning with 1 mio statements is super slow! Fariz Darari (unibz) RSP Dec 3, 2015 10 / 33
  • 11. 2. Efficient Techniques for Completeness Reasoning Problem: Reasoning with 1 mio statements is super slow! Why? Because we have to evaluate all the statements over the query. Fariz Darari (unibz) RSP Dec 3, 2015 10 / 33
  • 12. 2. Efficient Techniques for Completeness Reasoning Problem: Reasoning with 1 mio statements is super slow! Why? Because we have to evaluate all the statements over the query. But not all the statements are relevant to the query! Fariz Darari (unibz) RSP Dec 3, 2015 10 / 33
  • 13. 2. Efficient Techniques for Completeness Reasoning Solution: Predicate relevance principle Statements are relevant iff their predicates are contained in the query predicates. Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
  • 14. 2. Efficient Techniques for Completeness Reasoning Solution: Predicate relevance principle Statements are relevant iff their predicates are contained in the query predicates. For instance: Compl((?x, child, ?y)) is not relevant for the query ({ ?y }, (italy, president, ?y)), while Compl((?x, president, ?y)) is Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
  • 15. 2. Efficient Techniques for Completeness Reasoning Solution: Predicate relevance principle Statements are relevant iff their predicates are contained in the query predicates. For instance: Compl((?x, child, ?y)) is not relevant for the query ({ ?y }, (italy, president, ?y)), while Compl((?x, president, ?y)) is How can we retrieve those predicate-relevant statements? Reduce to well-known problem, the subset querying Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
  • 16. 2. Efficient Techniques for Completeness Reasoning Solution: Predicate relevance principle Statements are relevant iff their predicates are contained in the query predicates. For instance: Compl((?x, child, ?y)) is not relevant for the query ({ ?y }, (italy, president, ?y)), while Compl((?x, president, ?y)) is How can we retrieve those predicate-relevant statements? Reduce to well-known problem, the subset querying We compared three subset querying approaches: standard hashing, tries, inverted indexes Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
  • 17. 2. Efficient Techniques for Completeness Reasoning Solution: Predicate relevance principle Statements are relevant iff their predicates are contained in the query predicates. For instance: Compl((?x, child, ?y)) is not relevant for the query ({ ?y }, (italy, president, ?y)), while Compl((?x, president, ?y)) is How can we retrieve those predicate-relevant statements? Reduce to well-known problem, the subset querying We compared three subset querying approaches: standard hashing, tries, inverted indexes Then, we compared reasoning w/o indexing, reasoning w/ indexing, and query evaluation Fariz Darari (unibz) RSP Dec 3, 2015 11 / 33
  • 18. 2. Efficient Techniques for Completeness Reasoning Experiment results from randomly generated statements and queries with 1 mio statements: Fariz Darari (unibz) RSP Dec 3, 2015 12 / 33
  • 19. 3. Expressing No-Value Information in RDF Fariz Darari (unibz) RSP Dec 3, 2015 13 / 33
  • 20. 3. Expressing No-Value Information in RDF Fariz Darari (unibz) RSP Dec 3, 2015 14 / 33
  • 21. 4. Completeness Reasoning with Concrete Graphs Suppose that by C, Wikidata is complete for all Obama’s children, all Sasha’s schools, and all Malia’s schools. Give us schools of Obama’s children: Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) }) Wrt. the completeness information, can we guarantee the completeness of Q? Fariz Darari (unibz) RSP Dec 3, 2015 15 / 33
  • 22. 4. Completeness Reasoning with Concrete Graphs Suppose that by C, Wikidata is complete for all Obama’s children, all Sasha’s schools, and all Malia’s schools. Give us schools of Obama’s children: Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) }) Wrt. the completeness information, can we guarantee the completeness of Q? No, not sure who are Obama’s children. Fariz Darari (unibz) RSP Dec 3, 2015 15 / 33
  • 23. 4. Completeness Reasoning with Concrete Graphs Graph of Wikidata G: {(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)} Suppose that by C, Wikidata is complete for all Obama’s children, all Sasha’s schools, and all Malia’s schools. Give us schools of Obama’s children: Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) }) Wrt. Wikidata graph and its completeness information, can we guarantee the completeness of Q? Fariz Darari (unibz) RSP Dec 3, 2015 16 / 33
  • 24. 4. Completeness Reasoning with Concrete Graphs Graph of Wikidata G: {(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)} Suppose that by C, Wikidata is complete for all Obama’s children, all Sasha’s schools, and all Malia’s schools. Give us schools of Obama’s children: Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) }) Wrt. Wikidata graph and its completeness information, can we guarantee the completeness of Q? Yes, let’s see why. Fariz Darari (unibz) RSP Dec 3, 2015 16 / 33
  • 25. 4. Completeness Reasoning with Concrete Graphs Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) }) Fariz Darari (unibz) RSP Dec 3, 2015 17 / 33
  • 26. 4. Completeness Reasoning with Concrete Graphs Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) }) Wrt. C and G, the query Q is equivalent to: Q1 = ({sasha, ?s}, { (obama, child, sasha), (sasha, school, ?s) }) Q2 = ({malia, ?s}, { (obama, child, malia), (malia, school, ?s) }) Fariz Darari (unibz) RSP Dec 3, 2015 17 / 33
  • 27. 4. Completeness Reasoning with Concrete Graphs Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) }) Wrt. C and G, the query Q is equivalent to: Q1 = ({sasha, ?s}, { (obama, child, sasha), (sasha, school, ?s) }) Q2 = ({malia, ?s}, { (obama, child, malia), (malia, school, ?s) }) Which again wrt. C and G, are equivalent to: Q3 = (W3, P3) = ({sasha, unibz}, { (obama, child, sasha), (sasha, school, unibz) }) Fariz Darari (unibz) RSP Dec 3, 2015 17 / 33
  • 28. 4. Completeness Reasoning with Concrete Graphs Q = ({?c, ?s}, { (obama, child, ?c), (?c, school, ?s) }) Wrt. C and G, the query Q is equivalent to: Q1 = ({sasha, ?s}, { (obama, child, sasha), (sasha, school, ?s) }) Q2 = ({malia, ?s}, { (obama, child, malia), (malia, school, ?s) }) Which again wrt. C and G, are equivalent to: Q3 = (W3, P3) = ({sasha, unibz}, { (obama, child, sasha), (sasha, school, unibz) }) However, it holds P3 ⊆ G, so we conclude: C, G |= Compl(Q) Fariz Darari (unibz) RSP Dec 3, 2015 17 / 33
  • 29. 4. Completeness Reasoning with Concrete Graphs Next steps: Polish the formalization Build CORNER-WD, a completeness reasoner for Wikidata Reasoning engine implementation is finished UI layout is done, need to implement it Build optimization techniques for reasoning with a large number of realistic Wikidata completeness statements Optimization techniques are 70% finished (latest results: minutes for plain reasoning w/ 1 K CSs vs. 700 ms for optimized reasoning w/ 1 M CSs) Still needs to generate testing completeness statements And to perform experimental evaluations Fariz Darari (unibz) RSP Dec 3, 2015 18 / 33
  • 30. 5. Ensuring Correctness of Queries with Negation Graph of Wikidata G: {(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)} Give us Obama’s children not going to school: Q = ({?c}, { (obama, child, ?c), ¬(?c, school, ?s) }) Are we sure that { ?c → malia } ∈ Q G is a correct answer? Fariz Darari (unibz) RSP Dec 3, 2015 19 / 33
  • 31. 5. Ensuring Correctness of Queries with Negation Graph of Wikidata G: {(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)} Give us Obama’s children not going to school: Q = ({?c}, { (obama, child, ?c), ¬(?c, school, ?s) }) Are we sure that { ?c → malia } ∈ Q G is a correct answer? Given that RDF follows the Open World Assumption (OWA), then no! Fariz Darari (unibz) RSP Dec 3, 2015 19 / 33
  • 32. 5. Ensuring Correctness of Queries with Negation Graph of Wikidata G: {(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)} Suppose that by C, Wikidata is complete for all Obama’s children, all Sasha’s schools, and all Malia’s schools. Give us Obama’s children not going to school: Q = ({?c}, { (obama, child, ?c), ¬(?c, school, ?s) }) Are we sure that { ?c → malia } ∈ Q G is a correct answer? Fariz Darari (unibz) RSP Dec 3, 2015 20 / 33
  • 33. 5. Ensuring Correctness of Queries with Negation Graph of Wikidata G: {(obama, child, sasha), (obama, child, malia), (sasha, school, unibz)} Suppose that by C, Wikidata is complete for all Obama’s children, all Sasha’s schools, and all Malia’s schools. Give us Obama’s children not going to school: Q = ({?c}, { (obama, child, ?c), ¬(?c, school, ?s) }) Are we sure that { ?c → malia } ∈ Q G is a correct answer? Given that the graph is complete for all Malia’s schools, which are empty, then yes! From the statements, we are sure that the negated part is really empty! Fariz Darari (unibz) RSP Dec 3, 2015 20 / 33
  • 34. 5. Ensuring Correctness of Queries with Negation Next steps: Polish the formalization Implement a correctness reasoning system of queries with negation Conduct experimental evaluations with (again) realistic Wikidata completeness statements Fariz Darari (unibz) RSP Dec 3, 2015 21 / 33
  • 35. 6. Annotating RDF Streams with Completeness RDF streams: real-time, continuous, unbounded Two stream modeling paradigms: Triple-based (+) Information is sent immediately (-) Hard to give a complete view (i.e., how long is the window?) vs. Graph-based (+) Complete view (-) long delay before information is given What about both? Fariz Darari (unibz) RSP Dec 3, 2015 22 / 33
  • 36. 6. Annotating RDF Streams with Completeness We propose a session-based modeling. Let us see an example in the wedding events domain. Suppose the query asking for wedding events where Kim arrived → Timeliness requirement: Result is given as soon as possible Fariz Darari (unibz) RSP Dec 3, 2015 23 / 33
  • 37. 6. Annotating RDF Streams with Completeness We propose a session-based modeling. Let us see an example in the wedding events domain. Suppose the query asking for how many people attended a wedding event → Completeness requirement: We need complete data of interest for correct result. Fariz Darari (unibz) RSP Dec 3, 2015 24 / 33
  • 38. 6. Annotating RDF Streams with Completeness Next steps: Build a formal model for RDF streams in sessions Formalize query answering over sessions with completeness statements Build a prototype of stream processing engine using such formalization Fariz Darari (unibz) RSP Dec 3, 2015 25 / 33
  • 39. Epilogue Fariz Darari (unibz) RSP Dec 3, 2015 26 / 33
  • 40. Publications Fariz Darari, Radityo Eko Prasojo, Werner Nutt: CORNER: A Completeness Reasoner for SPARQL Queries Over RDF Data Sources. ESWC (Satellite Events) 2014. Fariz Darari, Simon Razniewski, Werner Nutt: Bridging the Semantic Gap between RDF and SPARQL using Completeness Statements. ISWC (Posters & Demos) 2014. Fariz Darari, Radityo Eko Prasojo and Werner Nutt: Expressing No-Value Information in RDF. ISWC (Posters & Demos) 2015. (Nominated for best poster) Fariz Darari, Werner Nutt, Giuseppe Pirrò, Simon Razniewski: Completeness Management for RDF Data Sources. (Submitted to a journal) 2015. Fariz Darari (unibz) RSP Dec 3, 2015 27 / 33
  • 41. Publications Plan Completeness Reasoning with Concrete Graphs and Its Application to Wikidata. ICWE 2016. Ensuring Correctness of Queries with Negation. ISWC 2016. Session Management on RDF Streams with Completeness Statements. SEMANTiCS 2016. Fariz Darari (unibz) RSP Dec 3, 2015 28 / 33
  • 42. Selected Activities SSSW 2015 Summer School External reviewer of COLD 2015 Poster presenter at ISWC 2015 Invited seminar at the Data Semantics Lab (led by: Pascal Hitzler) in 2015 (Confirmed Plan) Research visit in Dresden in 2016 with Sebastian Rudolph Fariz Darari (unibz) RSP Dec 3, 2015 29 / 33
  • 43. Thesis Outline 1 Introduction 2 Motivation 3 Formal Framework 1 RDF and SPARQL 2 Completeness Statements 3 Query Completeness 4 Standard Completeness Entailment Framework 4 Completeness Reasoning with Concrete Graphs and Partial Query Completeness 1 Completeness Reasoning with Concrete Graphs 2 Completeness Reasoning with Partial Query Completeness 3 Combining Concrete Graphs and Partial Query Completeness in Completeness Reasoning Fariz Darari (unibz) RSP Dec 3, 2015 30 / 33
  • 44. Thesis Outline 5 Completeness Reasoning with Time 1 Time-Extended Completeness Framework 2 Computing the Guaranteed Completeness Date 6 Efficient Implementation of Completeness Reasoner 1 Problem Overview 2 Filtering Based on Predicate Relevance 3 Experimental Evaluation 7 Reasoning about Query Emptiness 1 No-Value Statements 2 Query Emptiness 3 Emptiness Entailment 8 Ensuring Correctness of Queries with Negation 9 Completeness Statements as Session Annotations for RDF Streams 10 Deploying Completeness Statements over Wikidata 11 Related Work 12 References Fariz Darari (unibz) RSP Dec 3, 2015 31 / 33
  • 45. Conclusions Linked Data quality is as important as Linked Data itself In particular, we related Linked Data quality with SPARQL query answering We focused on three aspects of data quality: completeness, timeliness, correctness Fariz Darari (unibz) RSP Dec 3, 2015 32 / 33
  • 46. Thank you! Complete for all the slides ;-) Compl({(thisSlideset, hasSlide, ?s)}) Fariz Darari (unibz) RSP Dec 3, 2015 33 / 33