Data quality is a major issue in the development of knowledge graphs. Data completeness is a key factor in data quality that concerns the breadth, depth, and scope of information contained in knowledge graphs. As for large-scale knowledge graphs (e.g., DBpedia, Wikidata), it is conceivable that given the amount of information contained in there, they may be complete for a wide range of topics, such as children of Donald Trump, cantons of Switzerland, and presidents of Indonesia. Previous research has shown how one can augment knowledge graphs with statements about their completeness, stating which parts of data are complete. Such meta-information can be leveraged to check query completeness, that is, whether the answer returned by a query is complete. Yet, it is still unclear how such a check can be done in practice, especially when a large number of completeness statements are involved. We devise implementation techniques to make completeness reasoning in the presence of large sets of completeness statements feasible, and experimentally evaluate their effectiveness in realistic settings based on the characteristics of real-world knowledge graphs.
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Report
Share
1 of 43
More Related Content
Comparing Index Structures for Completeness Reasoning
1. Comparing Index Structures
for Completeness Reasoning
Fariz Darari*, Werner Nutt†, Simon Razniewski‡
*Universitas Indonesia, Indonesia
†Free University of Bozen-Bolzano, Italy
‡Max Planck Institute for Informatics, Germany
IWBIS 2018 - Jakarta, Indonesia
3. Imagine you are a Spielberg movie fan
who happens to be a knowledge graph fan too
4. Now you want to query to the knowledge graph:
Give movies written by Spielberg!
Movies written by Spielberg?
5. The knowledge graph gives some answers..
Movies written by Spielberg?
Answers
Poltergeist
ET
.....
6. The knowledge graph gives some answers..
Movies written by Spielberg?
Answers
Poltergeist
ET
.....
But are these answers complete?
7. The knowledge graph gives some answers..
Movies written by Spielberg?
Answers
Poltergeist
ET
.....
But are these answers complete? Maybe.
Maybe yes,
maybe no..
8. Imagine you are a Spielberg movie fan
who happens to be a knowledge graph fan too
9. Imagine you are a Spielberg movie fan
who happens to be a knowledge graph fan too*
Darari, et al. Completeness Statements about RDF Data Sources and Their Use for Query Answering. ISWC 2013.
*But now the knowledge graph has been augmented with completeness statements, as in:
This knowledge graph is complete
for all movies written by Spielberg
10. Now you want to query to the knowledge graph:
Give movies written by Spielberg!
Movies written by Spielberg?
This knowledge graph is complete
for all movies written by Spielberg
11. The knowledge graph gives some answers..
Movies written by Spielberg?
Answers
Poltergeist
ET
.....
But are these answers complete?
This knowledge graph is complete
for all movies written by Spielberg
12. The knowledge graph gives some answers..
Movies written by Spielberg?
Answers
Poltergeist
ET
.....
But are these answers complete? Yes!
This knowledge graph is complete
for all movies written by Spielberg
13. Completeness reasoning:
Movies written by Spielberg?
This knowledge graph is complete
for all movies written by Spielberg
Checking if completeness statements guarantee query completeness
Query: (?x, type, Movie), (?x, writtenBy, Spielberg)
Completeness statement:
(?x, type, Movie), (?x, writtenBy, Spielberg)
14. Completeness reasoning:
Movies written by Spielberg?
This knowledge graph is complete
for all movies written by Spielberg
Checking if completeness statements guarantee query completeness
Query: (?x, type, Movie), (?x, writtenBy, Spielberg)
Completeness statement:
(?x, type, Movie), (?x, writtenBy, Spielberg)
Can the statement(s) cover all query components?
In general, multiple completeness statements may be required
to cover all components of the query!
15. Completeness reasoning scalability challenge:
What if there were a million completeness statements?*
*We'd need this many because a large KG (= knowledge graph) naturally
would require many completeness statements!
Query
X 1 million!
16. X 1 million!
Query
Unoptimized completeness reasoning
takes minutes!
Completeness reasoning scalability challenge:
What if there were a million completeness statements?*
*We'd need this many because a large KG (= knowledge graph) naturally
would require many completeness statements!
17. X 1 million!
Query
Unoptimized completeness reasoning
takes minutes!
Completeness reasoning scalability challenge:
What if there were a million completeness statements?*
Can we do any better?
*We'd need this many because a large KG (= knowledge graph) naturally
would require many completeness statements!
18. We propose techniques
for scalable completeness reasoning
Our contributions:
1. Principle for filtering out irrelevant completeness statements
2. Indexing methods for completeness statements
3. Experimental evaluations of
naive completeness reasoning vs. indexed completeness reasoning
19. Constant-relevance principle
A statement contributes to query completeness only if
its constants are among the query’s
Movies written by Spielberg?
Query: (?x, type, Movie), (?x, writtenBy, Spielberg)
Completeness statements:
(?x, type, Movie), (?x, writtenBy, Spielberg)
(?x, type, Movie), (?x, writtenBy, Bob)
(?x, type, Song, (?x, writtenBy, Mary)
(Spielberg, child, ?y)
20. Constant-relevance principle
A statement contributes to query completeness only if
its constants are among the query’s
Movies written by Spielberg?
Query: (?x, type, Movie), (?x, writtenBy, Spielberg)
Completeness statements:
(?x, type, Movie), (?x, writtenBy, Spielberg)
(?x, type, Movie), (?x, writtenBy, Bob)
(?x, type, Song, (?x, writtenBy, Mary)
(Spielberg, child, ?y)
Constants:
type, Movie, writtenBy, Spielberg
type, Movie, writtenBy, Bob
type, Song, writtenBy, Mary
Spielberg, child
21. Constant-relevance principle
A statement contributes to query completeness only if
its constants are among the query’s
Movies written by Spielberg?
Query: (?x, type, Movie), (?x, writtenBy, Spielberg)
Completeness statements:
(?x, type, Movie), (?x, writtenBy, Spielberg)
(?x, type, Movie), (?x, writtenBy, Bob)
(?x, type, Song, (?x, writtenBy, Mary)
(Spielberg, child, ?y)
Constants: type, Movie, writtenBy, Spielberg
Constants:
type, Movie, writtenBy, Spielberg
type, Movie, writtenBy, Bob
type, Song, writtenBy, Mary
Spielberg, child
22. Constant-relevance principle
A statement contributes to query completeness only if
its constants are among the query’s
Movies written by Spielberg?
Query: (?x, type, Movie), (?x, writtenBy, Spielberg)
Completeness statements:
(?x, type, Movie), (?x, writtenBy, Spielberg)
(?x, type, Movie), (?x, writtenBy, Bob)
(?x, type, Song, (?x, writtenBy, Mary)
(Spielberg, child, ?y)
Constants: type, Movie, writtenBy, Spielberg
Constants:
type, Movie, writtenBy, Spielberg
type, Movie, writtenBy, Bob
type, Song, writtenBy, Mary
Spielberg, child
23. Constant-relevance principle: subset querying
A statement contributes to query completeness only if
its constants are among the query’s
Movies written by Spielberg?
Query: (?x, type, Movie), (?x, writtenBy, Spielberg)
Completeness statements:
(?x, type, Movie), (?x, writtenBy, Spielberg)
(?x, type, Movie), (?x, writtenBy, Bob)
(?x, type, Song, (?x, writtenBy, Mary)
(Spielberg, child, ?y)
Constants: type, Movie, writtenBy, Spielberg
Constants:
type, Movie, writtenBy, Spielberg
type, Movie, writtenBy, Bob
type, Song, writtenBy, Mary
Spielberg, child
Reduce the problem into:
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
24. Index structure for subset querying:
Standard hashing, inverted index, trie
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
Hash indexes map keys to values: Set-equality queries
How do we perform subset querying using hash indexes?
25. Index structure for subset querying:
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
Hash indexes map keys to values: Set-equality queries
How do we perform subset querying using hash indexes?
By enumerating all possible non-empty subsets of query constants!
Standard hashing, inverted index, trie
26. Index structure for subset querying:
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
const(C1) = a, b
const(C2) = a, b, c
const(C3) = a, b, c
const(C4) = d
const(Q) = a, b
Standard hashing, inverted index, trie
27. Index structure for subset querying:
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
const(C1) = a, b
const(C2) = a, b, c
const(C3) = a, b, c
const(C4) = d
const(Q) = a, b
M(a, b) = C1
M(a, b, c) = C2, C3
M(d) = C4
Standard hashing, inverted index, trie
28. Index structure for subset querying:
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
const(C1) = a, b
const(C2) = a, b, c
const(C3) = a, b, c
const(C4) = d
const(Q) = a, b
M(a, b) = C1
M(a, b, c) = C2, C3
M(d) = C4
Relevant completeness statements:
M(a) U M(b) U M(a, b) = C1
Standard hashing, inverted index, trie
29. Index structure for subset querying:
Standard hashing, inverted index, trie
const(C1) = a, b
const(C2) = a, b, c
const(C3) = a, b, c
const(C4) = d
const(Q) = a, b
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
30. Index structure for subset querying:
const(C1) = a, b
const(C2) = a, b, c
const(C3) = a, b, c
const(C4) = d
const(Q) = a, b
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
Standard hashing, inverted index, trie
31. Index structure for subset querying:
const(C1) = a, b
const(C2) = a, b, c
const(C3) = a, b, c
const(C4) = d
const(Q) = a, b
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
Bag(Q) = Inv(a) U Inv(b)
= C1, C2, C3, C1, C2, C3
Constant-relevant statements:
Those statements appearing as many times in Bag(Q)
as the statements' constants
Standard hashing, inverted index, trie
32. Index structure for subset querying:
const(C1) = a, b
const(C2) = a, b, c
const(C3) = a, b, c
const(C4) = d
const(Q) = a, b
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
Bag(Q) = Inv(a) U Inv(b)
= C1, C2, C3, C1, C2, C3
Constant-relevant statements:
Those statements appearing as many times in Bag(Q)
as the statements' constants
Standard hashing, inverted index, trie
33. Index structure for subset querying:
Standard hashing, inverted index, trie
const(C1) = a, b
const(C2) = a, b, c
const(C3) = a, b, c
const(C4) = d
const(Q) = a, b
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
34. Index structure for subset querying:
Standard hashing, inverted index, trie
const(C1) = a, b
const(C2) = a, b, c
const(C3) = a, b, c
const(C4) = d
const(Q) = a, b
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
35. Index structure for subset querying:
Standard hashing, inverted index, trie
const(C1) = a, b
const(C2) = a, b, c
const(C3) = a, b, c
const(C4) = d
const(Q) = a, b
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
Traverse the above trie using
query constant sequence (a, b)
36. Index structure for subset querying:
const(C1) = a, b
const(C2) = a, b, c
const(C3) = a, b, c
const(C4) = d
const(Q) = a, b
Subset querying
Retrieve completeness statements
whose constants are subsets of
the query's constants
Traverse the above trie using
query constant sequence (a, b)
visited node = constant-relevant
Standard hashing, inverted index, trie
37. Experimental evaluation:
Setup
Synthetic dataset with realistic parameters
from DBpedia, a large real-world knowledge graph
Parameters:
- Number of completeness statements, default value: 1 million
- Max length of completeness statements, default value: 6
- Query length, two default values, short = 3 and long = 6
Say you are at a restaurant, and you are always not sure of the menu you will be order, I can guarantee that the waiter won't be happy with that
https://emojiisland.com/products/thinking-face-emoji-icon