research-article

Materializing knowledge bases via trigger graphs

Authors:

Efthymia Tsamoura,

Enrico Malizia,

Jacopo UrbaniAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 14, Issue 6

Pages 943 - 956

https://doi.org/10.14778/3447689.3447699

Published: 01 February 2021 Publication History

Abstract

The chase is a well-established family of algorithms used to materialize Knowledge Bases (KBs) for tasks like query answering under dependencies or data cleaning. A general problem of chase algorithms is that they might perform redundant computations. To counter this problem, we introduce the notion of Trigger Graphs (TGs), which guide the execution of the rules avoiding redundant computations. We present the results of an extensive theoretical and empirical study that seeks to answer when and how TGs can be computed and what are the benefits of TGs when applied over real-world KBs. Our results include introducing algorithms that compute (minimal) TGs. We implemented our approach in a new engine, called GLog, and our experiments show that it can be significantly more efficient than the chase enabling us to materialize Knowledge Graphs with 17B facts in less than 40 min using a single machine with commodity hardware.

References

[1]

S. Abiteboul, R. Hull, and V. Vianu. 1995. Foundations of Databases. Addison Wesley, Reading, MA.

Digital Library

[2]

Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In SIGMOD. ACM, Melbourne, VIC, Australia, 1371--1382.

Digital Library

[3]

P. C. Arocena, B. Glavic, R. Ciucanu, and R. J. Miller. 2015. The iBench Integration Metadata Generator. In VLDB. Springer-Verlag, Kohala Coast, HI, USA, 108--119.

Digital Library

[4]

Franz Baader and Tobias Nipkow. 1999. Term Rewriting and All That. Cambridge University Press, USA.

Digital Library

[5]

Jean-François Baget, Michel Leclère, Marie-Laure Mugnier, Swan Rocher, and Clément Sipieter. 2015. Graal: A Toolkit for Query Answering with Existential Rules. In RuleML. Springer, Berlin, Germany, 328--344.

[6]

J.F. Baget, M. Leclère, M.L. Mugnier, and E. Salvat. 2011. On rules with existential variables: Walking the decidability line. Artificial Intelligence 175, 9-10 (2011), 1620--1654.

Digital Library

[7]

François Bancilhon, David Maier, Yehoshua Sagiv, and Jeffrey D. Ullman. 1986. Magic Sets and Other Strange Ways to Implement Logic Programs. In PODS. ACM, Cambridge, MA, USA, 1--15.

Digital Library

[8]

Catriel Beeri and Raghu Ramakrishnan. 1991. On the Power of Magic. Journal of Logic Programming 10, 3, 4 (1991), 255--299.

Digital Library

[9]

L. Bellomarini, E. Sallinger, and G. Gottlob. 2018. The Vadalog System: Datalog-based Reasoning for Knowledge Graphs. PVLDB 11, 9 (2018), 975--987.

Digital Library

[10]

Michael Benedikt, George Konstantinidis, Giansalvatore Mecca, Boris Motik, Paolo Papotti, Donatello Santoro, and Efthymia Tsamoura. 2017. Benchmarking the Chase. In PODS. ACM, Raleigh, NC, USA, 37--52.

Digital Library

[11]

M. Benedikt, J. Leblay, and E. Tsamoura. 2014. PDQ: Proof-driven Query Answering over Web-based Data. In VLDB. Springer-Verlag, Hangzhou, China, 1553--1556.

Digital Library

[12]

Michael Benedikt, Boris Motik, and Efthymia Tsamoura. 2018. Goal-Driven Query Answering for Existential Rules With Equality. In AAAI. AAAI Press, New Orleans, LA, USA, 1761--1770.

[13]

C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellman. 2009. DBpedia - A crystallization point for the Web of Data. Journal of Web Semantics 7, 3 (2009), 154--165.

Digital Library

[14]

A. Bonifati, I. Ileana, and M. Linardi. 2016. Functional Dependencies Unleashed for Scalable Data Exchange. In SSDBM. ACM, Budapest, Hungary, 2:1--2:12.

Digital Library

[15]

Dan Brickley, Ramanathan V Guha, and Brian McBride. 2014. RDF Schema 1.1. W3C recommendation 25 (2014), 2004--2014.

[16]

A. Calì, G. Gottlob, and T. Lukasiewicz. 2012. A general Datalog-based framework for tractable query answering over ontologies. Journal of Web Semantics 14 (2012), 57--83.

Digital Library

[17]

Diego Calvanese, Benjamin Cogrel, Sarah Komla-Ebri, Roman Kontchakov, Davide Lanti, Martin Rezk, Mariano Rodriguez-Muro, and Guohui Xiao. 2017. Ontop: Answering SPARQL queries over relational databases. Semantic Web 8, 3 (2017), 471--487.

Digital Library

[18]

Ashok K. Chandra and Philip M. Merlin. 1977. Optimal Implementation of Conjunctive Queries in Relational Data Bases. In STOC. ACM, Boulder, CO, USA, 77--90.

Digital Library

[19]

Brian Chin, Daniel von Dincklage, Vuk Ercegovac, Peter Hawkins, Mark S. Miller, Franz Och, Chris Olston, and Fernando Pereira. 2015. Yedalog: Exploring Knowledge at Scale. In SNAPL. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Asilomar, CA, USA, 63--78.

[20]

David Croft, Antonio Fabregat Mundo, Robin Haw, Marija Milacic, Joel Weiser, Guanming Wu, Michael Caudy, Phani Garapati, Marc Gillespie, Maulik R Kamdar, et al. 2013. The Reactome pathway knowledge base. Nucleic acids research 42, D1 (2013), D472--D477.

[21]

Artur S. d'Avila Garcez, Krysia Broda, and Dov M. Gabbay. 2002. Neural-symbolic learning systems: foundations and applications. Springer, Berlin, Germany.

Digital Library

[22]

Stathis Delivorias, Michel Leclère, Marie-Laure Mugnier, and Federico Ulliana. 2018. On the k-Boundedness for Existential Rules. In RuleML+RR. Springer, Luxembourg, Luxembourg, 48--64.

[23]

A. Deutsch, A. Nash, and J. B. Remmel. 2008. The chase revisited. In PODS. ACM, Vancouver, BC, Canada, 149--158.

Digital Library

[24]

R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. 2005. Data exchange: semantics and query answering. Theoretical Computer Science 336, 1 (2005), 89--124.

Digital Library

[25]

F. Geerts, G. Mecca, P. Papotti, and D. Santoro. 2014. That's All Folks! LLUNATIC Goes Open Source. In VLDB. Springer-Verlag, Hangzhou, China, 1565--1568.

Digital Library

[26]

Georg Gottlob, Giorgio Orsi, and Andreas Pieris. 2014. Query Rewriting and Optimization for Ontological Databases. ACM TODS 39, 3 (2014), 25:1--25:46.

Digital Library

[27]

Y. Guo, Z. Pan, and J. Heflin. 2011. LUBM: A Benchmark for OWL Knowledge Base Systems. Journal of Web Semantics 3, 2-3 (2011), 158--182.

Digital Library

[28]

J. Hoffart, F. Suchanek, K. Berberich, and G. Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence 194 (2013), 28--61.

Digital Library

[29]

Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d'Amato, Gerard de Melo, Claudio Gutierrez, José Emilio Labra Gayo, Sabrina Kirrane, Sebastian Neumaier, Axel Polleres, Roberto Navigli, Axel-Cyrille Ngonga Ngomo, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan Sequeda, Steffen Staab, and Antoine Zimmermann. 2020. Knowledge Graphs. arXiv:2003.02320 [cs] (2020). http://arxiv.org/abs/2003.02320 arXiv: 2003.02320.

[30]

Pan Hu, Boris Motik, and Ian Horrocks. 2019. Modular Materialisation of Datalog Programs. In AAAI. AAAI Press, Honolulu, HI, USA, 2859--2866.

[31]

Pan Hu, Jacopo Urbani, Boris Motik, and Ian Horrocks. 2019. Datalog Reasoning over Compressed RDF Knowledge Bases. In CIKM. ACM, Beijing, China, 2065--2068.

Digital Library

[32]

Nikolaos Konstantinou, Martin Koehler, Edward Abel, Cristina Civili, Bernd Neumayr, Emanuel Sallinger, Alvaro A.A. Fernandes, Georg Gottlob, John A. Keane, Leonid Libkin, and Norman W. Paton. 2017. The VADA Architecture for Cost-Effective Data Wrangling. In SIGMOD. ACM, Raleigh, NC, USA, 1599--1602.

Digital Library

[33]

Benno Kruit, Peter A. Boncz, and Jacopo Urbani. 2019. Extracting Novel Facts from Tables for Knowledge Graph Completion. In ISWC. Springer, Virtual Conference, 364--381.

[34]

Benno Kruit, Hongyu He, and Jacopo Urbani. 2020. Tab2Know: Building a Knowledge Base from Tables in Scientific Papers. In ISWC. Springer, Springer, Virtual Conference, 349--365.

[35]

Michel Leclère, Marie-Laure Mugnier, and Federico Ulliana. 2016. On Bounded Positive Existential Rules. In DL, Vol. 1577. CEUR-WS.org, Cape Town, South Africa.

[36]

Jaehun Lee, Taeho Hwang, Jungho Park, Yunsu Lee, Boris Motik, and Ian Horrocks. 2020. A Context-Aware Recommendation System for Mobile Devices. In ISWC. CEUR-WS.org, Virtual Conference, 380--382.

[37]

Li Ma, Yang Yang, Zhaoming Qiu, Guotong Xie, Yue Pan, and Shengping Liu. 2006. Towards a complete OWL Ontology Benchmark. In ESWC. Springer, Budva, Montenegro, 125--139.

Digital Library

[38]

David Maier, Alberto O. Mendelzon, and Yehoshua Sagiv. 1979. Testing Implications of Data Dependencies. ACM Transactions on Database Systems 4, 4 (1979), 455--469.

Digital Library

[39]

M. Meier. 2014. The backchase revisited. VLDB J. 23, 3 (2014), 495--516.

Digital Library

[40]

Boris Motik, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, Achille Fokoue, Carsten Lutz, et al. 2009. OWL 2 web ontology language profiles. W3C recommendation 27 (2009), 61.

[41]

Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks, and Dan Olteanu. 2014. Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems. In AAAI. AAAI Press, Quebec City, QUE, Canada, 129--137.

Digital Library

[42]

W. E. Moustafa, V. Papavasileiou, K. Yocum, and A. Deutsch. 2016. Datalography: Scaling datalog graph analytics on graph processing systems. In IEEE International Conference on Big Data. IEEE Computer Society, Washington DC, DC, USA, 56--65.

[43]

Yavor Nenov, Robert Piro, Boris Motik, Ian Horrocks, Zhe Wu, and Jay Banerjee. 2015. RDFox: A Highly-Scalable RDF Store. In ISWC. Springer, Bethlehem, PA, USA, 3--20.

[44]

Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. 2019. Industry-scale Knowledge Graphs: Lessons and Challenges. Commun. ACM 62, 8 (2019), 36--43.

Digital Library

[45]

A. Onet. 2013. The Chase Procedure and its Applications in Data Exchange. In DEIS. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Quebec City, QUE, Canada, 1--37.

[46]

R. Pichler and V. Savenkov. 2009. DEMo: Data Exchange Modeling Tool. In VLDB. Springer-Verlag, Lyon, France, 1606--1609.

Digital Library

[47]

Sebastian Rahtz, Alexander Dutton, Donna Kurtz, Graham Klyne, Andrew Zisserman, and Relja Arandjelovic. 2011. CLAROS---Collaborating on Delivering the Future of the Past. In DH. Stanford University Library, Stanford, CA, USA, 355--357.

[48]

Damien Sereni, Pavel Avgustinov, and Oege de Moor. 2008. Adding Magic to an Optimising Datalog Compiler. In SIGMOD. ACM, Vancouver, BC, Canada, 553--566.

Digital Library

[49]

Julien Subercaze, Christophe Gravier, Jules Chevalier, and Frederique Laforest. 2016. Inferray: Fast in-Memory RDF Inference. Proceedings of the VLDB Endowment 9, 6 (2016), 468--479.

Digital Library

[50]

RDFox's team. 2020. RDFox public release. https://github.com/dbunibas/chasebench/tree/master/tools/rdfox. Accessed: 2020-11-10.

[51]

K. Tuncay Tekle and Yanhong A. Liu. 2011. More Efficient Datalog Queries: Subsumptive Tabling Beats Magic Sets. In SIGMOD. ACM, Athens, Greece, 661--672.

Digital Library

[52]

Efthymia Tsamoura, David Carral, Enrico Malizia, and Jacopo Urbani. 2021. Materializing Knowledge Bases via Trigger Graphs. arXiv:2102.02753 [cs.DB]

Digital Library

[53]

Efthymia Tsamoura, Víctor Gutiérrez-Basulto, and Angelika Kimmig. 2020. Beyond the Grounding Bottleneck: Datalog Techniques for Inference in Probabilistic Logic Programs. In AAAI. AAAI Press, New York, NY, USA, 10284--10291.

[54]

Jacopo Urbani and Ceriel Jacobs. 2020. Adaptive Low-level Storage of Very Large Knowledge Graphs. In WWW. ACM / IW3C2, Virtual Conference, 1761--1772.

Digital Library

[55]

Jacopo Urbani, Ceriel Jacobs, and Markus Krötzsch. 2016. Column-Oriented Datalog Materialization for Large Knowledge Graphs. In AAAI. AAAI Press, Phoenix, AZ, USA, 258--264.

Digital Library

[56]

Jacopo Urbani, Spyros Kotoulas, Jason Maassen, Frank van Harmelen, and Henri Bal. 2010. OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples. In ESWC. Springer, Heraklion, Greece, 213--227.

Digital Library

[57]

Jacopo Urbani, Markus Krötzsch, Ceriel Jacobs, Irina Dragoste, and David Carral. 2018. Efficient Model Construction for Horn Logic with VLog. In IJCAR. Springer, Oxford, UK, 680--688.

[58]

Yujiao Zhou, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, and Jay Banerjee. 2013. Making the Most of your Triple Store: Query Answering in OWL 2 using an RL Reasoner. In WWW. International World Wide Web Conferences Steering Committee / ACM, Rio de Janeiro, Brazil, 1569--1580.

Digital Library

Cited By

Lukasiewicz TMalizia EMolinaro CMarquis PSon TKern-Isberner G(2023)Complexity of inconsistency-tolerant query answering in Datalog+/- under preferred repairsProceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning10.24963/kr.2023/46(472-481)Online publication date: 2-Sep-2023
https://dl.acm.org/doi/10.24963/kr.2023/46
Rodriguez OUlliana FMugnier M(2023)Scalable Reasoning on Document Stores via Instance-Aware Query RewritingProceedings of the VLDB Endowment10.14778/3611479.361148116:11(2699-2713)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611481
Tsamoura ELee JUrbani J(2023)Probabilistic Reasoning at Scale: Trigger Graphs to the RescueProceedings of the ACM on Management of Data10.1145/35887191:1(1-27)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588719
Show More Cited By

Materializing knowledge bases via trigger graphs
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory

Recommendations

OptiRef: Query Optimization for Knowledge Bases
WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023

Ontology-mediated query answering (OMQA) consists in asking database queries on a knowledge base (KB); a KB is a set of facts, the KB’s database, described by domain knowledge, the KB’s ontology.

FOL-rewritability is the main OMQA technique: it ...
Materializing views with minimal size to answer queries
PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

In this paper we study the following problem. Given a database and a set of queries, we want to find, in advance, a set of views that can compute the answers to the queries, such that the size of the viewset (i.e., the amount of space, in bytes, ...
Topics in knowledge bases: epistemic ontologies and secrecy-preserving reasoning

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 14, Issue 6

February 2021

261 pages

ISSN:2150-8097

Editors:
Xin Luna Dong
Amazon
,
Felix Naumann
HPI, University of Potsdam

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 February 2021

Published in PVLDB Volume 14, Issue 6

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
65
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lukasiewicz TMalizia EMolinaro CMarquis PSon TKern-Isberner G(2023)Complexity of inconsistency-tolerant query answering in Datalog+/- under preferred repairsProceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning10.24963/kr.2023/46(472-481)Online publication date: 2-Sep-2023
https://dl.acm.org/doi/10.24963/kr.2023/46
Rodriguez OUlliana FMugnier M(2023)Scalable Reasoning on Document Stores via Instance-Aware Query RewritingProceedings of the VLDB Endowment10.14778/3611479.361148116:11(2699-2713)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611481
Tsamoura ELee JUrbani J(2023)Probabilistic Reasoning at Scale: Trigger Graphs to the RescueProceedings of the ACM on Management of Data10.1145/35887191:1(1-27)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588719
Jiang SFeng JWang CLiu JXiong ZSha CZheng WLiang JXiao Y(2023)EASCKnowledge-Based Systems10.1016/j.knosys.2023.110900278:COnline publication date: 25-Oct-2023
https://dl.acm.org/doi/10.1016/j.knosys.2023.110900

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents