Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Materializing knowledge bases via trigger graphs

Published: 01 February 2021 Publication History

Abstract

The chase is a well-established family of algorithms used to materialize Knowledge Bases (KBs) for tasks like query answering under dependencies or data cleaning. A general problem of chase algorithms is that they might perform redundant computations. To counter this problem, we introduce the notion of Trigger Graphs (TGs), which guide the execution of the rules avoiding redundant computations. We present the results of an extensive theoretical and empirical study that seeks to answer when and how TGs can be computed and what are the benefits of TGs when applied over real-world KBs. Our results include introducing algorithms that compute (minimal) TGs. We implemented our approach in a new engine, called GLog, and our experiments show that it can be significantly more efficient than the chase enabling us to materialize Knowledge Graphs with 17B facts in less than 40 min using a single machine with commodity hardware.

References

[1]
S. Abiteboul, R. Hull, and V. Vianu. 1995. Foundations of Databases. Addison Wesley, Reading, MA.
[2]
Molham Aref, Balder ten Cate, Todd J. Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L. Veldhuizen, and Geoffrey Washburn. 2015. Design and Implementation of the LogicBlox System. In SIGMOD. ACM, Melbourne, VIC, Australia, 1371--1382.
[3]
P. C. Arocena, B. Glavic, R. Ciucanu, and R. J. Miller. 2015. The iBench Integration Metadata Generator. In VLDB. Springer-Verlag, Kohala Coast, HI, USA, 108--119.
[4]
Franz Baader and Tobias Nipkow. 1999. Term Rewriting and All That. Cambridge University Press, USA.
[5]
Jean-François Baget, Michel Leclère, Marie-Laure Mugnier, Swan Rocher, and Clément Sipieter. 2015. Graal: A Toolkit for Query Answering with Existential Rules. In RuleML. Springer, Berlin, Germany, 328--344.
[6]
J.F. Baget, M. Leclère, M.L. Mugnier, and E. Salvat. 2011. On rules with existential variables: Walking the decidability line. Artificial Intelligence 175, 9-10 (2011), 1620--1654.
[7]
François Bancilhon, David Maier, Yehoshua Sagiv, and Jeffrey D. Ullman. 1986. Magic Sets and Other Strange Ways to Implement Logic Programs. In PODS. ACM, Cambridge, MA, USA, 1--15.
[8]
Catriel Beeri and Raghu Ramakrishnan. 1991. On the Power of Magic. Journal of Logic Programming 10, 3, 4 (1991), 255--299.
[9]
L. Bellomarini, E. Sallinger, and G. Gottlob. 2018. The Vadalog System: Datalog-based Reasoning for Knowledge Graphs. PVLDB 11, 9 (2018), 975--987.
[10]
Michael Benedikt, George Konstantinidis, Giansalvatore Mecca, Boris Motik, Paolo Papotti, Donatello Santoro, and Efthymia Tsamoura. 2017. Benchmarking the Chase. In PODS. ACM, Raleigh, NC, USA, 37--52.
[11]
M. Benedikt, J. Leblay, and E. Tsamoura. 2014. PDQ: Proof-driven Query Answering over Web-based Data. In VLDB. Springer-Verlag, Hangzhou, China, 1553--1556.
[12]
Michael Benedikt, Boris Motik, and Efthymia Tsamoura. 2018. Goal-Driven Query Answering for Existential Rules With Equality. In AAAI. AAAI Press, New Orleans, LA, USA, 1761--1770.
[13]
C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellman. 2009. DBpedia - A crystallization point for the Web of Data. Journal of Web Semantics 7, 3 (2009), 154--165.
[14]
A. Bonifati, I. Ileana, and M. Linardi. 2016. Functional Dependencies Unleashed for Scalable Data Exchange. In SSDBM. ACM, Budapest, Hungary, 2:1--2:12.
[15]
Dan Brickley, Ramanathan V Guha, and Brian McBride. 2014. RDF Schema 1.1. W3C recommendation 25 (2014), 2004--2014.
[16]
A. Calì, G. Gottlob, and T. Lukasiewicz. 2012. A general Datalog-based framework for tractable query answering over ontologies. Journal of Web Semantics 14 (2012), 57--83.
[17]
Diego Calvanese, Benjamin Cogrel, Sarah Komla-Ebri, Roman Kontchakov, Davide Lanti, Martin Rezk, Mariano Rodriguez-Muro, and Guohui Xiao. 2017. Ontop: Answering SPARQL queries over relational databases. Semantic Web 8, 3 (2017), 471--487.
[18]
Ashok K. Chandra and Philip M. Merlin. 1977. Optimal Implementation of Conjunctive Queries in Relational Data Bases. In STOC. ACM, Boulder, CO, USA, 77--90.
[19]
Brian Chin, Daniel von Dincklage, Vuk Ercegovac, Peter Hawkins, Mark S. Miller, Franz Och, Chris Olston, and Fernando Pereira. 2015. Yedalog: Exploring Knowledge at Scale. In SNAPL. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Asilomar, CA, USA, 63--78.
[20]
David Croft, Antonio Fabregat Mundo, Robin Haw, Marija Milacic, Joel Weiser, Guanming Wu, Michael Caudy, Phani Garapati, Marc Gillespie, Maulik R Kamdar, et al. 2013. The Reactome pathway knowledge base. Nucleic acids research 42, D1 (2013), D472--D477.
[21]
Artur S. d'Avila Garcez, Krysia Broda, and Dov M. Gabbay. 2002. Neural-symbolic learning systems: foundations and applications. Springer, Berlin, Germany.
[22]
Stathis Delivorias, Michel Leclère, Marie-Laure Mugnier, and Federico Ulliana. 2018. On the k-Boundedness for Existential Rules. In RuleML+RR. Springer, Luxembourg, Luxembourg, 48--64.
[23]
A. Deutsch, A. Nash, and J. B. Remmel. 2008. The chase revisited. In PODS. ACM, Vancouver, BC, Canada, 149--158.
[24]
R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. 2005. Data exchange: semantics and query answering. Theoretical Computer Science 336, 1 (2005), 89--124.
[25]
F. Geerts, G. Mecca, P. Papotti, and D. Santoro. 2014. That's All Folks! LLUNATIC Goes Open Source. In VLDB. Springer-Verlag, Hangzhou, China, 1565--1568.
[26]
Georg Gottlob, Giorgio Orsi, and Andreas Pieris. 2014. Query Rewriting and Optimization for Ontological Databases. ACM TODS 39, 3 (2014), 25:1--25:46.
[27]
Y. Guo, Z. Pan, and J. Heflin. 2011. LUBM: A Benchmark for OWL Knowledge Base Systems. Journal of Web Semantics 3, 2-3 (2011), 158--182.
[28]
J. Hoffart, F. Suchanek, K. Berberich, and G. Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence 194 (2013), 28--61.
[29]
Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d'Amato, Gerard de Melo, Claudio Gutierrez, José Emilio Labra Gayo, Sabrina Kirrane, Sebastian Neumaier, Axel Polleres, Roberto Navigli, Axel-Cyrille Ngonga Ngomo, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan Sequeda, Steffen Staab, and Antoine Zimmermann. 2020. Knowledge Graphs. arXiv:2003.02320 [cs] (2020). http://arxiv.org/abs/2003.02320 arXiv: 2003.02320.
[30]
Pan Hu, Boris Motik, and Ian Horrocks. 2019. Modular Materialisation of Datalog Programs. In AAAI. AAAI Press, Honolulu, HI, USA, 2859--2866.
[31]
Pan Hu, Jacopo Urbani, Boris Motik, and Ian Horrocks. 2019. Datalog Reasoning over Compressed RDF Knowledge Bases. In CIKM. ACM, Beijing, China, 2065--2068.
[32]
Nikolaos Konstantinou, Martin Koehler, Edward Abel, Cristina Civili, Bernd Neumayr, Emanuel Sallinger, Alvaro A.A. Fernandes, Georg Gottlob, John A. Keane, Leonid Libkin, and Norman W. Paton. 2017. The VADA Architecture for Cost-Effective Data Wrangling. In SIGMOD. ACM, Raleigh, NC, USA, 1599--1602.
[33]
Benno Kruit, Peter A. Boncz, and Jacopo Urbani. 2019. Extracting Novel Facts from Tables for Knowledge Graph Completion. In ISWC. Springer, Virtual Conference, 364--381.
[34]
Benno Kruit, Hongyu He, and Jacopo Urbani. 2020. Tab2Know: Building a Knowledge Base from Tables in Scientific Papers. In ISWC. Springer, Springer, Virtual Conference, 349--365.
[35]
Michel Leclère, Marie-Laure Mugnier, and Federico Ulliana. 2016. On Bounded Positive Existential Rules. In DL, Vol. 1577. CEUR-WS.org, Cape Town, South Africa.
[36]
Jaehun Lee, Taeho Hwang, Jungho Park, Yunsu Lee, Boris Motik, and Ian Horrocks. 2020. A Context-Aware Recommendation System for Mobile Devices. In ISWC. CEUR-WS.org, Virtual Conference, 380--382.
[37]
Li Ma, Yang Yang, Zhaoming Qiu, Guotong Xie, Yue Pan, and Shengping Liu. 2006. Towards a complete OWL Ontology Benchmark. In ESWC. Springer, Budva, Montenegro, 125--139.
[38]
David Maier, Alberto O. Mendelzon, and Yehoshua Sagiv. 1979. Testing Implications of Data Dependencies. ACM Transactions on Database Systems 4, 4 (1979), 455--469.
[39]
M. Meier. 2014. The backchase revisited. VLDB J. 23, 3 (2014), 495--516.
[40]
Boris Motik, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, Achille Fokoue, Carsten Lutz, et al. 2009. OWL 2 web ontology language profiles. W3C recommendation 27 (2009), 61.
[41]
Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks, and Dan Olteanu. 2014. Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems. In AAAI. AAAI Press, Quebec City, QUE, Canada, 129--137.
[42]
W. E. Moustafa, V. Papavasileiou, K. Yocum, and A. Deutsch. 2016. Datalography: Scaling datalog graph analytics on graph processing systems. In IEEE International Conference on Big Data. IEEE Computer Society, Washington DC, DC, USA, 56--65.
[43]
Yavor Nenov, Robert Piro, Boris Motik, Ian Horrocks, Zhe Wu, and Jay Banerjee. 2015. RDFox: A Highly-Scalable RDF Store. In ISWC. Springer, Bethlehem, PA, USA, 3--20.
[44]
Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, and Jamie Taylor. 2019. Industry-scale Knowledge Graphs: Lessons and Challenges. Commun. ACM 62, 8 (2019), 36--43.
[45]
A. Onet. 2013. The Chase Procedure and its Applications in Data Exchange. In DEIS. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Quebec City, QUE, Canada, 1--37.
[46]
R. Pichler and V. Savenkov. 2009. DEMo: Data Exchange Modeling Tool. In VLDB. Springer-Verlag, Lyon, France, 1606--1609.
[47]
Sebastian Rahtz, Alexander Dutton, Donna Kurtz, Graham Klyne, Andrew Zisserman, and Relja Arandjelovic. 2011. CLAROS---Collaborating on Delivering the Future of the Past. In DH. Stanford University Library, Stanford, CA, USA, 355--357.
[48]
Damien Sereni, Pavel Avgustinov, and Oege de Moor. 2008. Adding Magic to an Optimising Datalog Compiler. In SIGMOD. ACM, Vancouver, BC, Canada, 553--566.
[49]
Julien Subercaze, Christophe Gravier, Jules Chevalier, and Frederique Laforest. 2016. Inferray: Fast in-Memory RDF Inference. Proceedings of the VLDB Endowment 9, 6 (2016), 468--479.
[50]
RDFox's team. 2020. RDFox public release. https://github.com/dbunibas/chasebench/tree/master/tools/rdfox. Accessed: 2020-11-10.
[51]
K. Tuncay Tekle and Yanhong A. Liu. 2011. More Efficient Datalog Queries: Subsumptive Tabling Beats Magic Sets. In SIGMOD. ACM, Athens, Greece, 661--672.
[52]
Efthymia Tsamoura, David Carral, Enrico Malizia, and Jacopo Urbani. 2021. Materializing Knowledge Bases via Trigger Graphs. arXiv:2102.02753 [cs.DB]
[53]
Efthymia Tsamoura, Víctor Gutiérrez-Basulto, and Angelika Kimmig. 2020. Beyond the Grounding Bottleneck: Datalog Techniques for Inference in Probabilistic Logic Programs. In AAAI. AAAI Press, New York, NY, USA, 10284--10291.
[54]
Jacopo Urbani and Ceriel Jacobs. 2020. Adaptive Low-level Storage of Very Large Knowledge Graphs. In WWW. ACM / IW3C2, Virtual Conference, 1761--1772.
[55]
Jacopo Urbani, Ceriel Jacobs, and Markus Krötzsch. 2016. Column-Oriented Datalog Materialization for Large Knowledge Graphs. In AAAI. AAAI Press, Phoenix, AZ, USA, 258--264.
[56]
Jacopo Urbani, Spyros Kotoulas, Jason Maassen, Frank van Harmelen, and Henri Bal. 2010. OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples. In ESWC. Springer, Heraklion, Greece, 213--227.
[57]
Jacopo Urbani, Markus Krötzsch, Ceriel Jacobs, Irina Dragoste, and David Carral. 2018. Efficient Model Construction for Horn Logic with VLog. In IJCAR. Springer, Oxford, UK, 680--688.
[58]
Yujiao Zhou, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, and Jay Banerjee. 2013. Making the Most of your Triple Store: Query Answering in OWL 2 using an RL Reasoner. In WWW. International World Wide Web Conferences Steering Committee / ACM, Rio de Janeiro, Brazil, 1569--1580.

Cited By

View all
  • (2023)Complexity of inconsistency-tolerant query answering in Datalog+/- under preferred repairsProceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning10.24963/kr.2023/46(472-481)Online publication date: 2-Sep-2023
  • (2023)Scalable Reasoning on Document Stores via Instance-Aware Query RewritingProceedings of the VLDB Endowment10.14778/3611479.361148116:11(2699-2713)Online publication date: 24-Aug-2023
  • (2023)Probabilistic Reasoning at Scale: Trigger Graphs to the RescueProceedings of the ACM on Management of Data10.1145/35887191:1(1-27)Online publication date: 30-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 14, Issue 6
February 2021
261 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 February 2021
Published in PVLDB Volume 14, Issue 6

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Complexity of inconsistency-tolerant query answering in Datalog+/- under preferred repairsProceedings of the 20th International Conference on Principles of Knowledge Representation and Reasoning10.24963/kr.2023/46(472-481)Online publication date: 2-Sep-2023
  • (2023)Scalable Reasoning on Document Stores via Instance-Aware Query RewritingProceedings of the VLDB Endowment10.14778/3611479.361148116:11(2699-2713)Online publication date: 24-Aug-2023
  • (2023)Probabilistic Reasoning at Scale: Trigger Graphs to the RescueProceedings of the ACM on Management of Data10.1145/35887191:1(1-27)Online publication date: 30-May-2023
  • (2023)EASCKnowledge-Based Systems10.1016/j.knosys.2023.110900278:COnline publication date: 25-Oct-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media