Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Representing Paths in Graph Database Pattern Matching

Published: 01 March 2023 Publication History

Abstract

Modern graph database query languages such as GQL, SQL/PGQ, and their academic predecessor G-Core promote paths to first-class citizens in the sense that their pattern matching facility can return paths, as opposed to only nodes and edges. This is challenging for database engines, since graphs can have a large number of paths between a given node pair, which can cause huge intermediate results in query evaluation.
We introduce the concept of path multiset representations (PMRs), which can represent multisets of paths exponentially succinctly and therefore bring significant advantages for representing intermediate results. We give a detailed theoretical analysis that shows that they are especially well-suited for representing results of regular path queries and extensions thereof involving counting, random sampling, and unions. Our experiments show that they drastically improve scalability for regular path query evaluation, with speedups of several orders of magnitude.

References

[1]
Serge Abiteboul, Dallan Quass, Jason McHugh, Jennifer Widom, and Janet L. Wiener. 1997. The Lorel Query Language for Semistructured Data. Int. J. Digit. Libr. 1, 1 (1997), 68--88.
[2]
Antoine Amarilli, Pierre Bourhis, Stefan Mengel, and Matthias Niewerth. 2021. Constant-Delay Enumeration for Nondeterministic Document Spanners. ACM Trans. Database Syst. 46, 1 (2021), 2:1--2:30.
[3]
Renzo Angles, Carlos Buil Aranda, Aidan Hogan, Carlos Rojas, and Domagoj Vrgoč. 2022. WDBench: A Wikidata Graph Query Benchmark. In The Semantic Web - ISWC 2022 - 21st International Semantic Web Conference, Virtual Event, October 23--27, 2022, Proceedings (Lecture Notes in Computer Science, Vol. 13489), Ulrike Sattler, Aidan Hogan, C. Maria Keet, Valentina Presutti, João Paulo A. Almeida, Hideaki Takeda, Pierre Monnin, Giuseppe Pirrò, and Claudia d'Amato (Eds.). Springer, 714--731.
[4]
Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter A. Boncz, George H. L. Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan F. Sequeda, Oskar van Rest, and Hannes Voigt. 2018. G-CORE: A Core for Future Graph Query Languages. In International Conference on Management of Data (SIGMOD). 1421--1432.
[5]
Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Keith W. Hare, Jan Hidders, Victor E. Lee, Bei Li, Leonid Libkin, Wim Martens, Filip Murlak, Josh Perryman, Ognjen Savkovic, Michael Schmidt, Juan F. Sequeda, Slawek Staworko, and Dominik Tomaszuk. 2021. PG-Keys: Keys for Property Graphs. In International Conference on Management of Data (SIGMOD). ACM, 2423--2436.
[6]
Marcelo Arenas, Sebastián Conca, and Jorge Pérez. 2012. Counting Beyond a Yottabyte, or How SPARQL 1.1 Property Paths Will Prevent Adoption of the Standard. In International Conference on World Wide Web (WWW). 629--638.
[7]
Guillaume Bagan, Angela Bonifati, and Benoît Groz. 2013. A Trichotomy for Regular Simple Path Queries on Graphs. In Symposium on Principles of Database Systems (PODS). 261--272.
[8]
Nurzhan Bakibayev, Tomás Kociský, Dan Olteanu, and Jakub Zavodny. 2013. Aggregation and Ordering in Factorised Databases. Proc. VLDB Endow. 6, 14 (2013), 1990--2001.
[9]
Nurzhan Bakibayev, Dan Olteanu, and Jakub Zavodny. 2012. FDB: A Query Engine for Factorised Relational Databases. Proc. VLDB Endow. 5, 11 (2012), 1232--1243.
[10]
Pablo Barceló. 2013. Querying graph databases. In Symposium on Principles of Database Systems (PODS). 175--188.
[11]
Pablo Barceló, Leonid Libkin, Anthony Widjaja Lin, and Peter T. Wood. 2012. Expressive Languages for Path Queries over Graph-Structured Data. ACM Transactions on Database Systems 37, 4 (2012), 31:1--31:46.
[12]
Christoph Berkholz, Fabian Gerhardt, and Nicole Schweikardt. 2020. Constant delay enumeration for conjunctive queries: a tutorial. ACM SIGLOG News 7, 1 (2020), 4--33.
[13]
Angela Bonifati, Stefania Dumbrava, George Fletcher, Jan Hidders, Matthias Hofer, Wim Martens, Filip Murlak, Joshua Shinavier, Slawek Staworko, and Dominik Tomaszuk. 2022. Threshold Queries in Theory and in the Wild. Proc. VLDB Endow. 15, 5 (2022), 1105--1118.
[14]
Angela Bonifati, Wim Martens, and Thomas Tim. 2019. Navigating the Maze of Wikidata Query Logs. In The Web Conference (WWW). ACM. To appear.
[15]
Angela Bonifati, Wim Martens, and Thomas Timm. 2020. An analytical study of large SPARQL query logs. VLDB J. 29, 2--3 (2020), 655--679.
[16]
Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. 1999. Rewriting of Regular Expressions and Regular Path Queries. In ACM Symposium on Principles of Database Systems. ACM Press, 194--204.
[17]
Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y. Vardi. 2000. Containment of Conjunctive Regular Path Queries with Inverse. In International Conference on Principles of Knowledge Representation and Reasoning (KR). Morgan Kaufmann, 176--185.
[18]
Qun Chen, Andrew Lim, and Kian Win Ong. 2003. D(k)-Index: An Adaptive Structural Summary for Graph-Structured Data. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (San Diego, California) (SIGMOD '03). Association for Computing Machinery, New York, NY, USA, 134--144.
[19]
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. 2009. On Compressing Social Networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Paris, France) (KDD '09). Association for Computing Machinery, New York, NY, USA, 219--228.
[20]
Mariano P. Consens and Alberto O. Mendelzon. 1990. GraphLog: a Visual Formalism for Real Life Recursion. In Symposium on Principles of Database Systems (PODS). 404--416.
[21]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2001. Introduction to Algorithms, Second Edition. The MIT Press and McGraw-Hill Book Company.
[22]
Isabel F. Cruz, Alberto O. Mendelzon, and Peter T. Wood. 1987. A Graphical Query Language Supporting Recursion. In ACM SIGMOD International Conference on Management of Data (SIGMOD). 323--330.
[23]
cypher [n.d.]. Cypher Query Language. https://neo4j.com/developer/cypher/.
[24]
Alin Deutsch, Nadime Francis, Alastair Green, Keith Hare, Bei Li, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Wim Martens, Jan Michels, Filip Murlak, Stefan Plantikow, Petra Selmer, Oskar van Rest, Hannes Voigt, Domagoj Vrgoč, Mingxi Wu, and Fred Zemke. 2022. Graph Pattern Matching in GQL and SQL/PGQ. In SIGMOD '22. 2246--2258.
[25]
Alin Deutsch and Val Tannen. 2001. Optimization Properties for Classes of Conjunctive Regular Path Queries. In International Workshop on Database Programming Languages DBPL (Lecture Notes in Computer Science, Vol. 2397). Springer, 21--39.
[26]
Orri Erling. 2012. Virtuoso, a Hybrid RDBMS/Graph Column Store. IEEE Data Eng. Bull. 35, 1 (2012), 3--8. http://sites.computer.org/debull/A12mar/vicol.pdf
[27]
Wenfei Fan, Jianzhong Li, Xin Wang, and Yinghui Wu. 2012. Query preserving graph compression. In SIGMOD Conference. ACM, 157--168.
[28]
Diego Figueira, Adwait Godbole, Shankara Narayanan Krishna, Wim Martens, Matthias Niewerth, and Tina Trautner. 2020. Containment of Simple Conjunctive Regular Path Queries. In International Conference on Principles of Knowledge Representation and Reasoning (KR). 371--380.
[29]
Fernando Florenzano, Cristian Riveros, Martín Ugarte, Stijn Vansummeren, and Domagoj Vrgoc. 2020. Efficient Enumeration Algorithms for Regular Document Spanners. ACM Trans. Database Syst. 45, 1 (2020), 3:1--3:42.
[30]
Daniela Florescu, Alon Y. Levy, and Dan Suciu. 1998. Query Containment for Conjunctive Queries with Regular Expressions. In Symposium on Principles of Database Systems (PODS). ACM Press, 139--148.
[31]
Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. 2018. Cypher: An Evolving Query Language for Property Graphs. In SIGMOD Conference. ACM, 1433--1445.
[32]
GSQL [n.d.]. GSQL. https://www.tigergraph.com/gsql/.
[33]
J.E. Hopcroft, R. Motwani, and J.D. Ullman. 2007. Introduction to Automata Theory, Languages, and Computation (3 ed.). Addison-Wesley.
[34]
Muhammad Idris, Martín Ugarte, and Stijn Vansummeren. 2017. The Dynamic Yannakakis Algorithm: Compact and Efficient Query Processing Under Updates. In International Conference on Management of Data (SIGMOD). ACM, 1259--1274.
[35]
Muhammad Idris, Martín Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolfgang Lehner. 2020. General dynamic Yannakakis: conjunctive queries with theta joins under updates. VLDB J. 29, 2--3 (2020), 619--653.
[36]
R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. 2002. Exploiting local similarity for indexing paths in graph-structured data. In Proceedings 18th International Conference on Data Engineering. 129--140.
[37]
Krys J. Kochut and Maciej Janik. 2007. SPARQLeR: Extended Sparql for Semantic Association Discovery. In ESWC (Lecture Notes in Computer Science, Vol. 4519). Springer, 145--159.
[38]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford large network dataset collection.
[39]
Katja Losemann and Wim Martens. 2013. The complexity of regular expressions and property paths in SPARQL. ACM Transactions on Database Systems 38, 4 (2013), 24:1--24:39.
[40]
Katja Losemann and Wim Martens. 2014. MSO queries on trees: enumerating answers under updates. In Joint Meeting of the Conference on Computer Science Logic (CSL) and the ACM/IEEE Symposium on Logic in Computer Science (LICS). ACM, 67:1--67:10.
[41]
Stanislav Malyshev, Markus Krötzsch, Larry González, Julius Gonsior, and Adrian Bielefeldt. 2018. Getting the Most out of Wikidata: Semantic Technology Usage in Wikipedia's Knowledge Graph. In International Semantic Web Conference (ISWC). 376--394.
[42]
Wim Martens, Matthias Niewerth, Tina Popp, Stijn Vansummeren, and Domagoj Vrgoč. 2022. Representing Paths in Graph Database Pattern Matching. CoRR abs/2207.13541 (2022). https://arxiv.org/abs/2207.13541
[43]
Wim Martens, Matthias Niewerth, and Tina Trautner. 2020. A Trichotomy for Regular Trail Queries. In STACS (LIPIcs, Vol. 154). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 7:1--7:16.
[44]
Wim Martens and Tina Trautner. 2019. Dichotomies for Evaluating Simple Regular Path Queries. ACM Trans. Database Syst. 44, 4 (2019), 16:1--16:46.
[45]
Alberto O. Mendelzon and Peter T. Wood. 1995. Finding Regular Simple Paths in Graph Databases. SIAM J. Comput. 24, 6 (12 1995), 1235--1258.
[46]
Neo4j [n.d.]. Neo4j. neo4j.com.
[47]
Neo4j. 2019. The Neo4j Developer Manual v3.4. https://neo4j.com/docs/developer-manual/3.4/.
[48]
Dan Olteanu. 2020. The Relational Data Borg is Learning. Proc. VLDB Endow. 13, 12 (2020), 3502--3515.
[49]
Dan Olteanu and Jakub Závodný. 2015. Size Bounds for Factorised Representations of Query Results. ACM Trans. Database Syst. 40, 1 (2015), 2:1--2:44.
[50]
PGQL [n.d.]. PGQL. https://pgql-lang.org/.
[51]
Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid G. Aref, Marcelo Arenas, Maciej Besta, Peter A. Boncz, Khuzaima Daudjee, Emanuele Della Valle, Stefania Dumbrava, Olaf Hartig, Bernhard Haslhofer, Tim Hegeman, Jan Hidders, Katja Hose, Adriana Iamnitchi, Vasiliki Kalavri, Hugo Kapp, Wim Martens, M. Tamer Özsu, Eric Peukert, Stefan Plantikow, Mohamed Ragab, Matei Ripeanu, Semih Salihoglu, Christian Schulz, Petra Selmer, Juan F. Sequeda, Joshua Shinavier, Gábor Szárnyas, Riccardo Tommasini, Antonino Tumeo, Alexandru Uta, Ana Lucia Varbanescu, Hsiang-Yun Wu, Nikolay Yakovets, Da Yan, and Eiko Yoneki. 2021. The future is big graphs: a community view on graph processing systems. Commun. ACM 64, 9 (2021), 62--71.
[52]
Nicole Schweikardt, Luc Segoufin, and Alexandre Vigny. 2022. Enumeration for FO Queries over Nowhere Dense Graphs. J. ACM 69, 3 (2022), 22:1--22:37.
[53]
Luc Segoufin. 2013. Enumerating with constant delay the answers to a query. In Joint 2013 EDBT/ICDT Conferences, ICDT '13 Proceedings, Genoa, Italy, March 18--22, 2013, Wang-Chiew Tan, Giovanna Guerrini, Barbara Catania, and Anastasios Gounaris (Eds.). ACM, 10--20.
[54]
Jena Team. 2022. TDB Documentation. https://jena.apache.org/documentation/tdb/
[55]
Stardog Team. 2021. Stardog 7.6.3 Documentation. https://docs.stardog.com/
[56]
Bryan B. Thompson, Mike Personick, and Martyn Cutcher. 2014. The Big-data® RDF Graph Database. In Linked Data Management, Andreas Harth, Katja Hose, and Ralf Schenkel (Eds.). Chapman and Hall/CRC, 193--237.
[57]
TigerGraph [n.d.]. TigerGraph. www.tigergraph.com.
[58]
Wen-Guey Tzeng. 1996. On Path Equivalence of Nondeterministic Finite Automata. Inf. Process. Lett. 58, 1 (1996), 43--46.
[59]
Nikolaos Tziavelis, Deepak Ajwani, Wolfgang Gatterbauer, Mirek Riedewald, and Xiaofeng Yang. 2020. Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries. Proc. VLDB Endow. 13, 9 (2020), 1582--1597.
[60]
Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78--85.
[61]
Domagoj Vrgoč. 2022. Evaluating regular path queries under the all-shortest paths semantics. CoRR abs/2204.11137 (2022).
[62]
Domagoj Vrgoč, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros, and Juan Romero. 2021. MillenniumDB: A Persistent, Open-Source, Graph Database. CoRR abs/2111.01540 (2021). https://arxiv.org/abs/2111.01540
[63]
W3C Sparql 2013. SPARQL 1.1 Query Language. https://www.w3.org/TR/sparql11-query/. World Wide Web Consortium.

Cited By

View all
  • (2024)Distinct Shortest Walk Enumeration for RPQsProceedings of the ACM on Management of Data10.1145/36516012:2(1-22)Online publication date: 14-May-2024
  • (2024)Querying Graph Databases at ScaleCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654695(585-589)Online publication date: 9-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 7
March 2023
203 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 March 2023
Published in PVLDB Volume 16, Issue 7

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)13
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Distinct Shortest Walk Enumeration for RPQsProceedings of the ACM on Management of Data10.1145/36516012:2(1-22)Online publication date: 14-May-2024
  • (2024)Querying Graph Databases at ScaleCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654695(585-589)Online publication date: 9-Jun-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media