Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Connectivity-Oriented Property Graph Partitioning for Distributed Graph Pattern Query Processing

Published: 20 December 2024 Publication History

Abstract

Graph pattern query is a powerful tool for extracting crucial information from property graphs. With the exponential growth of sizes, property graphs are typically divided into multiple subgraphs (referred to as partitions) and stored across various sites in distributed environments. Existing graph partitioning methods have not been efficiently optimized for pattern queries, resulting in numerous query matches across multiple partitions, called crossing matches. Identifying these matches requires much inter-partition communication, which is the primary performance bottleneck in distributed query processing. To address this issue, this paper introduces a novel connectivity-oriented relationship-disjoint partitioning method, namely RCP (Relationship Connectivity Partitioning), aimed at enhancing the efficiency of graph pattern query processing by reducing crossing matches. By employing each weakly connected component of the subgraph, which is induced by different relationship labels, as a basic unit of partition, RCP ensures that matches for both variable-length path and labeled graph pattern queries are not crossing matches. Here, variable-length path and labeled graph pattern are two common components in graph pattern queries to identify paths meeting specific label constraints and retrieve subgraphs with consistent relationship types, respectively. Moreover, in the query processing phase, we further demonstrate that all graph pattern queries, belonging to these two basic queries or their extensions, can be executed independently under RCP, thereby avoiding crossing matches. In experiments, we implemented two prototype distributed property graph systems based on Neo4j and JanusGraph, which use declarative and functional query language, respectively. Experimental results on billion-scale datasets demonstrate that our approach brings a performance improvement of nearly two orders of magnitude over state-of-the-art partitioning methods.

References

[1]
Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2007. Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB. VLDB Endowment, 411--422.
[2]
Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan Reutter, and Domagoj Vrgovc. 2017. Foundations of Modern Query Languages for Graph Databases. ACM Comput. Surv., Vol. 50, 5, Article 68 (sep 2017), 40 pages.
[3]
Renzo Angles, Harsh Thakkar, and Dominik Tomaszuk. 2020. Mapping RDF Databases to Property Graph Databases. IEEE Access, Vol. 8 (2020), 86091--86110.
[4]
Florian Bourse, Marc Lelarge, and Milan Vojnovic. 2014. Balanced Graph Edge Partition. In SIGKDD. Association for Computing Machinery, New York, NY, USA, 1456--1465.
[5]
Alin Deutsch, Nadime Francis, Alastair Green, Keith Hare, Bei Li, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Wim Martens, Jan Michels, Filip Murlak, Stefan Plantikow, Petra Selmer, Oskar van Rest, Hannes Voigt, Domagoj Vrgovc, Mingxi Wu, and Fred Zemke. 2022. Graph Pattern Matching in GQL and SQL/PGQ. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 2246--2258.
[6]
P Erdös and A Rényi. 1959. On Random Graphs I. Publicationes Mathematicae Debrecen, Vol. 6 (1959), 290--297.
[7]
Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and Peter Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In SIGMOD. Association for Computing Machinery, New York, NY, USA, 619--630.
[8]
Wenfei Fan, Tao He, Longbin Lai, Xue Li, Yong Li, Zhao Li, Zhengping Qian, Chao Tian, Lei Wang, Jingbo Xu, Youyang Yao, Qiang Yin, Wenyuan Yu, Jingren Zhou, Diwen Zhu, and Rong Zhu. 2021. GraphScope: A Unified Engine for Big Graph Processing. Proc. VLDB Endow., Vol. 14, 12 (jul 2021), 2879--2892.
[9]
Wenfei Fan, Ruiqi Xu, Qiang Yin, Wenyuan Yu, and Jingren Zhou. 2022. Application-Driven Graph Partitioning. The VLDB Journal, Vol. 32, 1 (apr 2022), 149--172.
[10]
Diego Figueira, Adwait Godbole, S Krishna, Wim Martens, Matthias Niewerth, and Tina Trautner. 2020. Containment of Simple Conjunctive Regular Path Queries. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning, Vol. 17. 371--380.
[11]
Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. 2018. Cypher: An Evolving Query Language for Property Graphs. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 1433--1445. https://doi.org/10.1145/3183713.3190657
[12]
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI. USENIX Association, USA, 599--613.
[13]
Masatoshi Hanai, Toyotaro Suzumura, Wen Jun Tan, Elvis Liu, Georgios Theodoropoulos, and Wentong Cai. 2019. Distributed Edge Partitioning for Trillion-Edge Graphs. Proc. VLDB Endow., Vol. 12, 13 (sep 2019), 2379--2392.
[14]
Kongzhang Hao, Zhengyi Yang, Longbin Lai, Zhengmin Lai, Xin Jin, and Xuemin Lin. 2019. PatMat: A Distributed Pattern Matching Engine with Cypher. In CIKM. Association for Computing Machinery, New York, NY, USA, 2921--2924.
[15]
JanusGraph. 2023. Distributed, open source, massively scalable graph database. http://janusgraph.org/
[16]
Shengwei Ji, Chenyang Bu, Lei Li, and Xindong Wu. 2021. Local Graph Edge Partitioning. ACM Trans. Intell. Syst. Technol., Vol. 12, 5 (sep 2021), 25 pages.
[17]
Martin Junghanns, Max Kießling, Alex Averbuch, André Petermann, and Erhard Rahm. 2017. Cypher-Based Graph Pattern Matching in Gradoop. In GRADES. Association for Computing Machinery, New York, NY, USA, Article 3, 8 pages.
[18]
George Karypis and Vipin Kumar. 1998. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing, Vol. 20, 1 (1998), 359--392.
[19]
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. 2015. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, Vol. 6, 2 (2015), 167--195.
[20]
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In SIGMOD. Association for Computing Machinery, New York, NY, USA, 135--146.
[21]
Daniel Margo and Margo Seltzer. 2015. A Scalable Distributed Graph Partitioner. Proc. VLDB Endow., Vol. 8, 12 (aug 2015), 1478--1489.
[22]
Claudio Martella, Dionysios Logothetis, Andreas Loukas, and Georgos Siganos. 2017. Spinner: Scalable Graph Partitioning in the Cloud. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). 1083--1094.
[23]
Ruben Mayer, Kamil Orujzade, and Hans-Arno Jacobsen. 2022. Out-of-Core Edge Partitioning at Linear Run-Time. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). 2629--2642.
[24]
Neo4j. 2024. The world's leading graph database as a fully-managed cloud service - zero-admin, globally available and always-on. https://neo4j.com/
[25]
Haohan Pang, Peng Gan, Pingpeng Yuan, Hai Jin, and Qiangsheng Hua. 2019. Partitioning Large-Scale Property Graph for Efficient Distributed Query Processing. In HPCC/SmartCity/DSS. 1643--1650.
[26]
Peng Peng, M. Tamer Özsu, Lei Zou, Cen Yan, and Chengjun Liu. 2022. MPC: Minimum Property-Cut RDF Graph Partitioning. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). 192--204.
[27]
Fabio Petroni, Leonardo Querzoni, Khuzaima Daudjee, Shahin Kamali, and Giorgio Iacoboni. 2015. HDRF: Stream-Based Partitioning for Power-Law Graphs. In CIKM. Association for Computing Machinery, New York, NY, USA, 243--252.
[28]
Marko A. Rodriguez. 2015. The Gremlin Graph Traversal Machine and Language (Invited Talk). In Proceedings of the 15th Symposium on Database Programming Languages (Pittsburgh, PA, USA) (DBPL 2015). Association for Computing Machinery, New York, NY, USA, 1--10. https://doi.org/10.1145/2815072.2815073
[29]
Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out Graph Processing from Secondary Storage. In SOSP. Association for Computing Machinery, New York, NY, USA, 410--424.
[30]
Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid Aref, Marcelo Arenas, Maciej Besta, Peter A Boncz, et al. 2021. The Future is Big Graphs: A Community View on Graph Processing Systems. Commun. ACM, Vol. 64, 9 (2021), 62--71.
[31]
Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, and Georg Lausen. 2014. Sempala: Interactive SPARQL Query Processing on Hadoop. In ISWC. Springer-Verlag, Berlin, Heidelberg, 164--179.
[32]
Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, and Georg Lausen. 2016. S2RDF: RDF Querying with SPARQL on Spark. Proc. VLDB Endow., Vol. 9, 10 (jun 2016), 804--815.
[33]
Bin Shao, Haixun Wang, and Yatao Li. 2013. Trinity: A Distributed Graph Engine on a Memory Cloud. In SIGMOD. Association for Computing Machinery, New York, NY, USA, 505--516.
[34]
Claus Stadler, Gezim Sejdiu, Damien Graux, and Jens Lehmann. 2019. Sparklify: A Scalable Software Component for Efficient Evaluation of SPARQL Queries over Distributed RDF Datasets. In ISWC. Springer-Verlag, Berlin, Heidelberg, 293--308.
[35]
Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, and Guotong Xie. 2015. SQLGraph: An Efficient Relational-Based Property Graph Store. In SIGMOD. Association for Computing Machinery, New York, NY, USA, 1887--1901.
[36]
Min Wu, Xinglu Yi, Hui Yu, Yu Liu, and Yujue Wang. 2022. Nebula Graph: An open source distributed graph database. arxiv: 2206.07278 [cs.DB]
[37]
Cong Xie, Ling Yan, Wu-Jun Li, and Zhihua Zhang. 2014. Distributed Power-Law Graph Computing: Theoretical and Empirical Analysis. In NIPS. MIT Press, Cambridge, MA, USA, 1673--1681.
[38]
Yuanyuan Zeng, Yixiang Fang, Chenhao Ma, Xu Zhou, and Kenli Li. 2024. Efficient Distributed Hop-Constrained Path Enumeration on Large-Scale Graphs. Proc. ACM Manag. Data, Vol. 2, 3 (2024), 22:1--22:25.
[39]
Yuanyuan Zeng, Wangdong Yang, Xu Zhou, Guoqin Xiao, Yunjun Gao, and Kenli Li. 2022. Distributed Set Label-Constrained Reachability Queries over Billion-Scale Graphs. In ICDE. IEEE.
[40]
Chenzi Zhang, Fan Wei, Qin Liu, Zhihao Gavin Tang, and Zhenguo Li. 2017. Graph Edge Partitioning via Neighborhood Heuristic. In SIGKDD. Association for Computing Machinery, New York, NY, USA, 605--614.
[41]
Xiaofei Zhang, Lei Chen, Yongxin Tong, and Min Wang. 2013. EAGRE: Towards scalable I/O efficient SPARQL query evaluation on the cloud. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). 565--576.
[42]
Lei Zou, Lei Chen, and M Tamer Özsu. 2009. Distance-Join: Pattern Match Query in A Large Graph Database. Proceedings of the VLDB Endowment, Vol. 2, 1 (2009), 886--897.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 6
SIGMOD
December 2024
792 pages
EISSN:2836-6573
DOI:10.1145/3709598
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 December 2024
Published in PACMMOD Volume 2, Issue 6

Permissions

Request permissions for this article.

Author Tags

  1. distributed query processing
  2. graph pattern query
  3. property graph partitioning

Qualifiers

  • Research-article

Funding Sources

  • the Key R\&D Program of Hunan Province
  • the Creative Research Groups Program of the National Natural Science Foundation of China
  • the Natural Science Foundation of Hunan Province
  • the National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 56
    Total Downloads
  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)56
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media