Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3626246.3653383acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

GraphScope Flex: LEGO-like Graph Computing Stack

Published: 09 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Graph computing has become increasingly crucial in processing large-scale graph data, with numerous systems developed for this purpose. Two years ago, we introduced GraphScope as a system addressing a wide array of graph computing needs, including graph traversal, analytics, and learning in one system. Since its inception, GraphScope has achieved significant technological advancements and gained widespread adoption across various industries. However, one key lesson from this journey has been understanding the limitations of a "one-size-fits-all" approach, especially when dealing with the diversity of programming interfaces, applications, and data storage formats in graph computing. In response to these challenges, we present GraphScope Flex, the next iteration of GraphScope. GraphScope Flex is designed to be both resource-efficient and cost-effective, while also providing flexibility and user-friendliness through its LEGO-like modularity. This paper explores the architectural innovations and fundamental design principles of GraphScope Flex, all of which are direct outcomes of the lessons learned during our ongoing development process. We validate the adaptability and efficiency of GraphScope Flex with extensive evaluations on synthetic and real-world datasets. The results show that GraphScope Flex achieves 2.4× throughput and up to 55.7× speedup over other systems on the LDBC Social Network and Graphalytics benchmarks, respectively. Furthermore, GraphScope Flex accomplishes up to a 2,400× performance gain in real-world applications, demonstrating its proficiency across a wide range of graph computing scenarios with increased effectiveness.

    References

    [1]
    2011. Apache Giraph. https://giraph.apache.org.
    [2]
    2013. W3C Sparql 1.1 Query Language. https://www.w3.org/TR/sparql11-query/.
    [3]
    2014. NetworkX. https://networkx.org/.
    [4]
    2014. W3C, Resource Description Framework (RDF). https://www.w3.org/RDF/.
    [5]
    2015. Apache TinkerPop. https://tinkerpop.apache.org.
    [6]
    2015. Cypher Query Language in Neo4j. https://neo4j.com/product/cypher-graph-query-language/.
    [7]
    2017. JanusGraph. https://janusgraph.org/.
    [8]
    2018. GQL Standard. https://https://www.gqlstandards.org.
    [9]
    2018. Spark GraphX. https://spark.apache.org/graphx/.
    [10]
    2019. LDBC Social Network Benchmark. https://ldbcouncil.org/benchmarks/snb/.
    [11]
    2019. TigerGraph. https://www.tigergraph.com/.
    [12]
    2022. Performance report of LDBC Graphalytics. https://github.com/alibaba/libgrape-lite/blob/master/Performance.md.
    [13]
    2023. Apache ORC. https://orc.apache.org.
    [14]
    2023. Apache Parquet. https://parquet.apache.org/.
    [15]
    2023. Auditing results of LDBC SNB Interactive Workload. https://ldbcouncil.org/benchmarks/snb-interactive/.
    [16]
    2023. Full Disclosure Report for TigerGraph of the LDBC Social Network Benchmark. https://ldbcouncil.org/benchmarks/snb/LDBC_SNB_BI_20221109_SF1000_tigergraph.pdf.
    [17]
    2023. GraphAr. https://github.com/alibaba/GraphAr.
    [18]
    2023. ISO graph query standard GQL. https://www.gqlstandards.org/.
    [19]
    2023. LDBC Graphalytics. https://ldbcouncil.org/benchmarks/graphalytics/.
    [20]
    2023. Neo4j. https://neo4j.com/.
    [21]
    2023. OntoText. https://www.ontotext.com/.
    [22]
    2023. PyTorch. https://github.com/pytorch/pytorch.
    [23]
    2023. PyTorch Geometric. https://github.com/pyg-team/pytorch_geometric.
    [24]
    2023. Queries used for the experiment of graph query optimization. https://github.com/alibaba/GraphScope/tree/main/flex/resources/queries/examples/store_procedure.
    [25]
    2023. TensorFlow. https://github.com/tensorflow/tensorflow.
    [26]
    2023. The Open Graph Benchmark (OGB). https://ogb.stanford.edu/.
    [27]
    2023. TuGraph, The Distributed Graph Database Behind AliPay. https://tugraph.antgroup.com/.
    [28]
    Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan Reutter, and Domagoj Vrgoc. 2017. Foundations of Modern Query Languages for Graph Databases. ACM Comput. Surv. 50, 5, Article 68 (sep 2017), 40 pages.
    [29]
    Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Alastair Green, Jan Hidders, Bei Li, Leonid Libkin, Victor Marsault, Wim Martens, et al. 2023. PG-Schema: Schemas for property graphs. Proceedings of the ACM on Management of Data 1, 2 (2023), 1--25.
    [30]
    Dmitry Anikin, Oleg Borisenko, and Yaroslav Nedumov. 2019. Labeled property graphs: SQL or NoSQL?. In 2019 Ivannikov Memorial Workshop (IVMEM). IEEE, 7--13.
    [31]
    Saeid Azadifar, Mehrdad Rostami, Kamal Berahmand, Parham Moradi, and Mourad Oussalah. 2022. Graph-based relevancy-redundancy gene selection method for cancer diagnosis. Computers in Biology and Medicine 147 (2022), 105766.
    [32]
    Nico Baken. 2020. Linked data for smart homes: Comparing RDF and labeled property graphs. In LDAC2020-8th Linked Data in Architecture and Construction Workshop. 23--36.
    [33]
    Tal Ben-Nun, Michael Sutton, Sreepathi Pai, and Keshav Pingali. 2017. Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations. In ACM Sigplan Symposium on Principles and Practice of Parallel Programming. 235--248.
    [34]
    Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10 (2008), P10008.
    [35]
    Rong Chen, Jiaxin Shi, Yanzhe Chen, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. ACM Transactions on Parallel Computing (TOPC) 5, 3 (2019), 1--39.
    [36]
    Timothy A. Davis. 2019. Algorithm 1000: SuiteSparse:GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra. ACM Trans. Math. Softw. 45, 4 (2019).
    [37]
    Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (dec 2011), 25 pages. https://doi.org/10.1145/2049662.2049663
    [38]
    Stefan Decker, Sergey Melnik, Frank Van Harmelen, Dieter Fensel, Michel Klein, Jeen Broekstra, Michael Erdmann, and Ian Horrocks. 2000. The semantic web: The roles of XML and RDF. IEEE Internet computing 4, 5 (2000), 63--73.
    [39]
    Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and Peter Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). Association for Computing Machinery, New York, NY, USA, 619--630.
    [40]
    Wenfei Fan, Tao He, Longbin Lai, Xue Li, Yong Li, Zhao Li, Zhengping Qian, Chao Tian, Lei Wang, Jingbo Xu, et al . 2021. GraphScope: a unified engine for big graph processing. Proceedings of the VLDB Endowment 14, 12 (2021), 2879--2892.
    [41]
    Wenfei Fan, Wenyuan Yu, Jingbo Xu, Jingren Zhou, Xiaojian Luo, Qiang Yin, Ping Lu, Yang Cao, and Ruiqi Xu. 2018. Parallelizing sequential graph computations. ACM Transactions on Database Systems (TODS) 43, 4 (2018), 1--39.
    [42]
    Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. 2018. Cypher: An evolving query language for property graphs. In Proceedings of the SIGMOD 2018. 1433--1445.
    [43]
    Swapnil Gandhi and Anand Padmanabha Iyer. 2021. P3: Distributed deep graph learning at scale. In 15th {USENIX} OSDI 21. 551--568.
    [44]
    Shufeng Gong, Chao Tian, Qiang Yin, Wenyuan Yu, Yanfeng Zhang, Liang Geng, Song Yu, Ge Yu, and Jingren Zhou. 2021. Automating Incremental Graph Processing with Flexible Memoization. Proc. VLDB Endow. 14, 9 (may 2021), 1613--1625.
    [45]
    Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI, Vol. 12. 2.
    [46]
    Zhiwei Guo and Heng Wang. 2020. A deep graph neural network-based mechanism for social recommendations. IEEE Transactions on Industrial Informatics 17, 4 (2020), 2776--2783.
    [47]
    Petter Holme. 2017. Three faces of node importance in network epidemiology: Exact results for small graphs. Physical Review E 96, 6 (2017), 062305.
    [48]
    Kasra Jamshidi, Rakesh Mahadasa, and Keval Vora. 2020. Peregrine: a pattern-aware graph mining system. In Proceedings of the Fifteenth European Conference on Computer Systems. 1--16.
    [49]
    Longbin Lai, Yufan Yang, Zhibin Wang, Yuxuan Liu, Haotian Ma, Sijie Shen, Bingqing Lyu, Xiaoli Zhou, Wenyuan Yu, Zhengping Qian, Chen Tian, Sheng Zhong, Yeh-Ching Chung, and Jingren Zhou. 2023. GLogS: Interactive Graph Pattern Matching Query At Large Scale. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 53--69.
    [50]
    Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
    [51]
    Michelle M Li, Kexin Huang, and Marinka Zitnik. 2022. Graph representation learning in biomedicine and healthcare. Nature Biomedical Engineering 6, 12 (2022), 1353--1369.
    [52]
    Su Li et al. 2023. Hiactor: an open-source hierarchical actor framework. https://github.com/alibaba/hiactor.
    [53]
    Xue Li, Ke Meng, Lu Qin, Longbin Lai, Wenyuan Yu, Zhengping Qian, Xuemin Lin, and Jingren Zhou. 2023. FLASH: A Framework for Programming Distributed Graph Processing Algorithms. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 232--244.
    [54]
    Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, and Chuanxiong Guo. 2023. {BGL}:{GPU- Efficient} {GNN} Training by Optimizing Graph Data {I/O} and Preprocessing. In 20th USENIX Symposium on NSDI 23. 103--118.
    [55]
    Zhiwei Liu, Liangwei Yang, Ziwei Fan, Hao Peng, and Philip S Yu. 2022. Federated social recommendation with graph neural network. ACM Transactions on Intelligent Systems and Technology (TIST) 13, 4 (2022), 1--24.
    [56]
    Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 135--146.
    [57]
    Robert Ryan McCune, Tim Weninger, and Greg Madey. 2015. Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Computing Surveys (CSUR) 48, 2 (2015), 1--39.
    [58]
    Ke Meng, Liang Geng, Xue Li, Qian Tao, Wenyuan Yu, and Jingren Zhou. 2023. Efficient Multi-GPU Graph Processing with Remote Work Stealing. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). 191--204.
    [59]
    Ke Meng, Jiajia Li, Guangming Tan, and Ninghui Sun. 2019. A Pattern Based Algorithmic Autotuner for Graph Processing on GPUs. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (Washington, District of Columbia) (PPoPP '19). ACM, New York, NY, USA, 201--213.
    [60]
    Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1998. The pagerank citation ranking: Bring order to the web. Technical Report. Technical report, stanford University.
    [61]
    Santosh Pandey, Lingda Li, Adolfy Hoisie, Xiaoye S Li, and Hang Liu. 2020. C-SAW: A framework for graph sampling and random walk on GPUs. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--15.
    [62]
    Pedro Pedreira, Orri Erling, Konstantinos Karanasos, Scott Schneider, Wes McKinney, Satya R Valluri, Mohamed Zait, and Jacques Nadeau. 2023. The composable data management system manifesto. Proceedings of the VLDB Endowment 16, 10 (2023), 2679--2685.
    [63]
    Zhengping Qian, Chenqiang Min, Longbin Lai, Yong Fang, Gaofeng Li, Youyang Yao, Bingqing Lyu, Xiaoli Zhou, Zhimin Chen, and Jingren Zhou. 2021. GAIA: A System for Interactive Analysis on Distributed Graphs Using a High-Level Language. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). USENIX Association, 321--335.
    [64]
    Marko A Rodriguez. 2015. The gremlin graph traversal machine and language (invited talk). In Proceedings of the 15th Symposium on Database Programming Languages. 1--10.
    [65]
    Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 472--488.
    [66]
    Sijie Shen, Zihang Yao, Lin Shi, Lei Wang, Longbin Lai, Qian Tao, Li Su, Rong Chen, Wenyuan Yu, Haibo Chen, Binyu Zang, and Jingren Zhou. 2023. Bridging the Gap between Relational OLTP and Graph-based OLAP. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, 181--196.
    [67]
    Li Su, Xiaoming Qin, Zichao Zhang, Rui Yang, Le Xu, Indranil Gupta, Wenyuan Yu, Kai Zeng, and Jingren Zhou. 2022. Banyan: A Scoped Dataflow Engine for Graph Query Service. Proc. VLDB Endow. 15, 10 (2022), 2045--2057.
    [68]
    Jie Sun, Li Su, Zuocheng Shi, Wenting Shen, Zeke Wang, Lei Wang, Jie Zhang, Yong Li, Wenyuan Yu, Jingren Zhou, and Fei Wu. 2023. Legion: Automatically Pushing the Envelope of Multi-GPU System for Billion-Scale GNN Training. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 165--179.
    [69]
    Gábor Szárnyas, Jack Waudby, Benjamin A. Steer, Dávid Szakállas, Altan Birler, Mingxi Wu, Yuchen Zhang, and Peter Boncz. 2022. The LDBC Social Network Benchmark: Business Intelligence Workload. Proc. VLDB Endow. 16, 4 (2022), 877--890.
    [70]
    Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From" think like a vertex" to" think like a graph". Proceedings of the VLDB Endowment 7, 3 (2013), 193--204.
    [71]
    Vasileios Trigonakis, Jean-Pierre Lozi, Tomá? Faltín, Nicholas P Roth, Iraklis Psaroudakis, Arnaud Delamare, Vlad Haprian, Calin Iorgulescu, Petr Koupy, Jinsoo Lee, et al. 2021. {aDFS}: An Almost {Depth-First-Search} Distributed {Graph-Querying} System. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 209--224.
    [72]
    Ke Tu, Wei Qu, Zhengwei Wu, Zhiqiang Zhang, Zhongyi Liu, Yiming Zhao, Le Wu, Jun Zhou, and Guannan Zhang. 2023. Disentangled Interest importance aware Knowledge Graph Neural Network for Fund Recommendation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 2482--2491.
    [73]
    Suketu Vakharia, Peng Li, Weiran Liu, and Sundaram Narayanan. 2023. Shared foundations: Modernizing meta's data lakehouse. In The Conference on Innovative Data Systems Research, CIDR.
    [74]
    Keval Vora. 2019. {LUMOS}:{Dependency-Driven} Disk-based Graph Processing. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 429--442.
    [75]
    Xiyuan Wang, Haotong Yang, and Muhan Zhang. 2023. Neural Common Neighbor with Completion for Link Prediction. arXiv:2302.00890 [cs.LG]
    [76]
    Yangzihao Wang, Yuechao Pan, Andrew Davidson, Yuduo Wu, Carl Yang, Leyuan Wang, Muhammad Osama, Chenshan Yuan, Weitang Liu, Andy T Riffel, et al. 2017. Gunrock: GPU graph analytics. ACM Transactions on Parallel Computing (TOPC) 4, 1 (2017), 3.
    [77]
    Jianbang Yang, Dahai Tang, Xiaoniu Song, Lei Wang, Qiang Yin, Rong Chen, Wenyuan Yu, and Jingren Zhou. 2022. GNNLab: a factored system for sample-based GNN training over GPUs. In Proceedings of the Seventeenth European Conference on Computer Systems. 417--434.
    [78]
    Liangwei Yang, Zhiwei Liu, Yingtong Dou, Jing Ma, and Philip S Yu. 2021. Consisrec: Enhancing gnn for social recommendation via consistent neighbor aggregation. In Proceedings of the 44th international ACM SIGIR conference on Research and development in information retrieval. 2141--2145.
    [79]
    Hai-Cheng Yi, Zhu-Hong You, De-Shuang Huang, and Chee Keong Kwoh. 2022. Graph representation learning in bioinformatics: trends, methods and applications. Briefings in Bioinformatics 23, 1 (2022), bbab340.
    [80]
    Wenyuan Yu, Tao He, Lei Wang, Ke Meng, Ye Cao, Diwen Zhu, Sanhong Li, and Jingren Zhou. 2023. Vineyard: Optimizing Data Sharing in Data-Intensive Analytics. Proc. ACM Manag. Data 1, 2 (2023).
    [81]
    Mohamad Zamini, Hassan Reza, and Minou Rabiei. 2022. A review of knowledge graph completion. Information 13, 8 (2022), 396.
    [82]
    Muhan Zhang and Yixin Chen. 2018. Link Prediction Based on Graph Neural Networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 5171--5181.
    [83]
    P Zhang and G Chartrand. 2006. Introduction to graph theory. Tata McGraw-Hill.
    [84]
    Xiao-Meng Zhang, Li Liang, Lin Liu, and Ming-Jing Tang. 2021. Graph neural networks and their current applications in bioinformatics. Frontiers in genetics 12 (2021), 690049.
    [85]
    Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks: A review of methods and applications. AI open 1 (2020), 57--81.
    [86]
    Zhilun Zhou, Yu Liu, Jingtao Ding, Depeng Jin, and Yong Li. 2023. Hierarchical Knowledge Graph Learning Enabled Socioeconomic Indicator Prediction in Location-Based Social Network (WWW '23). Association for Computing Machinery, New York, NY, USA, 122--132.
    [87]
    Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A Computation-Centric Distributed Graph Processing System. In OSDI. 301--316.
    [88]
    Xiaowei Zhu, Guanyu Feng, Marco Serafini, Xiaosong Ma, Jiping Yu, Lei Xie, Ashraf Aboulnaga, and Wenguang Chen. 2019. LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans. Proceedings of the VLDB Endowment 13, 7 (2019).

    Index Terms

    1. GraphScope Flex: LEGO-like Graph Computing Stack

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data
      June 2024
      694 pages
      ISBN:9798400704222
      DOI:10.1145/3626246
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 June 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. distributed system
      2. graph analytics
      3. graph computing
      4. graph query

      Qualifiers

      • Research-article

      Conference

      SIGMOD/PODS '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 90
        Total Downloads
      • Downloads (Last 12 months)90
      • Downloads (Last 6 weeks)71
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media