Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3661304.3661897acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Understanding High-Performance Subgraph Pattern Matching: A Systems Perspective

Published: 09 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Subgraph isomorphism is a crucial problem in graph-analytics with wide-ranging applications. This paper examines and compares two high-performance solutions to this problem: backtracking, represented by VF3, and compilation, represented by Dryadic. Despite both strategies being based on vertex-extension mapping, Dryadic significantly outperforms VF3 across all tests, with speed-ups ranging from a minimum of 4.95x to a maximum of 165x. To understand these disparities, the paper identifies and explores five key optimizations in Dryadic: candidate vertices generation, execution specificity, data graph storage, matching order, and redundancy elimination. With these optimizations removed, Dryadic's performance substantially degrades but it is still on average 10x faster than VF3 due to better spatial locality and search-space pruning. With the insights gained from these optimizations, we propose and implement two new techniques: lazy evaluation in Dryadic and connectivity checks in VF3, resulting in performance improvements of up to 1.23x and 1.46x, respectively.

    References

    [1]
    Aberger, C. R., Lamb, A., Tu, S., Nötzli, A., Olukotun, K., and Ré, C. Empty-headed: A relational engine for graph processing. ACM Transactions on Database Systems (TODS) 42, 4 (2017), 20.
    [2]
    Afrati, F. N., Fotakis, D., and Ullman, J. D. Enumerating subgraph instances using map-reduce. In 2013 IEEE 29th International Conference on Data Engineering (ICDE) (Piscataway, NJ, USA, 2013), IEEE, Insitute of Electrical and Electronics Engineers, pp. 62--73.
    [3]
    Ahmad, A., Yuan, L., Yan, D., Guo, G., Chen, J., and Zhang, C. Accelerating k-core decomposition by a gpu. In 2023 IEEE 39th International Conference on Data Engineering (ICDE) (Piscataway, NJ, USA, 2023), IEEE, Insitute of Electrical and Electronics Engineers, pp. 1818--1831.
    [4]
    Ammar, K., McSherry, F., Salihoglu, S., and Joglekar, M. Distributed evaluation of subgraph queries using worst-case optimal and low-memory dataflows. PVLDB 11, 6 (2018), 691--704.
    [5]
    Arai, J., Fujiwara, Y., and Onizuka, M. Gup: Fast subgraph matching by guard-based pruning. Proc. ACM Manag. Data 1, 2 (jun 2023).
    [6]
    Bhattarai, B., Liu, H., and Huang, H. H. CECI: compact embedding cluster index for scalable subgraph matching. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30-July 5, 2019 (New York, NY, USA, 2019), P. A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, and T. Kraska, Eds., ACM, pp. 1447--1462.
    [7]
    Bi, F., Chang, L., Lin, X., Qin, L., and Zhang, W. Efficient subgraph matching by postponing cartesian products. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26-July 01, 2016 (New York, NY, USA, 2016), F. Özcan, G. Koutrika, and S. Madden, Eds., ACM, pp. 1199--1214.
    [8]
    Bonnici, V., Giugno, R., Pulvirenti, A., Shasha, D., and Ferro, A. A subgraph isomorphism algorithm and its application to biochemical data. BMC bioinformatics 14, 7 (2013), 1--13.
    [9]
    Brahmakshatriya, A., Zhang, Y., Hong, C., Kamil, S., Shun, J., and Amarasinghe, S. P. Compliation techniques for graphs algorithms on gpus. CoRR abs/2012.07990 (2020).
    [10]
    Carletti, V., Foggia, P., Greco, A., Vento, M., and Vigilante, V. Vf3-light: a lightweight subgraph isomorphism algorithm and its experimental evaluation. Pattern Recognition Letters 125 (2019), 591--596.
    [11]
    Carletti, V., Foggia, P., Ritrovato, P., Vento, M., and Vigilante, V. A parallel algorithm for subgraph isomorphism. In Graph-Based Representations in Pattern Recognition: 12th IAPR-TC-15 International Workshop, GbRPR 2019, Tours, France, June 19-21, 2019, Proceedings 12 (Berlin Heidelberg, 2019), Springer, Springer, pp. 141--151.
    [12]
    Carletti, V., Foggia, P., Saggese, A., and Vento, M. Challenging the time complexity of exact subgraph isomorphism for huge and dense graphs with vf3. IEEE transactions on pattern analysis and machine intelligence 40, 4 (2017), 804--818.
    [13]
    Chen, H., Liu, M., Zhao, Y., Yan, X., Yan, D., and Cheng, J. G-miner: an efficient task-oriented graph mining system. In Proceedings of the Thirteenth EuroSys Conference (New York, NY, USA, 2018), EuroSys '18, Association for Computing Machinery.
    [14]
    Chen, J., and Qian, X. Dwarvesgraph: A high-performance graph mining system with pattern decomposition, 2020.
    [15]
    Chen, J., and Qian, X. Kudu: An efficient and scalable distributed graph pattern mining engine. arXiv preprint arXiv:2105.03789 (2021).
    [16]
    Cook, S. A. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing (New York, NY, USA, 1971), Association for Computing Machinery, pp. 151--158.
    [17]
    Cordella, L. P., Foggia, P., Sansone, C., and Vento, M. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26, 10 (2004), 1367--1372.
    [18]
    Dias, V., Teixeira, C. H. C., Guedes, D., Meira, W., and Parthasarathy, S. Fractal: A general-purpose graph pattern mining system. In Proceedings of the 2019 International Conference on Management of Data (New York, NY, USA, 2019), SIGMOD '19, Association for Computing Machinery, p. 1357--1374.
    [19]
    Elseidy, M., Abdelhamid, E., Skiadopoulos, S., and Kalnis, P. GRAMI: frequent subgraph and pattern mining in a single large graph. PVLDB 7, 7 (2014), 517--528.
    [20]
    Han, M., Kim, H., Gu, G., Park, K., and Han, W. Efficient subgraph matching: Harmonizing dynamic programming, adaptive matching order, and failing set together. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30-July 5, 2019 (New York, NY, USA, 2019), P. A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, and T. Kraska, Eds., ACM, pp. 1429--1446.
    [21]
    Han, W., Lee, J., and Lee, J. Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013 (New York, NY, USA, 2013), K. A. Ross, D. Srivastava, and D. Papadias, Eds., ACM, pp. 337--348.
    [22]
    He, H., and Singh, A. K. Graphs-at-a-time: query language and access methods for graph databases. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (New York, NY, USA, 2008), Association for Computing Machinery, pp. 405--418.
    [23]
    Jamshidi, K., Mariappan, M., and Vora, K. Anti-vertex for neighborhood constraints in subgraph queries. In Proceedings of the 5th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) (New York, NY, USA, 2022), Association for Computing Machinery, pp. 1--9.
    [24]
    Jian, X., Li, Z., and Chen, L. Suff: Accelerating subgraph matching with historical data. Proc. VLDB Endow. 16, 7 (mar 2023), 1699--1711.
    [25]
    Jiang, Z., Zhang, S., Liu, B., Hou, X., Yuan, M., and You, H. Fast subgraph matching by dynamic graph editing. IEEE Transactions on Services Computing (2023), 1--12.
    [26]
    Jin, X., Yang, Z., Lin, X., Yang, S., Qin, L., and Peng, Y. Fast: Fpga-based subgraph matching on massive graphs. In 2021 IEEE 37th International Conference on Data Engineering (ICDE) (Piscataway, NJ, USA, 2021), IEEE, Insitute of Electrical and Electronics Engineers, pp. 1452--1463.
    [27]
    Jüttner, A., and Madarasi, P. Vf2++---an improved subgraph isomorphism algorithm. Discrete Applied Mathematics 242 (2018), 69--81.
    [28]
    Kankanamge, C., Sahu, S., Mhedbhi, A., Chen, J., and Salihoglu, S. Graph-flow: An active graph database. In Proceedings of the 2017 ACM International Conference on Management of Data (New York, NY, USA, 2017), ACM, Association for Computing Machinery, pp. 1695--1698.
    [29]
    Kim, H., Choi, Y., Park, K., Lin, X., Hong, S.-H., and Han, W.-S. Versatile equivalences: Speeding up subgraph query processing and subgraph matching. In Proceedings of the 2021 International Conference on Management of Data (New York, NY, USA, 2021), SIGMOD '21, Association for Computing Machinery, p. 925--937.
    [30]
    Kim, K., Seo, I., Han, W., Lee, J., Hong, S., Chafi, H., Shin, H., and Jeong, G. Turboflux: A fast continuous subgraph matching system for streaming graph data. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018 (New York, NY, USA, 2018), G. Das, C. M. Jermaine, and P. A. Bernstein, Eds., ACM, pp. 411--426.
    [31]
    Kimmig, R., Meyerhenke, H., and Strash, D. Shared memory parallel subgraph enumeration. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (Piscataway, NJ, USA, 2017), IEEE, Insitute of Electrical and Electronics Engineers, pp. 519--529.
    [32]
    Lai, L., Qin, L., Lin, X., and Chang, L. Scalable subgraph enumeration in mapreduce. Proceedings of the VLDB Endowment 8, 10 (2015), 974--985.
    [33]
    Lai, L., Qin, L., Lin, X., Zhang, Y., and Chang, L. Scalable distributed subgraph enumeration. PVLDB 10, 3 (2016), 217--228.
    [34]
    Lai, L., Qing, Z., Yang, Z., Jin, X., Lai, Z., Wang, R., Hao, K., Lin, X., Qin, L., Zhang, W., Zhang, Y., Qian, Z., and Zhou, J. Distributed subgraph matching on timely dataflow. Proc. VLDB Endow. 12, 10 (2019), 1099--1112.
    [35]
    Leskovec, J., Kleinberg, J., and Faloutsos, C. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (New York, NY, USA, 2005), Association for Computing Machinery, pp. 177--187.
    [36]
    Leskovec, J., and Sosič, R. Snap: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1 (2016), 1.
    [37]
    Liu, J., Polisetty, S., Guan, H., and Serafini, M. Graphmini: Accelerating graph pattern matching using auxiliary graphs. In 2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT) (Piscataway, NJ, USA, 2023), IEEE, Insitute of Electrical and Electronics Engineers, pp. 211--224.
    [38]
    Mawhirter, D., Reinehr, S., Han, W., Fields, N., Claver, M., Holmes, C., McClurg, J., Liu, T., and Wu, B. Dryadic: Flexible and fast graph pattern matching at scale. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT) (Piscataway, NJ, USA, 2021), IEEE, Insitute of Electrical and Electronics Engineers, pp. 289--303.
    [39]
    Mawhirter, D., and Wu, B. Automine: harmonizing high-level abstraction and high performance for graph mining. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (New York, NY, USA, 2019), ACM, Association for Computing Machinery, pp. 509--523.
    [40]
    Mhedhbi, A., and Salihoglu, S. Optimizing subgraph queries by combining binary and worst-case optimal joins. Proc. VLDB Endow. 12, 11 (2019), 1692--1704.
    [41]
    Murray, D. G., McSherry, F., Isaacs, R., Isard, M., Barham, P., and Abadi, M. Naiad: a timely dataflow system. In ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP '13, Farmington, PA, USA, November 3-6, 2013 (New York, NY, USA, 2013), Association for Computing Machinery, pp. 439--455.
    [42]
    Plantenga, T. Inexact subgraph isomorphism in mapreduce. Journal of Parallel and Distributed Computing 73, 2 (2013), 164--175.
    [43]
    Qiao, M., Zhang, H., and Cheng, H. Subgraph matching: on compression and computation. PVLDB 11, 2 (2017), 176--188.
    [44]
    Raman, R., van Rest, O., Hong, S., Wu, Z., Chafi, H., and Banerjee, J. Pgx.iso: Parallel and efficient in-memory engine for subgraph isomorphism. In Proceedings of Workshop on GRAph Data Management Experiences and Systems (New York, NY, USA, 2014), GRADES'14, Association for Computing Machinery, p. 1--6.
    [45]
    Ren, X., and Wang, J. Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. Proceedings of the VLDB Endowment 8, 5 (2015), 617--628.
    [46]
    Ren, X., Wang, J., Han, W.-S., and Yu, J. X. Fast and robust distributed subgraph enumeration. Proceedings of the VLDB Endowment 12, 11 (2019), 1344--1356.
    [47]
    Serafini, M., De Francisci Morales, G., and Siganos, G. Qfrag: distributed graph search via subgraph isomorphism. In Proceedings of the 2017 Symposium on Cloud Computing (New York, NY, USA, 2017), SoCC '17, Association for Computing Machinery, p. 214--228.
    [48]
    Shang, H., Zhang, Y., Lin, X., and Yu, J. X. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proc. VLDB Endow. 1, 1 (2008), 364--375.
    [49]
    Shao, Y., Cui, B., Chen, L., Ma, L., Yao, J., and Xu, N. Parallel subgraph listing in a large-scale graph. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014 (New York, NY, USA, 2014), C. E. Dyreson, F. Li, and M. T. Özsu, Eds., ACM, pp. 625--636.
    [50]
    Shi, T., Zhai, M., Xu, Y., and Zhai, J. Graphpi: High performance graph pattern matching through effective redundancy elimination. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (Piscataway, NJ, USA, 2020), Insitute of Electrical and Electronics Engineers, pp. 1--14.
    [51]
    Su, X., Lin, Y., and Zou, L. Fasi: Fpga-friendly subgraph isomorphism on massive graphs. In 2023 IEEE 39th International Conference on Data Engineering (ICDE) (Piscataway, NJ, USA, 2023), IEEE, Insitute of Electrical and Electronics Engineers, pp. 2099--2112.
    [52]
    Sun, S., Che, Y., Wang, L., and Luo, Q. Efficient parallel subgraph enumeration on a single machine. In 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019 (Piscataway, NJ, USA, 2019), IEEE, pp. 232--243.
    [53]
    Sun, S., and Luo, Q. Parallelizing recursive backtracking based subgraph matching on a single machine. In 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS) (Piscataway, NJ, USA, 2018), IEEE, Insitute of Electrical and Electronics Engineers, pp. 1--9.
    [54]
    Sun, S., and Luo, Q. In-memory subgraph matching: An in-depth study. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Piscataway, NJ, USA, 2020), Insitute of Electrical and Electronics Engineers, pp. 1083--1098.
    [55]
    Sun, S., and Luo, Q. Subgraph matching with effective matching order and indexing. IEEE Transactions on Knowledge and Data Engineering 34, 1 (2020), 491--505.
    [56]
    Sun, X., and Luo, Q. Efficient gpu-accelerated subgraph matching. Proc. ACM Manag. Data 1, 2 (jun 2023).
    [57]
    Sun, Z., Wang, H., Wang, H., Shao, B., and Li, J. Efficient subgraph matching on billion node graphs. Proceedings of the VLDB Endowment 5, 9 (2012).
    [58]
    Teixeira, C. H. C., Fonseca, A. J., Serafini, M., Siganos, G., Zaki, M.J., and Aboulnaga, A. Arabesque: a system for distributed graph mining. In Proceedings of the 25th Symposium on Operating Systems Principles (New York, NY, USA, 2015), SOSP '15, Association for Computing Machinery, p. 425--440.
    [59]
    Ullmann, J. R. An algorithm for subgraph isomorphism. J. ACM 23, 1 (1976), 31--42.
    [60]
    Vora, K., Xu, G., and Gupta, R. Load the edges you need: A generic i/o optimization for disk-based graph processing. In 2016 USENIX Annual Technical Conference (USENIX ATC 16) (Denver, CO, 2016), USENIX Association, pp. 507--522.
    [61]
    Wang, Z., Gu, R., Hu, W., Yuan, C., and Huang, Y. BENU: distributed subgraph enumeration with backtracking-based framework. In 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019 (Piscataway, NJ, USA, 2019), Insitute of Electrical and Electronics Engineers, pp. 136--147.
    [62]
    Willett, P. Chemoinformatics: a history. Wiley Interdisciplinary Reviews: Computational Molecular Science 1, 1 (2011), 46--56.
    [63]
    Willey, L. C., and Salmon, J. L. A method for urban air mobility network design using hub location and subgraph isomorphism. Transportation Research Part C: Emerging Technologies 125 (2021), 102997.
    [64]
    Yang, J., and Leskovec, J. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42, 1 (2015), 181--213.
    [65]
    Yang, Z., Lai, L., Lin, X., Hao, K., and Zhang, W. Huge: An efficient and scalable subgraph enumeration system. In Proceedings of the 2021 International Conference on Management of Data (New York, NY, USA, 2021), SIGMOD '21, Association for Computing Machinery, p. 2049--2062.
    [66]
    Zhang, S., Li, S., and Yang, J. Gaddi: distance index based subgraph matching in biological networks. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (New York, NY, USA, 2009), EDBT '09, Association for Computing Machinery, p. 192--203.
    [67]
    Zhang, Y., Kiriansky, V., Mendis, C., Amarasinghe, S., and Zaharia, M. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data) (Piscataway, NJ, USA, 2017), IEEE, Insitute of Electrical and Electronics Engineers, pp. 293--302.
    [68]
    Zhang, Y., Yang, M., Baghdadi, R., Kamil, S., Shun, J., and Amarasinghe, S. P. Graphit: a high-performance graph DSL. PACMPL 2, OOPSLA (2018), 121:1--121:30.
    [69]
    Zhao, C., Zhang, Z., Xu, P., Zheng, T., and Guo, J. Kaleido: An efficient out-of-core graph mining system on a single machine. In 2020 IEEE 36th International Conference on Data Engineering (ICDE) (Piscataway, NJ, USA, 2020), Insitute of Electrical and Electronics Engineers, pp. 673--684.
    [70]
    Zhao, P., and Han, J. On graph query optimization in large networks. Proc. VLDB Endow. 3, 1 (2010), 340--351.

    Index Terms

    1. Understanding High-Performance Subgraph Pattern Matching: A Systems Perspective

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        GRADES-NDA '24: Proceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)
        June 2024
        62 pages
        ISBN:9798400706530
        DOI:10.1145/3661304
        • Editors:
        • Olaf Hartig,
        • Zoi Kaoudi
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 09 June 2024

        Check for updates

        Author Tags

        1. Dryadic
        2. Subgraph Isomorphism
        3. Subgraph Pattern Matching
        4. VF3

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        Conference

        SIGMOD/PODS '24
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 29 of 61 submissions, 48%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 56
          Total Downloads
        • Downloads (Last 12 months)56
        • Downloads (Last 6 weeks)52
        Reflects downloads up to 26 Jul 2024

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media