Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

HERO: A Hierarchical Set Partitioning and Join Framework for Speeding up the Set Intersection Over Graphs

Published: 26 March 2024 Publication History

Abstract

As one of the most primitive operators in graph algorithms, such as the triangle counting, maximal clique enumeration, and subgraph listing, a set intersection operator returns common vertices between any two given sets of vertices in data graphs. It is therefore very important to accelerate the set intersection, which will benefit a bunch of tasks that take it as a built-in block. Existing works on the set intersection usually followed the merge intersection or galloping-search framework, and most optimization research focused on how to leverage the SIMD hardware instructions. In this paper, we propose a novel multi-level set intersection framework, namely hierarchical set partitioning and join (HERO), by using our well-designed set intersection bitmap tree (SIB-tree) index, which is independent of SIMD instructions and completely orthogonal to the merge intersection framework. We recursively decompose the set intersection task into small-sized subtasks and solve each subtask using bitmap and boolean AND operations. To sufficiently achieve the acceleration brought by our proposed intersection approach, we formulate a graph reordering problem, prove its NP-hardness, and then develop a heuristic algorithm to tackle this problem. Extensive experiments on real-world graphs have been conducted to confirm the efficiency and effectiveness of our HERO approach. The speedup over classic merge intersection achieves up to 188x and 176x for triangle counting and maximal clique enumeration, respectively.

References

[1]
Aberger, C. R., Lamb, A., Tu, S., Nötzli, A., Olukotun, K., and Ré, C. Emptyheaded: A relational engine for graph processing. ACM Trans. Database Syst. 42, 4 (2017), 20:1--20:44.
[2]
Andreev, K., and Räcke, H. Balanced graph partitioning. In SPAA 2004: Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, June 27--30, 2004, Barcelona, Spain (2004), P. B. Gibbons and M. Adler, Eds., ACM, pp. 120--124.
[3]
Blandford, D. K., Blelloch, G. E., and Kash, I. A. Compact representations of separable graphs. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12--14, 2003, Baltimore, Maryland, USA (2003), ACM/SIAM, pp. 679--688.
[4]
Brendel, W., Han, F., Marujo, L., Jie, L., and Korolova, A. Practical privacy-preserving friend recommendations on social networks. In Companion of the The Web Conference 2018 on The Web Conference 2018, WWW 2018, Lyon, France, April 23--27, 2018 (2018), P. Champin, F. Gandon, M. Lalmas, and P. G. Ipeirotis, Eds., ACM, pp. 111--112.
[5]
Bron, C., and Kerbosch, J. Finding all cliques of an undirected graph (algorithm 457). Commun. ACM 16, 9 (1973), 575--576.
[6]
Chambi, S., Lemire, D., Godin, R., and Kaser, O. Roaring bitmap : nouveau modèle de compression bitmap. In Actes des 10e journées francophones sur les Entrepôts de Données et l'Analyse en Ligne, EDA 2014, Vichy, France, 5--6 Juin, 2014 (2014), S. Bimonte, L. d'Orazio, and E. Negre, Eds., vol. B-10 of RNTI, Hermann-Éditions, pp. 37--50.
[7]
Chandran, J., and V., M. V. A novel triangle count-based influence maximization method on social networks. Int. J. Knowl. Syst. Sci. 12, 4 (2021), 92--108.
[8]
Chu, S., and Cheng, J. Triangle listing in massive networks. ACM Trans. Knowl. Discov. Data 6, 4 (2012), 17:1--17:32.
[9]
Cui, W., Xiao, Y., Wang, H., Lu, Y., and Wang, W. Online search of overlapping communities. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22--27, 2013 (2013), K. A. Ross, D. Srivastava, and D. Papadias, Eds., ACM, pp. 277--288.
[10]
Demaine, E. D., López-Ortiz, A., and Munro, J. I. Adaptive set intersections, unions, and differences. In Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, January 9--11, 2000, San Francisco, CA, USA (2000), D. B. Shmoys, Ed., ACM/SIAM, pp. 743--752.
[11]
Dhulipala, L., Kabiljo, I., Karrer, B., Ottaviano, G., Pupyrev, S., and Shalita, A. Compressing graphs and indexes with recursive graph bisection. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13--17, 2016 (2016), B. Krishnapuram, M. Shah, A. J. Smola, C. C. Aggarwal, D. Shen, and R. Rastogi, Eds., ACM, pp. 1535--1544.
[12]
Ding, B., and König, A. C. Fast set intersection in memory. Proc. VLDB Endow. 4, 4 (2011), 255--266.
[13]
Garey, M. R., and Johnson, D. S. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.
[14]
Han, S., Zou, L., and Yu, J. X. Speeding up set intersections in graph algorithms using SIMD instructions. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018 (2018), G. Das, C. M. Jermaine, and P. A. Bernstein, Eds., ACM, pp. 1587--1602.
[15]
Huang, M., Jiang, Q., Qu, Q., Chen, L., and Chen, H. Information fusion oriented heterogeneous social network for friend recommendation via community detection. Appl. Soft Comput. 114 (2022), 108103.
[16]
Inoue, H., Ohara, M., and Taura, K. Faster set intersection with SIMD instructions by reducing branch mispredictions. Proc. VLDB Endow. 8, 3 (2014), 293--304.
[17]
Kang, J., Zhang, J., Song, W., and Yang, X. Friend relationships recommendation algorithm in online education platform. In Web Information Systems and Applications - 18th International Conference, WISA 2021, Kaifeng, China, September 24--26, 2021, Proceedings (2021), C. Xing, X. Fu, Y. Zhang, G. Zhang, and C. Borjigin, Eds., vol. 12999 of Lecture Notes in Computer Science, Springer, pp. 592--604.
[18]
Kunegis, J. KONECT: the koblenz network collection. In 22nd International World Wide Web Conference, WWW '13, Rio de Janeiro, Brazil, May 13--17, 2013, Companion Volume (2013), L. Carr, A. H. F. Laender, B. F. Lóscio, I. King, M. Fontoura, D. Vrandecic, L. Aroyo, J. P. M. de Oliveira, F. Lima, and E. Wilde, Eds., International World Wide Web Conferences Steering Committee / ACM, pp. 1343--1350.
[19]
Lemire, D., Boytsov, L., and Kurz, N. SIMD compression and the intersection of sorted integers. Softw. Pract. Exp. 46, 6 (2016), 723--749.
[20]
Lemire, D., Kaser, O., Kurz, N., Deri, L., O'Hara, C., Saint-Jacqes, F., and Kai, G. S. Y. Roaring bitmaps: Implementation of an optimized software library. Softw. Pract. Exp. 48, 4 (2018), 867--895.
[21]
Leskovec, J., and Krevl, A. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[22]
Lim, Y., Kang, U., and Faloutsos, C. Slashburn: Graph compression and mining beyond caveman communities. IEEE Trans. Knowl. Data Eng. 26, 12 (2014), 3077--3089.
[23]
Schlegel, B., Willhalm, T., and Lehner, W. Fast sorted-set intersection using SIMD instructions. In International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures - ADMS 2011, Seattle, WA, USA, September 2, 2011 (2011), R. Bordawekar and C. A. Lang, Eds., pp. 1--8.
[24]
Shao, Y., Cui, B., Chen, L., Ma, L., Yao, J., and Xu, N. Parallel subgraph listing in a large-scale graph. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014 (2014), C. E. Dyreson, F. Li, and M. T. Özsu, Eds., ACM, pp. 625--636.
[25]
Shoaran, M., and Thomo, A. Zero-knowledge-private counting of group triangles in social networks. Comput. J. 60, 1 (2017), 126--134.
[26]
Shun, J. Shared-memory parallelism can be simple, fast, and scalable. Morgan & Claypool, 2017.
[27]
Shun, J., and Tangwongsan, K. Multicore triangle computations without tuning. In 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13--17, 2015 (2015), J. Gehrke, W. Lehner, K. Shim, S. K. Cha, and G. M. Lohman, Eds., IEEE Computer Society, pp. 149--160.
[28]
Wang, N., Zhang, J., Tan, K., and Tung, A. K. H. On triangulation-based dense neighborhood graphs discovery. Proc. VLDB Endow. 4, 2 (2010), 58--68.
[29]
Yaozu, Cui, Junqiu, Li, Xingyuan, and Wang. Uncovering the overlapping community structure of complex networks by maximal cliques. Physica, A. Statistical mechanics and its applications 415 (2014), 398--406.
[30]
Zechner, N., and Lingas, A. Efficient algorithms for subgraph listing. Algorithms 7, 2 (2014), 243--252.
[31]
Zheng, W., Yang, Y., and Piao, C. Accelerating set intersections over graphs by reducing-merging. In KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14--18, 2021 (2021), F. Zhu, B. C. Ooi, and C. Miao, Eds., ACM, pp. 2349--2359.

Cited By

View all
  • (2024)Effective Edge-wise Representation Learning in Edge-Attributed Bipartite GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671805(3081-3091)Online publication date: 25-Aug-2024

Index Terms

  1. HERO: A Hierarchical Set Partitioning and Join Framework for Speeding up the Set Intersection Over Graphs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 2, Issue 1
    SIGMOD
    February 2024
    1874 pages
    EISSN:2836-6573
    DOI:10.1145/3654807
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 March 2024
    Published in PACMMOD Volume 2, Issue 1

    Permissions

    Request permissions for this article.

    Author Tags

    1. hierarchical set partitioning and join
    2. set intersection over graphs

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)144
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Effective Edge-wise Representation Learning in Edge-Attributed Bipartite GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671805(3081-3091)Online publication date: 25-Aug-2024

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media