Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3580305.3599419acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Free access

Locality Sensitive Hashing for Optimizing Subgraph Query Processing in Parallel Computing Systems

Published: 04 August 2023 Publication History
  • Get Citation Alerts
  • Abstract

    This paper explores parallel computing systems for efficient subgraph query processing in large graphs. We investigate how to take advantage of the inherent parallelism of parallel computing systems for both intraquery and interquery optimization during subgraph query processing. Rather than relying on widely-used hash-based methods, we utilize and extend locality sensitive hashing methods. For intraquery optimization, we use the structures of both the data graph and subgraph query to design a query-constraint locality sensitive hashing method named QCMH, which can be used to merge multiple tasks during a single subgraph query processing. For interquery optimization, we propose a query locality sensitive hashing method named QMH, which can be used to detect common subgraphs among different subgraph queries, thereby merging multiple subgraph queries. Our proposed methods can reduce the redundant computation among multiple tasks duringa single subgraph query processing or multiple queries. Extensive experimental studies on large real and synthetic graphs show that our proposed methods can improve query performance compared to state-of-the-art methods by 10% to 50%.

    Supplementary Material

    MP4 File (rtfp0123-2min-promo.mp4)
    The 2-minute-long promotional video for KDD 2023.

    References

    [1]
    G. Aluç, O. Hartig, M. T. Özsu, and K. Daudjee. Diversified Stress Testing of RDF Data Management Systems. In ISWC, pages 197--212, Cham, 2014. Springer International Publishing.
    [2]
    G. Aluç, M. T. Özsu, and K. Daudjee. Building self-clustering RDF databases using tunable-lsh. VLDB J., 28(2):173--195, 2019.
    [3]
    K. Ammar, F. McSherry, S. Salihoglu, and M. Joglekar. Distributed evaluation of subgraph queries using worst-case optimal and low-memory dataflows. Proc. VLDB Endow., 11(6):691--704, 2018.
    [4]
    A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS, pages 459--468. IEEE Computer Society, 2006.
    [5]
    F. Bi, L. Chang, X. Lin, L. Qin, and W. Zhang. Efficient Subgraph Matching by Postponing Cartesian Products. In SIGMOD, page 1199--1214, New York, NY, USA, 2016. Association for Computing Machinery.
    [6]
    M. Bröcheler, A. Pugliese, and V. S. Subrahmanian. COSI: Cloud Oriented Subgraph Identification in Massive Social Networks. In ASONAM, pages 248--255. IEEE Computer Society, 2010.
    [7]
    A. Broder. On the resemblance and containment of documents. In SEQUENCES, page 21, USA, 1997. IEEE Computer Society.
    [8]
    A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations (extended abstract). In STOC, page 327--336, New York, NY, USA, 1998. Association for Computing Machinery.
    [9]
    O. Chum, J. Philbin, and A. Zisserman. Near Duplicate Image Detection: min-Hash and tf-idf Weighting. In BMVC, pages 1--10, Leeds, UK, 2008. British Machine Vision Association.
    [10]
    A. Dasgupta, R. Kumar, and T. Sarlos. Fast locality-sensitive hashing. In SIGKDD, page 1073--1081, New York, NY, USA, 2011. Association for Computing Machinery.
    [11]
    W. Fan. Graph pattern matching revised for social network analysis. In ICDT, page 8--21, New York, NY, USA, 2012. Association for Computing Machinery.
    [12]
    W.-S. Han, J. Lee, and J.-H. Lee. Turboiso: Towards Ultrafast and Robust Subgraph Isomorphism Search in Large Graph Databases. In SIGMOD, page 337--348, New York, NY, USA, 2013. Association for Computing Machinery.
    [13]
    Q. Huang, G. Ma, J. Feng, Q. Fang, and A. K. H. Tung. Accurate and fast asymmetric locality-sensitive hashing scheme for maximum inner product search. In SIGKDD, page 1561--1570, New York, NY, USA, 2018. Association for Computing Machinery.
    [14]
    P. Indyk. Nearest neighbors in high-dimensional spaces. In Handbook of Discrete and Computational Geometry, Second Edition, pages 877--892. Chapman and Hall/CRC, 2004.
    [15]
    M. Junghanns, M. Kießling, N. Teichmann, K. Gómez, A. Petermann, and E. Rahm. Declarative and distributed graph analytics with GRADOOP. Proc. VLDB Endow., 11(12):2006--2009, 2018.
    [16]
    P. Kiran and N. Sivadasan. Scalable graph similarity search in large graph databases. In 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS), pages 207--211, 2015.
    [17]
    H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a Social Network or a News Media? In WWW, page 591--600, New York, NY, USA, 2010. Association for Computing Machinery.
    [18]
    L. Lai, L. Qin, X. Lin, and L. Chang. Scalable subgraph enumeration in mapreduce. Proc. VLDB Endow., 8(10):974--985, 2015.
    [19]
    L. Lai, L. Qin, X. Lin, Y. Zhang, and L. Chang. Scalable distributed subgraph enumeration. Proc. VLDB Endow., 10(3):217--228, 2016.
    [20]
    L. Lai, Z. Qing, Z. Yang, X. Jin, Z. Lai, R. Wang, K. Hao, X. Lin, L. Qin, W. Zhang, Y. Zhang, Z. Qian, and J. Zhou. Distributed Subgraph Matching on Timely Dataflow. Proc. VLDB Endow., 12(10):1099--1112, June 2019.
    [21]
    J. Leskovec and A. Krevl. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data, June 2014.
    [22]
    J. Leskovec, A. Rajaraman, and J. D. Ullman. Mining of Massive Datasets, 2nd Ed. Cambridge University Press, Cambridge, UK, 2014.
    [23]
    H. Q. Ngo. Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems. In PODS, pages 111--124, 2018.
    [24]
    H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. Worst-case Optimal Join Algorithms. J. ACM, 65(3):16:1--16:40, 2018.
    [25]
    N. Pr?ulj, D. G. Corneil, and I. Jurisica. Efficient estimation of graphlet fre- quency distributions in protein--protein interaction networks. Bioinformatics, 22(8):974--980, Apr. 2006.
    [26]
    X. Ren and J. Wang. Multi-Query Optimization for Subgraph Isomorphism Search. Proc. VLDB Endow., 10(3):121--132, 2016.
    [27]
    M. Serafini, G. De Francisci Morales, and G. Siganos. QFrag: Distributed Graph Search via Subgraph Isomorphism. In SoCC, page 214--228, New York, NY, USA, 2017. Association for Computing Machinery.
    [28]
    Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. Proc. VLDB Endow., 5(9):788--799, 2012.
    [29]
    J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In SIGKDD, page 990--998, New York, NY, USA, 2008. Association for Computing Machinery.
    [30]
    Y. Tian, X. Zhao, and X. Zhou. Db-lsh: Locality-sensitive hashing with query-based dynamic bucketing. In ICDE, page 2251--2263. IEEE Computer Society, 2022.
    [31]
    Z. Wang, R. Gu, W. Hu, C. Yuan, and Y. Huang. BENU: distributed subgraph enumeration with backtracking-based framework. In ICDE, pages 136--147, Macao, China, 2019. IEEE.
    [32]
    D. Yan, G. Guo, M. M. R. Chowdhury, M. T. Özsu, W. Ku, and J. C. S. Lui. G-thinker: A Distributed Framework for Mining Subgraphs in a Big Graph. In ICDE, pages 1369--1380, Dallas, TX, USA, 2020. IEEE.
    [33]
    B. Zhang, X. Liu, and B. Lang. Fast graph similarity search via locality sensitive hashing. In PCM, volume 9314 of Lecture Notes in Computer Science, pages 623--633. Springer, 2015.

    Index Terms

    1. Locality Sensitive Hashing for Optimizing Subgraph Query Processing in Parallel Computing Systems

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
        August 2023
        5996 pages
        ISBN:9798400701030
        DOI:10.1145/3580305
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 04 August 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. locality sensitive hashing
        2. parallel computing
        3. subgraph query processing

        Qualifiers

        • Research-article

        Funding Sources

        • NSFC
        • Science and Technology Major Projects of Changsha City
        • National Key R\&D Projects
        • Hunan Provincial Natural Science Foundation of China
        • Technology Projects of Hunan Province

        Conference

        KDD '23
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

        Upcoming Conference

        KDD '24

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 339
          Total Downloads
        • Downloads (Last 12 months)339
        • Downloads (Last 6 weeks)26
        Reflects downloads up to 26 Jul 2024

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media