Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3380566acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Approximate Pattern Matching in Massive Graphs with Precision and Recall Guarantees

Published: 31 May 2020 Publication History
  • Get Citation Alerts
  • Abstract

    There are multiple situations where supporting approximation in graph pattern matching tasks is highly desirable: (i) the data acquisition process can be noisy; (ii) a user may only have an imprecise idea of the search query; and (iii) approximation can be used for high volume vertex labeling when extracting machine learning features from graph data. We present a new algorithmic pipeline for approximate matching that combines edit-distance based matching with systematic graph pruning. We formalize the problem as identifying all exact matches for up to k edit-distance subgraphs of a user-supplied template. We design a solution which exploits unique optimization opportunities within the design space, not explored previously. Our solution is (i) highly scalable, (ii) supports arbitrary patterns and edit-distance, (iii) offers 100% precision and 100% recall guarantees, and (vi) supports a set of popular data analysis scenarios. We demonstrate its advantages through an implementation that offers good strong and weak scaling on massive real-world (257 billion edges) and synthetic (1.1 trillion edges) labeled graphs, respectively, and when operating on a massive cluster (256 nodes/9,216 cores), orders of magnitude larger than previously used for similar problems. Empirical comparison with the state-of-the-art highlights the advantages of our solution when handling massive graphs and complex patterns.

    Supplementary Material

    MP4 File (3318464.3380566.mp4)
    Presentation Video

    References

    [1]
    Charu C. Aggarwal and Haixun Wang (Eds.). 2010. Managing and Mining Graph Data. Advances in Database Systems, Vol. 40. Springer. https://doi.org/10.1007/978--1--4419--6045-0
    [2]
    Noga Alon, Phuong Dao, Iman Hajirasouliha, Fereydoun Hormozdiari, and S. Cenk Sahinalp. 2008. Biomolecular Network Motif Counting and Discovery by Color Coding. Bioinformatics, Vol. 24, 13 (July 2008), i241--i249. https://doi.org/10.1093/bioinformatics/btn163
    [3]
    Michael Anderson, Shaden Smith, Narayanan Sundaram, Mihai Capotua, Zheguang Zhao, Subramanya Dulloor, Nadathur Satish, and Theodore L. Willke. 2017. Bridging the Gap Between HPC and Big Data Frameworks. Proc. VLDB Endow., Vol. 10, 8 (April 2017), 901--912. https://doi.org/10.14778/3090163.3090168
    [4]
    Yunsheng Bai, Hao Ding, Song Bian, Ting Chen, Yizhou Sun, and Wei Wang. 2019. SimGNN: A Neural Network Approach to Fast Graph Similarity Computation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM '19). ACM, New York, NY, USA, 384--392. https://doi.org/10.1145/3289600.3290967
    [5]
    Austin R. Benson, David F. Gleich, and Jure Leskovec. 2016. Higher-order organization of complex networks. Science, Vol. 353, 6295 (2016), 163--166. https://doi.org/10.1126/science.aad9029
    [6]
    J. W. Berry. 2011. Practical Heuristics for Inexact Subgraph Isomorphism. In Technical Report SAND2011--6558W. Sandia National Laboratories, 8.
    [7]
    J. W. Berry, B. Hendrickson, S. Kahan, and P. Konecny. 2007. Software and Algorithms for Graph Queries on Multithreaded Architectures. In 2007 IEEE International Parallel and Distributed Processing Symposium. 1--14. https://doi.org/10.1109/IPDPS.2007.370685
    [8]
    H. Bunke. 1997. On a Relation Between Graph Edit Distance and Maximum Common Subgraph. Pattern Recogn. Lett., Vol. 18, 9 (Aug. 1997), 689--694. https://doi.org/10.1016/S0167--8655(97)00060--3
    [9]
    H Bunke and G Allermann. 1983. Inexact Graph Matching for Structural Pattern Recognition. Pattern Recogn. Lett., Vol. 1, 4 (May 1983), 245--253. https://doi.org/10.1016/0167--8655(83)90033--8
    [10]
    V. T. Chakaravarthy, M. Kapralov, P. Murali, F. Petrini, X. Que, Y. Sabharwal, and B. Schieber. 2016. Subgraph Counting: Color Coding Beyond Trees. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 2--11. https://doi.org/10.1109/IPDPS.2016.122
    [11]
    Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the Fourth SIAM Int. Conf. on Data Mining. Society for Industrial Mathematics, p. 442.
    [12]
    D. Conte, P. Foggia, C. Sansone, and M. Vento. 2004. THIRTY YEARS OF GRAPH MATCHING IN PATTERN RECOGNITION. International Journal of Pattern Recognition and Artificial Intelligence, Vol. 18, 03 (2004), 265--298. https://doi.org/10.1142/S0218001404003228 https://doi.org/10.1145/1242524.1242530
    [13]
    Shijie Zhang, Jiong Yang, and Wei Jin. 2010. SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs. Proc. VLDB Endow., Vol. 3, 1--2 (Sept. 2010), 1185--1194. https://doi.org/10.14778/1920841.1920988
    [14]
    Z. Zhao, G. Wang, A. R. Butt, M. Khan, V. S. A. Kumar, and M. V. Marathe. 2012. SAHAD: Subgraph Analysis in Massive Networks Using Hadoop. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium. 390--401. https://doi.org/10.1109/IPDPS.2012.44
    [15]
    Feida Zhu, Qiang Qu, David Lo, Xifeng Yan, Jiawei Han, and Philip S. Yu. 2011. Mining top-K large structural patterns in a massive network. Proceedings of the VLDB Endowment, Vol. 4, 11 (8 2011), 807--818.

    Cited By

    View all
    • (2023)Distributed approximate minimal Steiner trees with millions of seed vertices on billion-edge graphsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104717181(104717)Online publication date: Nov-2023
    • (2023)Temporal graph patterns by timed automataThe VLDB Journal10.1007/s00778-023-00795-zOnline publication date: 5-May-2023
    • (2022)Towards Distributed 2-Approximation Steiner Minimal Trees in Billion-edge Graphs2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00060(549-559)Online publication date: May-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
    June 2020
    2925 pages
    ISBN:9781450367356
    DOI:10.1145/3318464
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 May 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distributed computing
    2. graph processing
    3. pattern matching

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)50
    • Downloads (Last 6 weeks)4
    Reflects downloads up to

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Distributed approximate minimal Steiner trees with millions of seed vertices on billion-edge graphsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104717181(104717)Online publication date: Nov-2023
    • (2023)Temporal graph patterns by timed automataThe VLDB Journal10.1007/s00778-023-00795-zOnline publication date: 5-May-2023
    • (2022)Towards Distributed 2-Approximation Steiner Minimal Trees in Billion-edge Graphs2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00060(549-559)Online publication date: May-2022
    • (2021)A Survey on Distributed Graph Pattern Matching in Massive GraphsACM Computing Surveys10.1145/343972454:2(1-35)Online publication date: 9-Feb-2021
    • (2021)Scalable Pattern Matching in Metadata Graphs via Constraint CheckingACM Transactions on Parallel Computing10.1145/34343918:1(1-45)Online publication date: 4-Jan-2021
    • (2021)Identifying Coherent Subgraphs In Dynamic Brain Networks2021 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP42928.2021.9506581(121-125)Online publication date: 19-Sep-2021
    • (2020)Labeled Triangle Indexing for Efficiency Gains in Distributed Interactive Subgraph Search2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3)10.1109/IA351965.2020.00012(45-53)Online publication date: Nov-2020

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media