Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

GraMi: frequent subgraph and pattern mining in a single large graph

Published: 01 March 2014 Publication History

Abstract

Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or protein-protein interactions in bioinformatics, are modeled as a single large graph. In this paper we present GraMi, a novel framework for frequent subgraph mining in a single large graph. GraMi undertakes a novel approach that only finds the minimal set of instances to satisfy the frequency threshold and avoids the costly enumeration of all instances required by previous approaches. We accompany our approach with a heuristic and optimizations that significantly improve performance. Additionally, we present an extension of GraMi that mines frequent patterns. Compared to subgraphs, patterns offer a more powerful version of matching that captures transitive interactions between graph nodes (like friend of a friend) which are very common in modern applications. Finally, we present CGraMi, a version supporting structural and semantic constraints, and AGraMi, an approximate version producing results with no false positives. Our experiments on real data demonstrate that our framework is up to 2 orders of magnitude faster and discovers more interesting patterns than existing approaches.

References

[1]
B. Bringmann. Mining Patterns in Structured Data. PhD thesis, KU Leuven, 2009.
[2]
B. Bringmann and S. Nijssen. What is frequent in a single graph? In Proc. of PAKDD, pages 858--863, 2008.
[3]
C. Chen, X. Yan, F. Zhu, and J. Han. gApprox: Mining frequent approximate patterns from a massive network. In Proc. of ICDM, pages 445--450, 2007.
[4]
J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In Proc. of ICDE, pages 913--922, 2008.
[5]
Y.-R. Cho and A. Zhang. Predicting protein function by frequent functional association pattern mining in protein interaction networks. Trans. Info. Tech. Biomed., 14(1):30--36, Jan. 2010.
[6]
W.-T. Chu and M.-H. Tsai. Visual pattern discovery for architecture image classification and product image search. In Proc. of ICMR, pages 27:1--27:8, 2012.
[7]
D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1(1):231--255, 1994.
[8]
S. de Givry, T. Schiex, and G. Verfaillie. Exploiting tree decomposition and soft local consistency in weighted CSP. In Proc. of AAAI, pages 22--27, 2006.
[9]
M. Deshpande, M. Kuramochi, and G. Karypis. Frequent sub-structure-based approaches for classifying chemical compounds. In Proc. of ICDM, pages 35--42, 2003.
[10]
A. Deutsch, M. Fernandez, and D. Suciu. Storing semistructured data with stored. In Proc. of SIGMOD, pages 431--442, 1999.
[11]
C. Domshlak, R. I. Brafman, and S. E. Shimony. Preference-based configuration of web page content. In Proc. of IJCAI, pages 1451--1456, 2001.
[12]
M. Fiedler and C. Borgelt. Subgraph support in a single large graph. In Proc. of ICDMW, pages 399--404, 2007.
[13]
M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979.
[14]
S. Ghazizadeh and S. S. Chawathe. Seus: Structure extraction using summaries. In Proc. of DS, pages 71--85, 2002.
[15]
V. Guralnik and G. Karypis. A scalable algorithm for clustering sequential data. In Proc. of ICDM, pages 179--186, 2001.
[16]
H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In Proc. of SIGMOD, pages 405--418, 2008.
[17]
A. Khan, X. Yan, and K.-L. Wu. Towards proximity pattern mining in large graphs. In Proc. of SIGMOD, pages 867--878, 2010.
[18]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of ICDM, pages 313--320, 2001.
[19]
M. Kuramochi and G. Karypis. Grew - A scalable frequent subgraph discovery algorithm. In Proc. of ICDM, pages 439--442, 2004.
[20]
M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3):243--271, 2005.
[21]
J. Lee, W.-S. Han, R. Kasperovics, and J.-H. Lee. An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB, 6(2):133--144, Dec. 2012.
[22]
A. Mackworth. Consistency in networks of relations. Artificial Intelligence, 8(1):99--118, 1977.
[23]
J. J. McGregor. Relational consistency algorithms and their application in finding subgraph and graph isomorphisms. Information Sciences, 19: 228--250, 1979.
[24]
S. Ranu and A. K. Singh. GraphSig: A scalable approach to mining significant subgraphs in large graph databases. In Proc. of ICDE, pages 844--855, 2009.
[25]
Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. PVLDB, 5(9):788--799, May 2012.
[26]
L. T. Thomas, S. R. Valluri, and K. Karlapalem. Margin: Maximal frequent subgraph mining. TKDD, 4(3): 10:1--10:42, 2010.
[27]
J. R. Ullmann. An algorithm for subgraph isomorphism. Journal of ACM, 23: 31--42, 1976.
[28]
X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In Proc. of SIGMOD, pages 433--444, 2008.
[29]
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. of ICDM, pages 721--724, 2002.
[30]
X. Yan and J. Han. CloseGraph: mining closed frequent graph patterns. In Proc. of SIGKDD, pages 286--295, 2003.
[31]
X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In Proc. of SIGMOD, pages 335--346, 2004.
[32]
R. Zafarani and H. Liu. Social computing data repository at ASU, 2009.
[33]
F. Zhu, X. Yan, J. Han, and P. S. Yu. gPrune: A constraint pushing framework for graph pattern mining. In Proc. of PAKDD, pages 388--400, 2007.
[34]
L. Zou, L. Chen, and M. T. Özsu. Distance-join: pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009.

Cited By

View all
  • (2024)Understanding High-Performance Subgraph Pattern Matching: A Systems PerspectiveProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661897(1-12)Online publication date: 14-Jun-2024
  • (2024)Discovering Graph Generating Dependencies for Property Graph ProfilingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679764(2067-2076)Online publication date: 21-Oct-2024
  • (2024)Mining Path Association Rules in Large Property GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679525(1994-2003)Online publication date: 21-Oct-2024
  • Show More Cited By
  1. GraMi: frequent subgraph and pattern mining in a single large graph

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 7, Issue 7
    March 2014
    108 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 March 2014
    Published in PVLDB Volume 7, Issue 7

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)140
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Understanding High-Performance Subgraph Pattern Matching: A Systems PerspectiveProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661897(1-12)Online publication date: 14-Jun-2024
    • (2024)Discovering Graph Generating Dependencies for Property Graph ProfilingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679764(2067-2076)Online publication date: 21-Oct-2024
    • (2024)Mining Path Association Rules in Large Property GraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679525(1994-2003)Online publication date: 21-Oct-2024
    • (2024)Machine-Assisted Error Discovery in Conversational AI SystemsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651120(1-10)Online publication date: 11-May-2024
    • (2024)An efficient approach for discovering Graph Entity Dependencies (GEDs)Information Systems10.1016/j.is.2024.102421125:COnline publication date: 1-Nov-2024
    • (2024)Attributed Heterogeneous Graph Embedding with Meta-graph AttentionWeb and Big Data10.1007/978-981-97-7238-4_9(129-144)Online publication date: 31-Aug-2024
    • (2024)GraphRPM: Risk Pattern Mining on Industrial Large Attributed GraphsMachine Learning and Knowledge Discovery in Databases. Applied Data Science Track10.1007/978-3-031-70381-2_9(133-149)Online publication date: 8-Sep-2024
    • (2023)Isomorphic Graph Embedding for Progressive Maximal Frequent Subgraph MiningACM Transactions on Intelligent Systems and Technology10.1145/3630635Online publication date: 27-Oct-2023
    • (2023)GraphINC: Graph Pattern Mining at Network SpeedProceedings of the ACM on Management of Data10.1145/35893291:2(1-28)Online publication date: 20-Jun-2023
    • (2023)Making It Tractable to Catch Duplicates and Conflicts in GraphsProceedings of the ACM on Management of Data10.1145/35889401:1(1-28)Online publication date: 30-May-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media