Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

GraMi: frequent subgraph and pattern mining in a single large graph

Published: 01 March 2014 Publication History

Abstract

Mining frequent subgraphs is an important operation on graphs; it is defined as finding all subgraphs that appear frequently in a database according to a given frequency threshold. Most existing work assumes a database of many small graphs, but modern applications, such as social networks, citation graphs, or protein-protein interactions in bioinformatics, are modeled as a single large graph. In this paper we present GraMi, a novel framework for frequent subgraph mining in a single large graph. GraMi undertakes a novel approach that only finds the minimal set of instances to satisfy the frequency threshold and avoids the costly enumeration of all instances required by previous approaches. We accompany our approach with a heuristic and optimizations that significantly improve performance. Additionally, we present an extension of GraMi that mines frequent patterns. Compared to subgraphs, patterns offer a more powerful version of matching that captures transitive interactions between graph nodes (like friend of a friend) which are very common in modern applications. Finally, we present CGraMi, a version supporting structural and semantic constraints, and AGraMi, an approximate version producing results with no false positives. Our experiments on real data demonstrate that our framework is up to 2 orders of magnitude faster and discovers more interesting patterns than existing approaches.

References

[1]
B. Bringmann. Mining Patterns in Structured Data. PhD thesis, KU Leuven, 2009.
[2]
B. Bringmann and S. Nijssen. What is frequent in a single graph? In Proc. of PAKDD, pages 858--863, 2008.
[3]
C. Chen, X. Yan, F. Zhu, and J. Han. gApprox: Mining frequent approximate patterns from a massive network. In Proc. of ICDM, pages 445--450, 2007.
[4]
J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In Proc. of ICDE, pages 913--922, 2008.
[5]
Y.-R. Cho and A. Zhang. Predicting protein function by frequent functional association pattern mining in protein interaction networks. Trans. Info. Tech. Biomed., 14(1):30--36, Jan. 2010.
[6]
W.-T. Chu and M.-H. Tsai. Visual pattern discovery for architecture image classification and product image search. In Proc. of ICMR, pages 27:1--27:8, 2012.
[7]
D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1(1):231--255, 1994.
[8]
S. de Givry, T. Schiex, and G. Verfaillie. Exploiting tree decomposition and soft local consistency in weighted CSP. In Proc. of AAAI, pages 22--27, 2006.
[9]
M. Deshpande, M. Kuramochi, and G. Karypis. Frequent sub-structure-based approaches for classifying chemical compounds. In Proc. of ICDM, pages 35--42, 2003.
[10]
A. Deutsch, M. Fernandez, and D. Suciu. Storing semistructured data with stored. In Proc. of SIGMOD, pages 431--442, 1999.
[11]
C. Domshlak, R. I. Brafman, and S. E. Shimony. Preference-based configuration of web page content. In Proc. of IJCAI, pages 1451--1456, 2001.
[12]
M. Fiedler and C. Borgelt. Subgraph support in a single large graph. In Proc. of ICDMW, pages 399--404, 2007.
[13]
M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., 1979.
[14]
S. Ghazizadeh and S. S. Chawathe. Seus: Structure extraction using summaries. In Proc. of DS, pages 71--85, 2002.
[15]
V. Guralnik and G. Karypis. A scalable algorithm for clustering sequential data. In Proc. of ICDM, pages 179--186, 2001.
[16]
H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In Proc. of SIGMOD, pages 405--418, 2008.
[17]
A. Khan, X. Yan, and K.-L. Wu. Towards proximity pattern mining in large graphs. In Proc. of SIGMOD, pages 867--878, 2010.
[18]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of ICDM, pages 313--320, 2001.
[19]
M. Kuramochi and G. Karypis. Grew - A scalable frequent subgraph discovery algorithm. In Proc. of ICDM, pages 439--442, 2004.
[20]
M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3):243--271, 2005.
[21]
J. Lee, W.-S. Han, R. Kasperovics, and J.-H. Lee. An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB, 6(2):133--144, Dec. 2012.
[22]
A. Mackworth. Consistency in networks of relations. Artificial Intelligence, 8(1):99--118, 1977.
[23]
J. J. McGregor. Relational consistency algorithms and their application in finding subgraph and graph isomorphisms. Information Sciences, 19: 228--250, 1979.
[24]
S. Ranu and A. K. Singh. GraphSig: A scalable approach to mining significant subgraphs in large graph databases. In Proc. of ICDE, pages 844--855, 2009.
[25]
Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient subgraph matching on billion node graphs. PVLDB, 5(9):788--799, May 2012.
[26]
L. T. Thomas, S. R. Valluri, and K. Karlapalem. Margin: Maximal frequent subgraph mining. TKDD, 4(3): 10:1--10:42, 2010.
[27]
J. R. Ullmann. An algorithm for subgraph isomorphism. Journal of ACM, 23: 31--42, 1976.
[28]
X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In Proc. of SIGMOD, pages 433--444, 2008.
[29]
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. of ICDM, pages 721--724, 2002.
[30]
X. Yan and J. Han. CloseGraph: mining closed frequent graph patterns. In Proc. of SIGKDD, pages 286--295, 2003.
[31]
X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In Proc. of SIGMOD, pages 335--346, 2004.
[32]
R. Zafarani and H. Liu. Social computing data repository at ASU, 2009.
[33]
F. Zhu, X. Yan, J. Han, and P. S. Yu. gPrune: A constraint pushing framework for graph pattern mining. In Proc. of PAKDD, pages 388--400, 2007.
[34]
L. Zou, L. Chen, and M. T. Özsu. Distance-join: pattern match query in a large graph database. PVLDB, 2(1):886--897, 2009.

Cited By

View all
  • (2025)Revealing Urban Spatial Interaction Characteristics and Crowd Travel Patterns from Trajectory DataAnnals of the American Association of Geographers10.1080/24694452.2024.2440409115:3(559-577)Online publication date: 7-Jan-2025
  • (2024)Graph Pattern Mining: consolidating models, systems, and abstractionsAnais Estendidos do XXXIX Simpósio Brasileiro de Banco de Dados (SBBD Estendido 2024)10.5753/sbbd_estendido.2024.240515(190-195)Online publication date: 14-Oct-2024
  • (2024)Meta-Interpretive LEarning with ReuseMathematics10.3390/math1206091612:6(916)Online publication date: 20-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 7, Issue 7
March 2014
108 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 March 2014
Published in PVLDB Volume 7, Issue 7

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)117
  • Downloads (Last 6 weeks)5
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Revealing Urban Spatial Interaction Characteristics and Crowd Travel Patterns from Trajectory DataAnnals of the American Association of Geographers10.1080/24694452.2024.2440409115:3(559-577)Online publication date: 7-Jan-2025
  • (2024)Graph Pattern Mining: consolidating models, systems, and abstractionsAnais Estendidos do XXXIX Simpósio Brasileiro de Banco de Dados (SBBD Estendido 2024)10.5753/sbbd_estendido.2024.240515(190-195)Online publication date: 14-Oct-2024
  • (2024)Meta-Interpretive LEarning with ReuseMathematics10.3390/math1206091612:6(916)Online publication date: 20-Mar-2024
  • (2024)Mining Spatial-Temporal Frequent Patterns of Natural Disasters in China Based on Textual RecordsInformation10.3390/info1507037215:7(372)Online publication date: 27-Jun-2024
  • (2024)Graph Stream Compression Scheme Based on Pattern Dictionary Using ProvenanceApplied Sciences10.3390/app1411455314:11(4553)Online publication date: 25-May-2024
  • (2024)FSM-BC-BSP: Frequent Subgraph Mining Algorithm Based on BC-BSPApplied Sciences10.3390/app1408315414:8(3154)Online publication date: 9-Apr-2024
  • (2024)Efficient correlated subgraph searches for ai-powered drug discoveryProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/260(2352-2361)Online publication date: 3-Aug-2024
  • (2024)Making It Tractable to Detect and Correct Errors in GraphsACM Transactions on Database Systems10.1145/370231549:4(1-75)Online publication date: 16-Dec-2024
  • (2024)SecoInfer: Secure DNN End-Edge Collaborative Inference Framework Optimizing Privacy and LatencyACM Transactions on Sensor Networks10.1145/369497220:6(1-29)Online publication date: 23-Nov-2024
  • (2024)Understanding High-Performance Subgraph Pattern Matching: A Systems PerspectiveProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661897(1-12)Online publication date: 14-Jun-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media