Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1014052.1014123acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

SPIN: mining maximal frequent subgraphs from graph databases

Published: 22 August 2004 Publication History

Abstract

One fundamental challenge for mining recurring subgraphs from semi-structured data sets is the overwhelming abundance of such patterns. In large graph databases, the total number of frequent subgraphs can become too large to allow a full enumeration using reasonable computational resources. In this paper, we propose a new algorithm that mines only maximal frequent subgraphs, i.e. subgraphs that are not a part of any other frequent subgraphs. This may exponentially decrease the size of the output set in the best case; in our experiments on practical data sets, mining maximal frequent subgraphs reduces the total number of mined patterns by two to three orders of magnitude.Our method first mines all frequent trees from a general graph database and then reconstructs all maximal subgraphs from the mined trees. Using two chemical structure benchmarks and a set of synthetic graph data sets, we demonstrate that, in addition to decreasing the output size, our algorithm can achieve a five-fold speed up over the current state-of-the-art subgraph mining algorithms.

References

[1]
T. Asai, K. Abe, S. Kawasoe, H. Arimura, and H. Sakamoto. Efficiently substructure discovery from large semi-structured data. SDM, 2002.]]
[2]
C. Borgelt and M. R. Berhold. Mining molecular fragments: Finding relevant substructures of molecules. In Proc. International Conference on Data Mining'02.]]
[3]
D. Burdick, M. Calimlim, and J. Gehrke. Mafia: A maximal frequent itemset algorithm for transactional databases. ICDE, 2001.]]
[4]
Y. Chi, Y. Yang, and R. Muntz. Indexing and mining free trees. ICDM, 2003.]]
[5]
A. Deutsch, M. F. Fernandez, and D. Suciu. Storing semistructured data with STORED. in SIGMOD, pages 431-442, 1999.]]
[6]
K. Gouda and M. J. Zaki. Efficiently mining maximal frequent itemsets. ICDM, 2001.]]
[7]
J. Hu, X. Shen, Y. Shao, C. Bystroff, and M. J. Zaki. Mining protein contact maps. 2nd BIOKDD Workshop on Data Mining in Bioinformatics, 2002.]]
[8]
J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining protein family specific residue packing patterns from protein structure graphs. In Eighth Annual International Conference on Research in Computational Molecular Biology (RECOMB), pages 308--315, 2004.]]
[9]
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. in ICDM'03, 2003.]]
[10]
J. Huan, W. Wang, J. Prins, and J. Yang. Spin: Mining maximal frequent subgraphs from graph databases. UNC Technical Report TR04-018, 2004.]]
[11]
J. Huan, W. Wang, A. Washington, J. Prins, and A. Tropsha. Accurately classify protein family based on coherrent subgraph mining. in Pacific Symposium on Biocomputing, 2004.]]
[12]
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of the 4th European Conf. on Principles and Practices of Knowledge Discovery in Databases (PKDD), pages 13--23, 2000.]]
[13]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. International Conference on Data Mining'01.]]
[14]
M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. SDM, 2004.]]
[15]
J. Pei, G. Dong, W. Zou, and J. Han. On computing condensed frequent pattern bases. ICDM, 2002.]]
[16]
S. Raghavan and H. Garcia-Molina. Representing web graphs. In Proceedings of the IEEE Intl. Conference on Data Engineering, 2003.]]
[17]
N. Vanetik and E. Gudes. Mining frequent labeled and partially labeled graph patterns. ICDE, 2004.]]
[18]
N. Vanetik, E. Gudes, and E. Shimony. Computing frequent graph patterns from semi-structured data. Proc. International Conference on Data Mining'02, 2002.]]
[19]
X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In Proc. International Conference on Data Mining'02.]]
[20]
X. Yan and J. Han. Closegraph: Mining closed frequent graph patterns. KDD'03, 2003.]]
[21]
X. Yan, P. Yu, and J. Han. Graph Indexing: A Frequent Structure-based Approach. SIGMOD'04, 2004.]]
[22]
M. Zaki. Efficiently mining freqeunt trees in a forest. SIGKDD, 2002.]]

Cited By

View all
  • (2024)Dynamic frequent subgraph mining algorithms over evolving graphs: a surveyPeerJ Computer Science10.7717/peerj-cs.236110(e2361)Online publication date: 8-Oct-2024
  • (2024)Discovering API usage specifications for security detection using two-stage code miningCybersecurity10.1186/s42400-024-00224-w7:1Online publication date: 3-Oct-2024
  • (2024)PMCS: Partition-Based Maximal Frequent Subgraph Mining Using MCS2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00032(159-168)Online publication date: 2-Jul-2024
  • Show More Cited By

Index Terms

  1. SPIN: mining maximal frequent subgraphs from graph databases

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2004
    874 pages
    ISBN:1581138881
    DOI:10.1145/1014052
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. spanning tree
    2. subgraph mining

    Qualifiers

    • Article

    Conference

    KDD04

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)19
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Dynamic frequent subgraph mining algorithms over evolving graphs: a surveyPeerJ Computer Science10.7717/peerj-cs.236110(e2361)Online publication date: 8-Oct-2024
    • (2024)Discovering API usage specifications for security detection using two-stage code miningCybersecurity10.1186/s42400-024-00224-w7:1Online publication date: 3-Oct-2024
    • (2024)PMCS: Partition-Based Maximal Frequent Subgraph Mining Using MCS2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00032(159-168)Online publication date: 2-Jul-2024
    • (2024)Concepts of neighbors and their application to instance-based learning on relational dataInternational Journal of Approximate Reasoning10.1016/j.ijar.2023.109059164(109059)Online publication date: Jan-2024
    • (2024)Association Analysis: Basic Concepts and AlgorithmsAssociation Analysis Techniques and Applications in Bioinformatics10.1007/978-981-99-8251-6_2(9-53)Online publication date: 26-Apr-2024
    • (2024)In Silico Toxicological Protocols Optimization for the Prediction of Toxicity of DrugsBiosystems, Biomedical & Drug Delivery Systems10.1007/978-981-97-2596-0_10(197-223)Online publication date: 14-Jun-2024
    • (2023)Discovery of User Groups Densely Connecting Virtual and Physical Worlds in Event-Based Social NetworksInternational Journal of Information Technologies and Systems Approach10.4018/IJITSA.32700416:2(1-23)Online publication date: 28-Jul-2023
    • (2023)Sufficient Networks for Computing Support of Graph PatternsInformation10.3390/info1403014314:3(143)Online publication date: 21-Feb-2023
    • (2023)Isomorphic Graph Embedding for Progressive Maximal Frequent Subgraph MiningACM Transactions on Intelligent Systems and Technology10.1145/3630635Online publication date: 27-Oct-2023
    • (2023)Extracting Top-$k$ Frequent and Diversified Patterns in Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3233594(1-18)Online publication date: 2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media