research-article

SATMargin: Practical Maximal Frequent Subgraph Mining via Margin Space Sampling

Authors:

Pan LiAuthors Info & Claims

WWW '22: Proceedings of the ACM Web Conference 2022

Pages 1495 - 1505

https://doi.org/10.1145/3485447.3512196

Published: 25 April 2022 Publication History

Abstract

Maximal Frequent Subgraph (MFS) mining asks to identify the maximal subgraph that commonly appears in a set of graphs, which has been found valuable in many applications in social science, biology, and other domains. Previous studies focused on reducing the search space of MFSs and discovered the theoretically smallest search space. Despite the success in theory, no practical algorithm can exhaustively search the space as it is huge even for small graphs with only tens of nodes and hundreds of edges. Moreover, deciding whether a subgraph is an MFS needs to solve subgraph monomorphism (SM), an NP-complete problem that introduces extra challenges. Here, we propose a practical MFS mining algorithm that targets large MFSs, named SATMargin. SATMargin adopts random walk in the search space to perform efficient search and utilizes a customized conflict learning Boolean Satisfiability (SAT) algorithm to accelerate SM queries. We design a mechanism that reuses SAT solutions to combine the random walk and the SAT solver effectively. We evaluate SATMargin over synthetic graphs and 6 real-world graph datasets. SATMargin shows superior performance to baselines in finding more and larger MFSs. We further demonstrate the effectiveness of SATMargin in a case study of RNA graphs. The identified frequent subgraph by SATMargin well matches the functional core structure of RNAs previously detected in biological experiments. Our software can be found at https://github.com/MuyiLiu2022/SATMargin-and-Baselines.

References

[1]

Mohammad Al Hasan and Mohammed Zaki. 2009. Musk: Uniform sampling of k maximal patterns. In SIAM International Conference on Data Mining. SIAM, 650–661.

[2]

Mirela Andronescu, Vera Bereg, and Holger Hoos H.2008. RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinformatics 9(2008), 340.

[3]

Christian Borgelt and Michael R Berthold. 2002. Mining molecular fragments: Finding relevant substructures of molecules. In IEEE International Conference on Data Mining. IEEE, 51–58.

[4]

James W Brown. 1999. The Ribonuclease P Database. Nucleic Acids Research 27, 1 (1999), 314.

[5]

Yiqun Cao, Tao Jiang, and Thomas Girke. 2008. A maximum common substructure-based algorithm for searching and predicting drug-like compounds. Bioinformatics 24, 13 (2008), i366–i374.

Digital Library

[6]

Vincenzo Carletti, Pasquale Foggia, and Mario Vento. 2013. Performance comparison of five exact graph matching algorithms on biological databases. In International Conference on Image Analysis and Processing. Springer, 409–417.

Digital Library

[7]

Vincenzo Carletti, Pasquale Foggia, and Mario Vento. 2015. VF2 Plus: An improved version of VF2 for biological graphs. In International Workshop on Graph-Based Representation in Pattern Recognition. Springer, 168–177.

[8]

Shengnan Chen, Jianmin Qian, Haopeng Chen, and Si Liu. 2019. Anomaly subgraph mining in large-scale social networks. In IEEE Internal Conference on Parallel and Distributed Processing with Applications. IEEE, 883–890.

[9]

George Chin, Daniel G Chavarria, Grant C Nakamura, and Heidi J Sofia. 2008. BioGraphE: High-performance bionetwork analysis using the Biological Graph Environment. BMC Bioinformatics 9, 6 (2008), 1–10.

[10]

Edmund M Clarke, Thomas A Henzinger, Helmut Veith, and Roderick Bloem. 2018. Handbook of model checking. Vol. 10. Springer, Cham, Switzerland.

[11]

Luigi P Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. 2004. A (sub) graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 10(2004), 1367–1372.

Digital Library

[12]

Luc Dehaspe, Hannu Toivonen, and Ross D King. 1998. Finding frequent substructures in chemical compounds. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Vol. 98. ACM.

[13]

Mukund Deshpande, Michihiro Kuramochi, Nikil Wale, and George Karypis. 2005. Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering 17, 8(2005), 1036–1050.

Digital Library

[14]

Arda Durmaz, Tim AD Henderson, and Gurkan Bebek. 2020. Frequent Subgraph Mining of Functional Interaction Patterns Across Multiple Cancers. In Pacific Symposium on Biocomputing. World Scientific, Hawaii, USA, 261–272.

[15]

Paul Erdös and Alfréd Rényi. 1959. On random graphs Publ. Mathematicae Debrecen 6(1959), 290–297.

[16]

Michael R Garey and David S Johnson. 1979. Computers and intractability: a guide to the theory of NP-completeness. 1979. Freeman, San Francisco, CA, USA.

Digital Library

[17]

Luis Gil, Paulo Flores, and Luis Miguel Silveira. 2010. PMSat: a parallel version of MiniSAT. Satisfiability Boolean Modeling and Computation 6, 1-3(2010), 71–98.

[18]

Aditi Gupta, Reazur Rahman, Kejie Li, and Michael Gribskov. 2012. Identifying complete RNA structural ensembles including pseudoknots. RNA Biology 9, 2 (2012), 187–199.

[19]

Jiawei. Han and Xifeng Yan. 2003. CloseGraph: Mining closed frequent graph patterns. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Washington, DC, USA, 286–295.

[20]

Lawrence B. Holder, Diane J. Cook, and Surnjani Djoko. 1994. Substucture discovery in the SUBDUE system. In KDD Workshop. AAAI, Seattle, Washington, USA, 169–180.

[21]

Jun Huan, Wei Wang, and Jan Prins. 2003. Efficient mining of frequent subgraphs in the presence of isomorphism. In IEEE International Conference on Data Mining. IEEE, 549–552.

[22]

Jun Huan, Wei Wang, Jan Prins, and Jiong Yang. 2004. SPIN: Mining maximal frequent subgraphs from graph databases. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Seattle, WA, USA, 581–586.

Digital Library

[23]

Akihiro Inokuchi, Takashi Washio, and Hiroshi Motoda. 2000. An apriori-based algorithm for mining frequent substructures from graph data. In Conference on Principles of Knowledge Discovery in Databases. Springer, Lyon, France, 13–23.

[24]

Sergei Ivanov, Sergei Sviridov, and Evgeny Burnaev. 2019. Understanding isomorphism bias in graph datasets. arxiv:1910.12091 [cs.LG]

[25]

Said Jabbour, Nizar Mhadbhi, Badran Raddaoui, and Lakhdar Sais. 2018. Triangle-driven community detection in large graphs using propositional satisfiability. In 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA). IEEE, 437–444.

[26]

Said Jabbour, Nizar Mhadhbi, Abdesattar Mhadhbi, Badran Radaoui, and Lakhdar Sais. 2016. Summarizing big graphs by means of pseudo-boolean constraints. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 889–894.

[27]

Sebastian Keller, Pauli Miettinen, and Olga V. Kalinina. 2020. Frequent subgraph mining for biologically meaningful structural motifs. bioRxiv (2020). https://www.biorxiv.org/content/early/2020/05/14/2020.05.14.095695.full.pdf

[28]

Benny Kimelfeld and Phokion G Kolaitis. 2014. The complexity of mining maximal frequent subgraphs. ACM Transactions on Database Systems 39, 4 (2014), 32.

Digital Library

[29]

Stefan Kramer, Luc De Raedt, and Christoph Helma. 2001. Molecular feature mining in HIV data. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 136–143.

Digital Library

[30]

Michihiro Kuramochi and George Karypis. 2004. An efficient algorithm for discovering frequent subgraphs. Knowledge and Data Engineering, IEEE Transactions on 16, 9(2004), 1038–1051.

Digital Library

[31]

Michihiro Kuramochi and George Karypis. 2005. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery 11, 3 (2005), 243–271.

Digital Library

[32]

Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.

[33]

Muyi Liu and Michael Gribskov. 2015. MMC-Margin: Identification of maximum frequent subgraphs by metropolis Monte Carlo sampling. In IEEE International Conference on Big Data. IEEE, Santa Clara, CA, USA, 849–856.

Digital Library

[34]

Ciaran McCreesh, Patrick Prosser, and James Trimble. 2017. A Partitioning Algorithm for Maximum Common Subgraph Problems. In International Joint Conference on Artificial Intelligence. IJCAI, 712–719.

[35]

Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. 1953. Equation of state calculations by fast computing machines. The Journal of Chemical Physics 21, 6 (1953), 1087–1092.

[36]

Ilya Mironov and Lintao Zhang. 2006. Applications of SAT solvers to cryptanalysis of hash functions. In International Conference on Theory and Applications of Satisfiability Testing. Springer, Seattle, WA, USA, 102–115.

Digital Library

[37]

Aida Mrzic, Pieter Meysman, Wout Bittremieux, Pieter Moris, Boris Cule, Bart Goethals, and Kris Laukens. 2018. Grasping frequent subgraph mining for bioinformatics applications. BioData Mining 11, 1 (2018), 1–24.

[38]

Siegfried Nijssen and Joost Kok. 2001. Faster association rules for multiple relations. In International Joint Conference on Artificial Intelligence, Vol. 17. IJCAI, 891–896.

[39]

Eén Niklas and SNiklas, örensson. 2003. An extensible SAT-solver. In International conference on theory and applications of satisfiability testing. Springer, Santa Margherita Ligure, Italy, 502–518.

[40]

Norman R Pace and James W Brown. 1995. Evolutionary perspective on the structure and function of ribonuclease P, a ribozyme. Journal of Bacteriology 177, 8 (1995), 1919–1928.

[41]

Sumit Purohit, Sutanay Choudhury, and Lawrence B Holder. 2017. Application-specific graph sampling for frequent subgraph mining and community detection. In IEEE International Conference on Big Data. IEEE, 1000–1005.

[42]

Ti Ramraj and Ri Prabhakar. 2015. Frequent subgraph mining algorithms: A survey. Procedia Computer Science 47 (2015), 197–204.

[43]

Saif Ur Rehman and Sohail Asghar. 2020. Online social network trend discovery using frequent subgraph mining. Social Network Analysis and Mining 10, 1 (2020), 1–13.

[44]

Tapan K Saha and Mohammad Al Hasan. 2014. FS 3: A sampling based method for top-k frequent subgraph mining. In Proceedings of the 4th IEEE International Conference on Big Data. IEEE, Washington, DC, USA, 72–79.

[45]

Mate Soos and Armin Biere. 2019. CryptoMiniSat 5.6 with YalSAT. In SAT Race. Helsinki, 14–15.

[46]

Mate Soos, Karsten Nohl, and Claude Castelluccia. 2009. Extending SAT solvers to cryptographic problems. In International Conference on Theory and Applications of Satisfiability Testing. Springer, Swansea, UK, 244–257.

Digital Library

[47]

Niklas Sorensson and Niklas Een. 2005. Minisat v1. 13-a sat solver with conflict-clause minimization. SAT Race 2005, 53 (2005), 1–2.

[48]

Lini T Thomas, Satyanarayana R Valluri, and Kamalakar Karlapalem. 2006. MARGIN: Maximal frequent subgraph mining. In IEEE International Conference on Data Mining, ICDM. IEEE, Hong Kong, China, 1097–1101.

Digital Library

[49]

Lini T Thomas, Satyanarayana R Valluri, and Kamalakar Karlapalem. 2010. MARGIN: Maximal frequent subgraph mining. ACM Transactions on Knowledge Discovery from Data 4, 3 (2010), 10.

Digital Library

[50]

Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of “small-world” networks. Nature 393, 6684 (1998), 440–442.

[51]

Xifeng Yan, Hong Cheng, Jiawei Han, and Philip S Yu. 2008. Mining significant graph patterns by leap search. In ACM SIGMOD International Conference on Management of Data. 433–444.

Digital Library

[52]

Xifeng Yan and Jiawei Han. 2002. gSpan: Graph-based substructure pattern mining. In IEEE International Conference on Data Mining. IEEE, Maebashi City, Japan, 721–724.

[53]

Xifeng Yan, Xianghong Jasmine Zhou, and Jiawei Han. 2005. Mining closed relational graphs with connectivity constraints. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 324–333.

Digital Library

[54]

Shunyun Yang, Runxin Guo, Rui Liu, Xiangke Liao, Quan Zou, Benyun Shi, and Shaoliang Peng. 2018. cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining. BMC Bioinformatics 19, 4 (2018), 35–47.

[55]

Lintao Zhang, Conor F Madigan, Matthew H Moskewicz, and Sharad Malik. 2001. Efficient conflict driven learning in a boolean satisfiability solver. In International Conference on Computer Aided Design. IEEE, 279–285.

Cited By

Leng FLi FBao YZhang TYu G(2024)FSM-BC-BSP: Frequent Subgraph Mining Algorithm Based on BC-BSPApplied Sciences10.3390/app1408315414:8(3154)Online publication date: 9-Apr-2024
https://doi.org/10.3390/app14083154
Wu YSun RWang XZhang YQin LZhang WLin X(2024)Efficient Maximal Temporal Plex Enumeration2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00240(3098-3110)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00240
Li YChen XGuo WLi XLuo WHuang JZhen HYuan MYan JSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)HardSATGEN: Understanding the Difficulty of Hard SAT Formula Generation and A Strong Structure-Hardness-Aware BaselineProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599837(4414-4425)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599837
Show More Cited By

Index Terms

SATMargin: Practical Maximal Frequent Subgraph Mining via Margin Space Sampling
1. Information systems
  1. Information systems applications
    1. Data mining
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory

Index terms have been assigned to the content through auto-classification.

Recommendations

Isomorphic Graph Embedding for Progressive Maximal Frequent Subgraph Mining
Maximal frequent subgraph mining (MFSM) is the task of mining only maximal frequent subgraphs, i.e., subgraphs that are not a part of other frequent subgraphs. Although many intelligent systems require MFSM, MFSM is challenging compared to frequent ...
An Optimization of Closed Frequent Subgraph Mining Algorithm
Abstract
Graph mining isamajor area of interest within the field of data mining in recent years. Akey aspect of graph mining is frequent subgraph mining. Central to the entire discipline of frequent subgraph mining is the concept of subgraph ...
MARGIN: Maximal frequent subgraph mining

The exponential number of possible subgraphs makes the problem of frequent subgraph mining a challenge. The set of maximal frequent subgraphs is much smaller to that of the set of frequent subgraphs providing ample scope for pruning. MARGIN is a maximal ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '22: Proceedings of the ACM Web Conference 2022

April 2022

3764 pages

ISBN:9781450390965

DOI:10.1145/3485447

Editors:
Frédérique Laforest
INSA Lyon, France
,
Raphaël Troncy
EURECOM, France
,
Elena Simperl
King’s College London, UK
,
Deepak Agarwal
Pinterest, USA
,
Aristides Gionis
KTH Royal Institute of Technology, Sweden
,
Ivan Herman
W3C / retired
,
Lionel Médini
Université Lyon 1, France

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '22

Sponsor:

SIGWEB

WWW '22: The ACM Web Conference 2022

April 25 - 29, 2022

Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
277
Total Downloads

Downloads (Last 12 months)67
Downloads (Last 6 weeks)5

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Leng FLi FBao YZhang TYu G(2024)FSM-BC-BSP: Frequent Subgraph Mining Algorithm Based on BC-BSPApplied Sciences10.3390/app1408315414:8(3154)Online publication date: 9-Apr-2024
https://doi.org/10.3390/app14083154
Wu YSun RWang XZhang YQin LZhang WLin X(2024)Efficient Maximal Temporal Plex Enumeration2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00240(3098-3110)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00240
Li YChen XGuo WLi XLuo WHuang JZhen HYuan MYan JSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)HardSATGEN: Understanding the Difficulty of Hard SAT Formula Generation and A Strong Structure-Hardness-Aware BaselineProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599837(4414-4425)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599837
Liu WChen DYu MXu Q(2022)A Frequent Subgraph Publishing Algorithm Based on Differential Privacy2022 International Conference on Cloud Computing, Big Data Applications and Software Engineering (CBASE)10.1109/CBASE57816.2022.00032(136-141)Online publication date: Sep-2022
https://doi.org/10.1109/CBASE57816.2022.00032

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents