Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

T-FSM: A Task-Based System for Massively Parallel Frequent Subgraph Pattern Mining from a Big Graph

Published: 30 May 2023 Publication History

Abstract

Finding frequent subgraph patterns in a big graph is an important problem with many applications such as classifying chemical compounds and building indexes to speed up graph queries. Since this problem is NP-hard, some recent parallel systems have been developed to accelerate the mining. However, they often have a huge memory cost, very long running time, suboptimal load balancing, and possibly inaccurate results. In this paper, we propose an efficient system called T-FSM for parallel mining of frequent subgraph patterns in a big graph. T-FSM adopts a novel task-based execution engine design to ensure high concurrency, bounded memory consumption, and effective load balancing. It also supports a new anti-monotonic frequentness measure called Fraction-Score, which is more accurate than the widely used MNI measure. Our experiments show that T-FSM is orders of magnitude faster than SOTA systems for frequent subgraph pattern mining. Our system code has been released at https://github.com/lyuheng/T-FSM.

Supplemental Material

MP4 File
Presentation video for SIGMOD 2023
PDF File
Read me
ZIP File
Source Code

References

[1]
Ehab Abdelhamid et al. "Scalemine: scalable parallel frequent subgraph mining in a single large graph". In: SC. 2016, pp. 716--727.
[2]
Björn Bringmann and Siegfried Nijssen. "What Is Frequent in a Single Graph?" In: PAKDD. Vol. 5012. Lecture Notes in Computer Science. Springer, 2008, pp. 858--863.
[3]
Harry Kai-Ho Chan et al. "Fraction-Score: A New Support Measure for Co-location Pattern Mining". In: ICDE. IEEE, 2019, pp. 1514--1525.
[4]
Xuhao Chen et al. "Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU". In: Proc. VLDB Endow. 13.8 (2020), pp. 1190--1205.
[5]
Young-Rae Cho and Aidong Zhang. "Predicting protein function by frequent functional association pattern mining in protein interaction networks". In: IEEE Trans. Inf. Technol. Biomed. 14.1 (2010), pp. 30--36.
[6]
Wei-Ta Chu and Ming-Hung Tsai. "Visual pattern discovery for architecture image classification and product image search". In: International Conference on Multimedia Retrieval, ICMR. ACM, 2012, p. 27.
[7]
Stephen A. Cook. "The Complexity of Theorem-Proving Procedures". In: STOC. ACM, 1971, pp. 151--158.
[8]
Mukund Deshpande et al. "Frequent Substructure-Based Approaches for Classifying Chemical Compounds". In: IEEE Trans. Knowl. Data Eng. 17.8 (2005), pp. 1036--1050.
[9]
DistGraph. https://github.com/zakimjz/DistGraph.
[10]
Mohammed Elseidy et al. "GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph". In: Proc. VLDB Endow. 7.7 (2014), pp. 517--528.
[11]
Fractal. https://github.com/dccspeed/fractal.
[12]
GraMi. https://github.com/ehab-abdelhamid/GraMi.
[13]
GSE1730. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1730.
[14]
Myoungji Han et al. "Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together". In: SIGMOD. ACM, 2019, pp. 1429--1446.
[15]
Huahai He and Ambuj K. Singh. "Graphs-at-a-time: query language and access methods for graph databases". In: SIGMOD. ACM, 2008, pp. 405--418.
[16]
Kasra Jamshidi, Rakesh Mahadasa, and Keval Vora. "Peregrine: a pattern-aware graph mining system". In: EuroSys. ACM, 2020, 13:1--13:16.
[17]
Amine Mhedhbi and Semih Salihoglu. "Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins". In: Proc. VLDB Endow. 12.11 (2019), pp. 1692--1704.
[18]
Siegfried Nijssen and Joost N. Kok. "The Gaston Tool for Frequent Subgraph Mining". In: Electron. Notes Theor. Comput. Sci. 127.1 (2005), pp. 77--87.
[19]
Online Appendix. https://github.com/lyuheng/T-FSM/blob/main/appendix.pdf.
[20]
Pangolin. https://github.com/chenxuhao/GraphMiner.
[21]
Peregrine. https://github.com/pdclab/peregrine.
[22]
Vinicius Vitor dos Santos Dias et al. "Fractal: A General-Purpose Graph Pattern Mining System". In: SIGMOD. ACM, 2019, pp. 1357--1374.
[23]
ScaleMine. https://github.com/ehab-abdelhamid/ScaleMine.
[24]
Madeleine Seeland et al. "Online Structural Graph Clustering Using Frequent Subgraph Mining". In: ECML PKDD. Vol. 6323. Lecture Notes in Computer Science. Springer, 2010, pp. 213--228.
[25]
Shixuan Sun and Qiong Luo. "In-Memory Subgraph Matching: An In-depth Study". In: SIGMOD. 2020, pp. 1083--1098.
[26]
Shixuan Sun et al. "RapidMatch: A Holistic Approach to Subgraph Query Processing". In: Proc. VLDB Endow. 14.2 (2020), pp. 176--188.
[27]
Nilothpal Talukder and Mohammed J. Zaki. "A distributed approach for graph mining in massive networks". In: Data Min. Knowl. Discov. 30.5 (2016), pp. 1024--1052.
[28]
Carlos H. C. Teixeira et al. "Arabesque: a system for distributed graph mining". In: SOSP. ACM, 2015, pp. 425--440.
[29]
Twitter. https://academictorrents.com/details/2399616d26eeb4ae9ac3d05c7fdd98958299efa9.
[30]
Julian R. Ullmann. "An Algorithm for Subgraph Isomorphism". In: J. ACM 23.1 (1976), pp. 31--42.
[31]
Kai Wang et al. "RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on a Single Machine". In: OSDI. USENIX Association, 2018, pp. 763--782.
[32]
Lizhi Xiang et al. "cuTS: scaling subgraph isomorphism on distributed multi-GPU systems using trie based data structure". In: SC. ACM, 2021, 69:1--69:14.
[33]
Da Yan et al. "G-thinker: A Distributed Framework for Mining Subgraphs in a Big Graph". In: ICDE. IEEE, 2020, pp. 1369--1380.
[34]
Da Yan et al. "PrefixFPM: A Parallel Framework for General-Purpose Frequent Pattern Mining". In: ICDE. IEEE, 2020, pp. 1938--1941.
[35]
Da Yan et al. "PrefixFPM: a parallel framework for general-purpose mining of frequent and closed patterns". In: VLDB J. 31.2 (2022), pp. 253--286.
[36]
Xifeng Yan and Jiawei Han. "gSpan: Graph-Based Substructure Pattern Mining". In: ICDM. 2002, pp. 721--724.
[37]
Xifeng Yan, Philip S. Yu, and Jiawei Han. "Graph Indexing: A Frequent Structure-based Approach". In: SIGMOD. ACM, 2004, pp. 335--346.
[38]
Zongliang Yue et al. "Biological Network Mining". In: Modeling Transcriptional Regulation. Springer, 2021, pp. 139--151.
[39]
Lei Zou, Lei Chen, and M. Tamer Özsu. "K-Automorphism: A General Framework For Privacy Preserving Network Publication". In: Proc. VLDB Endow. 2.1 (2009), pp. 946--957.

Cited By

View all
  • (2024)Edge Deletion based Subgraph HidingWSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS10.37394/23209.2024.21.3221(333-347)Online publication date: 17-Jul-2024
  • (2024)Multidimensional clustering analysis of mathematical knowledge difficulty based on GspanJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23427446:4(10045-10058)Online publication date: 18-Apr-2024
  • (2024)FSM-Explorer: An Interactive Tool for Frequent Subgraph Pattern Mining From a Big Graph2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00414(5405-5408)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Badges

Author Tags

  1. fraction-score
  2. frequent subgraph
  3. graph
  4. parallel
  5. task

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)301
  • Downloads (Last 6 weeks)28
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Edge Deletion based Subgraph HidingWSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS10.37394/23209.2024.21.3221(333-347)Online publication date: 17-Jul-2024
  • (2024)Multidimensional clustering analysis of mathematical knowledge difficulty based on GspanJournal of Intelligent & Fuzzy Systems10.3233/JIFS-23427446:4(10045-10058)Online publication date: 18-Apr-2024
  • (2024)FSM-Explorer: An Interactive Tool for Frequent Subgraph Pattern Mining From a Big Graph2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00414(5405-5408)Online publication date: 13-May-2024
  • (2024) G 2 -AIMD: A Memory-Efficient Subgraph-Centric Framework for Efficient Subgraph Finding on GPUs 2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00245(3164-3177)Online publication date: 13-May-2024
  • (2024)GraphRPM: Risk Pattern Mining on Industrial Large Attributed GraphsMachine Learning and Knowledge Discovery in Databases. Applied Data Science Track10.1007/978-3-031-70381-2_9(133-149)Online publication date: 8-Sep-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media