Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

SUFF: Accelerating Subgraph Matching with Historical Data

Published: 01 March 2023 Publication History

Abstract

Subgraph matching is a fundamental problem in graph theory and has wide applications in areas like sociology, chemistry, and social networks. Due to its NP-hardness, the basic approach is a brute-force search over the whole search space. Some pruning strategies have been proposed to reduce the search space. However, they are either space-inefficient or based on assumptions that the graph has specific properties. In this paper, we propose SUFF, a general and powerful structure filtering framework, which can accelerate most of the existing approaches with slight modifications. Specifically, it builds a set of filters using matching results of past queries, and uses them to prune the search space for future queries. By fully utilizing the relationship between matches of two queries, it ensures that such pruning is sound. Furthermore, several optimizations are proposed to reduce the computation and space cost for building, storing, and using filters. Extensive experiments are conducted on multiple real-world data sets and representative existing approaches. The results show that SUFF can achieve up to 15X speedup with small overheads.

References

[1]
Christopher R. Aberger, Susan Tu, Kunle Olukotun, and Christopher Ré. 2016. EmptyHeaded: A Relational Engine for Graph Processing. In SIGMOD.
[2]
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The Semantic Web. 722--735.
[3]
Bibek Bhattarai, Hang Liu, and H. Howie Huang. 2019. CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching. In SIGMOD.
[4]
Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Efficient Subgraph Matching by Postponing Cartesian Products. In SIGMOD.
[5]
Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. In Commun. ACM.
[6]
Diane J Cook and Lawrence B Holder. 2006. Mining graph data. John Wiley & Sons.
[7]
Stephen A Cook. 1971. The complexity of theorem-proving procedures. In STOC.
[8]
Luigi P Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. 2004. A (sub) graph isomorphism algorithm for matching large graphs. In IEEE transactions on pattern analysis and machine intelligence.
[9]
Bin Fan, Dave G. Andersen, Michael Kaminsky, and Michael D. Mitzenmacher. 2014. Cuckoo Filter: Practically Better Than Bloom. In CoNEXT.
[10]
Jun Gao, Chang Zhou, Jiashuai Zhou, and Jeffrey Xu Yu. 2014. Continuous pattern detection over billion-edge graph using distributed framework. In ICDE.
[11]
Myoungji Han, Hyunjoon Kim, Geonmo Gu, Kunsoo Park, and Wook-Shin Han. 2019. Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together. In SIGMOD.
[12]
Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turbo iso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In SIGMOD.
[13]
Huahai He and Ambuj K Singh. 2008. Graphs-at-a-time: query language and access methods for graph databases. In SIGMOD.
[14]
Xun Jian, Zhiyuan Li, and Lei Chen. 2023. SUFF: Accelerating Subgraph Matching with Historical Data [Technical Report]. https://github.com/csjianxun/SUFF-Code/blob/main/SUFF-Technical-Report.pdf (2023).
[15]
Xun Jian, Yue Wang, Xiayu Lei, Yanyan Shen, and Lei Chen. 2020. DDSL: Efficient Subgraph Listing on Distributed and Dynamic Graphs. In DASFAA.
[16]
Alpár Juttner and Péter Madarasi. 2018. VF2++: An improved subgraph isomorphism algorithm. In Discrete Applied Mathematics.
[17]
Sanjay Ram Kairam, Dan J. Wang, and Jure Leskovec. 2012. The Life and Death of Online Groups: Predicting Group Growth and Longevity. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining.
[18]
Chathura Kankanamge, Siddhartha Sahu, Amine Mhedhbi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An Active Graph Database. In SIGMOD.
[19]
Richard M Karp. 1972. Reducibility among combinatorial problems. In Complexity of computer computations.
[20]
Hyunjoon Kim, Yunyoung Choi, Kunsoo Park, Xuemin Lin, Seok-Hee Hong, and Wook-Shin Han. 2021. Versatile Equivalences: Speeding up Subgraph Query Processing and Subgraph Matching. In SIGMOD.
[21]
Hyeonji Kim, Juneyoung Lee, Sourav S Bhowmick, Wook-Shin Han, JeongHoon Lee, Seongyun Ko, and Moath HA Jarrah. 2016. DUALSIM: Parallel subgraph enumeration in a massive graph on a single machine. In SIGMOD.
[22]
Longbin Lai, Lu Qin, Xuemin Lin, and Lijun Chang. 2015. Scalable subgraph enumeration in mapreduce. In PVLDB.
[23]
Longbin Lai, Lu Qin, Xuemin Lin, Ying Zhang, Lijun Chang, and Shiyu Yang. 2016. Scalable distributed subgraph enumeration. In PVLDB.
[24]
Jure Leskovec, Ajit Singh, and Jon Kleinberg. 2006. Patterns of Influence in a Recommendation Network. In PAKDD.
[25]
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In SIGMOD.
[26]
Nataša Pržulj. 2007. Biological network comparison using graphlet degree distribution. In Bioinformatics.
[27]
Miao Qiao, Hao Zhang, and Hong Cheng. 2017. Subgraph Matching: on Compression and Computation. In PVLDB.
[28]
Xuguang Ren and Junhu Wang. 2015. Exploiting Vertex Relationships in Speeding up Subgraph Isomorphism over Large Graphs. In PVLDB.
[29]
Carlos R. Rivero and Hasan M. Jamil. 2017. Efficient and Scalable Labeled Subgraph Matching Using SGMatch. In KIS.
[30]
Haichuan Shang, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. 2008. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. In PVLDB.
[31]
Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu. 2014. Parallel Subgraph Listing in a Large-scale Graph. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data.
[32]
Nino Shervashidze, SVN Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten Borgwardt. 2009. Efficient graphlet kernels for large graph comparison. In Artificial Intelligence and Statistics.
[33]
Claus Stadler, Muhammad Saleem, Qaiser Mehmood, Carlos Buil-Aranda, Michel Dumontier, Aidan Hogan, and Axel-Cyrille Ngonga Ngomo. 2022. LSQ 2.0: A linked dataset of SPARQL query logs. Semantic Web Journal (2022).
[34]
Shixuan Sun and Qiong Luo. 2020. In-Memory Subgraph Matching: An In-Depth Study. In SIGMOD.
[35]
Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. 2012. Efficient subgraph matching on billion node graphs. In PVLDB.
[36]
Douglas Brent West et al. 2001. Introduction to graph theory. Prentice hall Upper Saddle River.
[37]
Shijie Zhang, Shirong Li, and Jiong Yang. 2009. GADDI: distance index based subgraph matching in biological networks. In EDBT.
[38]
Peixiang Zhao and Jiawei Han. 2010. On graph query optimization in large networks. In PVLDB.

Cited By

View all
  • (2024)Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent QueriesProceedings of the VLDB Endowment10.14778/3681954.368197917:11(3003-3016)Online publication date: 30-Aug-2024
  • (2024)Fast Local Subgraph CountingProceedings of the VLDB Endowment10.14778/3659437.365945117:8(1967-1980)Online publication date: 1-Apr-2024
  • (2024)Understanding High-Performance Subgraph Pattern Matching: A Systems PerspectiveProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661897(1-12)Online publication date: 14-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 7
March 2023
203 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 March 2023
Published in PVLDB Volume 16, Issue 7

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)98
  • Downloads (Last 6 weeks)7
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent QueriesProceedings of the VLDB Endowment10.14778/3681954.368197917:11(3003-3016)Online publication date: 30-Aug-2024
  • (2024)Fast Local Subgraph CountingProceedings of the VLDB Endowment10.14778/3659437.365945117:8(1967-1980)Online publication date: 1-Apr-2024
  • (2024)Understanding High-Performance Subgraph Pattern Matching: A Systems PerspectiveProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661897(1-12)Online publication date: 14-Jun-2024
  • (2024)gSWORD: GPU-accelerated Sampling for Subgraph CountingProceedings of the ACM on Management of Data10.1145/36392882:1(1-26)Online publication date: 26-Mar-2024
  • (2024)Scalable Optimization of Graph Pattern Queries Using Summary GraphsWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0567-5_29(415-426)Online publication date: 2-Dec-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media