Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Fast Maximal Quasi-clique Enumeration: A Pruning and Branching Co-Design Approach

Published: 13 November 2023 Publication History

Abstract

Mining cohesive subgraphs from a graph is a fundamental problem in graph data analysis. One notable cohesive structure is γ-quasi-clique (QC), where each vertex connects at least a fraction γ of the other vertices inside. Enumerating maximal γ-quasi-cliques (MQCs) of a graph has been widely studied and used for many applications such as community detection and significant biomolecule structure discovery. One common practice of finding all MQCs is to (1) find a set of QCs containing all MQCs and then (2) filter out non-maximal QCs. While quite a few algorithms have been developed (which are branch-and-bound algorithms) for finding a set of QCs that contains all MQCs, all focus on sharpening the pruning techniques and devote little effort to improving the branching part. As a result, they provide no guarantee on pruning branches and all have the worst-case time complexity of O*(2n), where O* suppresses the polynomials and n is the number of vertices in the graph. In this paper, we focus on the problem of finding a set of QCs containing all MQCs but deviate from further sharpening the pruning techniques as existing methods do. We pay attention to both the pruning and branching parts and develop new pruning techniques and branching methods that would suit each other better towards pruning more branches both theoretically and practically. Specifically, we develop a new branch-and-bound algorithm called FastQC based on newly developed pruning techniques and branching methods, which improves the worst-case time complexity to O*(αkn), where αk is a positive real number strictly smaller than 2. Furthermore, we develop a divide-and-conquer strategy for boosting the performance of FastQC. Finally, we conduct extensive experiments on both real and synthetic datasets, and the results show that our algorithms are up to two orders of magnitude faster than the state-of-the-art on real datasets.

References

[1]
James Abello, Mauricio GC Resende, and Sandra Sudarsky. 2002. Massive quasi-clique detection. In Latin American symposium on theoretical informatics. Springer, 598--612.
[2]
Gary D Bader and Christopher WV Hogue. 2003. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics, Vol. 4, 1 (2003), 1--27.
[3]
Vladimir Batagelj and Matjaz Zaversnik. 2003. An O (m) algorithm for cores decomposition of networks. arXiv preprint cs/0310049 (2003).
[4]
Stas Bevc and Iztok Savnik. 2009. Using tries for subset and superset queries. In Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces. IEEE, 147--152.
[5]
Malay Bhattacharyya and Sanghamitra Bandyopadhyay. 2009. Mining the largest quasi-clique in human protein interactome. In 2009 International conference on adaptive and intelligent systems. IEEE, 194--199.
[6]
Coen Bron and Joep Kerbosch. 1973. Algorithm 457: finding all cliques of an undirected graph. Commun. ACM, Vol. 16, 9 (1973), 575--577.
[7]
Dongbo Bu, Yi Zhao, Lun Cai, Hong Xue, Xiaopeng Zhu, Hongchao Lu, Jingfen Zhang, Shiwei Sun, Lunjiang Ling, Nan Zhang, et al. 2003. Topological structure analysis of the protein--protein interaction network in budding yeast. Nucleic acids research, Vol. 31, 9 (2003), 2443--2450.
[8]
Lijun Chang. 2019. Efficient maximum clique computation over large sparse graphs. In Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining. 529--538.
[9]
Moses Charikar, Piotr Indyk, and Rina Panigrahy. 2002. New algorithms for subset query, partial match, orthogonal range searching, and related problems. In Automata, Languages and Programming: 29th International Colloquium, ICALP 2002 Málaga, Spain, July 8--13, 2002 Proceedings. Springer, 451--462.
[10]
Xiaoyu Chen, Yi Zhou, Jin-Kao Hao, and Mingyu Xiao. 2021. Computing maximum k-defective cliques in massive graphs. Computers & Operations Research, Vol. 127 (2021), 105131.
[11]
Yuan Heng Chou, En Tzu Wang, and Arbee LP Chen. 2015. Finding Maximal Quasi-cliques Containing a Target Vertex in a Graph. In DATA. 5--15.
[12]
Patricia Conde-Cespedes, Blaise Ngonmang, and Emmanuel Viennet. 2018. An efficient method for mining the maximal α-quasi-clique-community of a given node in complex networks. Social Network Analysis and Mining, Vol. 8, 1 (2018), 1--18.
[13]
Alessio Conte, Roberto Grossi, Andrea Marino, and Luca Versari. 2020. Sublinear-space and bounded-delay algorithms for maximal clique enumeration in graphs. Algorithmica, Vol. 82, 6 (2020), 1547--1573.
[14]
Qiangqiang Dai, Rong-Hua Li, Hongchao Qin, Meihao Liao, and Guoren Wang. 2022. Scaling Up Maximal k-plex Enumeration. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 345--354.
[15]
David Eppstein and Darren Strash. 2011. Listing all maximal cliques in large sparse real-world graphs. In International Symposium on Experimental Algorithms. Springer, 364--375.
[16]
Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie Zhang, Reynold Cheng, and Xuemin Lin. 2020. A survey of community search over big graphs. The VLDB Journal, Vol. 29, 1 (2020), 353--392.
[17]
Yixiang Fang, Kai Wang, Xuemin Lin, and Wenjie Zhang. 2021. Cohesive subgraph search over big heterogeneous information networks: Applications, challenges, and solutions. In Proceedings of the 2021 International Conference on Management of Data. 2829--2838.
[18]
Jian Gao, Zhenghang Xu, Ruizhi Li, and Minghao Yin. 2022. An Exact Algorithm with New Upper Bounds for the Maximum k-Defective Clique Problem in Massive Sparse Graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 10174--10183.
[19]
Guimu Guo, Da Yan, M Tamer Özsu, Zhe Jiang, and Jalal Khalil. 2020. Scalable mining of maximal quasi-cliques: an algorithm-system codesign approach. Proc. VLDB Endow., Vol. 14, 4 (2020), 573--585.
[20]
Guimu Guo, Da Yan, Lyuheng Yuan, Jalal Khalil, Cheng Long, Zhe Jiang, and Yang Zhou. 2022. Maximal directed quasi-clique mining. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 1900--1913.
[21]
Eric Harley, Anthony Bonner, and Nathan Goodman. 2001. Uniform integration of genome mapping data using intersection graphs. Bioinformatics, Vol. 17, 6 (2001), 487--494.
[22]
Dmitry I Ignatov, Polina Ivanova, Albina Zamaletdinova, and Oleg Prokopyev. 2019. Preliminary Results on Mixed Integer Programming for Searching Maximum Quasi-Bicliques and Large Dense Biclusters. In ICFCA. 28--32.
[23]
Daxin Jiang and Jian Pei. 2009. Mining frequent cross-graph quasi-cliques. ACM Trans. Knowl. Discov. Data (TKDD), Vol. 2, 4 (2009), 1--42.
[24]
Jalal Khalil, Da Yan, Guimu Guo, and Lyuheng Yuan. 2022. Parallel mining of large maximal quasi-cliques. The VLDB Journal, Vol. 31, 4 (2022), 649--674.
[25]
Pei Lee and Laks VS Lakshmanan. 2016. Query-driven maximum quasi-clique search. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 522--530.
[26]
Xiaofan Li, Rui Zhou, Lu Chen, Chengfei Liu, Qiang He, and Yun Yang. 2022. One set to cover all maximal cliques approximately. In Proceedings of the 2022 International Conference on Management of Data. 2006--2019.
[27]
Longlong Lin, Pingpeng Yuan, Rong-Hua Li, Jifei Wang, Ling Liu, and Hai Jin. 2021. Mining stable quasi-cliques on temporal networks. IEEE Trans. Syst. Man Cybern. Syst., Vol. 52, 6 (2021), 3731--3745.
[28]
Guimei Liu and Limsoon Wong. 2008. Effective pruning techniques for mining quasi-cliques. In Joint European conference on machine learning and knowledge discovery in databases. Springer, 33--49.
[29]
Xiaowen Liu, Jinyan Li, and Lusheng Wang. 2008. Quasi-bicliques: Complexity and binding pairs. In International Computing and Combinatorics Conference. Springer, 255--264.
[30]
Grigory Pastukhov, Alexander Veremyev, Vladimir Boginski, and Oleg A Prokopyev. 2018. On maximum degree-based-quasi-clique problem: Complexity and exact approaches. Networks, Vol. 71, 2 (2018), 136--152.
[31]
Jeffrey Pattillo, Alexander Veremyev, Sergiy Butenko, and Vladimir Boginski. 2013. On the maximum quasi-clique problem. Discrete Applied Mathematics, Vol. 161, 1--2 (2013), 244--257.
[32]
Jian Pei, Daxin Jiang, and Aidong Zhang. 2005. On mining cross-graph quasi-cliques. In Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining. 228--238.
[33]
Ronald L Rivest. 1976. Partial-match retrieval algorithms. SIAM J. Comput., Vol. 5, 1 (1976), 19--50.
[34]
Seyed-Vahid Sanei-Mehri, Apurba Das, Hooman Hashemi, and Srikanta Tirthapura. 2021. Mining Largest Maximal Quasi-Cliques. ACM Trans. Knowl. Discov. Data (TKDD), Vol. 15, 5 (2021), 1--21.
[35]
Seyed-Vahid Sanei-Mehri, Apurba Das, and Srikanta Tirthapura. 2018. Enumerating top-k quasi-cliques. In 2018 IEEE international conference on big data (big data). IEEE, 1107--1112.
[36]
Iztok Savnik. 2013. Index data structure for fast subset and superset queries. In Availability, Reliability, and Security in Information Systems and HCI: IFIP WG 8.4, 8.9, TC 5 International Cross-Domain Conference, CD-ARES 2013, Regensburg, Germany, September 2--6, 2013. Proceedings 8. Springer, 134--148.
[37]
Iztok Savnik, Mikita Akulich, Matjavz Krnc, and Riste vS krekovski. 2021. Data structure set-trie for storing and querying sets: Theoretical and empirical analysis. Plos one, Vol. 16, 2 (2021), e0245122.
[38]
Brian K Tanner, Gary Warner, Henry Stern, and Scott Olechowski. 2010. Koobface: The evolution of the social botnet. In 2010 eCrime Researchers Summit. IEEE, 1--10.
[39]
Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. 2006. The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci., Vol. 363, 1 (2006), 28--42.
[40]
Zhengren Wang, Yi Zhou, Mingyu Xiao, and Bakhadyr Khoussainov. 2022. Listing Maximal k-Plexes in Large Real-World Graphs. In Proceedings of the ACM Web Conference 2022. 1517--1527.
[41]
Daniel Weiss and Gary Warner. 2015. Tracking criminals on Facebook: A case study from a digital forensics REU program. (2015).
[42]
Da Yan, Guimu Guo, Md Mashiur Rahman Chowdhury, M Tamer Özsu, John CS Lui, and Weida Tan. 2019. T-thinker: a task-centric distributed framework for compute-intensive divide-and-conquer algorithms. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. 411--412.
[43]
Da Yan, Guimu Guo, Jalal Khalil, M Tamer Özsu, Wei-Shinn Ku, and John Lui. 2022. G-thinker: a general distributed framework for finding qualified subgraphs in a big graph with load balancing. The VLDB Journal, Vol. 31, 2 (2022), 287--320.
[44]
Yi Yang, Da Yan, Huanhuan Wu, James Cheng, Shuigeng Zhou, and John CS Lui. 2016. Diversified temporal subgraph pattern mining. In Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining. 1965--1974.
[45]
Haiyuan Yu, Alberto Paccanaro, Valery Trifonov, and Mark Gerstein. 2006. Predicting interactions in protein networks by completing defective cliques. Bioinformatics, Vol. 22, 7 (2006), 823--829.
[46]
Kaiqiang Yu and Cheng Long. 2021. Graph Mining Meets Fake News Detection. In Data Science for Fake News. Springer, 169--189.
[47]
Kaiqiang Yu and Cheng Long. 2023 a. Fast Maximal Quasi-clique Enumeration: A Pruning and Branching Co-Design Approach (Technical report). https://personal.ntu.edu.sg/c.long/paper/24-SIGMOD-FastQC-report.pdf.
[48]
Kaiqiang Yu and Cheng Long. 2023 b. Maximum k-Biplex Search on Bipartite Graphs: A Symmetric-BK Branching Approach. Proceedings of the ACM on Management of Data, Vol. 1, 1 (2023), 1--26.
[49]
Kaiqiang Yu, Cheng Long, P Deepak, and Tanmoy Chakraborty. 2021. On efficient large maximal biplex discovery. IEEE Trans. Knowl. Data Eng. (2021).
[50]
Kaiqiang Yu, Cheng Long, Shengxin Liu, and Da Yan. 2022. Efficient Algorithms for Maximal k-Biplex Enumeration. In SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022. ACM, 860--873.
[51]
Zhiping Zeng, Jianyong Wang, Lizhu Zhou, and George Karypis. 2006. Coherent closed quasi-clique discovery from large dense graph databases. In Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining. 797--802.
[52]
Yi Zhou, Jingwei Xu, Zhenyu Guo, Mingyu Xiao, and Yan Jin. 2020. Enumerating maximal k-plexes with worst-case time guarantee. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 2442--2449.

Cited By

View all
  • (2024)An optimization algorithm for maximum quasi-clique problem based on information feedback modelPeerJ Computer Science10.7717/peerj-cs.217310(e2173)Online publication date: 12-Jul-2024
  • (2024)A Fast Exact Algorithm to Enumerate Maximal Pseudo-cliques in Large Sparse GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672066(2479-2490)Online publication date: 25-Aug-2024
  • (2024)On Searching Maximum Directed $(k, \ell)$-Plex2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00202(2570-2583)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. Fast Maximal Quasi-clique Enumeration: A Pruning and Branching Co-Design Approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 1, Issue 3
    PACMMOD
    September 2023
    472 pages
    EISSN:2836-6573
    DOI:10.1145/3632968
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 November 2023
    Published in PACMMOD Volume 1, Issue 3

    Permissions

    Request permissions for this article.

    Author Tags

    1. branch-and-bound
    2. cohesive subgraph enumeration
    3. quasi-clique

    Qualifiers

    • Research-article

    Funding Sources

    • Ministry of Education, Singapore

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)224
    • Downloads (Last 6 weeks)50
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)An optimization algorithm for maximum quasi-clique problem based on information feedback modelPeerJ Computer Science10.7717/peerj-cs.217310(e2173)Online publication date: 12-Jul-2024
    • (2024)A Fast Exact Algorithm to Enumerate Maximal Pseudo-cliques in Large Sparse GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672066(2479-2490)Online publication date: 25-Aug-2024
    • (2024)On Searching Maximum Directed $(k, \ell)$-Plex2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00202(2570-2583)Online publication date: 13-May-2024
    • (2023)Fast Maximal Quasi-clique Enumeration: A Pruning and Branching Co-Design ApproachProceedings of the ACM on Management of Data10.1145/36173311:3(1-26)Online publication date: 13-Nov-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media