Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Maximum biclique search at billion scale

Published: 01 May 2020 Publication History

Abstract

Maximum biclique search, which finds the biclique with the maximum number of edges in a bipartite graph, is a fundamental problem with a wide spectrum of applications in different domains, such as E-Commerce, social analysis, web services, and bioinformatics. Unfortunately, due to the difficulty of the problem in graph theory, no practical solution has been proposed to solve the issue in large-scale real-world datasets. Existing techniques for maximum clique search on a general graph cannot be applied because the search objective of maximum biclique search is two-dimensional, i.e., we have to consider the size of both parts of the biclique simultaneously. In this paper, we divide the problem into several subproblems each of which is specified using two parameters. These subproblems are derived in a progressive manner, and in each subproblem we can restrict the search in a very small part of the original bipartite graph. We prove that a logarithmic number of subproblems is enough to guarantee the algorithm correctness. To minimize the computational cost, we show how to reduce significantly the bipartite graph size for each subproblem while preserving the maximum biclique satisfying certain constraints by exploring the properties of one-hop and two-hop neighbors for each vertex. We use several real datasets from various application domains, one of which contains over 300 million vertices and 1.3 billion edges, to demonstrate the high efficiency and scalability of our proposed solution. It is reported that 50% improvement on recall can be achieved after applying our method in Alibaba Group to identify the fraudulent transactions in their e-commerce networks. This further demonstrates the usefulness of our techniques in practice.

References

[1]
A. A. Al-Yamani, S. Ramsundar, and D. K. Pradhan. A defect tolerance scheme for nanotechnology circuits. IEEE Trans. on Circuits and Systems, 54-I(11):2402--2409, 2007.
[2]
G. Alexe, S. Alexe, Y. Crama, S. Foldes, P. L. Hammer, and B. Simeone. Consensus algorithms for the generation of all maximal bicliques. Discrete Applied Mathematics, 145(1):11--21, 2004.
[3]
M. Allahbakhsh, A. Ignjatovic, B. Benatallah, E. Bertino, N. Foo, et al. Collusion detection in online rating systems. In Asia-Pacific Web Conference, pages 196--207. Springer, 2013.
[4]
C. Ambühl, M. Mastrolilli, and O. Svensson. Inapproximability results for maximum edge biclique, minimum linear arrangement, and sparsest cut. SIAM J. Comput., 40(2):567--596, 2011.
[5]
A. Beutel, W. Xu, V. Guruswami, C. Palow, and C. Faloutsos. Copycatch: stopping group attacks by spotting lockstep behavior in social networks. In 22nd International World Wide Web Conference, WWW '13, Rio de Janeiro, Brazil, May 13-17, 2013, pages 119--130, 2013.
[6]
G. Fang, Y. Wu, M. Li, and J. Chen. An efficient algorithm for mining frequent closed itemsets. Informatica (Slovenia), 39(1), 2015.
[7]
Q. Feng, S. Li, Z. Zhou, and J. Wang. Parameterized algorithms for edge biclique and related problems. Theoretical Computer Science, 2017.
[8]
A. Gely, L. Nourine, and B. Sadi. Enumeration aspects of maximal cliques and bicliques. Discrete applied mathematics, 157(7):1447--1459, 2009.
[9]
A. Kershenbaum, A. Cutillo, C. Darabos, K. Murray, R. Schiaffino, and J. H. Moore. Bicliques in graphs with correlated edges: From artificial to biological networks. In European Conference on the Applications of Evolutionary Computation, pages 138--155. Springer, 2016.
[10]
J. Konc and D. Janezic. An improved branch and bound algorithm for the maximum clique problem. proteins, 4(5), 2007.
[11]
M. A. Langston, E. J. Chesler, and Y. Zhang. On finding bicliques in bipartite graphs: a novel algorithm with application to the integration of diverse biological data types. In Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008)(HICSS), volume 00, page 473, 07 2008.
[12]
C.-M. Li, Z. Fang, H. Jiang, and K. Xu. Incremental upper bound for the maximum clique problem. INFORMS Journal on Computing, 30(1):137---153, 2017.
[13]
C. M. Li and Z. Quan. An efficient branch-and-bound algorithm based on maxsat for the maximum clique problem. In AAAI, volume 10, pages 128--133, 2010.
[14]
J. Li, H. Li, D. Soh, and L. Wong. A correspondence between maximal complete bipartite subgraphs and closed patterns. In European Conference on Principles of Data Mining and Knowledge Discovery, pages 146--156. Springer, 2005.
[15]
G. Liu, K. Sim, and J. Li. Efficient mining of large maximal bicliques. In International Conference on Data Warehousing and Knowledge Discovery, pages 437--448. Springer, 2006.
[16]
J. Liu and W. Wang. Op-cluster: Clustering by tendency in high dimensional space. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM 2003), 19-22 December 2003, Melbourne, Florida, USA, pages 187--194, 2003.
[17]
C. Lu, J. X. Yu, H. Wei, and Y. Zhang. Finding the maximum clique in massive graphs. PVLDB, 10(11):1538--1549, 2017.
[18]
C. Lucchese, S. Orlando, and R. Perego. Fast and memory efficient mining of frequent closed itemsets. IEEE Trans. Knowl. Data Eng., 18(1):21--36, 2006.
[19]
K. Makino and T. Uno. New algorithms for enumerating all maximal cliques. In Scandinavian Workshop on Algorithm Theory, pages 260--272. Springer, 2004.
[20]
P. Manurangsi. Inapproximability of maximum biclique problems, minimum k-cut and densest at-least-k-subgraph from the small set expansion hypothesis. Algorithms, 11(1):10, 2018.
[21]
E. Maslov, M. Batsyn, and P. M. Pardalos. Speeding up branch and bound algorithms for solving the maximum clique problem. Journal of Global Optimization, 59(1):1--21, 2014.
[22]
C. McCreesh and P. Prosser. An exact branch and bound algorithm with symmetry breaking for the maximum balanced induced biclique problem. In International Conference on AI and OR Techniques in Constriant Programming for Combinatorial Optimization Problems, pages 226--234. Springer, 2014.
[23]
R. G. Michael and S. J. David. Computers and intractability: a guide to the theory of np-completeness. WH Free. Co., San Fr, pages 90--91, 1979.
[24]
A. P. Mukherjee and S. Tirthapura. Enumerating maximal bicliques from a large graph using mapreduce. IEEE Transactions on Services Computing, 10(5):771--784, 2017.
[25]
R. A. Mushlin, A. Kershenbaum, S. T. Gallagher, and T. R. Rebbeck. A graph-theoretical approach for pattern discovery in epidemiological research. IBM systems journal, 46(1):135--149, 2007.
[26]
R. Nataraj and S. Selvan. Parallel mining of large maximal bicliques using order preserving generators. International Journal of Computing, 8(3):105--113, 2014.
[27]
R. Peeters. The maximum edge biclique problem is np-complete. Discrete Applied Mathematics, 131(3):651--654, 2003.
[28]
M. J. Sanderson, A. C. Driskell, R. H. Ree, O. Eulenstein, and S. Langley. Obtaining maximal concatenated phylogenetic data sets from large sequence databases. Molecular biology and evolution, 20(7):1036--1042, 2003.
[29]
E. Shaham, H. Yu, and X. Li. On finding the maximum edge biclique in a bipartite graph: a subspace clustering approach. In Proceedings of the 2016 SIAM International Conference on Data Mining, Miami, Florida, USA, May 5-7, 2016, pages 315--323, 2016.
[30]
S. Shahinpour, S. Shirvani, Z. Ertem, and S. Butenko. Scale reduction techniques for computing maximum induced bicliques. Algorithms, 10(4):113, 2017.
[31]
M. B. Tahoori. Application-independent defect tolerance of reconfigurable nanoarchitectures. ACM Journal on Emerging Technologies in Computing Systems (JETC), 2(3):197--218, 2006.
[32]
A. Tanay, R. Sharan, and R. Shamir. Discovering statistically significant biclusters in gene expression data. In Proceedings of the Tenth International Conference on Intelligent Systems for Molecular Biology, August 3-7, 2002, Edmonton, Alberta, Canada, pages 136--144, 2002.
[33]
E. Tomita and T. Kameda. An efficient branch-and-bound algorithm for finding a maximum clique with computational experiments. Journal of Global optimization, 37(1):95--111, 2007.
[34]
E. Tomita and T. Seki. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete mathematics and theoretical computer science, pages 278--289. Springer, 2003.
[35]
E. Tomita, Y. Sutani, T. Higashi, S. Takahashi, and M. Wakatsuki. A simple and faster branch-and-bound algorithm for finding a maximum clique. In International Workshop on Algorithms and Computation, pages 191--203. Springer, 2010.
[36]
E. Tomita, K. Yoshida, T. Hatta, A. Nagao, H. Ito, and M. Wakatsuki. A much faster branch-and-bound algorithm for finding a maximum clique. In International Workshop on Frontiers in Algorithmics, pages 215--226. Springer, 2016.
[37]
Y. Tong, L. Chen, and B. Ding. Discovering threshold-based frequent closed itemsets over probabilistic data. In IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1-5 April, 2012, pages 270--281, 2012.
[38]
J. Wang, J. Han, and J. Pei. Closet+: Searching for the best strategies for mining frequent closed itemsets. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 236--245. ACM, 2003.
[39]
Y. Wang, S. Cai, and M. Yin. New heuristic approaches for maximum balanced biclique problem. Information Sciences, 432:362--375, 2018.
[40]
B. Yuan and B. Li. A fast extraction algorithm for defect-free subcrossbar in nanoelectronic crossbar. JETC, 10(3):25:1--25:19, 2014.
[41]
B. Yuan, B. Li, H. Chen, and X. Yao. A new evolutionary algorithm with structure mutation for the maximum balanced biclique problem. IEEE Trans. Cybernetics, 45(5):1040--1053, 2015.
[42]
L. Yuan, L. Qin, X. Lin, L. Chang, and W. Zhang. Diversified top-k clique search. VLDB J., 25(2):171--196, 2016.
[43]
Y. Zhang, C. A. Phillips, G. L. Rogers, E. J. Baker, E. J. Chesler, and M. A. Langston. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC Bioinformatics, 15:110, 2014.
[44]
Y. Zhou and J.-K. Hao. Combining tabu search and graph reduction to solve the maximum balanced biclique problem. arXiv preprint arXiv:1705.07339, 2017.
[45]
Y. Zhou, A. Rossi, and J.-K. Hao. Towards effective exact methods for the maximum balanced biclique problem in bipartite graphs. European Journal of Operational Research, 269(3):834--843, 2018.

Cited By

View all
  • (2025)Efficient Maximum Vertex (k,ℓ)-Biplex Computation on Bipartite GraphsTsinghua Science and Technology10.26599/TST.2024.901000930:2(569-584)Online publication date: Apr-2025
  • (2025)Efficient Projection-Based Algorithms for Tip Decomposition on Dynamic Bipartite GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348631037:2(626-640)Online publication date: 1-Feb-2025
  • (2024)Maximum Butterfly Generators Search in Bipartite NetworksMathematics10.3390/math1301008813:1(88)Online publication date: 29-Dec-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 13, Issue 9
May 2020
295 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 May 2020
Published in PVLDB Volume 13, Issue 9

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)13
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Efficient Maximum Vertex (k,ℓ)-Biplex Computation on Bipartite GraphsTsinghua Science and Technology10.26599/TST.2024.901000930:2(569-584)Online publication date: Apr-2025
  • (2025)Efficient Projection-Based Algorithms for Tip Decomposition on Dynamic Bipartite GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348631037:2(626-640)Online publication date: 1-Feb-2025
  • (2024)Maximum Butterfly Generators Search in Bipartite NetworksMathematics10.3390/math1301008813:1(88)Online publication date: 29-Dec-2024
  • (2024)Efficient Maximal Frequent Group Enumeration in Temporal Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368199717:11(3243-3255)Online publication date: 30-Aug-2024
  • (2024)Efficient Maximal Motif-Clique Enumeration over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368197517:11(2946-2959)Online publication date: 30-Aug-2024
  • (2024)Efficient Index for Temporal Core Queries over Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368196517:11(2813-2825)Online publication date: 30-Aug-2024
  • (2024)Efficient Parallel D-Core Decomposition at ScaleProceedings of the VLDB Endowment10.14778/3675034.367505417:10(2654-2667)Online publication date: 6-Aug-2024
  • (2024)Densest Multipartite Subgraph Search in Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3636218.363622617:4(699-711)Online publication date: 5-Mar-2024
  • (2024)Efficient Maximal Biclique Enumeration on Large Signed Bipartite GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337365436:9(4618-4631)Online publication date: 6-Mar-2024
  • (2024)AMBEA: Aggressive Maximal Biclique Enumeration in Large Bipartite Graph ComputingIEEE Transactions on Computers10.1109/TC.2024.344186473:12(2664-2677)Online publication date: 1-Dec-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media