Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Receipt: refine coarse-grained independent tasks for parallel tip decomposition of bipartite graphs

Published: 01 November 2020 Publication History

Abstract

Tip decomposition is a crucial kernel for mining dense subgraphs in bipartite networks, with applications in spam detection, analysis of affiliation networks etc. It creates a hierarchy of vertex-induced subgraphs with varying densities determined by the participation of vertices in butterflies (2, 2-bicliques). To build the hierarchy, existing algorithms iteratively follow a delete-update(peeling) process: deleting vertices with the minimum number of butterflies and correspondingly updating the butterfly count of their 2-hop neighbors. The need to explore 2-hop neighborhood renders tip-decomposition computationally very expensive. Furthermore, the inherent sequentiality in peeling only minimum butterfly vertices makes derived parallel algorithms prone to heavy synchronization.
In this paper, we propose a novel parallel tip-decomposition algorithm - REfine CoarsE-grained Independent Tasks (RECEIPT) that relaxes the peeling order restrictions by partitioning the vertices into multiple independent subsets that can be concurrently peeled. This enables RECEIPT to simultaneously achieve a high degree of parallelism and dramatic reduction in synchronizations. Further, RECEIPT employs a hybrid peeling strategy along with other optimizations that drastically reduce the amount of wedge exploration and execution time.
We perform detailed experimental evaluation of RECEIPT on a shared-memory multicore server. It can process some of the largest publicly available bipartite datasets orders of magnitude faster than the state-of-the-art algorithms - achieving up to 1100× and 64× reduction in the number of thread synchronizations and traversed wedges, respectively. Using 36 threads, RECEIPT can provide up to 17.1× self-relative speedup.

References

[1]
N. K. Ahmed, J. Neville, R. A. Rossi, and N. Duffield. Efficient graphlet counting for large networks. In 2015 IEEE International Conference on Data Mining, pages 1--10. IEEE, 2015.
[2]
N. K. Ahmed, J. Neville, R. A. Rossi, N. G. Duffield, and T. L. Willke. Graphlet decomposition: Framework, algorithms, and applications. Knowledge and Information Systems, 50(3):689--722, Mar. 2017.
[3]
S. G. Aksoy, T. G. Kolda, and A. Pinar. Measuring and modeling bipartite graphs with community structure. Journal of Complex Networks, 5(4):581--603, 2017.
[4]
A. Angel, N. Koudas, N. Sarkas, D. Srivastava, M. Svendsen, and S. Tirthapura. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. The VLDB journal, 23(2):175--199, 2014.
[5]
S. Arifuzzaman, M. Khan, and M. Marathe. Patric: A parallel algorithm for counting triangles in massive networks. In Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pages 529--538, 2013.
[6]
B. Bhattarai, H. Liu, and H. H. Huang. Ceci: Compact embedding cluster index for scalable subgraph matching. In Proceedings of the 2019 International Conference on Management of Data, pages 1447--1462, 2019.
[7]
F. Bonchi, A. Khan, and L. Severini. Distance-generalized core decomposition. In Proceedings of the 2019 International Conference on Management of Data, pages 1006--1023, 2019.
[8]
V. T. Chakaravarthy, A. Goyal, P. Murali, S. S. Pandian, and Y. Sabharwal. Improved distributed algorithm for graph truss decomposition. In European Conference on Parallel Processing, pages 703--717. Springer, 2018.
[9]
J. Chen and Y. Saad. Dense subgraph extraction with application to community detection. IEEE Transactions on knowledge and data engineering, 24(7):1216--1230, 2010.
[10]
N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal on computing, 14(1):210--223, 1985.
[11]
K. Date, K. Feng, R. Nagi, J. Xiong, N. S. Kim, and W.-M. Hwu. Collaborative (cpu+ gpu) algorithms for triangle counting and truss decomposition on the minsky architecture: Static graph challenge:Subgraph isomorphism. In 2017 IEEE High Performance Extreme Computing Conference (HPEC), pages 1--7. IEEE, 2017.
[12]
I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 269--274, 2001.
[13]
L. Dhulipala, G. Blelloch, and J. Shun. Julienne: A framework for parallel graph algorithms using work-efficient bucketing. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, pages 293--304, 2017.
[14]
A. Epasto, S. Lattanzi, V. Mirrokni, I. O. Sebe, A. Taei, and S. Verma. Ego-net community mining applied to friend suggestion. Proceedings of the VLDB Endowment, 9(4):324--335, 2015.
[15]
Y. Fang, Y. Yang, W. Zhang, X. Lin, and X. Cao. Effective and efficient community search over large heterogeneous information networks. Proceedings of the VLDB Endowment, 13(6):854--867, 2020.
[16]
Y. Fang, K. Yu, R. Cheng, L. V. Lakshmanan, and X. Lin. Efficient algorithms for densest subgraph discovery. Proceedings of the VLDB Endowment, 12(11):1719--1732, 2019.
[17]
G. Fei, A. Mukherjee, B. Liu, M. Hsu, M. Castellanos, and R. Ghosh. Exploiting burstiness in reviews for review spammer detection. In Seventh international AAAI conference on weblogs and social media, 2013.
[18]
J. Fox, O. Green, K. Gabert, X. An, and D. A. Bader. Fast and adaptive list intersections on the gpu. In 2018 IEEE High Performance extreme Computing Conference (HPEC), pages 1--7. IEEE, 2018.
[19]
E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. Motifcut: regulatory motifs finding with maximum density subgraphs. Bioinformatics (Oxford, England), 22(14):e150--7, July 2006.
[20]
D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st international conference on Very large data bases, pages 721--732, 2005.
[21]
D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In Proceedings of the 31st international conference on Very large data bases, pages 721--732, 2005.
[22]
R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM journal on Applied Mathematics, 17(2):416--429, 1969.
[23]
O. Green, J. Fox, E. Kim, F. Busato, N. Bombieri, K. Lakhotia, S. Zhou, S. Singapura, H. Zeng, R. Kannan, et al. Quickly finding a truss in a haystack. In 2017 IEEE High Performance Extreme Computing Conference (HPEC), pages 1--7. IEEE, 2017.
[24]
S. Han, L. Zou, and J. X. Yu. Speeding up set intersections in graph algorithms using simd instructions. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, page 1587--1602, New York, NY, USA, 2018. Association for Computing Machinery.
[25]
Y. Hu, H. Liu, and H. H. Huang. Tricore: Parallel triangle counting on gpus. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 171--182. IEEE, 2018.
[26]
Z. Huang, D. D. Zeng, and H. Chen. Analyzing consumer-product graphs: Empirical findings and applications in recommender systems. Management science, 53(7):1146--1164, 2007.
[27]
W. Khaouid, M. Barsky, V. Srinivasan, and A. Thomo. K-core decomposition of large networks on a single pc. Proceedings of the VLDB Endowment, 9(1):13--23, 2015.
[28]
J. Kunegis. Konect: the koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web, pages 1343--1350, 2013.
[29]
K. Lakhotia, R. Kannan, Q. Dong, and V. Prasanna. Planting trees for scalable and efficient canonical hub labeling. Proceedings of the VLDB Endowment, 13(4), 2019.
[30]
K. Lakhotia, R. Kannan, and V. Prasanna. Accelerating pagerank using partition-centric processing. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 427--440, 2018.
[31]
V. E. Lee, N. Ruan, R. Jin, and C. Aggarwal. A survey of algorithms for dense subgraph discovery. In Managing and Mining Graph Data, pages 303--336. Springer, 2010.
[32]
E. A. Leicht, P. Holme, and M. E. Newman. Vertex similarity in networks. Physical Review E, 73(2):026120, 2006.
[33]
W. Li, M. Qiao, L. Qin, Y. Zhang, L. Chang, and X. Lin. Scaling distance labeling on small-world networks. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, page 1060--1077, NewYork, NY, USA, 2019. Association for Computing Machinery.
[34]
A. Lumsdaine, D. Gregor, B. Hendrickson, and J. Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17(01):5--20, 2007.
[35]
C. Ma, R. Cheng, L. V. Lakshmanan, T. Grubenmann, Y. Fang, and X. Li. Linc: a motif counting algorithm for uncertain graphs. Proceedings of the VLDB Endowment, 13(2):155--168, 2019.
[36]
F. D. Malliaros, C. Giatsidis, A. N. Papadopoulos, and M. Vazirgiannis. The core decomposition of networks: Theory, algorithms and applications. The VLDB Journal, pages 1--32, 2019.
[37]
F. McSherry, M. Isard, and D. G. Murray. Scalability! but at what {COST}? In 15th Workshop on Hot Topics in Operating Systems (HotOS {XV}), 2015.
[38]
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC'07), San Diego, CA, October 2007.
[39]
A. Mukherjee, B. Liu, and N. Glance. Spotting fake reviewer groups in consumer reviews. In Proceedings of the 21st international conference on World Wide Web, pages 191--200, 2012.
[40]
S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded error. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 419--432, 2008.
[41]
M. E. Newman. The structure of scientific collaboration networks. Proceedings of the national academy of sciences, 98(2):404--409, 2001.
[42]
M. E. J. Newman. Scientific collaboration networks. i. network construction and fundamental results. Phys. Rev. E, 64:016131, Jun 2001.
[43]
H.-M. Park, S.-H. Myaeng, and U. Kang. Pte: Enumerating trillion triangles on distributed systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16. ACM Press, 2016.
[44]
P. Rozenshtein, A. Anagnostopoulos, A. Gionis, and N. Tatti. Event detection in activity networks. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1176--1185, 2014.
[45]
S. Samsi, V. Gadepally, M. Hurley, M. Jones, E. Kao, S. Mohindra, P. Monticciolo, A. Reuther, S. Smith, W. Song, et al. Static graph challenge: Subgraph isomorphism. In 2017 IEEE High Performance Extreme Computing Conference (HPEC), pages 1--6. IEEE, 2017.
[46]
S. Samsi, V. Gadepally, M. Hurley, M. Jones, E. Kao, S. Mohindra, P. Monticciolo, A. Reuther, S. Smith, W. Song, et al. Static graph challenge: Subgraph isomorphism. In 2017 IEEE High Performance Extreme Computing Conference (HPEC), pages 1--6. IEEE, 2017.
[47]
S.-V. Sanei-Mehri, A.E. Sariyuce, and S.Tirthapura. Butterfly counting in bipartite networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2150--2159, 2018.
[48]
S.-V. Sanei-Mehri, Y. Zhang, A. E. Sariyüce, and S. Tirthapura. Fleet: Butterfly estimation from a bipartite graph stream. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 1201--1210, 2019.
[49]
A. E. Sariyüce and A. Pinar. Fast hierarchy construction for dense subgraphs. Proceedings of the VLDB Endowment, 10(3), 2016.
[50]
A. E. Sariyuce and A. Pinar. Peeling bipartite networks for dense subgraph discovery. arXiv preprint arXiv:1611.02756, 2016.
[51]
A. E. Sariyüce and A. Pinar. Peeling bipartite networks for dense subgraph discovery. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pages 504--512, 2018.
[52]
A. E. Sariyüce, C. Seshadhri, and A. Pinar. Local algorithms for hierarchical dense subgraph discovery. Proceedings of the VLDB Endowment, 12(1):43--56, 2018.
[53]
A. E. Sariyuce, C. Seshadhri, A. Pinar, and U. V. Catalyurek. Finding the hierarchy of dense subgraphs using nucleus decompositions. In Proceedings of the 24th International Conference on World Wide Web, pages 927--937, 2015.
[54]
J. Shi and J. Shun. Parallel algorithms for butterfly computations. arXiv preprint arXiv:1907.08607, 2019.
[55]
S. Smith, X. Liu, N. K. Ahmed, A. S. Tom, F. Petrini, and G. Karypis. Truss decomposition on shared-memory parallel systems. In 2017 IEEE High Performance Extreme Computing Conference (HPEC), pages 1--6. IEEE, 2017.
[56]
J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. In Fifth IEEE International Conference on Data Mining (ICDM'05), pages 8-pp. IEEE, 2005.
[57]
C. E. Tsourakakis. A novel approach to finding near-cliques: The triangle-densest subgraph problem. arXiv preprint arXiv:1405.1477, 2014.
[58]
C. E. Tsourakakis, J. Pachocki, and M. Mitzenmacher. Scalable motif-aware graph clustering. In Proceedings of the 26th International Conference on World Wide Web, pages 1451--1460, 2017.
[59]
C. Voegele, Y.-S. Lu, S. Pai, and K. Pingali. Parallel triangle counting and k-truss identification using graph-centric methods. In 2017 IEEE High Performance Extreme Computing Conference (HPEC), pages 1--7. IEEE, 2017.
[60]
J. Wang and J. Cheng. Truss decomposition in massive networks. Proceedings of the VLDB Endowment, 5(9), 2012.
[61]
J. Wang, A. W.-C. Fu, and J. Cheng. Rectangle counting in large bipartite graphs. In 2014 IEEE International Congress on Big Data, pages 17--24. IEEE, 2014.
[62]
K. Wang, X. Cao, X. Lin, W. Zhang, and L. Qin. Efficient computing of radius-bounded k-cores. In 2018 IEEE 34th International Conference on Data Engineering (ICDE), pages 233--244. IEEE, 2018.
[63]
K. Wang, X. Lin, L. Qin, W. Zhang, and Y. Zhang. Vertex priority based butterfly counting for large-scale bipartite networks. Proceedings of the VLDB Endowment, 12(10):1139--1152, 2019.
[64]
K. Wang, X. Lin, L. Qin, W. Zhang, and Y. Zhang. Efficient bitruss decomposition for large-scale bipartite graphs. arXiv preprint arXiv 2001.06111, 2020.
[65]
N. Wang, J. Zhang, K.-L. Tan, and A. K. Tung. On triangulation-based dense neighborhood graph discovery. Proceedings of the VLDB Endowment, 4(2):58--68, 2010.
[66]
H. Wei, J. X. Yu, C. Lu, and X. Lin. Speedup graph processing by graph ordering. In Proceedings of the 2016 International Conference on Management of Data, pages 1813--1828. ACM, 2016.
[67]
D. Wen, L. Qin, Y. Zhang, X. Lin, and J. X. Yu. I/o efficient core graph decomposition: application to degeneracy ordering. IEEE Transactions on Knowledge and Data Engineering, 31(1):75--90, 2018.
[68]
Y. Yang, L. Chu, Y. Zhang, Z. Wang, J. Pei, and E. Chen. Mining density contrast subgraphs. In 2018 IEEE 34th International Conference on Data Engineering (ICDE), pages 221--232. IEEE, 2018.
[69]
Z. Zou. Bitruss decomposition of bipartite graphs. In International Conference on Database Systems for Advanced Applications, pages 218--233. Springer, 2016.

Cited By

View all
  • (2024)Parallel Algorithms for Hierarchical Nucleus DecompositionProceedings of the ACM on Management of Data10.1145/36392872:1(1-27)Online publication date: 26-Mar-2024
  • (2023)Parallel Peeling of Bipartite Networks for Hierarchical Dense Subgraph DiscoveryACM Transactions on Parallel Computing10.1145/358308410:2(1-35)Online publication date: 20-Jun-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 14, Issue 3
November 2020
217 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 November 2020
Published in PVLDB Volume 14, Issue 3

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Parallel Algorithms for Hierarchical Nucleus DecompositionProceedings of the ACM on Management of Data10.1145/36392872:1(1-27)Online publication date: 26-Mar-2024
  • (2023)Parallel Peeling of Bipartite Networks for Hierarchical Dense Subgraph DiscoveryACM Transactions on Parallel Computing10.1145/358308410:2(1-35)Online publication date: 20-Jun-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media