Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Scalable distributed Louvain algorithm for community detection in large graphs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Community detection (or clustering) in large-scale graphs is an important problem in graph mining. Communities reveal interesting organizational and functional characteristics of a network. Louvain algorithm is an efficient sequential algorithm for community detection. However, such sequential algorithms fail to scale for emerging large-scale data. Scalable parallel algorithms are necessary to process large graph datasets. In this work, we show a comparative analysis of our different parallel implementations of Louvain algorithm. We design parallel algorithms for Louvain method in shared memory and distributed memory settings. Developing distributed memory parallel algorithms is challenging because of inter-process communication and load balancing issues. We incorporate dynamic load balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms and shows around 12-fold speedup scaling to a larger number of processors. We also compare the performance of our algorithm with some other prominent algorithms in the literature and get better or comparable performance . We identify the challenges in developing distributed memory algorithm and provide an optimized solution DPLAL showing performance analysis of the algorithm on large-scale real-world networks from different domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Arifuzzaman S, Khan M, Marathe M (2020) Fast parallel algorithms for counting and listing triangles in big graphs. ACM Trans Knowl Disc Data (TKDD) 14(1):1–34. https://doi.org/10.1145/3365676

    Article  Google Scholar 

  2. Arifuzzaman S, Khan M, Marathe M (2013) Patric: A parallel algorithm for counting triangles in massive networks. In: Proceedings of the 22nd ACM international conference on Information & Knowledge Management, pp. 529–538. ACM . https://doi.org/10.1145/2505515.2505545

  3. Arifuzzaman S, Khan M, Marathe M (2015) A fast parallel algorithm for counting triangles in graphs using dynamic load balancing. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 1839–1847. IEEE . https://doi.org/10.1109/BigData.2015.7363957

  4. Arifuzzaman S, Khan M, Marathe M(2015) A space-efficient parallel algorithm for counting exact triangles in massive networks. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), pp. 527–534. IEEE . https://doi.org/10.1109/HPCC-CSS-ICESS.2015.301

  5. Arifuzzaman S, Pandey B (2017) Scalable mining and analysis of protein-protein interaction networks. In: 3rd Intl Conf on Big Data Intelligence and Computing (DataCom 2017), pp. 1098–1105. IEEE . https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.178

  6. Bhowmick S, Srinivasan S (2013) A template for parallelizing the Louvain method for modularity maximization dynamics on and of complex networks. Springer, New York, pp 111–124. https://doi.org/10.1007/978-1-4614-6729-8_6

    Book  Google Scholar 

  7. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Statist Mech: Theor Exp 10:P10008. https://doi.org/10.1088/1742-5468/2008/10/p10008

    Article  MATH  Google Scholar 

  8. Brandes U, Delling D, Gaertler M, Görke R, Hoefer M, Nikoloski Z, Wagner D(2006) Maximizing modularity is hard. arXiv: 0608255

  9. Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E. https://doi.org/10.1103/PhysRevE.70.066111

    Article  Google Scholar 

  10. Cray documentation portal. https://pubs.cray.com/content/S-3014/3.0.UP00/cray-graph-engine-user-guide/community-detection-parallel-louvain-method-plm

  11. Documentation | user guides | qb2. http://www.hpc.lsu.edu/docs/guides.php? system=QB2

  12. Faysal MAM, Arifuzzaman S (2019) Distributed community detection in large networks using an information-theoretic approach. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 4773–4782. IEEE

  13. Fazlali M, Moradi E, Malazi HT (2017) Adaptive parallel Louvain community detection on a multicore platform. Microprocess Microsyst 54:26–34. https://doi.org/10.1016/j.micpro.2017.08.002

    Article  Google Scholar 

  14. Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174. https://doi.org/10.1016/j.physrep.2009.11.002

    Article  MathSciNet  Google Scholar 

  15. Garcia JO, Ashourvan A, Muldoon S, Vettel JM, Bassett DS (2018) Applications of community detection techniques to brain graphs: algorithmic considerations and implications for neural function. Proc IEEE 106(5):846–867. https://doi.org/10.1109/JPROC.2017.2786710

    Article  Google Scholar 

  16. Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, Gebremedhin AH (2018) Scalable distributed memory community detection using vite. In: 2018 IEEE High Performance extreme Computing Conference (HPEC), pp. 1–7. IEEE . https://doi.org/10.1109/HPEC.2018.8547534

  17. Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, Lu H, Chavarria-Miranda D, Khan A, Gebremedhin A (2018) Distributed Louvain algorithm for graph community detection. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 885–895. IEEE . https://doi.org/10.1109/IPDPS.2018.00098

  18. Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826. https://doi.org/10.1073/pnas.122653799

    Article  MathSciNet  MATH  Google Scholar 

  19. Halappanavar M, Lu H, Kalyanaraman A, Tumeo A (2017) Scalable static and dynamic community detection using grappolo. In: High Performance Extreme Computing Conference (HPEC), 2017 IEEE, pp. 1–6. IEEE https://doi.org/10.1109/HPEC.2017.8091047

  20. Hashmi JM, Xu S, Ramesh B, Bayatpour M, Subramoni H, Panda DKD (2020) Machine-agnostic and communication-aware designs for mpi on emerging architectures. In: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 32–41. IEEE https://doi.org/10.1109/IPDPS47924.2020.00014

  21. Jiang Y, Jia C, Yu J (2014) An efficient community detection algorithm using greedy surprise maximization. J Phys A: Math Theor 47(16):165101

    Article  MathSciNet  Google Scholar 

  22. Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World wide web, pp. 591–600. AcM https://doi.org/10.1145/1772690.1772751

  23. Lalwani D, Somayajulu DV, Krishna PR (2015) A community driven social recommendation system. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 821–826. IEEE . https://doi.org/10.1109/BigData.2015.7363828

  24. Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E. https://doi.org/10.1103/PhysRevE.80.056117

    Article  Google Scholar 

  25. Lee Y, Lee Y, Seong J, Stanescu A, Hwang CS (2020) A comparison of network clustering algorithms in keyword network analysis: a case study with geography conference presentations. Int J Geosp Environ Res 7(3):1

    Google Scholar 

  26. Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World wide web, pp. 631–640. ACM . https://doi.org/10.1145/1772690.1772755

  27. Li Z, Zhang S, Wang RS, Zhang XS, Chen L (2008) Quantitative function for community detection. Phys Rev E. https://doi.org/10.1103/PhysRevE.77.036109

    Article  Google Scholar 

  28. Low TM, Spampinato DG, McMillan S, Pelletier M (2020) Linear algebraic Louvain method in python. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 223–226. IEEE . https://doi.org/10.1109/IPDPSW50202.2020.00050

  29. Lu H, Halappanavar M, Kalyanaraman A (2015) Parallel heuristics for scalable community detection. Parallel Comput 47:19–37. https://doi.org/10.1016/j.parco.2015.03.003

    Article  MathSciNet  Google Scholar 

  30. Makris C, Pettas D, Pispirigos G (2019) Distributed community prediction for social graphs based on Louvain algorithm. In: IFIP International Conference on Artificial Intelligence Applications and Innovations, pp. 500–511. Springer https://doi.org/10.1007/978-3-030-19823-7_42

  31. Metis - serial graph partitioning and fill-reducing matrix ordering|karypis lab. http://glaros.dtc.umn.edu/gkhome/metis/metis/overview

  32. Mohammadi M, Fazlali M, Hosseinzadeh M (2021) Accelerating Louvain community detection algorithm on graphic processing unit. J Supercomput 77(6):6056–6077. https://doi.org/10.1007/s11227-020-03510-9

    Article  Google Scholar 

  33. Moradi E, Fazlali M, Malazi HT (2015) Fast parallel community detection algorithm based on modularity. In: 2015 18th CSI International Symposium on Computer Architecture and Digital Systems (CADS), pp. 1–4. IEEE https://doi.org/10.1109/CADS.2015.7377794

  34. Mosadegh MJ, Behboudi M (2011) Using social network paradigm for developing a conceptual framework in crm. Aust J Bus Manag Res 1(4):63

    Article  Google Scholar 

  35. Naim M, Manne F, Halappanavar M, Tumeo A (2017) Community detection on the gpu. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 625–634. IEEE . https://doi.org/10.1109/IPDPS.2017.16

  36. Pinheiro CAR (2012) Community detection to identify fraud events in telecommunications networks. SAS SUGI proceedings: customer intelligence

  37. Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: International symposium on computer and information sciences, pp. 284–293. Springer

  38. Que X, Checconi F, Petrini F, Gunnels JA (2015) Scalable community detection with the Louvain algorithm. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 28–37. IEEE https://doi.org/10.1109/IPDPS.2015.59

  39. Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106

    Article  Google Scholar 

  40. Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E. https://doi.org/10.1103/PhysRevE.76.036106

    Article  Google Scholar 

  41. Raval A, Nasre R, Kumar V, Vadhiyar S, Pingali K., et al (2017) Dynamic load balancing strategies for graph applications on gpus. arXiv preprint arXiv:1711.00231

  42. Remy C, Rym B, Matthieu L (2017) Tracking bitcoin users activity using community detection on a network of weak signals. In: International Conference on complex networks and their applications, pp. 166–177. Springer . https://doi.org/10.1007/978-3-319-72150-7_14

  43. Rosvall M, Bergstrom CT (2007) An information-theoretic framework for resolving community structure in complex networks. Proc Natl Acad Sci 104(18):7327–7331

    Article  Google Scholar 

  44. Sarvari H, Abozinadah E, Mbaziira A, Mccoy D (2014) Constructing and analyzing criminal networks. In: 2014 IEEE Security and Privacy Workshops, pp. 84–91. IEEE . https://doi.org/10.1109/SPW.2014.22

  45. Sattar NS (2019) Scalable community detection using distributed Louvain algorithm. https://scholarworks.uno.edu/td/2640/

  46. Sattar NS, Arifuzzaman S (2020) Data parallel large sparse deep neural network on gpu. In: 2020 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, pp 1–9. https://doi.org/10.1109/IPDPSW50202.2020.00170

  47. Sattar NS, Arifuzzaman S (2019) Understanding performance bottleneck to improve parallel efficiency of Louvain algorithm. In: PDSW-DISCS workshop, 2019 international conference for high performance computing, networking, storage, and analysis (SC’19). https://www.pdsw.org/pdsw19/wips/NawSafrinSattar-pdswWIP.pdf

  48. Sattar NS, Arifuzzaman S (2018) Overcoming MPI communication overhead for distributed community detection. In: Workshop on software challenges to exascale computing. Springer, pp 77–90. https://doi.org/10.1007/978-981-13-7729-7_6

  49. Sattar NS, Arifuzzaman S (2018) Parallelizing Louvain algorithm: Distributed memory challenges. In: 2018 IEEE 16th International Conference on Dependable, Autonomic and Secure Computing (DASC 2018), pp. 695–701. IEEE . https://doi.org/10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00122

  50. Sattar NS, Arifuzzaman S (2020) Community detection using semi-supervised learning with graph convolutional network on GPUs. In: 2020 IEEE international conference on big data (Big Data). IEEE, pp 5237–5246. https://doi.org/10.1109/BigData50022.2020.9378123

  51. Sattar NS, Arifuzzaman S (2021) COVID-19 vaccination awareness and aftermath: public sentiment analysis on Twitter data and vaccinated population prediction in the USA. Appl Sci 11(13):6128

    Article  Google Scholar 

  52. Sattar NS, Arifuzzaman S, Zibran MF, Sakib MM (2019) Detecting web spam in webgraphs with predictive model analysis. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 4299–4308. IEEE. https://doi.org/10.1109/BigData47090.2019.9006282

  53. Shende SS, Malony AD (2006) The tau parallel performance system. Int J High Perform Comput Appl 20(2):287–311. https://doi.org/10.1177/1094342006064482

    Article  Google Scholar 

  54. Stanford large network dataset collection. https://snap.stanford.edu/data/index.html

  55. Staudt CL, Meyerhenke H (2016) Engineering parallel algorithms for community detection in massive networks. IEEE Trans Parallel Distrib Syst 1:1–1. https://doi.org/10.1109/TPDS.2015.2390633

    Article  Google Scholar 

  56. Talukder N, Zaki M.J (2016) Parallel graph mining with dynamic load balancing. In: Big Data (Big Data), 2016 IEEE International Conference on, pp. 3352–3359. IEEE . https://doi.org/10.1109/BigData.2016.7840995

  57. Tithi JJ, Stasiak A, Aananthakrishnan S, Petrini F (2020) Prune the unnecessary: Parallel pull-push Louvain algorithms with automatic edge pruning. In: 49th International Conference on Parallel Processing-ICPP, pp. 1–11 . https://doi.org/10.1145/3404397.3404455

  58. Ugander J, Karrer B, Backstrom L, Marlow C (2011) The anatomy of the facebook social graph. arXiv preprint arXiv:1111.4503

  59. Wang S, Gong M, Liu W, Wu Y (2020) Preventing epidemic spreading in networks by community detection and memetic algorithm. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2020.106118

    Article  Google Scholar 

  60. Waskiewicz T (2012) Friend of a friend influence in terrorist social networks. In: Proceedings on the International Conference on Artificial Intelligence (ICAI), p. 1. The Steering Committee of The World Congress in Computer Science, Computer

  61. Wickramaarachchi C, Frincuy M, Small P, Prasannay V (2014) Fast parallel algorithm for unfolding of communities in large graphs. In: High Performance Extreme Computing Conference (HPEC), 2014 IEEE, pp. 1–6. IEEE . https://doi.org/10.1109/HPEC.2014.7040973

  62. Zhou X, Zafarani R (2019) Network-based fake news detection: a pattern-driven approach. ACM SIGKDD Explor Newsl 21(2):48–60. https://doi.org/10.1145/3373464.3373473

    Article  Google Scholar 

Download references

Acknowledgements

This work has been supported by Louisiana Board of Regents RCS Grant LEQSF (2017-20)-RDA-25. We also thank the anonymous reviewers for the helpful comments and suggestions to improve this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naw Safrin Sattar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sattar, N.S., Arifuzzaman, S. Scalable distributed Louvain algorithm for community detection in large graphs. J Supercomput 78, 10275–10309 (2022). https://doi.org/10.1007/s11227-021-04224-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04224-2

Keywords