Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Parallel Community Detection Algorithm Using a Data Partitioning Strategy with Pairwise Subdomain Duplication

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9697))

Included in the following conference series:

  • 2724 Accesses

Abstract

Community detection is an important data clustering technique for studying graph structures. Many serial algorithms have been developed and well studied in the literature. As the problem size grows, the research attention has recently been turning to parallelizing the technique. However, the conventional parallelization strategies that divide the problem domain into non-overlapping subdomains do not scale with problem size and the number of processes. The main obstacle lies in the fact that the graph algorithms often exhibit a high degree of data dependency, which makes developing scalable parallel algorithms a great challenge.

We present PMEP, a distributed-memory based parallel community detection algorithm that adopts an unconventional data partitioning strategy. PMEP divides a graph into subgraphs and assigns each pair of subgraphs to one process. This method duplicates a portion of computational workload among processes in exchange for a significantly reduced communication cost required in the later stages. After data partitioning, each process runs MEP on the assigned subgraph pair. MEP is a community detection algorithm based on the idea of maximizing equilibrium and purity. Our data partitioning method effectively simplifies the communication required for combining the local results into a global one and hence allows us to achieve better scalability over existing parallel algorithms without sacrificing the result quality. Our experimental results show a speedup of 126.95 on 190 MPI processes for using synthetic data sets and a speedup of 204.22 on 1225 processes for using a real-world data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://burtleburtle.net/bob/.

  2. 2.

    http://snap.stanford.edu/data/index.html.

  3. 3.

    https://github.com/usc-cloud/parallel-louvain-modularity.

References

  1. Bansal, S., Bhowmick, S., Paymal, P.: Fast community detection for dynamic complex networks. In: Mangioni, G. (ed.) CompleNet 2010. CCIS, vol. 116, pp. 196–207. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  2. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008(10), P10008 (2008)

    Article  Google Scholar 

  3. Boldi, P., Codenotti, B., Santini, M., Vigna, S.: Ubicrawler: a scalable fully distributed web crawler. Softw.: Pract. Experience 34(8), 711–726 (2004)

    Google Scholar 

  4. Brandes, U., Delling, D., Gaertler, M., Görke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On finding graph clusterings with maximum modularity. In: Brandstädt, A., Kratsch, D., Müller, H. (eds.) WG 2007. LNCS, vol. 4769, pp. 121–132. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)

    Article  Google Scholar 

  6. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  7. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  8. Hendrickson, B., Kolda, T.G.: Graph partitioning models for parallel computing. Parallel Comput. 26(12), 1519–1534 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  9. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  MATH  Google Scholar 

  10. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review (1999)

    Google Scholar 

  11. Karypis, G., Kumar, V.: Parallel multilevel k-way partitioning scheme for irregular graphs. In: Proceedings of the 1996 ACM/IEEE Conference on Supercomputing (1996)

    Google Scholar 

  12. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  13. Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. Parallel Distrib. Comput. 48(1), 96–129 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  14. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 78(4 Pt 2), 046110 (2008)

    Article  Google Scholar 

  15. Lu, H., Halappanavar, M., Kalyanaraman, A., Choudhury, S.: Parallel heuristics for scalable community detection. In: Proceedings of the International Workshop on Multithreaded Architectures and Applications (MTAAP), IPDPS Workshops (2014)

    Google Scholar 

  16. Meyerhenke, H., Gehweiler, J.: On dynamic graph partitioning and graph clustering using diffusion. In: Algorithm Engineering. Dagstuhl Seminar Proceedings, vol. 10261 (2010)

    Google Scholar 

  17. Riedy, E.J., Meyerhenke, H., Ediger, D., Bader, D.A.: Parallel community detection for massive graphs. In: Graph Partitioning and Graph Clustering, pp. 207–222 (2012)

    Google Scholar 

  18. Staudt, C., Meyerhenke, H.: Engineering high-performance community detection heuristics for massive graphs. In: ICPP, pp. 180–189 (2013)

    Google Scholar 

  19. Wakita, K., Tsurumi, T.: Finding community structure in mega-scale social networks:[extended abstract]. In: Proceedings of the 16th International Conference on World Wide Web, pp. 1275–1276. ACM (2007)

    Google Scholar 

  20. Watts, D.J., Strogatz, S.H.: Collective dynamics of’small-world’networks. Nature 393(6684), 409–10 (1998)

    Article  Google Scholar 

  21. Wickramaarachchi, C., Frincu, M., Small, P., Prasanna, V.: Fast parallel algorithm for unfolding of communities in large graphs. In: 2014 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6, September 2014

    Google Scholar 

  22. Zafarani, R., Liu, H.: Social computing data repository at arizona state university. School Comput. Inf. Decis. Syst. Eng. (2009)

    Google Scholar 

  23. Zardi, H., Romdhane, L.B.: An \(o(n^2)\) algorithm for detecting communities of unbalanced sizes in large scale social networks. Know.-Based Syst. 37, 19–36 (2013)

    Google Scholar 

Download references

Acknowledgment

This work is supported in part by the following grants: NSF awards CCF-1029166, IIS-1343639, CCF-1409601; DOE awards DE-SC0007456, DE-SC0014330; AFOSR award FA9550-12-1-0458; NIST award 70NANB14H012; DARPA award N66001-15-C-4036.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diana Palsetia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Palsetia, D., Hendrix, W., Lee, S., Agrawal, A., Liao, Wk., Choudhary, A. (2016). Parallel Community Detection Algorithm Using a Data Partitioning Strategy with Pairwise Subdomain Duplication. In: Kunkel, J., Balaji, P., Dongarra, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9697. Springer, Cham. https://doi.org/10.1007/978-3-319-41321-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41321-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41320-4

  • Online ISBN: 978-3-319-41321-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics