Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3295500.3356206acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Public Access

Scalable generation of graphs for benchmarking HPC community-detection algorithms

Published: 17 November 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Community detection in graphs is a canonical social network analysis method. We consider the problem of generating suites of teras-cale synthetic social networks to compare the solution quality of parallel community-detection methods. The standard method, based on the graph generator of Lancichinetti, Fortunato, and Radicchi (LFR), has been used extensively for modest-scale graphs, but has inherent scalability limitations.
    We provide an alternative, based on the scalable Block Two-Level Erdos-Renyi (BTER) graph generator, that enables HPC-scale evaluation of solution quality in the style of LFR. Our approach varies community coherence, and retains other important properties. Our methods can scale real-world networks, e.g., to create a version of the Friendster network that is 512 times larger. With BTER's inherent scalability, we can generate a 15-terabyte graph (4.6B vertices, 925B edges) in just over one minute. We demonstrate our capability by showing that label-propagation community-detection algorithm can be strong-scaled with negligible solution-quality loss.

    References

    [1]
    Maksudul Alam, Maleq Khan, Anil Vullikanti, and Madhav Marathe. 2016. An efficient and scalable algorithmic method for generating large: scale random graphs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 32.
    [2]
    David A Bader and Kamesh Madduri. 2006. GTgraph: A synthetic graph generator suite. (2006).
    [3]
    Vladimir Batagelj and Ulrik Brandes. 2005. Efficient generation of large random networks. Physical Review E 71, 3 (2005), 036113.
    [4]
    Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (2008), P10008. http://stacks.iop.org/1742-5468/2008/i=10/a=P10008
    [5]
    Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, 595--601.
    [6]
    M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi. 2010. Measuring User Influence in Twitter: The Million Follower Fallacy. In Proc. Int'l. Conf. on Weblogs and Social Media (ICWSM).
    [7]
    Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442--446.
    [8]
    Fan Chung and Linyuan Lu. 2002. Connected components in random graphs with given expected degree sequences. Annals of combinatorics 6, 2 (2002), 125--145.
    [9]
    Paul Erdős and Alfréd Rényi. 1960. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5, 1 (1960), 17--60.
    [10]
    John Forrest, Ted Ralphs, Stefan Vigerske, LouHafer, Bjarni Kristjansson, jpfasano, EdwinStraver, Miles Lubin, Haroldo Gambini Santos, rlougee, and Matthew Saltzman. 2018. (COIN-OR/Cbc): Version 2.9.9.
    [11]
    S. Fortunato and M. Barthélemy. 2007. Resolution Limit in Community Detection. PNAS 104, 1 (2007), 36--41.
    [12]
    Daniel Funke, Sebastian Lamm, Peter Sanders, Christian Schultz, Darren Strash, and Mortiz von Looz. 2018. Communication-Free Massively Distributed Graph Generation. In International Parallel & Distributed Processing Symposium (IPDPS).
    [13]
    Sayan Ghosh, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman, Hao Lu, Daniel Chavarria-Miranda, Arif Khan, and Assefaw Gebremedhin. 2018. Distributed louvain algorithm for graph community detection. In 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 885--895.
    [14]
    Michael Hamann, Ulrich Meyer, Manuel Penschuck, Hung Tran, and Dorothea Wagner. 2018. I/O-Efficient Generation of Massive Graphs Following the LFR Benchmark. J. Exp. Algorithmics 23, 1, Article 2.5 (Aug. 2018), 33 pages.
    [15]
    William E Hart, Jean-Paul Watson, and David L Woodruff. 2011. Pyomo: modeling and solving mathematical programs in Python. Mathematical Programming Computation 3, 3 (2011), 219--260.
    [16]
    Tamara G. Kolda, Ali Pinar, Todd Plantenga, and C Seshadhri. 2014. A scalable generative graph model with community structure. SIAM Journal on Scientific Computing 36, 5 (2014), C424--C452.
    [17]
    Jérôme Kunegis. 2013. KONECT - The Koblenz Network Collection. In Proc. Int. Conf. on World Wide Web Companion. 1343--1350.
    [18]
    Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. 2008. Benchmark graphs for testing community detection algorithms. Physical Review E 78, 4 (Oct. 2008), 1--5.
    [19]
    Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
    [20]
    Joel C Miller and Aric Hagberg. 2011. Efficient generation of networks with given expected degrees. In International Workshop on Algorithms and Models for the Web-Graph. Springer, 115--126.
    [21]
    M. E. J. Newman and M. Girvan. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69 (Feb 2004), 026113. Issue 2.
    [22]
    Symeon Papdopoulos, Yiannis Kompatsiaris, Athena Vakali, and Ploutarchos Spyridonos. 2012. Community detection in Social Media: Performance and application considerations. Data Mining and Knowledge Discovery 24, 3 (2012), 515--554.
    [23]
    Himchan Park and Min-Soo Kim. 2018. EvoGraph: an effective and efficient graph upscaling method for preserving graph properties. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2051--2059.
    [24]
    Web Archive Project. [n. d.]. Friendster social network dataset: friends. https://archive.org/details/friendster-dataset-201107.
    [25]
    U. N. Raghavan, R. Albert, and S. Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E 76, 3 (2007), 036106.
    [26]
    Giulio Rossetti and Rémy Cazabet. 2018. Community Discovery in Dynamic Networks: A Survey. ACM Comput. Surv. 51, 2, Article 35 (Feb. 2018), 37 pages.
    [27]
    Geoffrey Sanders, Roger Pearce, Timothy La Fond, and Jeremy Kepner. 2018. On large-scale graph generation with validation of diverse triangle statistics at edges and vertices. arXiv preprint arXiv:1803.09021 (2018).
    [28]
    Comandur Seshadhri, Tamara G Kolda, and Ali Pinar. 2012. Community structure and scale-free collections of Erdős-Rényi graphs. Physical Review E 85, 5 (2012), 056109.
    [29]
    G. M. Slota and S. Rajamanickam. 2018. Experimental Design of Work Chunking for Graph Algorithms on High Bandwidth Memory Architectures. In International Parallel & Distributed Processing Symposium (IPDPS).
    [30]
    G. M. Slota, S. Rajamanickam, and K. Madduri. 2016. A Case Study of Complex Graph Analysis in Distributed Memory: Implementation and Optimization. In International Parallel & Distributed Processing Symposium (IPDPS).
    [31]
    Christian L Staudt, Michael Hamann, Ilya Safro, Alexander Gutfraind, and Henning Meyerhenke. 2016. Generating scaled replicas of real-world complex networks. In International Workshop on Complex Networks and their Applications. Springer, 17--28.
    [32]
    Christian L. Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. 2016. NetworKit: A tool suite for large-scale complex network analysis. Network Science 4, 4 (2016), 508--530.
    [33]
    Nguyen Xuan Vinh, Julien Epps, and James Bailey. 2010. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research 11, Oct (2010), 2837--2854.
    [34]
    M Winlaw, H DeSterck, and G Sanders. 2015. An In-Depth Analysis of the Chung-Lu Model. Technical Report LLNL-TR-678729. Lawrence Livermore National Laboratory.

    Cited By

    View all
    • (2024)Effects of Null Model Choice on Modularity MaximizationComplex Networks & Their Applications XII10.1007/978-3-031-53499-7_21(261-272)Online publication date: 29-Feb-2024
    • (2024)Identifying Well-Connected Communities in Real-World and Synthetic NetworksComplex Networks & Their Applications XII10.1007/978-3-031-53499-7_1(3-14)Online publication date: 29-Feb-2024
    • (2023)Parallel Overlapping Community Detection Algorithm on GPUIEEE Transactions on Big Data10.1109/TBDATA.2022.31803609:2(677-687)Online publication date: 1-Apr-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
    November 2019
    1921 pages
    ISBN:9781450362290
    DOI:10.1145/3295500
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 November 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    • National Nuclear Security Administration

    Conference

    SC '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)127
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Effects of Null Model Choice on Modularity MaximizationComplex Networks & Their Applications XII10.1007/978-3-031-53499-7_21(261-272)Online publication date: 29-Feb-2024
    • (2024)Identifying Well-Connected Communities in Real-World and Synthetic NetworksComplex Networks & Their Applications XII10.1007/978-3-031-53499-7_1(3-14)Online publication date: 29-Feb-2024
    • (2023)Parallel Overlapping Community Detection Algorithm on GPUIEEE Transactions on Big Data10.1109/TBDATA.2022.31803609:2(677-687)Online publication date: 1-Apr-2023
    • (2023)Correcting Output Degree Sequences in Chung-Lu Random Graph GenerationComplex Networks and Their Applications XI10.1007/978-3-031-21131-7_6(69-80)Online publication date: 26-Jan-2023
    • (2022)Random graph generator for leader and community detection in networksInternational Transactions in Operational Research10.1111/itor.1322831:3(1699-1719)Online publication date: 6-Nov-2022
    • (2022)HD-CPS: Hardware-assisted Drift-aware Concurrent Priority Scheduler for Shared Memory Multicores2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00046(528-542)Online publication date: Apr-2022
    • (2022)Towards scaling community detection on distributed-memory heterogeneous systemsParallel Computing10.1016/j.parco.2022.102898111(102898)Online publication date: Jul-2022
    • (2022)Limitations of Chung Lu Random Graph GenerationComplex Networks & Their Applications X10.1007/978-3-030-93409-5_38(451-462)Online publication date: 1-Jan-2022
    • (2021)ElGAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3480857(1-15)Online publication date: 14-Nov-2021
    • (2021)LineageBA: A Fast, Exact and Scalable Graph Generation for the Barabási-Albert Model2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00053(540-551)Online publication date: Apr-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media