Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

Fast Distributed Algorithms for Connectivity and MST in Large Graphs

Published: 13 June 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Motivated by the increasing need to understand the algorithmic foundations of distributed large-scale graph computations, we study a number of fundamental graph problems in a message-passing model for distributed computing where k≥2 machines jointly perform computations on graphs with n nodes (typically, n>k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation.
    Our main result is an (almost) optimal distributed randomized algorithm for graph connectivity. Our algorithm runs in Õ(n/k2) rounds (Õ notation hides a polylog(n) factor and an additive polylog(n) term). This improves over the best previously known bound of Õ(n/k) [Klauck et al., SODA 2015] and is optimal (up to a polylogarithmic factor) in light of an existing lower bound of Ω˜(n/k2). Our improved algorithm uses a bunch of techniques, including linear graph sketching, that prove useful in the design of efficient distributed graph algorithms. Using the connectivity algorithm as a building block, we then present fast randomized algorithms for computing minimum spanning trees, (approximate) min-cuts, and for many graph verification problems. All these algorithms take Õ(n/k2) rounds and are optimal up to polylogarithmic factors. We also show an almost matching lower bound of Ω˜(n/k2) rounds for many graph verification problems by leveraging lower bounds in random-partition communication complexity.

    References

    [1]
    Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Analyzing graph structure via linear measurements. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 459--467.
    [2]
    Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Graph sketches: Sparsification, spanners, and subgraphs. In Proceedings of the 31st ACM Symposium on Principles of Database Systems (PODS). 5--14.
    [3]
    Noga Alon, László Babai, and Alon Itai. 1986. A fast and simple randomized parallel algorithm for the maximal independent set problem. Journal of Algorithms 7, 4 (1986), 567--583.
    [4]
    Noga Alon, Ronitt Rubinfeld, Shai Vardi, and Ning Xie. 2012. Space-efficient local computation algorithms. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 1132--1139.
    [5]
    Sayan Bandyapadhyay, Tanmay Inamdar, Shreyas Pai, and Sriram V. Pemmaraju. 2018. Near-optimal clustering in the -machine model. In Proceedings of the 19th International Conference on Distributed Computing and Networking (ICDCN).
    [6]
    Otakar Borůvka. 1926. O jistém problému minimálním (about a certain minimal problem). Práce Mor. Prírodoved. Spol. v Brne III 3 (1926).
    [7]
    Keren Censor-Hillel, Petteri Kaski, Janne H. Korhonen, Christoph Lenzen, Ami Paz, and Jukka Suomela. 2015. Algebraic methods in the congested clique. In Proceedings of the 34th ACM Symposium on Principles of Distributed Computing (PODC). 143--152.
    [8]
    Jen-Yeu Chen and Gopal Pandurangan. 2012. Almost-optimal gossip-based aggregate computation. SIAM Journal of Computing 41, 3 (2012), 455--483.
    [9]
    Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment 8, 12 (2015), 1804--1815.
    [10]
    Fan Chung and Olivia Simpson. 2015. Distributed algorithms for finding local clusters using heat kernel Pagerank. In Proceedings of the 12th Workshop on Algorithms and Models for the Web-graph (WAW). 77--189.
    [11]
    Graham Cormode and Donatella Firmani. 2014. A unifying framework for -sampling algorithms. Distributed and Parallel Databases 32, 3 (2014), 315--335.
    [12]
    Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, Gopal Pandurangan, David Peleg, and Roger Wattenhofer. 2012. Distributed verification and hardness of distributed approximation. SIAM Journal of Computing 41, 5 (2012), 1235--1265.
    [13]
    Andrew Drucker, Fabian Kuhn, and Rotem Oshman. 2014. On the power of the congested clique model. In Proceedings of the 33rd ACM Symposium on Principles of Distributed Computing (PODC). 367--376.
    [14]
    Michael Elkin, Hartmut Klauck, Danupon Nanongkai, and Gopal Pandurangan. 2014. Can quantum communication speed up distributed computation?. In Proceedings of the 33rd ACM Symposium on Principles of Distributed Computing (PODC). 166--175.
    [15]
    Robert G. Gallager, Pierre A. Humblet, and Philip M. Spira. 1983. A distributed algorithm for minimum-weight spanning trees. ACM Transactions on Programming Language Systems 5, 1 (1983), 66--77.
    [16]
    Mohsen Ghaffari and Fabian Kuhn. 2013. Distributed minimum cut approximation. In Proceedings of the 27th International Symposium on Distributed Computing (DISC). 1--15.
    [17]
    Mohsen Ghaffari and Merav Parter. 2016. MST in log-star rounds of congested clique. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (PODC). 19--28.
    [18]
    James W. Hegeman, Gopal Pandurangan, Sriram V. Pemmaraju, Vivek B. Sardeshmukh, and Michele Scquizzato. 2015. Toward optimal bounds in the congested clique: Graph connectivity and MST. In Proceedings of the 34th ACM Symposium on Principles of Distributed Computing (PODC). 91--100.
    [19]
    Hossein Jowhari, Mert Saglam, and Gábor Tardos. 2011. Tight bounds for samplers, finding duplicates in streams, and related problems. In Proceedings of the 30th ACM Symposium on Principles of Database Systems (PODS). 49--58.
    [20]
    David R. Karger. 1994. Random sampling in cut, flow, and network design problems. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing (STOC). 648--657.
    [21]
    David R. Karger, Philip N. Klein, and Robert E. Tarjan. 1995. A randomized linear-time algorithm to find minimum spanning trees. Journal of the ACM 42, 2 (1995), 321--328.
    [22]
    Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. 2010. A model of computation for MapReduce. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 938--948.
    [23]
    Valerie King, Shay Kutten, and Mikkel Thorup. 2015. Construction and impromptu repair of an MST in a distributed network with o(m) communication. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing (PODC). 71--80.
    [24]
    Hartmut Klauck, Danupon Nanongkai, Gopal Pandurangan, and Peter Robinson. 2015. Distributed computation of large-scale graph problems. In Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 391--410.
    [25]
    Eyal Kushilevitz and Noam Nisan. 1997. Communication Complexity. Cambridge University Press.
    [26]
    Shay Kutten, Gopal Pandurangan, David Peleg, Peter Robinson, and Amitabh Trehan. 2015. Sublinear bounds for randomized leader election. Theoretical Computer Science 561 (2015), 134--143.
    [27]
    Silvio Lattanzi, Benjamin Moseley, Siddharth Suri, and Sergei Vassilvitskii. 2011. Filtering: A method for solving graph problems in MapReduce. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 85--94.
    [28]
    Christoph Lenzen. 2013. Optimal deterministic routing and sorting on the congested clique. In Proceedings of the 32nd ACM Symposium on Principles of Distributed Computing (PODC). 42--50.
    [29]
    Christoph Lenzen and Roger Wattenhofer. 2016. Tight bounds for parallel randomized load balancing. Distributed Computing 29, 2 (2016), 127--142.
    [30]
    Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets. Cambridge University Press.
    [31]
    Zvi Lotker, Boaz Patt-Shamir, Elan Pavlov, and David Peleg. 2005. Minimum-weight spanning tree construction in communication rounds. SIAM Journal of Computing 35, 1 (2005), 120--131.
    [32]
    Nancy A. Lynch. 1996. Distributed Algorithms. Morgan Kaufmann Publishers Inc.
    [33]
    Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM International Conference on Management of Data (SIGMOD). 135--146.
    [34]
    Andrew McGregor. 2014. Graph stream algorithms: A survey. SIGMOD Record 43, 1 (2014), 9--20.
    [35]
    Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press.
    [36]
    Danupon Nanongkai. 2014. Distributed approximation algorithms for weighted shortest paths. In Proceedings of the 46th ACM Symposium on Theory of Computing (STOC). 565--573.
    [37]
    Danupon Nanongkai, Atish Das Sarma, and Gopal Pandurangan. 2011. A tight unconditional lower bound on distributed randomwalk computation. In Proceedings of the 30th Annual ACM Symposium on Principles of Distributed Computing (PODC). 257--266.
    [38]
    Rotem Oshman. 2014. Communication complexity lower bounds in distributed message-passing. In Proceedings of the 21th International Colloquium on Structural Information and Communication Complexity (SIROCCO). 14--17.
    [39]
    Gopal Pandurangan, David Peleg, and Michele Scquizzato. 2016. Message lower bounds via efficient network synchronization. In Proceedings of the 23rd International Colloquium on Structural Information and Communication Complexity (SIROCCO). 75--91.
    [40]
    Gopal Pandurangan, Peter Robinson, and Michele Scquizzato. 2016. Fast distributed algorithms for connectivity and MST in large graphs. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 429--438.
    [41]
    Gopal Pandurangan, Peter Robinson, and Michele Scquizzato. 2017. A time- and message-optimal distributed algorithm for minimum spanning trees. In Proceedings of the 49th Annual ACM Symposium on the Theory of Computing (STOC). 743--756.
    [42]
    Gopal Pandurangan, Peter Robinson, and Michele Scquizzato. 2018. On the distributed complexity of large-scale graph computations. In Proceedings of the 30th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). To appear.
    [43]
    David Peleg. 2000. Distributed Computing: A Locality-Sensitive Approach. Society for Industrial and Applied Mathematics.
    [44]
    Sriram V. Pemmaraju and Vivek B. Sardeshmukh. 2016. Super-fast MST algorithms in the congested clique using o(m) messages. In Proceedings of the 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS). 47:1--47:15.
    [45]
    Judy Qiu, Shantenu Jha, Andre Luckow, and Geoffrey C. Fox. 2014. Towards HPC-ABDS: An initial high-performance big data stack. Retrieved from http://grids.ucs.indiana.edu/ptliupages/publications/nist-hpc-abds.pdf.
    [46]
    Isabelle Stanton. 2014. Streaming balanced graph partitioning algorithms for random graphs. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 1287--1301.
    [47]
    Ramakrishna Thurimella. 1997. Sub-linear distributed algorithms for sparse certificates and biconnected components. Journal of Algorithms 23, 1 (1997), 160--179.
    [48]
    Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From “think like a vertex” to “think like a graph.” PVLDB 7, 3 (2013), 193--204.
    [49]
    Leslie G. Valiant. 1982. A scheme for fast parallel communication. SIAM Journal on Computing 11, 2 (1982), 350--361.
    [50]
    Leslie G. Valiant. 1990. A bridging model for parallel computation. Communications of the ACM 33, 8 (1990), 103--111.
    [51]
    Sergei Vassilvitskii. 2015. Models for Parallel Computation (A Hitchhikers’ Guide to Massively Parallel Universes). Retrieved from http://grigory.us/blog/massively-parallel-universes/.
    [52]
    David P. Woodruff and Qin Zhang. 2017. When distributed computation is communication expensive. Distributed Computing 30, 5 (2017), 309--323.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Parallel Computing
    ACM Transactions on Parallel Computing  Volume 5, Issue 1
    Special Issue on SPAA 2016
    March 2018
    140 pages
    ISSN:2329-4949
    EISSN:2329-4957
    DOI:10.1145/3232649
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2018
    Accepted: 01 January 2018
    Revised: 01 October 2017
    Received: 01 November 2016
    Published in TOPC Volume 5, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Distributed graph algorithms
    2. graph connectivity
    3. graph sketching
    4. minimum spanning trees

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • NSF
    • US-Israel Binational Science Foundation

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)66
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Balanced parallel triangle enumeration with an adaptive algorithmDistributed and Parallel Databases10.1007/s10619-023-07437-x42:1(103-141)Online publication date: 1-Mar-2024
    • (2023)A key review on graph data science: The power of graphs in scientific studiesChemometrics and Intelligent Laboratory Systems10.1016/j.chemolab.2023.104896240(104896)Online publication date: Sep-2023
    • (2022)Equivalence classes and conditional hardness in massively parallel computationsDistributed Computing10.1007/s00446-021-00418-235:2(165-183)Online publication date: 1-Apr-2022
    • (2021)On the Distributed Complexity of Large-Scale Graph ComputationsACM Transactions on Parallel Computing10.1145/34609008:2(1-28)Online publication date: 15-Jul-2021
    • (2021)Efficient Distributed Algorithms in the k-machine model via PRAM Simulations2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00031(223-232)Online publication date: May-2021
    • (2020)How Fast Can You Update Your MST?Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400240(531-533)Online publication date: 6-Jul-2020
    • (2020)Parallel algorithms for finding connected components using linear algebraJournal of Parallel and Distributed Computing10.1016/j.jpdc.2020.04.009Online publication date: May-2020
    • (2019)LACC: A Linear-Algebraic Algorithm for Finding Connected Components in Distributed Memory2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00012(2-12)Online publication date: May-2019
    • (2018)On the Distributed Complexity of Large-Scale Graph ComputationsProceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures10.1145/3210377.3210409(405-414)Online publication date: 11-Jul-2018
    • (2018)Message lower bounds via efficient network synchronizationTheoretical Computer Science10.1016/j.tcs.2018.11.017Online publication date: Nov-2018

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media