Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1989493.1989505acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Filtering: a method for solving graph problems in MapReduce

Published: 04 June 2011 Publication History
  • Get Citation Alerts
  • Abstract

    The MapReduce framework is currently the de facto standard used throughout both industry and academia for petabyte scale data analysis. As the input to a typical MapReduce computation is large, one of the key requirements of the framework is that the input cannot be stored on a single machine and must be processed in parallel. In this paper we describe a general algorithmic design technique in the MapReduce framework called filtering. The main idea behind filtering is to reduce the size of the input in a distributed fashion so that the resulting, much smaller, problem instance can be solved on a single machine. Using this approach we give new algorithms in the MapReduce framework for a variety of fundamental graph problems for sufficiently dense graphs. Specifically, we present algorithms for minimum spanning trees, maximal matchings, approximate weighted matchings, approximate vertex and edge covers and minimum cuts. In all of these cases, we parameterize our algorithms by the amount of memory available on the machines allowing us to show tradeoffs between the memory available and the number of MapReduce rounds. For each setting we will show that even if the machines are only given substantially sublinear memory, our algorithms run in a constant number of MapReduce rounds. To demonstrate the practical viability of our algorithms we implement the maximal matching algorithm that lies at the core of our analysis and show that it achieves a significant speedup over the sequential version.

    References

    [1]
    E. Bakshy, J. Hofman, W. Mason, and D. J. Watts. Everyone's an influencer: Quantifying influence on twitter. In Proceedings of WSDM, 2011.
    [2]
    Bernard Chazelle. A minimum spanning tree algorithm with inverse-Ackerman type complexity. Journal of the ACM, 47(6):1028--1047, November 2000.
    [3]
    Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of OSDI, pages 137--150, 2004.
    [4]
    Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. On graph problems in a semi-streaming model. Theoretical Computer Science, 348(2-3):207--216, December 2005.
    [5]
    Michael T. Goodrich. Simulating parallel algorithms in the mapreduce framework with applications to parallel computational geometry. Second Workshop on Massive Data Algorithmics (MASSIVE 2010), June 2010.
    [6]
    Hadoop Wiki - Powered By. http://wiki.apache.org/hadoop/PoweredBy.
    [7]
    Blake Irving. Big data and the power of hadoop. Yahoo! Hadoop Summit, June 2010.
    [8]
    Amos Israel and A. Itai. A fast and simple randomized parallel algorithm for maximal matching. Information Processing Letters, 22(2):77--80, 1986.
    [9]
    U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, and Jure Leskovec. HADI: Fast diameter estimation and mining in massive graphs with hadoop. Technical Report Carnegie Mellon University-ML-08-117, Carnegie Mellon University, December 2008.
    [10]
    David R. Karger. Global min-cuts in RNC and other ramifications of a simple mincut algorithm. In Proceedings of SODA, pages 21--30, January 1993.
    [11]
    David R. Karger, Philip N. Klein, and Robert E. Tarjan. A randomized linear-time algorithm for finding minimum spanning trees. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, Proceedings of STOC, pages 9--15, New York, NY, USA, 1994. ACM.
    [12]
    David R. Karger and Clifford Stein. An O(n<sup>2</sup>) algorithm for minimum cuts. In Proceedings of STOC, pages 757--765, May 1993.
    [13]
    Howard Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for MapReduce. In Proceedings of SODA, pages 938--948, 2010.
    [14]
    Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: Densification laws, shrinking daimeters and possible explanations. In Proc. 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005.
    [15]
    Jimmy Lin and Chris Dyer. Data-Intensive Text Processing with MapReduce. Number 7 in Synthesis Lectures on Human Language Technologies. Morgan and Claypool, April 2010.
    [16]
    Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of SIGMOD, pages 135--145, Indianapolis, IN, USA, June 2010. ACM.
    [17]
    Mike Schroepfer. Inside large-scale analytics at facebook. Yahoo! Hadoop Summit, June 2010.
    [18]
    Daniel A. Spielman and Nikhil Srivastava. Graph sparsification by effective resistances. In Proceedings of STOC, pages 563--568, New York, NY, USA, 2008. ACM.
    [19]
    Mirjam Wattenhofer and Roger Wattenhofer. Distributed weighted matching. In Proceedings of DISC, pages 335--348. Springer, 2003.
    [20]
    Tom White. Hadoop: The Definitive Guide. O'Reilly Media, 2009.
    [21]
    Yahoo! Inc Press Release. Yahoo! partners with four top universities to advance cloud computing systems and applications research. http://research.yahoo.com/news/2743, April 2009.

    Cited By

    View all
    • (2024)Streaming Graph Algorithms in the Massively Parallel Computation ModelProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662770(496-507)Online publication date: 17-Jun-2024
    • (2024)Log Diameter Rounds MST Verification and Sensitivity in MPCProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659984(269-280)Online publication date: 17-Jun-2024
    • (2024)O(log log n) Passes Is Optimal for Semi-streaming Maximal Independent SetProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649763(847-858)Online publication date: 10-Jun-2024
    • Show More Cited By

    Index Terms

    1. Filtering: a method for solving graph problems in MapReduce

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SPAA '11: Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
      June 2011
      404 pages
      ISBN:9781450307437
      DOI:10.1145/1989493
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      • EATCS: European Association for Theoretical Computer Science

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 June 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. MapReduce
      2. graph algorithms
      3. matchings

      Qualifiers

      • Research-article

      Conference

      SPAA '11

      Acceptance Rates

      Overall Acceptance Rate 447 of 1,461 submissions, 31%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)43
      • Downloads (Last 6 weeks)5

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Streaming Graph Algorithms in the Massively Parallel Computation ModelProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662770(496-507)Online publication date: 17-Jun-2024
      • (2024)Log Diameter Rounds MST Verification and Sensitivity in MPCProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659984(269-280)Online publication date: 17-Jun-2024
      • (2024)O(log log n) Passes Is Optimal for Semi-streaming Maximal Independent SetProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649763(847-858)Online publication date: 10-Jun-2024
      • (2024)Component stability in low-space massively parallel computationDistributed Computing10.1007/s00446-024-00461-937:1(35-64)Online publication date: 8-Feb-2024
      • (2023)A Hierarchical Grouping Algorithm for the Multi-Vehicle Dial-a-Ride ProblemProceedings of the VLDB Endowment10.14778/3579075.357909116:5(1195-1207)Online publication date: 6-Mar-2023
      • (2023)RabbitTClust: enabling fast clustering analysis of millions of bacteria genomes with MinHash sketchesGenome Biology10.1186/s13059-023-02961-624:1Online publication date: 17-May-2023
      • (2023)Exponentially Faster Massively Parallel Maximal MatchingJournal of the ACM10.1145/361736070:5(1-18)Online publication date: 11-Oct-2023
      • (2023)Engineering Massively Parallel MST Algorithms2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00075(691-701)Online publication date: May-2023
      • (2022)Deterministic massively parallel connectivityProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3519935.3520055(162-175)Online publication date: 9-Jun-2022
      • (2022)From Switch Scheduling to Datacenter SchedulingProceedings of the 2022 ACM Symposium on Principles of Distributed Computing10.1145/3519270.3538443(313-323)Online publication date: 20-Jul-2022
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media