Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1772690.1772715acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Max-cover in map-reduce

Published: 26 April 2010 Publication History

Abstract

The NP-hard Max-k-cover problem requires selecting k sets from a collection so as to maximize the size of the union. This classic problem occurs commonly in many settings in web search and advertising. For moderately-sized instances, a greedy algorithm gives an approximation of (1-1/e). However, the greedy algorithm requires updating scores of arbitrary elements after each step, and hence becomes intractable for large datasets.
We give the first max cover algorithm designed for today's large-scale commodity clusters. Our algorithm has provably almost the same approximation as greedy, but runs much faster. Furthermore, it can be easily expressed in the MapReduce programming paradigm, and requires only polylogarithmically many passes over the data. Our experiments on five large problem instances show that our algorithm is practical and can achieve good speedups compared to the sequential greedy algorithm.

References

[1]
N. Alon, D. Moshkovitz, and S. Safra. Algorithmic construction of sets for k-restrictions. ACM Trans. Algorithms, 2(2):153--177, 2006.
[2]
B. Berger, J. Rompel, and P. W. Shor. Efficient NC algorithms for set cover with applications to learning and geometry. J. Comput. Syst. Sci., 49(3):454--477, 1994.
[3]
T. Brants, A. C. Popat, P. Xu, F. J. Och, and J. Dean. Large language models in machine translation. In Proc. EMNLP, pages 858--867, 2007.
[4]
M. Charikar. Greedy approximations for finding dense components in a graph. In Proc. 3rd APPROX, pages 84--95, 2000.
[5]
W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social networks. In Proc. 15th KDD, pages 199--208, 2009.
[6]
C. T. Chu, S. K. Kim, Y. A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In Proc. NIPS, pages 281--288, 2006.
[7]
J. Cohen. Graph twiddling in a MapReduce world. Computing in Science and Engineering, 11(4):29--41, 2009.
[8]
A. Dasgupta, A. Ghosh, R. Kumar, C. Olston, S. Pandey, and A. Tomkins. The discoverability of the web. In Proc. 16th WWW, pages 421--430, 2007.
[9]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. C. ACM, 51:107--113, 2008.
[10]
J. Ekanayake, S. Pallickara, and G. Fox. MapReduce for data intensive scientific analyses. In IEEE Fourth International Conference on eScience, pages 277--284, 2008.
[11]
T. Elsayed, J. Lin, and D. W. Oard. Pairwise document similarity in large collections with MapReduce. In Proc. 46th ACL/HLT, pages 265--268, 2008.
[12]
U. Feige. A threshold of ln n for approximationg set cover. J. ACM, 45:634--652, 1988.
[13]
J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein, and Z. Svitkina. On distributing symmetric streaming computations. In Proc. 19th SODA, pages 710--719, 2008.
[14]
R. Gandhi, S. Khuller, and A. Srinivasan. Approximation algorithms for partial covering problems. Journal of Algorithms, 2(1):55--84, 2004.
[15]
M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Co., 1979.
[16]
D. Hochbaum, editor. Approximation Algorithms for NP-Hard Problems. Course Technology, July 1996.
[17]
D. Hochbaum and A. Pathria. Analysis of the greedy approach in covering problems, 1994. Unpublished manuscript.
[18]
U. Kang, C. E. Tsourakakis, A. Appel, C. Faloutsos, and J. Leskovec. HADI: Fast diameter estimation and mining in massive graphs with Hadoop. Technical Report CMU-ML-08-117, CMU, 2008.
[19]
H. Karloff, S. Suri, and S. Vassilvitskii. A model of computation for MapReduce. In Proc. 20th SODA, 2010.
[20]
D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In Proc. 9th KDD, pages 137--146, 2003.
[21]
S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45, 1999.
[22]
A. Kimball. Parallel graph algorithms with MapReduce. http://youtube.com/watch?v=BT-piFBP4fE.
[23]
R. Kumar, A. Tomkins, and E. Vee. Connectivity structure of bipartite graphs via the KNC-plot. In Proc. 1st WSDM, pages 129--138, 2008.
[24]
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. S. Glance. Cost-effective outbreak detection in networks. In Proc. 13th KDD, pages 420--429, 2007.
[25]
J. Lin. Exploring large-data issues in the curriculum: A case study with MapReduce. In 3rd Workshop on Issues in Teaching Computational Linguistics (TeachCL-08) at ACL, pages 54--61, 2008.
[26]
J. Lin and C. Dyer. Text processing with MapReduce, 2009. NAACL/HLT Tutorial, http://www.umiacs.umd.edu/ jimmylin/cloud-computing/NAACL-HLT-2009/inde%x.html.
[27]
J. J. Lin. Scalable language processing algorithms for the masses: A case study in computing word co-occurrence matrices with MapReduce. In Proc. EMNLP, pages 419--428, 2008.
[28]
R. M. C. McCreadie, C. Macdonald, and I. Ounis. On single-pass indexing with MapReduce. In Proc. 32nd SIGIR, pages 742--743, 2009.
[29]
S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science, 1(2), 2005.
[30]
S. Muthukrishnan. MapReduce again, 2008. http://mysliceofpizza.blogspot.com/2008/01/mapreduce-again.html.
[31]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web, 1999.
[32]
S. Papadimitriou and J. Sun. Disco: Distributed co-clustering with Map-Reduce: A case study towards petabyte-scale end-to-end mining. In Proc. of 8th ICDM, pages 512--521, 2008.
[33]
D. Rao and D. Yarowsky. Ranking and semi-supervised classification on large scale graphs using Map-Reduce. In Proc. 4th TextGraphs at ACL/IJCNLP, 2009.
[34]
B. Saha and L. Getoor. On maximum coverage in the streaming model & application to multi-topic blog-watch. In Proc. 9th SDM, pages 697--708, 2008.
[35]
C. E. Tsourakakis. Fast counting of triangles in large real networks without counting: Algorithms and laws. In Proc. 8th ICDM, pages 608--617, 2008.
[36]
C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. DOULION: Counting triangles in massive graphs with a coin. In Proc. 15th KDD, pages 837--846, 2009.

Cited By

View all
  • (2022)The Limitations of Optimization from SamplesJournal of the ACM10.1145/351101869:3(1-33)Online publication date: 11-Jun-2022
  • (2022)A master-apprentice evolutionary algorithm for maximum weighted set K-covering problemApplied Intelligence10.1007/s10489-022-03531-253:2(1912-1944)Online publication date: 4-May-2022
  • (2020)Gene-Similarity Normalization in a Genetic Algorithm for the Maximum k-Coverage ProblemMathematics10.3390/math80405138:4(513)Online publication date: 2-Apr-2020
  • Show More Cited By

Index Terms

  1. Max-cover in map-reduce

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '10: Proceedings of the 19th international conference on World wide web
    April 2010
    1407 pages
    ISBN:9781605587998
    DOI:10.1145/1772690

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 April 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. greedy algorithm
    2. map-reduce
    3. maximum cover

    Qualifiers

    • Research-article

    Conference

    WWW '10
    WWW '10: The 19th International World Wide Web Conference
    April 26 - 30, 2010
    North Carolina, Raleigh, USA

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)The Limitations of Optimization from SamplesJournal of the ACM10.1145/351101869:3(1-33)Online publication date: 11-Jun-2022
    • (2022)A master-apprentice evolutionary algorithm for maximum weighted set K-covering problemApplied Intelligence10.1007/s10489-022-03531-253:2(1912-1944)Online publication date: 4-May-2022
    • (2020)Gene-Similarity Normalization in a Genetic Algorithm for the Maximum k-Coverage ProblemMathematics10.3390/math80405138:4(513)Online publication date: 2-Apr-2020
    • (2020)Distributed Submodular Minimization and Motion Planning Over Discrete State SpaceIEEE Transactions on Control of Network Systems10.1109/TCNS.2019.29339937:2(932-943)Online publication date: Jun-2020
    • (2019)Tight Trade-offs for the Maximum k-Coverage Problem in the General Streaming ModelProceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3294052.3319691(200-217)Online publication date: 25-Jun-2019
    • (2019)Streaming algorithm for maximizing a monotone non-submodular function under d-knapsack constraintOptimization Letters10.1007/s11590-019-01430-zOnline publication date: 6-May-2019
    • (2018)Non-monotone submodular maximization in exponentially fewer iterationsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327162(2359-2370)Online publication date: 3-Dec-2018
    • (2018)Set cover in sub-linear timeProceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3174304.3175463(2467-2486)Online publication date: 7-Jan-2018
    • (2018)Tight bounds on the round complexity of the distributed maximum coverage problemProceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3174304.3175460(2412-2431)Online publication date: 7-Jan-2018
    • (2018)Optimal Distributed Submodular Optimization via SketchingProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3220081(1138-1147)Online publication date: 19-Jul-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    ePub

    View this article in ePub.

    ePub

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media