Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1454115.1454152acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Mars: a MapReduce framework on graphics processors

Published: 25 October 2008 Publication History
  • Get Citation Alerts
  • Abstract

    We design and implement Mars, a MapReduce framework, on graphics processors (GPUs). MapReduce is a distributed programming framework originally proposed by Google for the ease of development of web search applications on a large number of commodity CPUs. Compared with CPUs, GPUs have an order of magnitude higher computation power and memory bandwidth, but are harder to program since their architectures are designed as a special-purpose co-processor and their programming interfaces are typically for graphics applications. As the first attempt to harness GPU's power for MapReduce, we developed Mars on an NVIDIA G80 GPU, which contains over one hundred processors, and evaluated it in comparison with Phoenix, the state-of-the-art MapReduce framework on multi-core CPUs. Mars hides the programming complexity of the GPU behind the simple and familiar MapReduce interface. It is up to 16 times faster than its CPU-based counterpart for six common web applications on a quad-core machine.

    References

    [1]
    A. Ailamaki, N. K. Govindaraju, S. Harizopoulos, and D. Manocha. Query co-processing on commodity processors. In VLDB '06: Proceedings of the 32nd international conference on Very large data bases, pages 1267--1267. VLDB Endowment, 2006.
    [2]
    AMD CTM. http://ati.amd.com/products/streamprocessor/, 2007.
    [3]
    Apache Hadoop. http://lucene.apache.org/hadoop/, 2006.
    [4]
    D. Blythe. The direct3d 10 system. ACM Trans. Graph., 25(3):724--734, 2006.
    [5]
    I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for gpus: stream computing on graphics hardware. ACM Trans. Graph., 23(3):777--786, 2004.
    [6]
    B. Catanzaro, N. Sundaram, and K. Keutzer. A map reduce framework for programming graphics processors. In Workshop on Software Tools for MultiCore Systems, 2008.
    [7]
    H. chih Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-reduce-merge: simplified relational data processing on large clusters. In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1029--1040, New York, NY, USA, 2007. ACM.
    [8]
    C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In NIPS '07: Proceedings of Twenty-First Annual Conference on Neural Information Processing Systems. Neural Information Processing Systems Foundation, 2007.
    [9]
    J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008.
    [10]
    J. Feng, S. Chakraborty, B. Schmidt, W. Liu, and U. D. Bordoloi. Fast schedulability analysis using commodity graphics hardware. 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2007.
    [11]
    Folding@home. http://www.scei.co.jp/folding, 2007.
    [12]
    N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: high performance graphics co-processor sorting for large database management. In SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 325--336, New York, NY, USA, 2006. ACM.
    [13]
    N. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 215--226, New York, NY, USA, 2004. ACM.
    [14]
    M. Harris, J. Owens, S. Sengupta, Y. Zhang, and A. Davidson. Cudpp: Cuda data parallel primitives library. 2007.
    [15]
    B. He, N. K. Govindaraju, Q. Luo, and B. Smith. Efficient gather and scatter operations on graphics processors. In SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pages 1--12, New York, NY, USA, 2007. ACM.
    [16]
    B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 511--524, New York, NY, USA, 2008. ACM.
    [17]
    D. Horn. Lib GPU FFT, 2006.
    [18]
    C. Jiang and M. Snir. Automatic tuning matrix multiplication performance on graphics hardware. In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 185--196, Washington, DC, USA, 2005. IEEE Computer Society.
    [19]
    E. S. Larsen and D. McAllister. Fast matrix multiplies using graphics hardware. In Supercomputing '01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pages 55--55, New York, NY, USA, 2001. ACM.
    [20]
    M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, pages 287--296, New York, NY, USA, 2008. ACM.
    [21]
    W. Liu, B. Schmidt, G. Voss, and W. Wittig. Streaming algorithms for biological sequence alignment on gpus. IEEE Transactions on Parallel and Distributed Systems, 18:1270--1281, 2007.
    [22]
    H. Nguyen. GPU gems 3. Addison-Wesley, 2008.
    [23]
    NVIDIA Corp. . CUDA Occupancy Calculator, 2007.
    [24]
    NVIDIA CUDA. http://developer.nvidia.com/object/cuda.html, 2007.
    [25]
    OpenGL. http://www.opengl.org/, 2007.
    [26]
    J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26, 2007.
    [27]
    C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In HPCA '07: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pages 13--24, Washington, DC, USA, 2007. IEEE Computer Society.
    [28]
    S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for gpu computing. In GH '07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, pages 97--106, Aire-la-Ville, Switzerland, Switzerland, 2007. Eurographics Association.
    [29]
    SETI@home. http://setiathome.berkeley.edu/, 2007.
    [30]
    D. Tarditi, S. Puri, and J. Oglesby. Accelerator: using data parallelism to program gpus for general-purpose uses. SIGOPS Oper. Syst. Rev., 40(5):325--335, 2006.
    [31]
    R. Yates and B. Neto. Modern information retrieval. Addison Wesley, 1 edition, 1999.

    Cited By

    View all
    • (2023)Heterogeneous Hadoop Cluster-Based Image Processing Workload Distribution Framework between CPU and GPUScientific Programming10.1155/2023/12286142023Online publication date: 1-Jan-2023
    • (2023)Scoped Buffered Persistency Model for GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575749(688-701)Online publication date: 27-Jan-2023
    • (2023)NUBA: Non-Uniform Bandwidth GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575745(544-559)Online publication date: 27-Jan-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques
    October 2008
    328 pages
    ISBN:9781605582825
    DOI:10.1145/1454115
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 October 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPGPU
    2. MapReduce
    3. data parallelism
    4. graphics processor
    5. multi-core processors
    6. web analysis

    Qualifiers

    • Research-article

    Conference

    PACT '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Upcoming Conference

    PACT '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)61
    • Downloads (Last 6 weeks)7

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Heterogeneous Hadoop Cluster-Based Image Processing Workload Distribution Framework between CPU and GPUScientific Programming10.1155/2023/12286142023Online publication date: 1-Jan-2023
    • (2023)Scoped Buffered Persistency Model for GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575749(688-701)Online publication date: 27-Jan-2023
    • (2023)NUBA: Non-Uniform Bandwidth GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575745(544-559)Online publication date: 27-Jan-2023
    • (2023)SGSI – A Scalable GPU-Friendly Subgraph Isomorphism AlgorithmIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323074435:11(11899-11916)Online publication date: 1-Nov-2023
    • (2023)FASI: FPGA-friendly Subgraph Isomorphism on Massive Graphs2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00163(2099-2112)Online publication date: Apr-2023
    • (2023)GaiwanScience of Computer Programming10.1016/j.scico.2023.102989230:COnline publication date: 1-Aug-2023
    • (2023)Cluster-aware scheduling in multitasking GPUsReal-Time Systems10.1007/s11241-023-09409-x60:1(1-23)Online publication date: 22-Nov-2023
    • (2022)OSM: Off-Chip Shared Memory for GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.315431533:12(3415-3429)Online publication date: 1-Dec-2022
    • (2022)A Survey of GPU Multitasking Methods Supported by Hardware ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.311563033:6(1451-1463)Online publication date: 1-Jun-2022
    • (2022)Efficient microscopy image analysis on CPU-GPU systems with cost-aware irregular data partitioningJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.02.004164(40-54)Online publication date: Jun-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media