research-article

Mars: a MapReduce framework on graphics processors

Authors:

Naga K. Govindaraju, and

Tuyong WangAuthors Info & Claims

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

October 2008

Pages 260 - 269

https://doi.org/10.1145/1454115.1454152

Published: 25 October 2008 Publication History

Abstract

We design and implement Mars, a MapReduce framework, on graphics processors (GPUs). MapReduce is a distributed programming framework originally proposed by Google for the ease of development of web search applications on a large number of commodity CPUs. Compared with CPUs, GPUs have an order of magnitude higher computation power and memory bandwidth, but are harder to program since their architectures are designed as a special-purpose co-processor and their programming interfaces are typically for graphics applications. As the first attempt to harness GPU's power for MapReduce, we developed Mars on an NVIDIA G80 GPU, which contains over one hundred processors, and evaluated it in comparison with Phoenix, the state-of-the-art MapReduce framework on multi-core CPUs. Mars hides the programming complexity of the GPU behind the simple and familiar MapReduce interface. It is up to 16 times faster than its CPU-based counterpart for six common web applications on a quad-core machine.

References

[1]

A. Ailamaki, N. K. Govindaraju, S. Harizopoulos, and D. Manocha. Query co-processing on commodity processors. In VLDB '06: Proceedings of the 32nd international conference on Very large data bases, pages 1267--1267. VLDB Endowment, 2006.

Digital Library

[2]

AMD CTM. http://ati.amd.com/products/streamprocessor/, 2007.

[3]

Apache Hadoop. http://lucene.apache.org/hadoop/, 2006.

[4]

D. Blythe. The direct3d 10 system. ACM Trans. Graph., 25(3):724--734, 2006.

Digital Library

[5]

I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for gpus: stream computing on graphics hardware. ACM Trans. Graph., 23(3):777--786, 2004.

Digital Library

[6]

B. Catanzaro, N. Sundaram, and K. Keutzer. A map reduce framework for programming graphics processors. In Workshop on Software Tools for MultiCore Systems, 2008.

[7]

H. chih Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-reduce-merge: simplified relational data processing on large clusters. In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1029--1040, New York, NY, USA, 2007. ACM.

Digital Library

[8]

C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In NIPS '07: Proceedings of Twenty-First Annual Conference on Neural Information Processing Systems. Neural Information Processing Systems Foundation, 2007.

[9]

J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008.

Digital Library

[10]

J. Feng, S. Chakraborty, B. Schmidt, W. Liu, and U. D. Bordoloi. Fast schedulability analysis using commodity graphics hardware. 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2007.

Digital Library

[11]

Folding@home. http://www.scei.co.jp/folding, 2007.

[12]

N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: high performance graphics co-processor sorting for large database management. In SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 325--336, New York, NY, USA, 2006. ACM.

Digital Library

[13]

N. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 215--226, New York, NY, USA, 2004. ACM.

Digital Library

[14]

M. Harris, J. Owens, S. Sengupta, Y. Zhang, and A. Davidson. Cudpp: Cuda data parallel primitives library. 2007.

[15]

B. He, N. K. Govindaraju, Q. Luo, and B. Smith. Efficient gather and scatter operations on graphics processors. In SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pages 1--12, New York, NY, USA, 2007. ACM.

Digital Library

[16]

B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 511--524, New York, NY, USA, 2008. ACM.

Digital Library

[17]

D. Horn. Lib GPU FFT, 2006.

[18]

C. Jiang and M. Snir. Automatic tuning matrix multiplication performance on graphics hardware. In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 185--196, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[19]

E. S. Larsen and D. McAllister. Fast matrix multiplies using graphics hardware. In Supercomputing '01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pages 55--55, New York, NY, USA, 2001. ACM.

Digital Library

[20]

M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, pages 287--296, New York, NY, USA, 2008. ACM.

Digital Library

[21]

W. Liu, B. Schmidt, G. Voss, and W. Wittig. Streaming algorithms for biological sequence alignment on gpus. IEEE Transactions on Parallel and Distributed Systems, 18:1270--1281, 2007.

Digital Library

[22]

H. Nguyen. GPU gems 3. Addison-Wesley, 2008.

Digital Library

[23]

NVIDIA Corp. . CUDA Occupancy Calculator, 2007.

[24]

NVIDIA CUDA. http://developer.nvidia.com/object/cuda.html, 2007.

[25]

OpenGL. http://www.opengl.org/, 2007.

[26]

J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26, 2007.

[27]

C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In HPCA '07: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pages 13--24, Washington, DC, USA, 2007. IEEE Computer Society.

Digital Library

[28]

S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for gpu computing. In GH '07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, pages 97--106, Aire-la-Ville, Switzerland, Switzerland, 2007. Eurographics Association.

Digital Library

[29]

SETI@home. http://setiathome.berkeley.edu/, 2007.

[30]

D. Tarditi, S. Puri, and J. Oglesby. Accelerator: using data parallelism to program gpus for general-purpose uses. SIGOPS Oper. Syst. Rev., 40(5):325--335, 2006.

Digital Library

[31]

R. Yates and B. Neto. Modern information retrieval. Addison Wesley, 1 edition, 1999.

Digital Library

Cited By

Naz NZada IMalik ANadeem MAli S(2023)Heterogeneous Hadoop Cluster-Based Image Processing Workload Distribution Framework between CPU and GPUScientific Programming10.1155/2023/12286142023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/1228614
Pandey SKamath ABasu AAamodt TJerger NSwift M(2023)Scoped Buffered Persistency Model for GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575749(688-701)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575749
Zhao XJahre MTang YZhang GEeckhout LAamodt TJerger NSwift M(2023)NUBA: Non-Uniform Bandwidth GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575745(544-559)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575745
Show More Cited By

Index Terms

Mars: a MapReduce framework on graphics processors
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors
  2. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Frameworks
      2. Language types
        Parallel programming languages

Recommendations

Mars: Accelerating MapReduce with Graphics Processors

We design and implement Mars, a MapReduce runtime system accelerated with graphics processing units (GPUs). MapReduce is a simple and flexible parallel programming paradigm originally proposed by Google, for the ease of large-scale data processing on ...
Read More
Clustering billions of data points using GPUs
UCHPC-MAW '09: Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop

In this paper, we report our research on using GPUs to accelerate clustering of very large data sets, which are common in today's real world applications. While many published works have shown that GPUs can be used to accelerate various general purpose ...
Read More
GPU-accelerated predicate evaluation on column store
WAIM'10: Proceedings of the 11th international conference on Web-age information management

Column scan, or predicate evaluation and filtering over a column of data in a database table, is an important primitive for data mining and data warehousing. In this paper, we present our study on accelerating column scan using a massively parallel ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

October 2008

328 pages

ISBN:9781605582825

DOI:10.1145/1454115

General Chair:
Andreas Moshovos
University of Toronto, Canada
,
Program Chairs:
David Tarditi
Microsoft, USA
,
Kunle Olukotun
Stanford University, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PACT '08

Sponsor:

PACT '08: International Conference on Parallel Architectures and Compilation Techniques

October 25 - 29, 2008

Ontario, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Upcoming Conference

PACT '24

Sponsor:
sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Long Beach , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

534
Total Citations
View Citations
3,530
Total Downloads

Downloads (Last 12 months)61
Downloads (Last 6 weeks)7

Other Metrics

View Author Metrics

Citations

Cited By

Naz NZada IMalik ANadeem MAli S(2023)Heterogeneous Hadoop Cluster-Based Image Processing Workload Distribution Framework between CPU and GPUScientific Programming10.1155/2023/12286142023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/1228614
Pandey SKamath ABasu AAamodt TJerger NSwift M(2023)Scoped Buffered Persistency Model for GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575749(688-701)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575749
Zhao XJahre MTang YZhang GEeckhout LAamodt TJerger NSwift M(2023)NUBA: Non-Uniform Bandwidth GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575745(544-559)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575745
Zeng LZou LÖzsu M(2023)SGSI – A Scalable GPU-Friendly Subgraph Isomorphism AlgorithmIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323074435:11(11899-11916)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TKDE.2022.3230744
Su XLin YZou L(2023)FASI: FPGA-friendly Subgraph Isomorphism on Massive Graphs2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00163(2099-2112)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00163
Gurdeep Singh RScholliers C(2023)GaiwanScience of Computer Programming10.1016/j.scico.2023.102989230:COnline publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1016/j.scico.2023.102989
Zhao XWang HHuang AWang DZhang G(2023)Cluster-aware scheduling in multitasking GPUsReal-Time Systems10.1007/s11241-023-09409-x60:1(1-23)Online publication date: 22-Nov-2023
https://doi.org/10.1007/s11241-023-09409-x
Darabi SYousefzadeh-Asl-Miandoab EAkbarzadeh NFalahati HLotfi-Kamran PSadrosadati MSarbazi-Azad H(2022)OSM: Off-Chip Shared Memory for GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.315431533:12(3415-3429)Online publication date: 1-Dec-2022
https://doi.org/10.1109/TPDS.2022.3154315
Zhao CGao WNie FZhou H(2022)A Survey of GPU Multitasking Methods Supported by Hardware ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.311563033:6(1451-1463)Online publication date: 1-Jun-2022
https://doi.org/10.1109/TPDS.2021.3115630
Barreiros WMelo AKong JFerreira RKurc TSaltz JTeodoro G(2022)Efficient microscopy image analysis on CPU-GPU systems with cost-aware irregular data partitioningJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.02.004164(40-54)Online publication date: Jun-2022
https://doi.org/10.1016/j.jpdc.2022.02.004
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents