research-article

Multi-target C++ implementation of parallel skeletons

Authors:

Wilfried Kirschenmann,

Laurent Plagne,

Stephane VialleAuthors Info & Claims

POOSC '09: Proceedings of the 8th workshop on Parallel/High-Performance Object-Oriented Scientific Computing

Article No.: 7, Pages 1 - 10

https://doi.org/10.1145/1595655.1595662

Published: 07 July 2009 Publication History

Abstract

This paper presents the design of an efficient multi-target (CPU+GPU) implementation for the Parallel_for skeleton. Emerging massively parallel architectures promise very high performances for a low cost. However, these architectures change faster than ever. Thus, optimization of codes becomes a very complex and time consumming task. We have identified the data storage as the main difference between the CPU and the GPU implementation of a code. We introduce an abstract data layout in order to adapt the data storage. Based on this layout, the utilization of Parallel_for skeleton allows to compile and execute the same program both on CPU and on GPU. Once compiled, the program runs close to the hardware limits.

References

[1]

P. An, A. Jula, S. Rus, S. Saunders, T. G. Smith, G. Tanase, N. Thomas, N. M. Amato, and L. Rauchwerger. Stapl: An adaptive, generic parallel c++ library. In H. G. Dietz, editor, LCPC, volume 2624 of Lecture Notes in Computer Science, pages 193--208. Springer, 2001.

Digital Library

[2]

K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical report, EECS Department, University of California, Berkeley, Dec 2006.

[3]

H. Bischof, S. Gorlatch, and R. Leshchinskiy. Generic parallel programming using c++ templates and skeletons. Domain-Specific Program Generation, pages 107--126, 2004.

[4]

M. Cole. Algorithmic skeletons: structured management of parallel computation. MIT Press, Cambridge, MA, USA, 1991.

Digital Library

[5]

cudpp: http://gpgpu.org/developer/cudpp.

[6]

K. Czarnecki, J. T. Odonnell, J. Striegnitz, Walid, and Taha. Dsl implementation in metaocaml, template haskell, and c++. LNCS: Domain-Specific Program Generation, 3016(2):51--72, 2004.

[7]

G. H. Golub and C. F. Van Loan. Matrix Computations (Johns Hopkins Studies in Mathematical Sciences). The Johns Hopkins University Press, October 1996.

[8]

Intel. MKL web page can be found from: http://www.intel.com.

[9]

E. Johnson and D. Gannon. Programming with the hpc++ parallel standard template library. In In Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing (PP'97, 1997.

[10]

L. Kal&#233; and S. Krishnan. CHARM++: A Portable Concurrent Object Oriented System Based on C++. In A. Paepcke, editor, Proceedings of OOPSLA'93, pages 91--108. ACM Press, September 1993.

Digital Library

[11]

W. Kirschenmann, L. Plagne, S. Ploix, A. Pon&#231;ot, and S. Vialle. Massively parallel solving of 3D simplified P_N equations on graphic processing units. In Proceedings of Mathematics, Computational Methods & Reactor Physics, May 2009.

[12]

MTL4: http://www.osl.iu.edu/research/mtl/mtl4/.

[13]

Microsoft. Microsoft PPL web page can be found from: http://msdn.microsoft.com/en-us/concurrency/default.aspx.

[14]

Netlib. BLAS: http://www.netlib.org/blas.

[15]

NVIDIA. CUDA Programming Guide 2.0, July 2008.

[16]

L. Plagne and A. Pon&#231;ot. Generic programming for deterministic neutron transport codes. In Proceedings of Mathematics and Computation, Supercomputing, Reactor Physics and Nuclear and Biological Applications, Palais des Papes, Avignon, France, September 2005.

[17]

P. Plauger, M. Lee, D. Musser, and A. A. Stepanov. C++ Standard Template Library. Prentice Hall PTR, Upper Saddle River, NJ, USA, 2000.

Digital Library

[18]

J. Reinders. Intel threading building blocks. O'Reilly & Associates, Inc., Sebastopol, CA, USA, 2007.

Digital Library

[19]

B. Stroustrup. The C++ Programming Language. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2000.

Digital Library

[20]

G. Tanase, C. Raman, M. Bianco, N. M. Amato, and L. Rauchwerger. Associative parallel containers in stapl. In V. S. Adve, M. J. Garzar&#225;n, and P. Petersen, editors, LCPC, volume 5234 of Lecture Notes in Computer Science, pages 156--171. Springer, 2007.

[21]

Thrust: http://code.google.com/p/thrust/.

[22]

T. L. Veldhuizen. Arrays in blitz++. In Proceedings of the 2nd International Scientific Computing in Object-Oriented Parallel Environments (ISCOPE'98), Lecture Notes in Computer Science. Springer-Verlag, 1998.

Digital Library

[23]

R. C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1--2):3--25, 2001.

[24]

W. A. Wulf and S. A. McKee. Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News, 23(1):20--24, 1995.

Digital Library

Cited By

Moustafa SKirschenmann WDupros FAochi H(2018)Task-Based Programming on Emerging Parallel Architectures for Finite-Differences Seismic Numerical KernelEuro-Par 2018: Parallel Processing10.1007/978-3-319-96983-1_54(764-777)Online publication date: 1-Aug-2018
https://doi.org/10.1007/978-3-319-96983-1_54
Kessler CGorlatch SEnmyren JDastgeer USteuwer MKegel P(2017)Skeleton Programming for Portable Many‐Core ComputingProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch6(121-141)Online publication date: 27-Jan-2017
https://doi.org/10.1002/9781119332015.ch6
Dastgeer UEnmyren JKessler CPankratius VPhilippsen M(2011)Auto-tuning SkePUProceedings of the 4th International Workshop on Multicore Software Engineering10.1145/1984693.1984697(25-32)Online publication date: 21-May-2011
https://dl.acm.org/doi/10.1145/1984693.1984697
Show More Cited By

Index Terms

Multi-target C++ implementation of parallel skeletons
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

A GEMM interface and implementation on NVIDIA GPUs for multiple small matrices

We present an interface and an implementation of the General Matrix Multiply (GEMM) routine for multiple small matrices processed simultaneously on NVIDIA graphics processing units (GPUs). We focus on matrix sizes under 16. The implementation can be ...
PSkel: A stencil programming framework for CPU-GPU systems

The use of Graphics Processing Units GPUs for high-performance computing has gained growing momentum in recent years. Unfortunately, GPU-programming platforms like Compute Unified Device Architecture CUDA are complex, user unfriendly, and increase the ...
A Comparison of Graphics Processor Architectures for RFID Simulation
NBIS '14: Proceedings of the 2014 17th International Conference on Network-Based Information Systems

Graphics Processing Units (GPUs) have a huge number of cores to speed up graphical computations and they are being used in a wide area of general-purpose applications that require high performances. In this paper, GPU computing is exploited to model the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

POOSC '09: Proceedings of the 8th workshop on Parallel/High-Performance Object-Oriented Scientific Computing

July 2009

85 pages

ISBN:9781605585475

DOI:10.1145/1595655

Conference Chair:
Kei Davis|Chair
Computer Science for High Performance Computing, Los Alamos National Laboratory, Los Alamos, NM

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ECOOP '09

ECOOP '09: European Conference on Object-Oriented Programming

July 7, 2009

Genova, Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
196
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Moustafa SKirschenmann WDupros FAochi H(2018)Task-Based Programming on Emerging Parallel Architectures for Finite-Differences Seismic Numerical KernelEuro-Par 2018: Parallel Processing10.1007/978-3-319-96983-1_54(764-777)Online publication date: 1-Aug-2018
https://doi.org/10.1007/978-3-319-96983-1_54
Kessler CGorlatch SEnmyren JDastgeer USteuwer MKegel P(2017)Skeleton Programming for Portable Many‐Core ComputingProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch6(121-141)Online publication date: 27-Jan-2017
https://doi.org/10.1002/9781119332015.ch6
Dastgeer UEnmyren JKessler CPankratius VPhilippsen M(2011)Auto-tuning SkePUProceedings of the 4th International Workshop on Multicore Software Engineering10.1145/1984693.1984697(25-32)Online publication date: 21-May-2011
https://dl.acm.org/doi/10.1145/1984693.1984697
Enmyren JKessler CLoulergue F(2010)SkePUProceedings of the fourth international workshop on High-level parallel programming and applications10.1145/1863482.1863487(5-14)Online publication date: 25-Sep-2010
https://dl.acm.org/doi/10.1145/1863482.1863487
Kirschenmann WPlagne LVialle S(2010)Multi-Target vectorization with MTPS c++ generic libraryProceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 210.1007/978-3-642-28145-7_33(336-346)Online publication date: 6-Jun-2010
https://dl.acm.org/doi/10.1007/978-3-642-28145-7_33

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents