Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1595655.1595662acmotherconferencesArticle/Chapter ViewAbstractPublication PagesecoopConference Proceedingsconference-collections
research-article

Multi-target C++ implementation of parallel skeletons

Published: 07 July 2009 Publication History

Abstract

This paper presents the design of an efficient multi-target (CPU+GPU) implementation for the Parallel_for skeleton. Emerging massively parallel architectures promise very high performances for a low cost. However, these architectures change faster than ever. Thus, optimization of codes becomes a very complex and time consumming task. We have identified the data storage as the main difference between the CPU and the GPU implementation of a code. We introduce an abstract data layout in order to adapt the data storage. Based on this layout, the utilization of Parallel_for skeleton allows to compile and execute the same program both on CPU and on GPU. Once compiled, the program runs close to the hardware limits.

References

[1]
P. An, A. Jula, S. Rus, S. Saunders, T. G. Smith, G. Tanase, N. Thomas, N. M. Amato, and L. Rauchwerger. Stapl: An adaptive, generic parallel c++ library. In H. G. Dietz, editor, LCPC, volume 2624 of Lecture Notes in Computer Science, pages 193--208. Springer, 2001.
[2]
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical report, EECS Department, University of California, Berkeley, Dec 2006.
[3]
H. Bischof, S. Gorlatch, and R. Leshchinskiy. Generic parallel programming using c++ templates and skeletons. Domain-Specific Program Generation, pages 107--126, 2004.
[4]
M. Cole. Algorithmic skeletons: structured management of parallel computation. MIT Press, Cambridge, MA, USA, 1991.
[5]
cudpp: http://gpgpu.org/developer/cudpp.
[6]
K. Czarnecki, J. T. Odonnell, J. Striegnitz, Walid, and Taha. Dsl implementation in metaocaml, template haskell, and c++. LNCS: Domain-Specific Program Generation, 3016(2):51--72, 2004.
[7]
G. H. Golub and C. F. Van Loan. Matrix Computations (Johns Hopkins Studies in Mathematical Sciences). The Johns Hopkins University Press, October 1996.
[8]
Intel. MKL web page can be found from: http://www.intel.com.
[9]
E. Johnson and D. Gannon. Programming with the hpc++ parallel standard template library. In In Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing (PP'97, 1997.
[10]
L. Kalé and S. Krishnan. CHARM++: A Portable Concurrent Object Oriented System Based on C++. In A. Paepcke, editor, Proceedings of OOPSLA'93, pages 91--108. ACM Press, September 1993.
[11]
W. Kirschenmann, L. Plagne, S. Ploix, A. Ponçot, and S. Vialle. Massively parallel solving of 3D simplified PN equations on graphic processing units. In Proceedings of Mathematics, Computational Methods & Reactor Physics, May 2009.
[12]
MTL4: http://www.osl.iu.edu/research/mtl/mtl4/.
[13]
Microsoft. Microsoft PPL web page can be found from: http://msdn.microsoft.com/en-us/concurrency/default.aspx.
[14]
Netlib. BLAS: http://www.netlib.org/blas.
[15]
NVIDIA. CUDA Programming Guide 2.0, July 2008.
[16]
L. Plagne and A. Ponçot. Generic programming for deterministic neutron transport codes. In Proceedings of Mathematics and Computation, Supercomputing, Reactor Physics and Nuclear and Biological Applications, Palais des Papes, Avignon, France, September 2005.
[17]
P. Plauger, M. Lee, D. Musser, and A. A. Stepanov. C++ Standard Template Library. Prentice Hall PTR, Upper Saddle River, NJ, USA, 2000.
[18]
J. Reinders. Intel threading building blocks. O'Reilly & Associates, Inc., Sebastopol, CA, USA, 2007.
[19]
B. Stroustrup. The C++ Programming Language. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2000.
[20]
G. Tanase, C. Raman, M. Bianco, N. M. Amato, and L. Rauchwerger. Associative parallel containers in stapl. In V. S. Adve, M. J. Garzarán, and P. Petersen, editors, LCPC, volume 5234 of Lecture Notes in Computer Science, pages 156--171. Springer, 2007.
[21]
Thrust: http://code.google.com/p/thrust/.
[22]
T. L. Veldhuizen. Arrays in blitz++. In Proceedings of the 2nd International Scientific Computing in Object-Oriented Parallel Environments (ISCOPE'98), Lecture Notes in Computer Science. Springer-Verlag, 1998.
[23]
R. C. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(1--2):3--25, 2001.
[24]
W. A. Wulf and S. A. McKee. Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News, 23(1):20--24, 1995.

Cited By

View all
  • (2018)Task-Based Programming on Emerging Parallel Architectures for Finite-Differences Seismic Numerical KernelEuro-Par 2018: Parallel Processing10.1007/978-3-319-96983-1_54(764-777)Online publication date: 1-Aug-2018
  • (2017)Skeleton Programming for Portable Many‐Core ComputingProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch6(121-141)Online publication date: 27-Jan-2017
  • (2011)Auto-tuning SkePUProceedings of the 4th International Workshop on Multicore Software Engineering10.1145/1984693.1984697(25-32)Online publication date: 21-May-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
POOSC '09: Proceedings of the 8th workshop on Parallel/High-Performance Object-Oriented Scientific Computing
July 2009
85 pages
ISBN:9781605585475
DOI:10.1145/1595655
  • Conference Chair:
  • Kei Davis|Chair
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. C++ templates
  2. Nvidia CUDA
  3. data layout
  4. intel TBB
  5. parallel computing
  6. parallel skeletons

Qualifiers

  • Research-article

Conference

ECOOP '09

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Task-Based Programming on Emerging Parallel Architectures for Finite-Differences Seismic Numerical KernelEuro-Par 2018: Parallel Processing10.1007/978-3-319-96983-1_54(764-777)Online publication date: 1-Aug-2018
  • (2017)Skeleton Programming for Portable Many‐Core ComputingProgramming multi‐core and many‐core computing systems10.1002/9781119332015.ch6(121-141)Online publication date: 27-Jan-2017
  • (2011)Auto-tuning SkePUProceedings of the 4th International Workshop on Multicore Software Engineering10.1145/1984693.1984697(25-32)Online publication date: 21-May-2011
  • (2010)SkePUProceedings of the fourth international workshop on High-level parallel programming and applications10.1145/1863482.1863487(5-14)Online publication date: 25-Sep-2010
  • (2010)Multi-Target vectorization with MTPS c++ generic libraryProceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 210.1007/978-3-642-28145-7_33(336-346)Online publication date: 6-Jun-2010

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media