Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2832241.2832244acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Public Access

Higher-level parallelization for local and distributed asynchronous task-based programming

Published: 15 November 2015 Publication History

Abstract

One of the biggest challenges on the way to exascale computing is programmability in the context of performance portability. The efficient utilization of the prospective architectures of exascale supercomputers will be challenging in many ways, very much because of a massive increase of on-node parallelism, and an increase of complexity of memory hierarchies. Parallel programming models need to be able to formulate algorithms that allow exploiting these architectural peculiarities. The recent revival of interest in the industry and wider community for the C++ language has spurred a remarkable amount of standardization proposals and technical specifications. Among those efforts is the development of seamlessly integrating various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous execution flows, continuation style computation, and explicit fork-join control flow of independent and non-homogeneous code paths. Those proposals are the foundation of a powerful high-level abstraction that allows C++ codes to deal with an ever increasing architectural complexity in recent hardware developments.
In this paper, we present the results of developing those higher level parallelization facilities in HPX, a general purpose C++ runtime system for applications of any scale. The developed higher-level parallelization APIs have been designed to overcome the limitations of today's prevalently used programming models in C++ codes. HPX exposes a uniform higher-level API which gives the application programmer syntactic and semantic equivalence of various types of on-node and off-node parallelism, all of which are well integrated into the C++ type system. We show that these higher level facilities which are fully aligned with modern C++ programming concepts, are easily extensible, fully generic, and enable highly efficient parallelization on par with or better than existing equivalent applications based on OpenMP and/or MPI.

References

[1]
Boost: a collection of free peer-reviewed portable C++ source libraries, 1998--2015. http://www.boost.org/.
[2]
OpenMP V4.0 Specification, 2013. http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf.
[3]
Intel Parallel research Kernels, 2015. https://github.com/ParRes/Kernels.
[4]
N4406: Parallel Algorithms Need Executors. Technical report, 2015. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4406.pdf.
[5]
N4411: Task Block (formerly Task Region) R4. Technical report, 2015. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4411.pdf.
[6]
N4501: Working Draft: Technical Specification for C++ Extensions for Concurrency. Technical report, 2015. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4501.html.
[7]
N4505: Working Draft: Technical Specification for C++ Extensions for Parallelism. Technical report, 2015. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4505.pdf.
[8]
ASCR. The Challenges of Exascale, 2013. hhttp://science.energy.gov/ascr/research/scidac/exascale-challenges/.
[9]
H. C. Baker and C. Hewitt. The incremental garbage collection of processes. In SIGART Bull., pages 55--59, New York, NY, USA, August 1977. ACM.
[10]
R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, and R. Menon. Parallel programming in OpenMP. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001.
[11]
L. Dagum and R. Menon. OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Computational Science and Engineering, 5(1):46--55, 1998.
[12]
R. F. V. der Wijngaart and T. G. Mattson. The parallel research kernels. In IEEE High Performance Extreme Computing Conference, HPEC 2014, Waltham, MA, USA, September 9-11, 2014, pages 1--6, 2014.
[13]
D. P. Friedman and D. S. Wise. CONS Should Not Evaluate its Arguments. In ICALP, pages 257--284, 1976.
[14]
P. Grubel, H. Kaiser, J. Cook, and A. Serio. The Performance Implication of Task Size for Applications on the HPX Runtime System. In Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications (HPCMASPA 2015), IEEE Cluster 2015, September 2015.
[15]
T. Heller, H. Kaiser, and K. Iglberger. Application of the ParalleX Execution Model to Stencil-based Problems. In Proceedings of the International Supercomputing Conference ISC'12, Hamburg, Germany, 2012.
[16]
T. Heller, H. Kaiser, A. Schäfer, and D. Fey. Using HPX and LibGeoDecomp for Scaling HPC Applications on Heterogeneous Supercomputers. In Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA '13, pages 1:1--1:8, New York, NY, USA, 2013. ACM.
[17]
K. Huck, S. Shende, A. Malony, H. Kaiser, A. jh, R. Fowler, and R. Brightwell. An early prototype of an autonomic performance environment for exascale. In Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ROSS '13, pages 8:1--8:8, New York, NY, USA, 2013. ACM.
[18]
Intel. Intel Thread Building Blocks 3.0, 2010. http://www.threadingbuildingblocks.org.
[19]
H. Kaiser, M. Brodowicz, and T. Sterling. ParalleX: An Advanced Parallel Execution Model for Scaling-Impaired Applications. In Parallel Processing Workshops, pages 394--401, Los Alamitos, CA, USA, 2009. IEEE Computer Society.
[20]
H. Kaiser, T. Heller, B. Adelstein-Lelbach, A. Serio, and D. Fey. HPX: A Task Based Programming Model in a Global Address Space. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS '14, pages 6:1--6:11, New York, NY, USA, 2014. ACM.
[21]
H. Kaiser, T. Heller, A. Berge, and B. Adelstein-Lelbach. HPX V0.9.10: A general purpose C++ runtime system for parallel and distributed applications of any scale, 2015. http://github.com/STEllAR-GROUP/hpx.
[22]
P. Kogge et al. ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems. Technical Report TR-2008-13, University of Notre Dame, Notre Dame, IN, 2008.
[23]
J. D. McCalpin. Stream: Sustainable memory bandwidth in high performance computers. Technical report, University of Virginia, Charlottesville, Virginia, 1991-2007. A continually updated technical report. http://www.cs.virginia.edu/stream/.
[24]
J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.
[25]
Message Passing Interface Forum. MPI: A Message-Passing Interface Standard. High Performance Computing Center Stuttgart (HLRS), Stuttgart, Germany, September 2009.
[26]
Microsoft. Microsoft Parallel Pattern Library, 2010. http://msdn.microsoft.com/en-us/library/dd492418.aspx.
[27]
NESRSC. Edison, 2015. https://www.nersc.gov/users/computational-systems/edison/.
[28]
P. Plauger, M. Lee, D. Musser, and A. A. Stepanov. C++ Standard Template Library. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1st edition, 2000.
[29]
PPL. PPL - Parallel Programming Laboratory, 2011. http://charm.cs.uiuc.edu/.
[30]
M. Stumpf. Distributed GPGPU Computing with HPXCL, 2014. Talk at LA-SiGMA TESC Meeting, LSU, Baton Rouge, Louisiana, September 25, 2014, http://stellar.cct.lsu.edu/pubs/martin_stumpf_gputalk.pdf.
[31]
The C++ Standards Committee. ISO International Standard ISO/IEC 14882:2011, Programming Language C++. Technical report, Geneva, Switzerland: International Organization for Standardization (ISO)., 2011. http://www.open-std.org/jtc1/sc22/wg21.
[32]
The C++ Standards Committee. ISO International Standard ISO/IEC 14882:2014, Programming Language C++. Technical report, Geneva, Switzerland: International Organization for Standardization (ISO)., 2014. http://www.open-std.org/jtc1/sc22/wg21.
[33]
The STEllAR Group, Louisiana State University. HPX source code repository, 2007-2015. http://github.com/STEllAR-GROUP/hpx, Available under the Boost Software License (a BSD-style open source license).
[34]
Thomas Sterling. ParalleX Execution Model V3.1, 2013. https://www.crest.iu.edu/projects/xpress/_media/public/parallex_v3-1_03182013.doc.
[35]
M. Wittmann and G. Hager. Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems. CoRR, abs/1101.0093, 2011.

Cited By

View all
  • (2022)Towards superior software portability with SHAD and HPX C++ librariesProceedings of the 19th ACM International Conference on Computing Frontiers10.1145/3528416.3530784(251-257)Online publication date: 17-May-2022
  • (2022)From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)10.1109/P3HPC56579.2022.00014(89-99)Online publication date: Nov-2022
  • (2022)Broad Performance Measurement Support for Asynchronous Multi-Tasking with APEX2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)10.1109/ESPM256814.2022.00008(20-29)Online publication date: Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESPM '15: Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware
November 2015
58 pages
ISBN:9781450339964
DOI:10.1145/2832241
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HPX
  2. distributed asynchronous computing
  3. task-based parallelism

Qualifiers

  • Research-article

Funding Sources

Conference

SC15
Sponsor:

Acceptance Rates

ESPM '15 Paper Acceptance Rate 5 of 10 submissions, 50%;
Overall Acceptance Rate 5 of 10 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)8
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Towards superior software portability with SHAD and HPX C++ librariesProceedings of the 19th ACM International Conference on Computing Frontiers10.1145/3528416.3530784(251-257)Online publication date: 17-May-2022
  • (2022)From Task-Based GPU Work Aggregation to Stellar Mergers: Turning Fine-Grained CPU Tasks into Portable GPU Kernels2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)10.1109/P3HPC56579.2022.00014(89-99)Online publication date: Nov-2022
  • (2022)Broad Performance Measurement Support for Asynchronous Multi-Tasking with APEX2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)10.1109/ESPM256814.2022.00008(20-29)Online publication date: Nov-2022
  • (2022)From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)10.1109/ESPM256814.2022.00007(10-19)Online publication date: Nov-2022
  • (2021)Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW52791.2021.00066(377-386)Online publication date: Jun-2021
  • (2021)Octo-Tiger’s New Hydro Module and Performance Using HPX+CUDA on ORNL’s Summit2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00059(204-214)Online publication date: Sep-2021
  • (2020)Performance Analysis of a Quantum Monte Carlo Application on Multiple Hardware Architectures Using the HPX Runtime2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)10.1109/ScalA51936.2020.00015(77-84)Online publication date: Nov-2020
  • (2020)Distributed Asynchronous Array Computing with the JetLag Environment2020 IEEE/ACM 9th Workshop on Python for High-Performance and Scientific Computing (PyHPC)10.1109/PyHPC51966.2020.00011(49-57)Online publication date: Nov-2020
  • (2020)Towards Distributed Software Resilience in Asynchronous Many- Task Programming Models2020 IEEE/ACM 10th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS)10.1109/FTXS51974.2020.00007(11-20)Online publication date: Nov-2020
  • (2020)Towards a Scalable and Distributed Infrastructure for Deep Learning Applications2020 IEEE/ACM Fourth Workshop on Deep Learning on Supercomputers (DLS)10.1109/DLS51937.2020.00008(20-30)Online publication date: Nov-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media