Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2676870.2676883acmotherconferencesArticle/Chapter ViewAbstractPublication PagespgasConference Proceedingsconference-collections
research-article

HPX: A Task Based Programming Model in a Global Address Space

Published: 06 October 2014 Publication History

Abstract

The significant increase in complexity of Exascale platforms due to energy-constrained, billion-way parallelism, with major changes to processor and memory architecture, requires new energy-efficient and resilient programming techniques that are portable across multiple future generations of machines. We believe that guaranteeing adequate scalability, programmability, performance portability, resilience, and energy efficiency requires a fundamentally new approach, combined with a transition path for existing scientific applications, to fully explore the rewards of todays and tomorrows systems. We present HPX -- a parallel runtime system which extends the C++11/14 standard to facilitate distributed operations, enable fine-grained constraint based parallelism, and support runtime adaptive resource management. This provides a widely accepted API enabling programmability, composability and performance portability of user applications. By employing a global address space, we seamlessly augment the standard to apply to a distributed case. We present HPX's architecture, design decisions, and results selected from a diverse set of application runs showing superior performance, scalability, and efficiency over conventional practice.

References

[1]
"X-Stack: Programming Challenges, Runtime Systems, and Tools, DoE-FOA-0000619," 2012, http://science.energy.gov//media/grants/pdf/foas/2012/SC_FOA_0000619.pdf.
[2]
"The Qthread Library," 2014, http://www.cs.sandia.gov/qthreads/.
[3]
K. Huck, S. Shende, A. Malony, H. Kaiser, A. Porterfield, R. Fowler, and R. Brightwell, "An early prototype of an autonomic performance environment for exascale," in Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, ser. ROSS '13. New York, NY, USA: ACM, 2013, pp. 8:1--8:8. {Online}. Available: http://doi.acm.org/10.1145/2491661.2481434
[4]
M. Anderson, M. Brodowicz, H. Kaiser, B. Adelstein-Lelbach, and T. L. Sterling, "Neutron star evolutions using tabulated equations of state with a new execution model," CoRR, vol. abs/1205.5055, 2012. {Online}. Available: http://dblp.uni-trier.de/db/journals/corr/corr1205.html#abs-1205-5055
[5]
C. Dekate, H. Kaiser, M. Anderson, B. Adelstein-Lelbach, and T. Sterling, "N-Body SVN repository," 2011, available under a BSD-style open source license. Contact [email protected] for repository access. {Online}. Available: https: //svn.cct.lsu.edu/repos/projects/parallex/trunk/history/nbody
[6]
Intel, "Intel Thread Building Blocks 3.0," 2010, http://www.threadingbuildingblocks.org.
[7]
Microsoft, "Microsoft Parallel Pattern Library," 2010, http://msdn.microsoft.com/en-us/library/dd492418.aspx.
[8]
"StarPU - A Unified Runtime System for Heterogeneous Multicore Architectures," 2013, http://runtime.bordeaux.inria.fr/StarPU/.
[9]
"Intel(R) Cilk(tm) Plus," 2014, http://software.intel.com/en-us/intel-cilk-plus.
[10]
"OpenMP Specifications," 2013, http://openmp.org/wp/openmp-specifications/.
[11]
B. L. Chamberlain, D. Callahan, and H. P. Zima, "Parallel programmability and the Chapel language," International Journal of High Performance Computing Applications, vol. 21, pp. 291--312, 2007.
[12]
"Intel SPMD Program Compiler," 2011-2012, http://ispc.github.io/.
[13]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar, "X10: An object-oriented approach to non- uniform cluster computing," SIGPLAN Not., vol. 40, pp. 519--538, October 2005. {Online}. Available: http://doi.acm.org/10.1145/1103845.1094852
[14]
The C++ Standards Committee, "ISO/IEC 14882:2011, Standard for Programming Language C++,", Tech. Rep., 2011, http://www.open-std.org/jtc1/sc22/wg21.
[15]
The C++ Standards Committee, "N3797: Working Draft, Standard for Programming Language C++," Tech. Rep., 2013, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3797.pdf.
[16]
Niklas Gustafsson and Artur Laksberg and Herb Sutter and Sana Mithani, "N3857: Improvements to std::future<T> and Related APIs," The C++ Standards Committee, Tech. Rep., 2014, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3857.pdf.
[17]
"The OmpSs Programming Model," 2013, https://pm.bsc.es/ompss.
[18]
"OpenACC - Directives for Accelerators," 2013, http://www.openacc-standard.org/.
[19]
"C++ AMP (C++ Accelerated Massive Parallelism)," 2013, http://msdn.microsoft.com/en-us/library/hh265137.aspx.
[20]
"CUDA," 2013, http://www.nvidia.com/object/cuda_home_new.html.
[21]
"OpenCL - The open standard for parallel programming of heterogeneous systems," 2013, https://www.khronos.org/opencl/.
[22]
UPC Consortium, "UPC Language Specifications, v1.2," Lawrence Berkeley National Lab, Tech Report LBNL-59208, 2005. {Online}. Available: http://www.gwu.edu/\~{}upc/publications/LBNL-59208.pdf
[23]
Oracle, "Project Frotress," 2011, https://projectfortress.java.net/.
[24]
PGAS, "PGAS - Partitioned Global Address Space," 2011, http://www.pgas.org.
[25]
S. Chatterjee, S. Tasirlar, Z. Budimlic, V. Cavé, M. Chabbi, M. Grossman, V. Sarkar, and Y. Yan, "Integrating asynchronous task parallelism with mpi." in IPDPS. IEEE Computer Society, 2013, pp. 712--725. {Online}. Available: http://dblp.uni-trier.de/db/conf/ipps/ipdps2013.html#ChatterjeeTBCCGSY13
[26]
Message Passing Interface Forum, MPI: A Message-Passing Interface Standard, Version 2.2. Stuttgart, Germany: High Performance Computing Center Stuttgart (HLRS), September 2009.
[27]
H. Kaiser, M. Brodowicz, and T. Sterling, "ParalleX: An Advanced Parallel Execution Model for Scaling-Impaired Applications," in Parallel Processing Workshops. Los Alamitos, CA, USA: IEEE Computer Society, 2009, pp. 394--401.
[28]
T. Heller, H. Kaiser, A. Schäfer, and D. Fey, "Using HPX and LibGeoDecomp for Scaling HPC Applications on Heterogeneous Supercomputers," in Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ser. ScalA '13. New York, NY, USA: ACM, 2013, pp. 1:1--1:8. {Online}. Available: http://doi.acm.org/10.1145/2530268.2530269
[29]
C. Dekate, M. Anderson, M. Brodowicz, H. Kaiser, B. Adelstein-Lelbach, and T. L. Sterling, "Improving the scalability of parallel N-body applications with an event driven constraint based execution model," The International Journal of High Performance Computing Applications, vol. abs/1109.5190, 2012, http://arxiv.org/abs/1109.5190.
[30]
A. Tabbal, M. Anderson, M. Brodowicz, H. Kaiser, and T. Sterling, "Preliminary design examination of the ParalleX system from a software and hardware perspective," SIGMETRICS Performance Evaluation Review, vol. 38, p. 4, Mar 2011.
[31]
M. Anderson, M. Brodowicz, H. Kaiser, and T. L. Sterling, "An application driven analysis of the ParalleX execution model," CoRR, vol. abs/1109.5201, 2011, http://arxiv.org/abs/1109.5201.
[32]
"InifiniBand Trade Association," 2014, http://www.infinibandta.org/.
[33]
A. Kopser and D. Vollrath, "Overview of the Next Generation Cray XMT," in Cray User Group Proceedings, 2011, pp. 1--10.
[34]
C. E. Leiserson, "The Cilk++ concurrency platform," in DAC '09: Proceedings of the 46th Annual Design Automation Conference. New York, NY, USA: ACM, 2009, pp. 522--527. {Online}. Available: http://dx.doi.org/10.1145/1629911.1630048
[35]
L. Dagum and R. Menon, "OpenMP: An Industry- Standard API for Shared-Memory Programming," IEEE Computational Science and Engineering, vol. 5, no. 1, pp. 46--55, 1998.
[36]
R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, and R. Menon, Parallel programming in OpenMP. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2001.
[37]
G. Papadopoulos and D. Culler, "Monsoon: An Explicit Token-Store Architecture," in 17th International Symposium on Computer Architecture, ser. ACM SIGARCH Computer Architecture News, no. 18(2). Seattle, Washington, May 28--31: ACM Digital Library, June 1990, pp. 82--91.
[38]
J. B. Dennis, "First version of a data flow procedure language," in Symposium on Programming, 1974, pp. 362--376.
[39]
PPL, "PPL - Parallel Programming Laboratory," 2011, http://charm.cs.uiuc.edu/.
[40]
"CppLINDA: C++ LINDA implementation," 2013, http://sourceforge.net/projects/cpplinda/.
[41]
D. W. Wall, "Messages as active agents," in Proceedings of the 9th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, ser. POPL '82. New York, NY, USA: ACM, 1982, pp. 34--39. {Online}. Available: http://doi.acm.org/10.1145/582153.582157
[42]
K. Yelick, V. Sarkar, J. Demmel, M. Erez, and D. Quinlan, "DEGAS: Dynamic Exascale Global Address Space," 2013, http://crd.lbl.gov/assets/Uploads/FTG/Projects/DEGAS/RetreatSummer13/DEGAS-Overview-Yelick-Retreat13.pdf.
[43]
H. C. Baker and C. Hewitt, "The incremental garbage collection of processes," in SIGART Bull. New York, NY, USA: ACM, August 1977, pp. 55--59. {Online}. Available: http://doi.acm.org/10.1145/872736.806932
[44]
D. P. Friedman and D. S. Wise, "CONS Should Not Evaluate its Arguments," in ICALP, 1976, pp. 257--284.
[45]
R. H. Halstead, Jr., "MULTILISP: A language for concurrent symbolic computation," ACM Trans. Program. Lang. Syst., vol. 7, pp. 501--538, October 1985. {Online}. Available: http://doi.acm.org/10.1145/4472.4478
[46]
J. B. Dennis and D. Misunas, "A Preliminary Architecture for a Basic Data-Flow Processor," in 25 Years ISCA: Retrospectives and Reprints, 1998, pp. 125--131.
[47]
Arvind and R. Nikhil, "Executing a Program on the MIT Tagged-Token Dataflow Architecture"," in PARLE '87, Parallel Architectures and Languages Europe, Volume 2: Parallel Languages, J. W. de Bakker, A. J. Nijman, and P. C. Treleaven, Eds. Berlin, DE: Springer-Verlag, 1987, lecture Notes in Computer Science 259.
[48]
P. J. Courtois, F. Heymans, and D. L. Parnas, "Concurrent control with "readers" and "writers"," Commun. ACM, vol. 14, no. 10, pp. 667--668, 1971.
[49]
Vicente J. Botet Escriba, "N3865: More Improvements to std::future<T>," The C++ Standards Committee, Tech. Rep., 2014, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3865.pdf.
[50]
Chris Mysen and Niklas Gustafsson and Matt Austern and Jeffrey Yasskin, "N3785: Executors and schedulers, revision 3,", Tech. Rep., 2013, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3785.pdf.
[51]
A. Schïl¡fer and D. Fey, "LibGeoDecomp: A Grid-Enabled Library for Geometric Decomposition Codes," in Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface. Berlin, Heidelberg: Springer, 2008, pp. 285--294.
[52]
MetaScale, "NT2 -- High-performance MATLAB-inspired C++ framework," 2014, http://www.metascale.org/products/nt2.
[53]
Odeint, "Boost.Odeint -- a C++ Library for Solving ODEs," 2014, http://www.odeint.com.
[54]
R. F. Barrett, C. T. Vaughan, and M. A. Heroux, "Minighost: a miniapp for exploring boundary exchange strategies using stencil computations in scientific parallel computing," Sandia National Laboratories, Tech. Rep. SAND, vol. 5294832, 2011.
[55]
M. A. Heroux, D. W. Doerfler, P. S. Crozier, J. M. Willenbring, H. C. Edwards, A. Williams, M. Rajan, E. R. Keiter, H. K. Thornquist, and R. W. Numrich, "Improving performance via mini-applications," Sandia National Laboratories, Tech. Rep. SAND2009-5574, 2009.
[56]
Texas Advanced Computing Center - Stampede. Http://www.tacc.utexas.edu/resources/hpc/stampede. {Online}. Available: http://www.tacc.utexas.edu/resources/hpc/stampede
[57]
T. Heller, H. Kaiser, and K. Iglberger, "Application of the ParalleX Execution Model to Stencil-based Problems," in Proceedings of the International Supercomputing Conference ISC'12, Hamburg, Germany, 2012. {Online}. Available: http://stellar.cct.lsu.edu/pubs/isc2012.pdf

Cited By

View all
  • (2024)Runtime support for CPU-GPU high-performance computing on distributed memory platformsFrontiers in High Performance Computing10.3389/fhpcp.2024.14170402Online publication date: 19-Jul-2024
  • (2024)Distributed Ranges: A Model for Distributed Data Structures, Algorithms, and ViewsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656632(236-246)Online publication date: 30-May-2024
  • (2024)An Efficient Task-Parallel Pipeline Programming FrameworkProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3635037(95-106)Online publication date: 18-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PGAS '14: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models
October 2014
199 pages
ISBN:9781450332477
DOI:10.1145/2676870
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 October 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Exascale
  2. Global Address Space
  3. High Performance Computing
  4. Parallel Runtime Systems
  5. Programming Models

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PGAS '14

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)84
  • Downloads (Last 6 weeks)8
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Runtime support for CPU-GPU high-performance computing on distributed memory platformsFrontiers in High Performance Computing10.3389/fhpcp.2024.14170402Online publication date: 19-Jul-2024
  • (2024)Distributed Ranges: A Model for Distributed Data Structures, Algorithms, and ViewsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656632(236-246)Online publication date: 30-May-2024
  • (2024)An Efficient Task-Parallel Pipeline Programming FrameworkProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3635037(95-106)Online publication date: 18-Jan-2024
  • (2024)Pure: Evolving Message Passing To Better Leverage Shared Memory Within NodesProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638503(133-146)Online publication date: 2-Mar-2024
  • (2024)Accelerating LULESH using HPX - the C++ Standard Library for Parallelism and ConcurrencyPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670529(1-8)Online publication date: 17-Jul-2024
  • (2024)IRIS: A Performance-Portable Framework for Cross-Platform Heterogeneous ComputingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342901035:10(1796-1809)Online publication date: Oct-2024
  • (2024)Lamellar: A Rust-based Asynchronous Tasking and PGAS Runtime for High Performance ComputingSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00165(1236-1251)Online publication date: 17-Nov-2024
  • (2024)IRIS-GNN: Leveraging Graph Neural Networks for Scheduling on Truly Heterogeneous Runtime SystemsSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00147(1071-1080)Online publication date: 17-Nov-2024
  • (2024)Introduction to Parallel and Distributed Programming using N-Body SimulationsSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00052(347-354)Online publication date: 17-Nov-2024
  • (2024)CUDASTF: Bridging the Gap Between CUDA and Task ParallelismProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00049(1-17)Online publication date: 17-Nov-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media