Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2832241.2832246acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Public Access

PPL: an abstract runtime system for hybrid parallel programming

Published: 15 November 2015 Publication History

Abstract

Hardware trends indicate that supercomputers will see fast growing intra-node parallelism. Future programming models will need to carefully manage the interaction between inter- and intra-node parallelism to cope with this evolution. There exist many programming models which expose both levels of parallelism. However, they do not scale well as per-node thread counts rise and there is limited interoperability between threading and communication, leading to unnecessary software overheads and an increased amount of unnecessary communication. To address this, it is necessary to understand the limitations of current models and develop new approaches.
We propose a new runtime system design, PPL, which abstracts important high-level concepts of a typical parallel system for distributed-memory machines. By modularizing these elements, layers can be tested to better understand the needs of future programming models. We present details of the design and development implementation of PPL in C++11 and evaluate the performance of several different module implementations through micro-benchmarks and three applications: Barnes-Hut, Monte Carlo particle tracking, and a sparse-triangular solver.

References

[1]
Mellanox technologies. http://www.mellanox.com.
[2]
B. Acun, A. Gupta, N. Jain, et al. Parallel programming with migratable objects: Charm++ in practice. In Supercomputing 2014, pages 647--658. IEEE, 2014.
[3]
A. Amer, H. Lu, Y. Wei, et al. MPI+Threads: Runtime contention and remedies. In PPoPP 2015, pages 239--248, New York, NY, USA, 2015. ACM.
[4]
P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Toward efficient support for multithreaded mpi communication. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 120--129. Springer, 2008.
[5]
P. Balaji, D. Buntinas, D. Goodell, W. Gropp, and R. Thakur. Fine-grained multithreading support for hybrid threaded mpi programming. International Journal of High Performance Computing Applications, 24(1):49--57, 2010.
[6]
M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. Legion: expressing locality and independence with logical regions. In Supercomputing 2012, page 66. IEEE Computer Society Press, 2012.
[7]
K. Bergman, S. Borkar, D. Campbell, et al. Exascale computing study: Technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech. Rep, 15, 2008.
[8]
D. Bonachea. GASNet specification, v1.8. http://gasnet.lbl.gov/#spec, November 2008.
[9]
P. Charles, C. Grothoff, V. Saraswat, et al. X10: an object-oriented approach to non-uniform cluster computing. ACM SIGPlan Notices, 40(10):519--538, 2005.
[10]
C. Coarfa, Y. Dotsenko, J. Mellor-Crummey, et al. An evaluation of global address space languages: Co-array Fortran and Unified Parallel C. In PPoPP 2005, pages 36--47. ACM, 2005.
[11]
K. Devine, E. Boman, R. Heaphy, et al. Zoltan data management services for parallel dynamic applications. Computing in Science & Engineering, 4(2):90--96, 2002.
[12]
G. Dózsa, S. Kumar, P. Balaji, D. Buntinas, D. Goodell, W. Gropp, J. Ratterman, and R. Thakur. Enabling concurrent multithreaded MPI communication on multicore petascale systems. In Recent Advances in the Message Passing Interface, pages 11--20. Springer, 2010.
[13]
K. G. Felker, A. R. Siegel, K. S. Smith, P. K. Romano, and B. Forget. The Energy Band Memory Server Algorithm for Parallel Monte Carlo Transport Calculations. In Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo, 2013.
[14]
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In ACM Sigplan Notices, volume 33, pages 212--223. ACM, 1998.
[15]
W. Gropp and R. Thakur. Thread-safety in an MPI implementation: Requirements and analysis. Parallel Computing, 33(9):595--604, 2007.
[16]
P. Husbands, C. Iancu, and K. Yelick. A performance analysis of the berkeley upc compiler. In ICS '03, pages 63--73. ACM, 2003.
[17]
H. Kaiser, T. Heller, B. Adelstein-Lelbach, A. Serio, and D. Fey. HPX: A task based programming model in a global address space. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, page 6. ACM, 2014.
[18]
L. V. Kale and S. Krishnan. CHARM++: A portable concurrent object oriented system based on C++. In OOPSLA '93, pages 91--108, New York, NY, USA, 1993. ACM.
[19]
T. Mattson, R. Cledat, Z. Budimlic, et al. OCR: The open community runtime interface version 1.1.0. http://xstack.exascale-tech.com/git/public?p=xstack.git;a=blob;f=ocr/spec/ocr-1.0.0.pdf, June 2015.
[20]
MPI Forum. MPI: A message-passing interface standard version 3.0. http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf, Sept. 2012.
[21]
R. W. Numrich and J. Reid. Co-Array Fortran for parallel programming. In ACM Sigplan Fortran Forum, volume 17, pages 1--31. ACM, 1998.
[22]
OpenMP Architedcture Review Board. OpenMP application program interface version 4.0. http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf, July 213.
[23]
J. Reinders. Intel threading building blocks: outfitting C++ for multi-core processor parallelism. O'Reilly Media, Inc., 2007.
[24]
A. Righi. umalloc.c file reference. http://minirighi.sourceforge.net/html/umalloc_8c.html. Accessed: 2014-10-17.
[25]
J. K. Salmon. Parallel hierarchical N-body methods. PhD thesis, California Institute of Technology, 1991.
[26]
V. Sarkar, W. Harrod, and A. E. Snavely. Software challenges in extreme scale systems. In Journal of Physics: Conference Series, volume 180, page 012045. IOP Publishing, 2009.
[27]
S. Seo, A. Amer, P. Balaji, P. Beckman, C. Bordage, G. Bosilca, A. Brooks, A. CastellÃş, D. Genet, T. Herault, P. Jindal, L. V. Kale, S. Krishnamoorthy, J. Lifflander, H. Lu, E. Meneses, M. Snir, and Y. Sun. Argobots: A lightweight low-level threading/tasking framework. http://collab.mcs.anl.gov/display/ARGOBOTS/, 2015.
[28]
M. Si, A. J. Peña, P. Balaji, et al. MT-MPI: Multithreaded MPI for many-core environments. In ICS '14, pages 125--134. ACM, 2014.
[29]
H. Tang and T. Yang. Optimizing threaded MPI execution on SMP clusters. In ICS '01, pages 381--392. ACM, 2001.
[30]
Texas Advanced Computing Center. Stampede. portal.tacc.utexas.edu/user-guides/stampede. Accessed: 2015-01-13.
[31]
E. Totoni, M. T. Heath, and L. V. Kale. Structure-adaptive parallel solution of sparse triangular linear systems. Parallel Computing, 40(9):454--470, 2014.
[32]
UPC Consortium. UPC language specifications v1.3. http://upc.lbl.gov/publications/upc-spec-1.3.pdf, 2013.
[33]
V. M. Weaver. Linux perf_event features and overhead. In The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath, page 80, 2013.
[34]
K. B. Wheeler, R. C. Murphy, and D. Thain. Qthreads: An API for programming with millions of lightweight threads. In IPDPS 2008, pages 1--8. IEEE, 2008.
[35]
J. Zhang. MPI-3 EBMS. http://github.com/ANL-CESAR/EBMS, 2015.
[36]
J. Zhang, B. Behzad, and M. Snir. Design of a multithreaded Barnes-Hut algorithm for multicore clusters. IEEE Transactions on Parallel and Distributed Systems, 26(7):31--36, 2015.

Cited By

View all
  • (2018)Aluminum: An Asynchronous, GPU-Aware Communication Library Optimized for Large-Scale Training of Deep Neural Networks on HPC Systems2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC)10.1109/MLHPC.2018.8638639(1-13)Online publication date: Nov-2018
  • (2016)Towards millions of communicating threadsProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966914(1-14)Online publication date: 25-Sep-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESPM '15: Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware
November 2015
58 pages
ISBN:9781450339964
DOI:10.1145/2832241
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. PGAS
  2. RDMA
  3. distributed-memory parallelism
  4. multithreading
  5. one-sided communication
  6. programming models

Qualifiers

  • Research-article

Funding Sources

  • NSF (National Science Foundation)
  • Sandia National Laboratories

Conference

SC15
Sponsor:

Acceptance Rates

ESPM '15 Paper Acceptance Rate 5 of 10 submissions, 50%;
Overall Acceptance Rate 5 of 10 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)3
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Aluminum: An Asynchronous, GPU-Aware Communication Library Optimized for Large-Scale Training of Deep Neural Networks on HPC Systems2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC)10.1109/MLHPC.2018.8638639(1-13)Online publication date: Nov-2018
  • (2016)Towards millions of communicating threadsProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966914(1-14)Online publication date: 25-Sep-2016

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media