research-article

A hybrid approach of OpenMP for clusters

Authors:

Rudolf Eigenmann,

Samuel MidkiffAuthors Info & Claims

ACM SIGPLAN Notices, Volume 47, Issue 8

Pages 75 - 84

https://doi.org/10.1145/2370036.2145827

Published: 25 February 2012 Publication History

Abstract

We present the first fully automated compiler-runtime system that successfully translates and executes OpenMP shared-address-space programs on laboratory-size clusters, for the complete set of regular, repetitive applications in the NAS Parallel Benchmarks. We introduce a hybrid compiler-runtime translation scheme. Compared to previous work, this scheme features a new runtime data flow analysis and new compiler techniques for improving data affinity and reducing communication costs. We present and discuss the performance of our translated programs, and compare them with the performance of the MPI, HPF and UPC versions of the benchmarks. The results show that our translated programs achieve 75% of the hand-coded MPI programs, on average.

References

[1]

Berkeley UPC - Unified Parallel C. Available at: upc.lbl.gov.

[2]

GCC Unified Parallel C. Available at: www.gccupc.org.

[3]

UPC NAS Parallel Benchmarks from The George Washington University High Performance Computing Laboratory. Available at: threads.hpcl.gwu.edu/sites/npb-upc.

[4]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. 1991.

[5]

M. M. Baskaran, N. Vydyanathan, U. K. R. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '09, pages 219--228, New York, NY, USA, 2009. ACM.

Digital Library

[6]

D. Baxter, R. Mirchandaney, and J. H. Saltz. Run-time parallelization and scheduling of loops. In Proceedings of the first annual ACM symposium on Parallel Algorithms and Architectures, SPAA '89, pages 303--312, New York, NY, USA, 1989. ACM.

Digital Library

[7]

P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An Object-oriented Approach to Non-uniform Cluster Computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented Programming, Systems, Languages, and Applications. (OOPSLA '05), pages 519--538, New York, NY, USA, 2005. ACM.

Digital Library

[8]

S. Dwarkadas, A. L. Cox, and W. Zwaenepoel. An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System. In Proc. of the 7th Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), pages 186--197, 1996.

Digital Library

[9]

M. Frumkin, H. Jin, and J. Yan. Implementation of NAS Parallel Benchmarks in High Performance Fortran. In Symposium on Parallel and Distributed Processing, 2000.

[10]

M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K.-Y. Wang, W.-M. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), page 71, New York, NY, USA, 1995. ACM.

Digital Library

[11]

High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. Technical Report CRPC-TR92225, Houston, Tex., 1993.

[12]

J. P. Hoeflinger. Extending OpenMP to Clusters. White Paper, 2006.

[13]

K. Kusano, M. Sato, T. Hosomi, and Y. Seo. The Omni OpenMP Compiler on the Distributed Shared Memory of Cenju-4. In OpenMP Shared Memory Parallel Programming, volume 2104 of Lecture Notes in Computer Science, pages 20--30. Springer Berlin / Heidelberg, 2001.

Digital Library

[14]

O. Kwon, F. Jubair, S.-J. Min, H. Bae, R. Eigenmann, and S. Midkiff. Automatic Scaling of OpenMP Beyond Shared Memory. In LCPC 2011: Proceedings of the 24th International Workshop on Languages and Compilers for Parallel Computing, Sept. 2011.

[15]

R. W. Numrich and J. Reid. Co-array Fortran for Parallel Programming. SIGPLAN Fortran Forum, 17 (2): 1--31, 1998.

Digital Library

[16]

Y. Paek, J. Hoeflinger, and D. Padua. Efficient and precise array access analysis. ACM Trans. Program. Lang. Syst., 24: 65--109, January 2002.

Digital Library

[17]

S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. In Proceedings of the 16th International Conference on Supercomputing, ICS '02, pages 274--284, New York, NY, USA, 2002. ACM.

Digital Library

[18]

H. Shan, F. Blagojević, S.-J. Min, P. Hargrove, H. Jin, K. Fuerlinger, A. Koniges, and N. J. Wright. A programming model performance study using the NAS parallel benchmarks. Scientific Programming, 18: 153--167, August 2010.

Digital Library

[19]

UPC Consortium. UPC Language Specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Laboratory, 2005.

[20]

R. F. V. D. Wijngaart. Efficient Implementation of a 3-Dimensional ADI Method on the iPSC/860. In In Supercomputing '93, pages 102--111, 1993.

Digital Library

[21]

K. A. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. N. Hilfinger, S. L. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance java dialect. Concurrency - Practice and Experience, 10 (11-13): 825--836, 1998.

Cited By

Lee WPapadakis MSlaughter EAiken ATaufer MBalaji PPeña A(2019)A constraint-based approach to automatic data partitioning for distributed memory executionProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356199(1-24)Online publication date: 17-Nov-2019
https://dl.acm.org/doi/10.1145/3295500.3356199
Moreton‐Fernandez AGonzalez‐Escribano A(2018)Automatic runtime calculation of communications for data‐parallel expressions with periodic conditionsConcurrency and Computation: Practice and Experience10.1002/cpe.443031:5Online publication date: 31-Jan-2018
https://doi.org/10.1002/cpe.4430
Shiina STaura KMohror KArnold DBadia R(2023)Itoyori: Reconciling Global Address Space and Global Fork-Join Task ParallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607049(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607049
Show More Cited By

Index Terms

A hybrid approach of OpenMP for clusters
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

A hybrid approach of OpenMP for clusters
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

We present the first fully automated compiler-runtime system that successfully translates and executes OpenMP shared-address-space programs on laboratory-size clusters, for the complete set of regular, repetitive applications in the NAS Parallel ...
Performance-based parallel loop self-scheduling using hybrid OpenMP and MPI programming on multicore SMP clusters

Parallel loop self-scheduling on parallel and distributed systems has been a critical problem and it is becoming more difficult to deal with in the emerging heterogeneous cluster computing environments. In the past, some self-scheduling schemes have ...
Combining Data and Computation Distribution Directives for Hybrid Parallel Programming: A Transformation System

This paper describes dSTEP, a directive-based programming model for hybrid shared and distributed memory machines. The originality of our work is the definition and an implementation of a unified high-level programming model addressing both data and ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 47, Issue 8

PPOPP '12

August 2012

334 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2370036

Issue’s Table of Contents

PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
February 2012
352 pages
ISBN:9781450311601
DOI:10.1145/2145816
General Chair:
J. Ramanujam
Louisiana State University, USA
,
Program Chair:
P. Sadayappan
The Ohio State University, USA

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2012

Published in SIGPLAN Volume 47, Issue 8

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
444
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)3

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lee WPapadakis MSlaughter EAiken ATaufer MBalaji PPeña A(2019)A constraint-based approach to automatic data partitioning for distributed memory executionProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356199(1-24)Online publication date: 17-Nov-2019
https://dl.acm.org/doi/10.1145/3295500.3356199
Moreton‐Fernandez AGonzalez‐Escribano A(2018)Automatic runtime calculation of communications for data‐parallel expressions with periodic conditionsConcurrency and Computation: Practice and Experience10.1002/cpe.443031:5Online publication date: 31-Jan-2018
https://doi.org/10.1002/cpe.4430
Shiina STaura KMohror KArnold DBadia R(2023)Itoyori: Reconciling Global Address Space and Global Fork-Join Task ParallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607049(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607049
Endo WSato STaura K(2020)MENPS: A Decentralized Distributed Shared Memory Exploiting RDMA2020 IEEE/ACM Fourth Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)10.1109/IPDRM51949.2020.00006(9-16)Online publication date: Nov-2020
https://doi.org/10.1109/IPDRM51949.2020.00006
Lyerly RKim SRavindran B(2019)libMPNodeProceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3303084.3309495(81-90)Online publication date: 17-Feb-2019
https://dl.acm.org/doi/10.1145/3303084.3309495
Hegde NChang QKulkarni MTaufer MBalaji PPeña A(2019)D2PProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356205(1-22)Online publication date: 17-Nov-2019
https://dl.acm.org/doi/10.1145/3295500.3356205
Cho HKwon OMidkiff S(2019)HDArray: Parallel Array Interface for Distributed Heterogeneous DevicesLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_13(176-184)Online publication date: 13-Nov-2019
https://doi.org/10.1007/978-3-030-34627-0_13
Slaughter ELee WTreichler SZhang WBauer MShipman GMcCormick PAiken AMohr BRaghavan P(2017)Control replicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126949(1-12)Online publication date: 12-Nov-2017
https://dl.acm.org/doi/10.1145/3126908.3126949
Moreton-Fernandez AGonzalez-Escribano ALlanos D(2017)A technique to automatically determine Ad-hoc communication patterns at runtimeParallel Computing10.1016/j.parco.2017.08.00969:C(45-62)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1016/j.parco.2017.08.009
Kim JLee SVetter JNakashima HTaura KLange J(2016)IMPACCProceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing10.1145/2907294.2907302(189-201)Online publication date: 31-May-2016
https://dl.acm.org/doi/10.1145/2907294.2907302
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents