Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A hybrid approach of OpenMP for clusters

Published: 25 February 2012 Publication History

Abstract

We present the first fully automated compiler-runtime system that successfully translates and executes OpenMP shared-address-space programs on laboratory-size clusters, for the complete set of regular, repetitive applications in the NAS Parallel Benchmarks. We introduce a hybrid compiler-runtime translation scheme. Compared to previous work, this scheme features a new runtime data flow analysis and new compiler techniques for improving data affinity and reducing communication costs. We present and discuss the performance of our translated programs, and compare them with the performance of the MPI, HPF and UPC versions of the benchmarks. The results show that our translated programs achieve 75% of the hand-coded MPI programs, on average.

References

[1]
Berkeley UPC - Unified Parallel C. Available at: upc.lbl.gov.
[2]
GCC Unified Parallel C. Available at: www.gccupc.org.
[3]
UPC NAS Parallel Benchmarks from The George Washington University High Performance Computing Laboratory. Available at: threads.hpcl.gwu.edu/sites/npb-upc.
[4]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. 1991.
[5]
M. M. Baskaran, N. Vydyanathan, U. K. R. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '09, pages 219--228, New York, NY, USA, 2009. ACM.
[6]
D. Baxter, R. Mirchandaney, and J. H. Saltz. Run-time parallelization and scheduling of loops. In Proceedings of the first annual ACM symposium on Parallel Algorithms and Architectures, SPAA '89, pages 303--312, New York, NY, USA, 1989. ACM.
[7]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An Object-oriented Approach to Non-uniform Cluster Computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented Programming, Systems, Languages, and Applications. (OOPSLA '05), pages 519--538, New York, NY, USA, 2005. ACM.
[8]
S. Dwarkadas, A. L. Cox, and W. Zwaenepoel. An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System. In Proc. of the 7th Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), pages 186--197, 1996.
[9]
M. Frumkin, H. Jin, and J. Yan. Implementation of NAS Parallel Benchmarks in High Performance Fortran. In Symposium on Parallel and Distributed Processing, 2000.
[10]
M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K.-Y. Wang, W.-M. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Supercomputing '95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), page 71, New York, NY, USA, 1995. ACM.
[11]
High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. Technical Report CRPC-TR92225, Houston, Tex., 1993.
[12]
J. P. Hoeflinger. Extending OpenMP to Clusters. White Paper, 2006.
[13]
K. Kusano, M. Sato, T. Hosomi, and Y. Seo. The Omni OpenMP Compiler on the Distributed Shared Memory of Cenju-4. In OpenMP Shared Memory Parallel Programming, volume 2104 of Lecture Notes in Computer Science, pages 20--30. Springer Berlin / Heidelberg, 2001.
[14]
O. Kwon, F. Jubair, S.-J. Min, H. Bae, R. Eigenmann, and S. Midkiff. Automatic Scaling of OpenMP Beyond Shared Memory. In LCPC 2011: Proceedings of the 24th International Workshop on Languages and Compilers for Parallel Computing, Sept. 2011.
[15]
R. W. Numrich and J. Reid. Co-array Fortran for Parallel Programming. SIGPLAN Fortran Forum, 17 (2): 1--31, 1998.
[16]
Y. Paek, J. Hoeflinger, and D. Padua. Efficient and precise array access analysis. ACM Trans. Program. Lang. Syst., 24: 65--109, January 2002.
[17]
S. Rus, L. Rauchwerger, and J. Hoeflinger. Hybrid analysis: static & dynamic memory reference analysis. In Proceedings of the 16th International Conference on Supercomputing, ICS '02, pages 274--284, New York, NY, USA, 2002. ACM.
[18]
H. Shan, F. Blagojević, S.-J. Min, P. Hargrove, H. Jin, K. Fuerlinger, A. Koniges, and N. J. Wright. A programming model performance study using the NAS parallel benchmarks. Scientific Programming, 18: 153--167, August 2010.
[19]
UPC Consortium. UPC Language Specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Laboratory, 2005.
[20]
R. F. V. D. Wijngaart. Efficient Implementation of a 3-Dimensional ADI Method on the iPSC/860. In In Supercomputing '93, pages 102--111, 1993.
[21]
K. A. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. N. Hilfinger, S. L. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance java dialect. Concurrency - Practice and Experience, 10 (11-13): 825--836, 1998.

Cited By

View all
  • (2019)A constraint-based approach to automatic data partitioning for distributed memory executionProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356199(1-24)Online publication date: 17-Nov-2019
  • (2018)Automatic runtime calculation of communications for data‐parallel expressions with periodic conditionsConcurrency and Computation: Practice and Experience10.1002/cpe.443031:5Online publication date: 31-Jan-2018
  • (2023)Itoyori: Reconciling Global Address Space and Global Fork-Join Task ParallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607049(1-15)Online publication date: 12-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 47, Issue 8
PPOPP '12
August 2012
334 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2370036
Issue’s Table of Contents
  • cover image ACM Conferences
    PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
    February 2012
    352 pages
    ISBN:9781450311601
    DOI:10.1145/2145816
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2012
Published in SIGPLAN Volume 47, Issue 8

Check for updates

Author Tags

  1. MPI
  2. OpenMP
  3. hybrid
  4. optimization
  5. runtime data flow analysis
  6. runtime environment
  7. translator

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)3
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)A constraint-based approach to automatic data partitioning for distributed memory executionProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356199(1-24)Online publication date: 17-Nov-2019
  • (2018)Automatic runtime calculation of communications for data‐parallel expressions with periodic conditionsConcurrency and Computation: Practice and Experience10.1002/cpe.443031:5Online publication date: 31-Jan-2018
  • (2023)Itoyori: Reconciling Global Address Space and Global Fork-Join Task ParallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607049(1-15)Online publication date: 12-Nov-2023
  • (2020)MENPS: A Decentralized Distributed Shared Memory Exploiting RDMA2020 IEEE/ACM Fourth Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)10.1109/IPDRM51949.2020.00006(9-16)Online publication date: Nov-2020
  • (2019)libMPNodeProceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3303084.3309495(81-90)Online publication date: 17-Feb-2019
  • (2019)D2PProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356205(1-22)Online publication date: 17-Nov-2019
  • (2019)HDArray: Parallel Array Interface for Distributed Heterogeneous DevicesLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_13(176-184)Online publication date: 13-Nov-2019
  • (2017)Control replicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126949(1-12)Online publication date: 12-Nov-2017
  • (2017)A technique to automatically determine Ad-hoc communication patterns at runtimeParallel Computing10.1016/j.parco.2017.08.00969:C(45-62)Online publication date: 1-Nov-2017
  • (2016)IMPACCProceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing10.1145/2907294.2907302(189-201)Online publication date: 31-May-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media