research-article

Evaluating Support for OpenMP Offload Features

Authors:

Jose Monsalve Diaz,

Swaroop Pophale,

Kyle Friedline,

Oscar Hernandez,

David E. Bernholdt,

Sunita ChandrasekaranAuthors Info & Claims

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

Article No.: 31, Pages 1 - 10

https://doi.org/10.1145/3229710.3229717

Published: 13 August 2018 Publication History

Abstract

The OpenMP language features have been evolving to meet the rapid development in hardware platforms. DOE applications tend to push the bleeding edge of features ratified in the OpenMP specification and tend to expose the rough edges of the features' implementations. The software harness on DOE supercomputers such as Titan and (upcoming) Summit include Cray, Clang, Flang, XL and GCC compilers. It is critical, especially for Summit, that the compilers support OpenMP offloading features. This paper focuses on evaluating support for OpenMP 4.5 target offload directives across compiler implementations on Titan and Summitdev, an early access system, which is one generation removed from Summit's architecture enabling application teams to test the systems' architecture. Our tests not only evaluate the OpenMP implementations but also expose ambiguities in the OpenMP 4.5 specification. We also evaluate compiler implementations using kernels extracted from production DOE applications. This helps in assessing the interaction of different OpenMP directives independent of other application artifacts. We are aware that the implementations are constantly evolving and are advertised as having only partial OpenMP 4.x support. We see this as a synergistic effort to help identify and correct features that are required by DOE applications and prevent deployment delays later on. Going forward, we also plan to interact with standard benchmarking bodies like SPEC/HPG to donate our tests and mini-apps/kernels for potential inclusion in the next release versions of SPEC OMP and SPEC ACCEL benchmark suites.

References

[1]

{n. d.}. NVIDIA Thrust. https://developer.nvidia.com/thrust. ({n. d.}). Accessed: 2017-02-03.

[2]

OpenMP Architecture Review Board. {n. d.}. OpenMP Application Programming Interface. http://www.openmp.org/wp-content/uploads/openmp-examples-4.5.0.pdf.({n. d.}).

[3]

J Mark Bull, Fiona Reid, and Nicola McDonnell. 2012. A microbenchmark suite for openmp tasks. In International Workshop on OpenMP. Springer, 271--274.

Digital Library

[4]

MP Clay, D Buaria, PK Yeung, and T Gotoh. 2018. GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5. Computer Physics Communications 228 (2018), 100--114.

[5]

M. P. Clay, D. Buaria, and P. K. Yeung. 2017. Improving Scalability and Accelerating Petascale Turbulence Simulations Using OpenMP. http://openmpcon.org/conf2017/program/. (2017). To Appear.

[6]

Jack Dongarra, Mark Furtney, Steve Reinhardt, and Jerry Russell. 1991. Parallel Loops?A test suite for parallelizing compilers: Description and example results. Parallel Comput. 11, 10--11 (1991), 1247--1255.

Digital Library

[7]

H Carter Edwards, Christian R Trott, and Daniel Sunderland. 2014. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel and Distrib. Comput. 74, 12 (2014), 3202--3216.

Digital Library

[8]

Jose Monsalve Diaz, Swaroop Pophale, Oscar Hernandez, David Bernholdt, and Sunita Chandrasekaran. {n. d.}. OpenMP 4.5 Validation and Verification Suite. https://crpl.cis.udel.edu/ompvvsollve/. ({n. d.}).

[9]

Guido Juckeland, William Brantley, Sunita Chandrasekaran, Barbara Chapman, Shuai Che, Mathew Colgrove, Huiyu Feng, Alexander Grund, Robert Henschel, Wen-Mei W Hwu, et al. 2014. SPEC ACCEL: a standard application suite for measuring hardware accelerator performance. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems. Springer, 46--67.

[10]

Guido Juckeland, Alexander Grund, and Wolfgang E Nagel. 2015. Performance portable applications for hardware accelerators: lessons learned from SPEC ACCEL. In Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International. IEEE, 689--698.

Digital Library

[11]

Guido Juckeland, Oscar Hernandez, Arpith C Jacob, Daniel Neilson, Veronica G Vergara Larrea, Sandra Wienke, Alexander Bobyr, William C Brantley, Sunita Chandrasekaran, Mathew Colgrove, et al. 2016. From describing to prescribing parallelism: Translating the SPEC ACCEL OpenACC suite to OpenMP target directives. In International Conference on High Performance Computing. Springer, 470--488.

[12]

GrahamLopez Kyle Friedline, Sunita Chandrasekaran and Oscar Hernandez. {n. d.}. OpenACC 2.5 Validation Testsuite targeting multiple architectures. In Proceedings of P3MA Workshop co-located with ISC 2017 ({n. d.}). To appear.

[13]

LLVM. {n. d.}. LLVM Testing Infrastructure Guide. http://www.llvm.org/pre-releases/4.0.0/rc2/docs/TestingGuide.html#test-suite. ({n. d.}).

[14]

Frank H McMahon. 1986. The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical Report. Lawrence Livermore National Lab., CA (USA).

[15]

Matthias Müller and Pavel Neytchev. 2003. An openmp validation suite. In Fifth European Workshop on OpenMP, Aachen University, Germany.

[16]

Matthias S Müller, Christoph Niethammer, Barbara Chapman, Yi Wen, and Zhenying Liu. 2004. Validating OpenMP 2.5 for fortran and c/c++. In Sixth European Workshop on OpenMP, KTH Royal Institute of Technology, Stockholm, Sweden.

[17]

NVIDIA. {n. d.}. CUDA SDK Code Samples. http://developer.nvidia.com/cuda-cc-sdk-code-samples. ({n. d.}). Accessed: 2017-02-03.

[18]

Oak Ridge National Lab. {n. d.}. Ascending to Summit: Announcing Summitdev. https://www.olcf.ornl.gov/2017/02/28/ascending-to-summit-announcing-summitdev/. ({n. d.}).

[19]

Oak Ridge National Lab. {n. d.}. Summit. https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/. ({n. d.}).

[20]

Oak Ridge National Lab. {n. d.}. Titan supercomputer. https://www.olcf.ornl.gov/titan/. ({n. d.}).

[21]

OpenACC. {n. d.}. OpenACC, Directives for Accelerators. http://www.openacc.org/. ({n. d.}).

[22]

OpenCL. {n. d.}. OpenCL. https://www.khronos.org/. ({n. d.}).

[23]

OpenMP. {n. d.}. OpenMP 4.5 Specification. http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf. ({n. d.}).

[24]

OpenMP. {n. d.}. OpenMP Compilers. http://www.openmp.org/resources/openmp-compilers/, ({n. d.}).

[25]

Swaroop Suhas Pophale, Anthony Curtis, Barbara Chapman, and Stephen Poole. 2013. Poster: Validation and Verification Suite for OpenSHMEM. In Proceedings of the Seventh Conference on Partitioned Global Address Space Programming Model (PGAS 2013). 257, 258.

[26]

Fiona JL Reid and J Mark Bull. 2004. Openmp microbenchmarks version 2.0. In Proc. EWOMP. 63--68.

[27]

David F Richards, Ryan C Bleile, Patrick S Brantley, Shawn A Dawson, Michael Scott McKinley, and Matthew J O?Brien. 2017. Quicksilver: A Proxy App for the Monte Carlo Transport Code Mercury. In Cluster Computing (CLUSTER), 2017 IEEE International Conference on. IEEE, 866--873.

[28]

Top500. {n. d.}. Global Supercomputing Capacity Creeps Up as Petascale Systems Blanket Top 100. https://www.top500.org/news/global-supercomputing-capacity-creeps-up-as-petascale-systems-blanket-top-100/. ({n. d.}).

[29]

Cheng Wang, Sunita Chandrasekaran, and Barbara Chapman. 2012. An openmp 3.1 validation testsuite. In International Workshop on OpenMP. Springer, 237--249.

Digital Library

[30]

Cheng Wang, Rengan Xu, Sunita Chandrasekaran, Barbara Chapman, and Oscar Hernandez. 2014. A validation testsuite for OpenACC 1.0. In Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International. IEEE, 1407--1416.

Digital Library

[31]

Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In ACM SIGPLAN Notices, Vol. 46. ACM, 283--294.

Digital Library

Cited By

Yan KShi YYan YKulkarni MKrishnamoorthy SDehnavi M(2023)Exploring OpenMP GPU Offloading for Implementing Convolutional Neural NetworksProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582523(60-69)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3582514.3582523
Bhattacharjee ADaley CJannesari A(2023)OpenMP Offload Features and Strategies for High Performance across Architectures and Compilers2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00098(564-573)Online publication date: May-2023
https://doi.org/10.1109/IPDPSW59300.2023.00098
Huber TPophale SBaker NCarr MRao NReap JHolsapple KDavis JBurnus TLee SBernholdt DChandrasekaran S(2022)ECP SOLLVE: Validation and Verification Testsuite Status Update and Compiler Insight for OpenMP2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)10.1109/P3HPC56579.2022.00017(123-135)Online publication date: Nov-2022
https://doi.org/10.1109/P3HPC56579.2022.00017
Show More Cited By

Index Terms

Evaluating Support for OpenMP Offload Features

Recommendations

Critical-blame analysis for OpenMP 4.0 offloading on Intel Xeon Phi

Critical-path detection in OpenMP 4.0 programs with offloaded code.Detection and quantification of load imbalances and their cause in OpenMP 4.0 codes.Implementation in the open-source tool infrastructure Score-P.Validation and evaluation with modified ...
Juggler: a dependence-aware task-based execution framework for GPUs
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Scientific applications with single instruction, multiple data (SIMD) computations show considerable performance improvements when run on today's graphics processing units (GPUs). However, the existence of data dependences across thread blocks may ...
OpenMP Offloading in the Jetson Nano Platform
ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel Processing

The nvidia Jetson Nano is a very popular system-on-module and developer kit which brings high-performance specs in a small and power-efficient embedded platform. Integrating a 128-core gpu and a quad-core cpu, it provides enough capabilities to support ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing

August 2018

409 pages

ISBN:9781450365239

DOI:10.1145/3229710

Copyright © 2018 ACM.

© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

University of Oregon: University of Oregon

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP '18 Comp

ICPP '18 Comp: 47th International Conference on Parallel Processing Companion

August 13 - 16, 2018

OR, Eugene, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
181
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)3

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yan KShi YYan YKulkarni MKrishnamoorthy SDehnavi M(2023)Exploring OpenMP GPU Offloading for Implementing Convolutional Neural NetworksProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582523(60-69)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3582514.3582523
Bhattacharjee ADaley CJannesari A(2023)OpenMP Offload Features and Strategies for High Performance across Architectures and Compilers2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00098(564-573)Online publication date: May-2023
https://doi.org/10.1109/IPDPSW59300.2023.00098
Huber TPophale SBaker NCarr MRao NReap JHolsapple KDavis JBurnus TLee SBernholdt DChandrasekaran S(2022)ECP SOLLVE: Validation and Verification Testsuite Status Update and Compiler Insight for OpenMP2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)10.1109/P3HPC56579.2022.00017(123-135)Online publication date: Nov-2022
https://doi.org/10.1109/P3HPC56579.2022.00017
Rabbi FDaley CÇatalyürek ÜAktulga H(2022)A Portable Sparse Solver Framework for Large Matrices on Heterogeneous Architectures2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC56025.2022.00030(145-155)Online publication date: Dec-2022
https://doi.org/10.1109/HiPC56025.2022.00030
Agullo EAltenbernd MAnzt HBautista-Gomez LBenacchio TBonaventura LBungartz HChatterjee SCiorba FDeBardeleben NDrzisga DEibl SEngelmann CGansterer WGiraud LGöddeke DHeisig MJézéquel FKohl NLi XLion RMehl MMycek PObersteiner MQuintana-Ortí ERizzi FRüde USchulz MFung FSpeck RStals LTeranishi KThibault SThönnes DWagner AWohlmuth B(2021)Resiliency in numerical algorithm design for extreme scale simulationsThe International Journal of High Performance Computing Applications10.1177/10943420211055188(109434202110551)Online publication date: 10-Dec-2021
https://doi.org/10.1177/10943420211055188
Bagies TJannesari A(2021)An Empirical Study of Parallelizing Test Execution Using CUDA Unified Memory and OpenMP GPU Offloading2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)10.1109/ICSTW52544.2021.00052(271-278)Online publication date: Apr-2021
https://doi.org/10.1109/ICSTW52544.2021.00052
Bercea GBataev AEichenberger ABertolli CObrien J(2020)An open-source solution to performance portability for Summit and Sierra supercomputersIBM Journal of Research and Development10.1147/JRD.2019.295594464:3/4(12:1-12:23)Online publication date: 1-May-2020
https://doi.org/10.1147/JRD.2019.2955944
Cramer TRömmer MKosmynin BFocht EMüller M(2020)OpenMP Target Device Offloading for the SX-Aurora TSUBASA Vector EngineParallel Processing and Applied Mathematics10.1007/978-3-030-43229-4_21(237-249)Online publication date: 19-Mar-2020
https://doi.org/10.1007/978-3-030-43229-4_21

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents