Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2833157.2833161acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Integrating GPU support for OpenMP offloading directives into Clang

Published: 15 November 2015 Publication History

Abstract

The LLVM community is currently developing OpenMP 4.1 support, consisting of software improvements for Clang and new runtime libraries. OpenMP 4.1 includes offloading constructs that permit execution of user selected regions on generic devices, external to the main host processor. This paper describes our ongoing work towards delivering support for OpenMP offloading constructs for the OpenPower system into the LLVM compiler infrastructure. We previously introduced a design for a control loop scheme necessary to implement the OpenMP generic offloading model on NVIDIA GPUs. In this paper we show how we integrated the complexity of the control loop into Clang by limiting its support to OpenMP-related functionality. We also synthetically report the results of performance analysis on benchmarks and a complex application kernel. We show an optimization in the Clang code generation scheme for specific code patterns, alternative to the control loop, which delivers improved performance.

References

[1]
A. Baker. Custom hardware state-machines and datapaths: Using llvm to generate fpga accelerators, October 2014. http://llvm.org/devmtg/2014-10/Slides/Baker-CustomHardwareStateMachines.pdf.
[2]
J. Barker and J. Bowden. Manycore parallelism through openmp. In A. P. Rendell, B. M. Chapman, and M. S. Muller, editors, OpenMP in the Era of Low Power Devices and Accelerators, volume 8122 of Lecture Notes in Computer Science, pages 45--57. Springer Berlin Heidelberg, 2013.
[3]
M. M. Baskaran, J. Ramanujam, and P. Sadayappan. Automatic c-to-cuda code generation for affine programs. In Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction, CC'10/ETAPS'10, pages 244--263, Berlin, Heidelberg, 2010. Springer-Verlag.
[4]
A. Bataev. Openmp support in clang/llvm: Status update and future directions, October 2014. http://llvm.org/devmtg/2014-10/Slides/Bataev-OpenMP.pdf.
[5]
G.-T. Bercea, C. Bertolli, S. F. Antao, A. C. Jacob, A. E. Eichenberger, L. Duran, T. Chen, Z. Sura, H. Sung, G. Rokos, D. Appelhans, and K. O'Brien. Performance analysis of openmp on a gpu using a coral proxy application. In Submitted to 6th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS15), 2015.
[6]
C. Bertolli, S. F. Antao, A. E. Eichenberger, K. O'Brien, Z. Sura, A. C. Jacob, T. Chen, and O. Sallenave. Coordinating gpu threads for openmp 4.0 in llvm. In Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, LLVM-HPC '14, pages 12--21, Piscataway, NJ, USA, 2014. IEEE Press.
[7]
G. Brown. Implementing the sycl for opencl shared source c++ programming model using clang/llvm. https://www.codeplay.com/public/uploaded/publications/SC2014_LLVM_HPC.pdf.
[8]
Github repository for extended clang implementation supporting openmp 4.0. https://github.com/clang-omp/clang_trunk.
[9]
Coral award announcement. http://energy.gov/articles/department-energy-awards-425-million-next-generation-supercomputing-technologies.
[10]
Cuda toolkit webpage. http://docs.nvidia.com/cuda/index.html.
[11]
M. Haidl and S. Gorlatch. Pacxx: Towards a unified programming model for programming accelerators using c++14. In Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, LLVM-HPC '14, pages 1--11, Piscataway, NJ, USA, 2014. IEEE Press.
[12]
Nvidia libnvvm library manual. http://docs.nvidia.com/cuda/libnvvm-api/modules.htm.
[13]
The llvm compiler infrastructure webpage. http://llvm.org/.
[14]
Llvm backend component for nvptx archietecture (nvidia gpus). http://llvm.org/docs/NVPTXUsage.html.
[15]
Lulesh webpage. https://codesign.llnl.gov/lulesh.php.
[16]
Github repository for libomptarget offloading and gpu openmp runtime. https://github.com/clang-omp/libomptarget.
[17]
OpenMP Language Committee. OpenMP Application Program Interface, version 4.0 edition, July 2013. http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf.
[18]
E. Stotzer, A. Jayaraj, M. Ali, A. Friedmann, G. Mitra, A. Rendell, and I. Lintault. Openmp on the low-power ti keystone ii arm/dsp system-on-chip. In A. P. Rendell, B. M. Chapman, and M. S. Muller, editors, OpenMP in the Era of Low Power Devices and Accelerators, volume 8122 of Lecture Notes in Computer Science, pages 114--127. Springer Berlin Heidelberg, 2013.
[19]
Vikas, T. Scott, N. Giacaman, and O. Sinnen. Using openmp under android. In A. P. Rendell, B. M. Chapman, and M. S. Muller, editors, OpenMP in the Era of Low Power Devices and Accelerators, volume 8122 of Lecture Notes in Computer Science, pages 15--29. Springer Berlin Heidelberg, 2013.
[20]
U. Weigand. Supporting the new ibm z13 mainframe and its simd vector unit, April 2015. http://llvm.org/devmtg/2015-04/slides/Euro-LLVM-2015-Weigand.pdf.

Cited By

View all
  • (2024)Towards an Optimized Heterogeneous Distributed Task Scheduler in OpenMP ClusterSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00239(1894-1903)Online publication date: 17-Nov-2024
  • (2024)Study and Evaluation for Adopting Environmental Adaptation of Low-Resource DevicesIEEE Access10.1109/ACCESS.2024.344091812(110447-110456)Online publication date: 2024
  • (2024)Study and evaluation of automatic division of general-purpose programs to facilitate addition of user functionsInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2024.2375650(1-12)Online publication date: 9-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
LLVM '15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC
November 2015
74 pages
ISBN:9781450340052
DOI:10.1145/2833157
  • Conference Chair:
  • Hal Finkel
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • LLNS

Conference

SC15
Sponsor:

Acceptance Rates

LLVM '15 Paper Acceptance Rate 7 of 12 submissions, 58%;
Overall Acceptance Rate 16 of 22 submissions, 73%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)3
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Towards an Optimized Heterogeneous Distributed Task Scheduler in OpenMP ClusterSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00239(1894-1903)Online publication date: 17-Nov-2024
  • (2024)Study and Evaluation for Adopting Environmental Adaptation of Low-Resource DevicesIEEE Access10.1109/ACCESS.2024.344091812(110447-110456)Online publication date: 2024
  • (2024)Study and evaluation of automatic division of general-purpose programs to facilitate addition of user functionsInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2024.2375650(1-12)Online publication date: 9-Aug-2024
  • (2024)Study and evaluation of automatic offloading for function blocks of applicationsAutomatika10.1080/00051144.2024.230188865:1(387-400)Online publication date: 9-Jan-2024
  • (2023)Specialized Kernels for Optimizing GPU Offload in OpenMPProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624605(1918-1928)Online publication date: 12-Nov-2023
  • (2023)Maximizing Parallelism and GPU Utilization For Direct GPU Compilation Through Ensemble ExecutionProceedings of the 52nd International Conference on Parallel Processing Workshops10.1145/3605731.3606016(112-118)Online publication date: 7-Aug-2023
  • (2023)Implementing OpenMP’s SIMD Directive in LLVM’s GPU RuntimeProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605640(173-182)Online publication date: 7-Aug-2023
  • (2023)Exploring OpenMP GPU Offloading for Implementing Convolutional Neural NetworksProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582523(60-69)Online publication date: 25-Feb-2023
  • (2023)Proposal and Evaluation of GPU Offloading Parts Reconfiguration During Applications Operations for Environment AdaptationJournal of Network and Systems Management10.1007/s10922-023-09789-232:1Online publication date: 28-Nov-2023
  • (2023)Exploring the Limits of Generic Code Execution on GPUs via Direct (OpenMP) OffloadOpenMP: Advanced Task-Based, Device and Compiler Programming10.1007/978-3-031-40744-4_12(179-192)Online publication date: 1-Sep-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media