research-article

Generating OpenCL C kernels from OpenACC

Authors:

Tristan Vanderbruggen,

John CavazosAuthors Info & Claims

IWOCL '14: Proceedings of the International Workshop on OpenCL 2013 & 2014

Article No.: 9, Pages 1 - 10

https://doi.org/10.1145/2664666.2664675

Published: 12 May 2014 Publication History

Abstract

Hardware accelerators are now a common way to improve the performances of compute nodes. This performance improvement has a cost: applications need to be rewritten to take advantage of the new hardware. OpenACC is a set of compiler directives to target hardware accelerators with minimal modification of the original application. In this paper, we present the generation of OpenCL C kernels from OpenACC annotated codes. We introduce a method to produce multiple kernels for each OpenACC compute region.

We evaluate these kernels on different hardware accelerators (NVidia GPU, Intel MIC). Finally, we show that the produced kernels give different performances for different accelerators. Hence this method produces a tuning space in which we can search for the best kernel version for a given accelerator.

References

[1]

Felix Agakov et al. "Using machine learning to focus iterative optimization". In: Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society. 2006, pp. 295--305.

Digital Library

[2]

Altera's OpenCL Website. http://www.altera.com/products/software/opencl/opencl-index.html.

[3]

AMD's OpenCL Website. http://developer.amd.com/resources/heterogeneous-computing/opencl-zone/.

[4]

Stéphane Bihan. "CAPS OpenACC Compilers: Performance and Portability". In: (2013).

[5]

John Cavazos et al. "Automatic performance model construction for the fast software exploration of new hardware designs". In: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems. ACM. 2006, pp. 24--34.

Digital Library

[6]

Leonardo Dagum and Ramesh Menon. "OpenMP: an industry standard API for shared-memory programming". In: Computational Science & Engineering, IEEE 5.1 (1998), pp. 46--55.

Digital Library

[7]

Intel Many Integrated Core. http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html. 2012.

[8]

Intel's OpenCL Website. http://software.intel.com/en-us/vcsource/tools/opencl-sdk.

[9]

Brent Leback, MichaelWolfe, and Douglas Miles. "The PGI Fortran and C99 OpenACC Compilers". In: (2012).

[10]

Chunhua Liao et al. "A ROSE-based OpenMP 3.0 research compiler supporting multiple runtime libraries". In: Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More. Springer, 2010, pp. 15--28.

Digital Library

[11]

Chunhua Liao et al. "Early Experiences With The OpenMP Accelerator Model". In: OpenMP in the Era of Low Power Devices and Accelerators. Springer, 2013, pp. 84--98.

[12]

Chunhua Liao et al. "OpenUH: An optimizing, portable OpenMP compiler". In: Concurrency and Computation: Practice and Experience 19.18 (2007), pp. 2317--2332.

Digital Library

[13]

Mentor Graphics, OpenACC announce. http://blogs.mentor.com/embedded/blog/2013/11/14/bringing-openacc-to-the-gnu-compiler-suite/.

[14]

Aaftab Munshi et al. OpenCL programming guide. Pearson Education, 2011.

Digital Library

[15]

John Nickolls et al. "Scalable Parallel Programming with CUDA". In: Queue 6.2 (Mar. 2008), pp. 40--53. issn: 1542-7730.

Digital Library

[16]

NVIDIA's OpenCL Website. https://developer.nvidia.com/opencl.

[17]

OpenACC 2.0 and the PGI Accelerator Compilers. http://on-demand.gputechconf.com/gtc/2013/presentations/S3447-OpenACC-2-PGI-Accelerator-Compilers.pdf.

[18]

OpenACC and OpenMP Accelerator directives with the Cray Compilation Environment. http://on-demand.gputechconf.com/gtc/2013/presentations/S3084-OpenACC-OpenMP-Directives-CCE.pdf.

[19]

OpenACC: Directives for Accelerators. http://www.openacc-standard.org/.

[20]

P. Glaskowsky. NVIDIA's Fermi - The First Complete GPU Architecture. http://www.nvidia.com/content/PDF/fermi_white_papers/P.Glaskowsky_NVIDIA's_Fermi-The_First_Complete_GPU_Architecture.pdf.

[21]

Daniel Quinlan et al. ROSE Compiler Infrastructure. http://www.rosecompiler.org.

[22]

Ruymán Reyes et al. "A preliminary evaluation of OpenACC implementations". English. In: (2013). 1007/s11227-012-0853-z.

Digital Library

[23]

Shane Ryoo et al. "Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA". In: PPoPP '08. ACM, 2008.

Digital Library

[24]

Erich Strohmaier. Highlights of the 42nd TOP500 List. SC13 BoF. 2013.

[25]

Xiaonan Tian et al. "Compiling a High-level Directive-Based Programming Model for GPGPUs". In: The 26th International Workshop on Languages and Compilers for Parallel Computing. Springer. 2013.

Cited By

Vanderbruggen TCavazos JLiao CQuinlan D(2017)Directive-based tile abstraction to distribute loops on acceleratorsProceedings of the General Purpose GPUs10.1145/3038228.3038238(53-62)Online publication date: 4-Feb-2017
https://dl.acm.org/doi/10.1145/3038228.3038238
Bücker HSeidler RNeuhäuser DBeier T(2015)The Approximate Discrete Radon TransformProceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip10.1109/MCSoC.2015.38(219-226)Online publication date: 23-Sep-2015
https://dl.acm.org/doi/10.1109/MCSoC.2015.38

Index Terms

Generating OpenCL C kernels from OpenACC

Recommendations

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. ...
A Comparison of Performance Tunabilities between OpenCL and OpenACC
MCSOC '13: Proceedings of the 2013 IEEE 7th International Symposium on Embedded Multicore/Manycore System-on-Chip

To design and develop any auto tuning mechanisms for OpenACC, it is important to clarify the differences between conventional GPU programming models and OpenACC in terms of available programming and tuning techniques, called performance tunabilities. ...
Understanding Performance Portability of OpenACC for Supercomputers
IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Scientific applications need to be moved among supercomputers, such as Tianhe-2 and TSUBAME 2.5. OpenACC provides a directive-based approach for a single source code base with function portability across different accelerators used in the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IWOCL '14: Proceedings of the International Workshop on OpenCL 2013 & 2014

May 2014

86 pages

ISBN:9781450330077

DOI:10.1145/2664666

General Chairs:
Simon McIntosh-Smith
University of Bristol, UK
,
Ben Bergen
Los Alamos National Laboratory

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

ARM: ARM
AMD
SAMSUNG: SAMSUNG
Khronos: Khronos Group
Xilinx: Xilinx Inc.
QI: Qualcomm Inc.
Codeplay: Codeplay Software Ltd.
Intel: Intel
StreamComputing: StreamComputing BV
Lithe: Lithe Technology
The University of Bristol: The University of Bristol
Altera Corp.: Altera Corporation
ArrayFire: ArrayFire
Imagination: Imagination Technologies Limited

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 May 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

IWOCL '14

Sponsor:

ARM
SAMSUNG
Khronos
Xilinx
QI
Codeplay
Intel
StreamComputing
Lithe
The University of Bristol
Altera Corp.
ArrayFire
Imagination

IWOCL '14: International Workshop on OpenCL 2013 & 2014

May 12 - 13, 2014

Bristol, United Kingdom

Acceptance Rates

Overall Acceptance Rate 84 of 152 submissions, 55%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
170
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Vanderbruggen TCavazos JLiao CQuinlan D(2017)Directive-based tile abstraction to distribute loops on acceleratorsProceedings of the General Purpose GPUs10.1145/3038228.3038238(53-62)Online publication date: 4-Feb-2017
https://dl.acm.org/doi/10.1145/3038228.3038238
Bücker HSeidler RNeuhäuser DBeier T(2015)The Approximate Discrete Radon TransformProceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip10.1109/MCSoC.2015.38(219-226)Online publication date: 23-Sep-2015
https://dl.acm.org/doi/10.1109/MCSoC.2015.38

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten