Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2664666.2664675acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiwoclConference Proceedingsconference-collections
research-article

Generating OpenCL C kernels from OpenACC

Published: 12 May 2014 Publication History

Abstract

Hardware accelerators are now a common way to improve the performances of compute nodes. This performance improvement has a cost: applications need to be rewritten to take advantage of the new hardware. OpenACC is a set of compiler directives to target hardware accelerators with minimal modification of the original application. In this paper, we present the generation of OpenCL C kernels from OpenACC annotated codes. We introduce a method to produce multiple kernels for each OpenACC compute region.
We evaluate these kernels on different hardware accelerators (NVidia GPU, Intel MIC). Finally, we show that the produced kernels give different performances for different accelerators. Hence this method produces a tuning space in which we can search for the best kernel version for a given accelerator.

References

[1]
Felix Agakov et al. "Using machine learning to focus iterative optimization". In: Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society. 2006, pp. 295--305.
[2]
Altera's OpenCL Website. http://www.altera.com/products/software/opencl/opencl-index.html.
[3]
AMD's OpenCL Website. http://developer.amd.com/resources/heterogeneous-computing/opencl-zone/.
[4]
Stéphane Bihan. "CAPS OpenACC Compilers: Performance and Portability". In: (2013).
[5]
John Cavazos et al. "Automatic performance model construction for the fast software exploration of new hardware designs". In: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems. ACM. 2006, pp. 24--34.
[6]
Leonardo Dagum and Ramesh Menon. "OpenMP: an industry standard API for shared-memory programming". In: Computational Science & Engineering, IEEE 5.1 (1998), pp. 46--55.
[7]
Intel Many Integrated Core. http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html. 2012.
[8]
Intel's OpenCL Website. http://software.intel.com/en-us/vcsource/tools/opencl-sdk.
[9]
Brent Leback, MichaelWolfe, and Douglas Miles. "The PGI Fortran and C99 OpenACC Compilers". In: (2012).
[10]
Chunhua Liao et al. "A ROSE-based OpenMP 3.0 research compiler supporting multiple runtime libraries". In: Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More. Springer, 2010, pp. 15--28.
[11]
Chunhua Liao et al. "Early Experiences With The OpenMP Accelerator Model". In: OpenMP in the Era of Low Power Devices and Accelerators. Springer, 2013, pp. 84--98.
[12]
Chunhua Liao et al. "OpenUH: An optimizing, portable OpenMP compiler". In: Concurrency and Computation: Practice and Experience 19.18 (2007), pp. 2317--2332.
[13]
Mentor Graphics, OpenACC announce. http://blogs.mentor.com/embedded/blog/2013/11/14/bringing-openacc-to-the-gnu-compiler-suite/.
[14]
Aaftab Munshi et al. OpenCL programming guide. Pearson Education, 2011.
[15]
John Nickolls et al. "Scalable Parallel Programming with CUDA". In: Queue 6.2 (Mar. 2008), pp. 40--53. issn: 1542-7730.
[16]
NVIDIA's OpenCL Website. https://developer.nvidia.com/opencl.
[17]
OpenACC 2.0 and the PGI Accelerator Compilers. http://on-demand.gputechconf.com/gtc/2013/presentations/S3447-OpenACC-2-PGI-Accelerator-Compilers.pdf.
[18]
OpenACC and OpenMP Accelerator directives with the Cray Compilation Environment. http://on-demand.gputechconf.com/gtc/2013/presentations/S3084-OpenACC-OpenMP-Directives-CCE.pdf.
[19]
OpenACC: Directives for Accelerators. http://www.openacc-standard.org/.
[20]
P. Glaskowsky. NVIDIA's Fermi - The First Complete GPU Architecture. http://www.nvidia.com/content/PDF/fermi_white_papers/P.Glaskowsky_NVIDIA's_Fermi-The_First_Complete_GPU_Architecture.pdf.
[21]
Daniel Quinlan et al. ROSE Compiler Infrastructure. http://www.rosecompiler.org.
[22]
Ruymán Reyes et al. "A preliminary evaluation of OpenACC implementations". English. In: (2013). 1007/s11227-012-0853-z.
[23]
Shane Ryoo et al. "Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA". In: PPoPP '08. ACM, 2008.
[24]
Erich Strohmaier. Highlights of the 42nd TOP500 List. SC13 BoF. 2013.
[25]
Xiaonan Tian et al. "Compiling a High-level Directive-Based Programming Model for GPGPUs". In: The 26th International Workshop on Languages and Compilers for Parallel Computing. Springer. 2013.

Cited By

View all
  • (2017)Directive-based tile abstraction to distribute loops on acceleratorsProceedings of the General Purpose GPUs10.1145/3038228.3038238(53-62)Online publication date: 4-Feb-2017
  • (2015)The Approximate Discrete Radon TransformProceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip10.1109/MCSoC.2015.38(219-226)Online publication date: 23-Sep-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IWOCL '14: Proceedings of the International Workshop on OpenCL 2013 & 2014
May 2014
86 pages
ISBN:9781450330077
DOI:10.1145/2664666
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

  • ARM: ARM
  • AMD
  • SAMSUNG: SAMSUNG
  • Khronos: Khronos Group
  • Xilinx: Xilinx Inc.
  • QI: Qualcomm Inc.
  • Codeplay: Codeplay Software Ltd.
  • Intel: Intel
  • StreamComputing: StreamComputing BV
  • Lithe: Lithe Technology
  • The University of Bristol: The University of Bristol
  • Altera Corp.: Altera Corporation
  • ArrayFire: ArrayFire
  • Imagination: Imagination Technologies Limited

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 May 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OpenACC
  2. OpenCL
  3. execution model
  4. performance
  5. portability

Qualifiers

  • Research-article

Conference

IWOCL '14
Sponsor:
  • ARM
  • SAMSUNG
  • Khronos
  • Xilinx
  • QI
  • Codeplay
  • Intel
  • StreamComputing
  • Lithe
  • The University of Bristol
  • Altera Corp.
  • ArrayFire
  • Imagination
IWOCL '14: International Workshop on OpenCL 2013 & 2014
May 12 - 13, 2014
Bristol, United Kingdom

Acceptance Rates

Overall Acceptance Rate 84 of 152 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Directive-based tile abstraction to distribute loops on acceleratorsProceedings of the General Purpose GPUs10.1145/3038228.3038238(53-62)Online publication date: 4-Feb-2017
  • (2015)The Approximate Discrete Radon TransformProceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip10.1109/MCSoC.2015.38(219-226)Online publication date: 23-Sep-2015

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media