research-article

GLOpenCL: OpenCL support on hardware- and software-managed cache multicores

Authors:

Konstantis Daloukas,

Christos D. Antonopoulos,

Nikolaos BellasAuthors Info & Claims

HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers

Pages 15 - 24

https://doi.org/10.1145/1944862.1944868

Published: 24 January 2011 Publication History

Abstract

OpenCL is an industry supported standard for writing programs that execute on multicore platforms as well as on accelerators, such as GPUs or the SPEs of the Cell B.E. In this paper we introduce GLOpenCL, a unified development framework which supports OpenCL on both homogeneous, shared memory, as well as on heterogeneous, distributed memory multicores. The framework consists of a compiler, based on the LLVM compiler infrastructure, and a run-time library, sharing the same basic architecture across all target platforms. The compiler recognizes OpenCL constructs, performs source-to-source code transformations targeting both efficiency and semantic correctness, and adds calls to the run-time library. The latter offers functionality for work creation, management and execution, as well as for data transfers. We evaluate our framework using benchmarks from the distributions of OpenCL implementations by hardware vendors. We find that our generic system performs comparably or better than customized, platform-specific vendor distributions. OpenCL is designed and marketed as a write-once run-anywhere software development framework. However, the standard leaves enough room for target platform specific optimizations. Our experimentation with different, customized implementations of kernels reveals that optimized, hardware mapped implementations are both possible and necessary in the context of OpenCL -- especially on non-conventional multicores -- if performance is considered a higher priority than programmability.

References

[1]

R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, 2002.

Digital Library

[2]

ATI-AMD. ATI Stream Software Development Kit (SDK) v2.1. http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx.

[3]

F. Black and M. Scholes. The Pricing of Options and Corporate Liabilities. The Journal of Political Economy, 81(3):637--654, 1973.

[4]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. Journal of Parallel and Distributed Computing, 37(1):55--69, 1996.

Digital Library

[5]

Cell B. E. Performance Counter Tool. http://www.ibm.com/developerworks/power/tutorials/pa-sdk3tool/.

[6]

Clang: A C Language Family Frontend for LLVM. http://clang.llvm.org/.

[7]

G. Diamos, A. Kerr, and M. Kesavan. Translating GPU Binaries to Tiered SIMD Architectures with Ocelot. Technical report, Georgia Institute of Technology, 2009.

[8]

H. Franke and R. Russel. Fuss, Futexes and Furlocks: Fast User-Space Locking in Linux. In Proceedings of the Otawa Linux Symposium, pages 85--97, 2002.

[9]

Intel Corporation. Intel VTune Performance Analyzer. Document Number 310866-001.

[10]

International Business Machines Corporation (IBM). IBM SDK for Multicore Acceleration Version 3.1. http://www.ibm.com/developerworks/power/cell/.

[11]

International Business Machines Corporation (IBM). OpenCL Development Kit for Linux on Power. http://www.alphaworks.ibm.com/tech/opencl.

[12]

Khronos OpenCL Working Group and A. Munshi. The OpenCL Specification Version: 1.0 Document Revision: 48, 2009.

[13]

C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization, pages 75--86, 2004.

Digital Library

[14]

J. Lee, S. Seo, C. Kim, J. Kim, P. Chun, Z. Sura, J. Kim, and S. Han. COMIC: A Coherent Shared Memory Interface for Cell BE. In PACT '08: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pages 303--314, 2008.

Digital Library

[15]

F. Mueller. A Library Implementation of POSIX Threads under Unix. In Proceedings of the USENIX Conference, pages 29--41, 1993.

[16]

National Institute of Standards and Technology (NIST). Announcing the ADVANCED ENCRYPTION STANDARD (AES). Federal Information Processing Standards Publication 197, November 2001.

[17]

NVIDIA. CUDA Technology. http://www.nvidia.com/object/cuda\_home\_new.html.

[18]

NVIDIA. CUDA Programming Guide, Version 3.0, 2010.

[19]

NVIDIA. NVIDIA Compute PTX: Parallel Thread Execution ISA Version 2.0, 2010.

[20]

OpenMP. The OpenMP API. http://openmp.org/wp/.

[21]

J. P. Perez, P. Bellens, R. M. Badia, and J. Labarta. CellSs: Making it Easier to Program the Cell Broadband Engine Processor. IBM Journal of Research and Development, 51(5):593--604, 2007.

Digital Library

[22]

I. Sobel and G. Feldman. A 3x3 Isotropic Gradient Operator for Image Processing. Talk presented at the Stanford Artificial Project, 1968.

[23]

J. A. Stratton, S. S. Stone, and W.-M. W. Hwu. MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs. In Languages and Compilers for Parallel Computing: 21th International Workshop, LCPC 2008, Revised Selected Papers, pages 16--30, 2008.

Digital Library

[24]

UPC Consortium. UPC Language Specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Lab, 2005.

Cited By

Ding HHuang M(2017)PolyPC: Polymorphic parallel computing framework on embedded reconfigurable system2017 27th International Conference on Field Programmable Logic and Applications (FPL)10.23919/FPL.2017.8056770(1-8)Online publication date: Sep-2017
https://doi.org/10.23919/FPL.2017.8056770
Fabeiro JAndrade DFraguela BDoallo R(2015)Automatic Generation of Optimized OpenCL Codes Using OCLoptimizerThe Computer Journal10.1093/comjnl/bxv03858:11(3057-3073)Online publication date: 2-Jun-2015
https://doi.org/10.1093/comjnl/bxv038
Fabeiro JAndrade DFraguela BDoallo R(2014)Writing Self-adaptive Codes for Heterogeneous SystemsEuro-Par 2014 Parallel Processing10.1007/978-3-319-09873-9_67(800-811)Online publication date: 2014
https://doi.org/10.1007/978-3-319-09873-9_67
Show More Cited By

Index Terms

GLOpenCL: OpenCL support on hardware- and software-managed cache multicores
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
      2. Source code generation
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

An OpenCL framework for heterogeneous multicores with local memory
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

In this paper, we present the design and implementation of an Open Computing Language (OpenCL) framework that targets heterogeneous accelerator multicore architectures with local memory. The architecture consists of a general-purpose processor core and ...
Achieving a single compute device image in OpenCL for multiple GPUs
PPoPP '11

In this paper, we propose an OpenCL framework that combines multiple GPUs and treats them as a single compute device. Providing a single virtual compute device image to the user makes an OpenCL application written for a single GPU portable to the ...
Achieving a single compute device image in OpenCL for multiple GPUs
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

In this paper, we propose an OpenCL framework that combines multiple GPUs and treats them as a single compute device. Providing a single virtual compute device image to the user makes an OpenCL application written for a single GPU portable to the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers

January 2011

226 pages

ISBN:9781450302418

DOI:10.1145/1944862

General Chairs:
Manolis Katevenis
FORTH-ICS and U.Crete, Greece
,
Margaret Martonosi
Princeton University
,
Program Chairs:
Christos Kozyrakis
Stanford University
,
Olivier Temam
INRIA, France

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

HiPEAC: HiPEAC Network of Excellence

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 January 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HIPEAC '11

Sponsor:

HiPEAC

HIPEAC '11: International Conference on High-Performance and Embedded Architectures and Compilers

January 24 - 26, 2011

Heraklion, Greece

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
365
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ding HHuang M(2017)PolyPC: Polymorphic parallel computing framework on embedded reconfigurable system2017 27th International Conference on Field Programmable Logic and Applications (FPL)10.23919/FPL.2017.8056770(1-8)Online publication date: Sep-2017
https://doi.org/10.23919/FPL.2017.8056770
Fabeiro JAndrade DFraguela BDoallo R(2015)Automatic Generation of Optimized OpenCL Codes Using OCLoptimizerThe Computer Journal10.1093/comjnl/bxv03858:11(3057-3073)Online publication date: 2-Jun-2015
https://doi.org/10.1093/comjnl/bxv038
Fabeiro JAndrade DFraguela BDoallo R(2014)Writing Self-adaptive Codes for Heterogeneous SystemsEuro-Par 2014 Parallel Processing10.1007/978-3-319-09873-9_67(800-811)Online publication date: 2014
https://doi.org/10.1007/978-3-319-09873-9_67
Sakamoto RSato MKoizumi YAmano HNamiki M(2012)An OpenCL Runtime Library for Embedded Multi-Core AcceleratorProceedings of the 2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2012.67(419-422)Online publication date: 19-Aug-2012
https://dl.acm.org/doi/10.1109/RTCSA.2012.67
Owaida MBellas NAntonopoulos CDaloukas KAntoniadis CPhillips JHu AGraeb H(2011)Massively parallel programming models used as hardware description languagesProceedings of the International Conference on Computer-Aided Design10.5555/2132325.2132409(326-333)Online publication date: 7-Nov-2011
https://dl.acm.org/doi/10.5555/2132325.2132409
Owaida MBellas NAntonopoulos CDaloukas KAntoniadis C(2011)Massively parallel programming models used as hardware description languagesProceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design10.1109/ICCAD.2011.6105349(326-333)Online publication date: 7-Nov-2011
https://dl.acm.org/doi/10.1109/ICCAD.2011.6105349
Owaida MBellas NDaloukas KAntonopoulos C(2011)Synthesis of Platform Architectures from OpenCL ProgramsProceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines10.1109/FCCM.2011.19(186-193)Online publication date: 1-May-2011
https://dl.acm.org/doi/10.1109/FCCM.2011.19

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten