Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1944862.1944868acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipeacConference Proceedingsconference-collections
research-article

GLOpenCL: OpenCL support on hardware- and software-managed cache multicores

Published: 24 January 2011 Publication History

Abstract

OpenCL is an industry supported standard for writing programs that execute on multicore platforms as well as on accelerators, such as GPUs or the SPEs of the Cell B.E. In this paper we introduce GLOpenCL, a unified development framework which supports OpenCL on both homogeneous, shared memory, as well as on heterogeneous, distributed memory multicores. The framework consists of a compiler, based on the LLVM compiler infrastructure, and a run-time library, sharing the same basic architecture across all target platforms. The compiler recognizes OpenCL constructs, performs source-to-source code transformations targeting both efficiency and semantic correctness, and adds calls to the run-time library. The latter offers functionality for work creation, management and execution, as well as for data transfers. We evaluate our framework using benchmarks from the distributions of OpenCL implementations by hardware vendors. We find that our generic system performs comparably or better than customized, platform-specific vendor distributions. OpenCL is designed and marketed as a write-once run-anywhere software development framework. However, the standard leaves enough room for target platform specific optimizations. Our experimentation with different, customized implementations of kernels reveals that optimized, hardware mapped implementations are both possible and necessary in the context of OpenCL -- especially on non-conventional multicores -- if performance is considered a higher priority than programmability.

References

[1]
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, 2002.
[2]
ATI-AMD. ATI Stream Software Development Kit (SDK) v2.1. http://developer.amd.com/gpu/ATIStreamSDK/Pages/default.aspx.
[3]
F. Black and M. Scholes. The Pricing of Options and Corporate Liabilities. The Journal of Political Economy, 81(3):637--654, 1973.
[4]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. Journal of Parallel and Distributed Computing, 37(1):55--69, 1996.
[5]
Cell B. E. Performance Counter Tool. http://www.ibm.com/developerworks/power/tutorials/pa-sdk3tool/.
[6]
Clang: A C Language Family Frontend for LLVM. http://clang.llvm.org/.
[7]
G. Diamos, A. Kerr, and M. Kesavan. Translating GPU Binaries to Tiered SIMD Architectures with Ocelot. Technical report, Georgia Institute of Technology, 2009.
[8]
H. Franke and R. Russel. Fuss, Futexes and Furlocks: Fast User-Space Locking in Linux. In Proceedings of the Otawa Linux Symposium, pages 85--97, 2002.
[9]
Intel Corporation. Intel VTune Performance Analyzer. Document Number 310866-001.
[10]
International Business Machines Corporation (IBM). IBM SDK for Multicore Acceleration Version 3.1. http://www.ibm.com/developerworks/power/cell/.
[11]
International Business Machines Corporation (IBM). OpenCL Development Kit for Linux on Power. http://www.alphaworks.ibm.com/tech/opencl.
[12]
Khronos OpenCL Working Group and A. Munshi. The OpenCL Specification Version: 1.0 Document Revision: 48, 2009.
[13]
C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization, pages 75--86, 2004.
[14]
J. Lee, S. Seo, C. Kim, J. Kim, P. Chun, Z. Sura, J. Kim, and S. Han. COMIC: A Coherent Shared Memory Interface for Cell BE. In PACT '08: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pages 303--314, 2008.
[15]
F. Mueller. A Library Implementation of POSIX Threads under Unix. In Proceedings of the USENIX Conference, pages 29--41, 1993.
[16]
National Institute of Standards and Technology (NIST). Announcing the ADVANCED ENCRYPTION STANDARD (AES). Federal Information Processing Standards Publication 197, November 2001.
[17]
NVIDIA. CUDA Technology. http://www.nvidia.com/object/cuda\_home\_new.html.
[18]
NVIDIA. CUDA Programming Guide, Version 3.0, 2010.
[19]
NVIDIA. NVIDIA Compute PTX: Parallel Thread Execution ISA Version 2.0, 2010.
[20]
OpenMP. The OpenMP API. http://openmp.org/wp/.
[21]
J. P. Perez, P. Bellens, R. M. Badia, and J. Labarta. CellSs: Making it Easier to Program the Cell Broadband Engine Processor. IBM Journal of Research and Development, 51(5):593--604, 2007.
[22]
I. Sobel and G. Feldman. A 3x3 Isotropic Gradient Operator for Image Processing. Talk presented at the Stanford Artificial Project, 1968.
[23]
J. A. Stratton, S. S. Stone, and W.-M. W. Hwu. MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs. In Languages and Compilers for Parallel Computing: 21th International Workshop, LCPC 2008, Revised Selected Papers, pages 16--30, 2008.
[24]
UPC Consortium. UPC Language Specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Lab, 2005.

Cited By

View all
  • (2017)PolyPC: Polymorphic parallel computing framework on embedded reconfigurable system2017 27th International Conference on Field Programmable Logic and Applications (FPL)10.23919/FPL.2017.8056770(1-8)Online publication date: Sep-2017
  • (2017)Broken-Karatsuba multiplication and its application to Montgomery modular multiplication2017 27th International Conference on Field Programmable Logic and Applications (FPL)10.23919/FPL.2017.8056769(1-4)Online publication date: Sep-2017
  • (2015)Automatic Generation of Optimized OpenCL Codes Using OCLoptimizerThe Computer Journal10.1093/comjnl/bxv03858:11(3057-3073)Online publication date: 2-Jun-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
January 2011
226 pages
ISBN:9781450302418
DOI:10.1145/1944862
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • HiPEAC: HiPEAC Network of Excellence

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 January 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. OpenCL
  2. compilers
  3. hardware-managed cache multicores
  4. runtime
  5. software-managed cache multicores

Qualifiers

  • Research-article

Conference

HIPEAC '11
Sponsor:
  • HiPEAC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)PolyPC: Polymorphic parallel computing framework on embedded reconfigurable system2017 27th International Conference on Field Programmable Logic and Applications (FPL)10.23919/FPL.2017.8056770(1-8)Online publication date: Sep-2017
  • (2017)Broken-Karatsuba multiplication and its application to Montgomery modular multiplication2017 27th International Conference on Field Programmable Logic and Applications (FPL)10.23919/FPL.2017.8056769(1-4)Online publication date: Sep-2017
  • (2015)Automatic Generation of Optimized OpenCL Codes Using OCLoptimizerThe Computer Journal10.1093/comjnl/bxv03858:11(3057-3073)Online publication date: 2-Jun-2015
  • (2014)Writing Self-adaptive Codes for Heterogeneous SystemsEuro-Par 2014 Parallel Processing10.1007/978-3-319-09873-9_67(800-811)Online publication date: 2014
  • (2012)An OpenCL Runtime Library for Embedded Multi-Core AcceleratorProceedings of the 2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2012.67(419-422)Online publication date: 19-Aug-2012
  • (2011)Massively parallel programming models used as hardware description languagesProceedings of the International Conference on Computer-Aided Design10.5555/2132325.2132409(326-333)Online publication date: 7-Nov-2011
  • (2011)Massively parallel programming models used as hardware description languagesProceedings of the 2011 IEEE/ACM International Conference on Computer-Aided Design10.1109/ICCAD.2011.6105349(326-333)Online publication date: 7-Nov-2011
  • (2011)Synthesis of Platform Architectures from OpenCL ProgramsProceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines10.1109/FCCM.2011.19(186-193)Online publication date: 1-May-2011

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media