Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3204919.3204935acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiwoclConference Proceedingsconference-collections
extended-abstract

OpenCL Optimization and Best Practices for Qualcomm Adreno GPUs

Published: 14 May 2018 Publication History

Abstract

As the industry's leading mobile graphics processing unit (GPU) core, Adreno™ in Qualcomm®'s Snapdragon™ SOCs has supported the OpenCL™ standard since its A3x family and all through its A4x, A5x families, and the latest A6x family. How to effectively program and optimize OpenCL applications on Adreno OpenCL is of great interest for many OEMs as well as 3rd party app developers. This paper provides a high level overview of Adreno's compute architecture, introduces Adreno's OpenCL support and general guidance and good practices on programming, optimization and profiling, and illustrates how to apply them and achieve good performance through two use case studies.

References

[1]
Rotem Aviv and Guohui Wang. 2016. OpenCL-Based Mobile GPGPU Benchmarking: Methods and Challenges. In Proceedings of the 4th International Workshop on OpenCL (IWOCL '16). ACM, New York, NY, USA, Article 3, 4 pages.
[2]
K. He, J. Sun, and X. Tang. 2013. Guided Image Filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 6 (June 2013), 1397--1409.
[3]
A. Hosni, C. Rhemann, M. Bleyer, C. Rother, and M. Gelautz. 2013. Fast Cost-Volume Filtering for Visual Correspondence and Beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 2 (Feb 2013), 504--511.
[4]
https://developer.qualcomm.com. 2018. Adreno GPUs. (March 2018). Retrieved April 13, 2018 from https://developer.qualcomm.com/software/adreno-gpu-sdk/gpu
[5]
https://developer.qualcomm.com. 2018. Adreno OpenCL programming guide and best practices. (March 2018). Retrieved April 13, 2018 from https://developer.qualcomm.com/qfile/33472/80-nb295-11_a.pdf
[6]
https://developer.qualcomm.com. 2018. Matrix Multiply on Adreno GPUs. (March 2018). Retrieved April 13, 2018 from https://developer.qualcomm.com/blog/matrix-multiply-adreno-gpus-part-1-opencl-optimization
[7]
Amit Jindal and Wenjia Ruan. 2017. Symphony: Task Scheduling and Memory Management in Heterogeneous Computing. In Proceedings of the 5th International Workshop on OpenCL (IWOCL 2017). ACM, New York, NY, USA, Article 13, 1 pages.
[8]
David R. Kaeli, Perhaad Mistry, Dana Schaa, and Dong Ping Zhang. 2015. Heterogeneous Computing with OpenCL 2.0 (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[9]
Aaftab Munshi, Benedict Gaster, Timothy G. Mattson, James Fung, and Dan Ginsburg. 2011. OpenCL Programming Guide (1st ed.). Addison-Wesley Professional.
[10]
David A. Patterson and John L. Hennessy. 2017. Computer Architecture: A Quantitative Approach, 6th edition. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[11]
Qualcomm. 2018. Snapdragon Neural Processing Engine for AI. (March 2018). Retrieved April 13, 2018 from https://developer.qualcomm.com/software/snapdragon-neural-processing-engine-ai
[12]
Qualcomm. 2018. Snapdragon Profiler. (March 2018). Retrieved April 13, 2018 from https://developer.qualcomm.com/software/snapdragon-profiler
[13]
G. Wang, Y. Xiong, J. Yun, and J. R. Cavallaro. 2013. Accelerating computer vision algorithms using OpenCL framework on the mobile GPU - A case study. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 2629--2633.

Cited By

View all
  • (2021)An efficient GPU-accelerated inference engine for binary neural network on mobile phonesJournal of Systems Architecture10.1016/j.sysarc.2021.102156117(102156)Online publication date: Aug-2021
  • (2020)PhoneBitProceedings of the 23rd Conference on Design, Automation and Test in Europe10.5555/3408352.3408531(786-791)Online publication date: 9-Mar-2020
  • (2020)PhoneBit: Efficient GPU-Accelerated Binary Neural Network Inference Engine for Mobile Phones2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116236(786-791)Online publication date: Mar-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IWOCL '18: Proceedings of the International Workshop on OpenCL
May 2018
108 pages
ISBN:9781450364393
DOI:10.1145/3204919
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

In-Cooperation

  • Huawei Technologies Co. Ltd.: Huawei Technologies Co. Ltd.
  • Khronos: Khronos Group
  • Xilinx: Xilinx Inc.
  • Codeplay: Codeplay Software Ltd.
  • Intel: Intel
  • The University of Bristol: The University of Bristol

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2018

Check for updates

Author Tags

  1. CUDA
  2. GPGPU
  3. GPU
  4. GPU Optimization
  5. OpenCL

Qualifiers

  • Extended-abstract
  • Research
  • Refereed limited

Conference

IWOCL '18
IWOCL '18: International Workshop on OpenCL
May 14 - 16, 2018
Oxford, United Kingdom

Acceptance Rates

IWOCL '18 Paper Acceptance Rate 16 of 33 submissions, 48%;
Overall Acceptance Rate 84 of 152 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)6
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2021)An efficient GPU-accelerated inference engine for binary neural network on mobile phonesJournal of Systems Architecture10.1016/j.sysarc.2021.102156117(102156)Online publication date: Aug-2021
  • (2020)PhoneBitProceedings of the 23rd Conference on Design, Automation and Test in Europe10.5555/3408352.3408531(786-791)Online publication date: 9-Mar-2020
  • (2020)PhoneBit: Efficient GPU-Accelerated Binary Neural Network Inference Engine for Mobile Phones2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116236(786-791)Online publication date: Mar-2020
  • (2019)Towards Algebraic Modeling of GPU Memory Access for Bank Conflict Mitigation2019 IEEE International Workshop on Signal Processing Systems (SiPS)10.1109/SiPS47522.2019.9020385(103-108)Online publication date: Oct-2019
  • (2019)Demo: Accelerating Depth-Map on Mobile Device Using CPU-GPU Co-processingComputer Analysis of Images and Patterns10.1007/978-3-030-29888-3_7(75-86)Online publication date: 22-Aug-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media