extended-abstract

OpenCL Optimization and Best Practices for Qualcomm Adreno GPUs

Authors:

Hongqiang Wang,

Jay Yun,

Alex BourdAuthors Info & Claims

IWOCL '18: Proceedings of the International Workshop on OpenCL

Article No.: 16, Pages 1 - 8

https://doi.org/10.1145/3204919.3204935

Published: 14 May 2018 Publication History

Get Access

Abstract

As the industry's leading mobile graphics processing unit (GPU) core, Adreno™ in Qualcomm®'s Snapdragon™ SOCs has supported the OpenCL™ standard since its A3x family and all through its A4x, A5x families, and the latest A6x family. How to effectively program and optimize OpenCL applications on Adreno OpenCL is of great interest for many OEMs as well as 3rd party app developers. This paper provides a high level overview of Adreno's compute architecture, introduces Adreno's OpenCL support and general guidance and good practices on programming, optimization and profiling, and illustrates how to apply them and achieve good performance through two use case studies.

References

[1]

Rotem Aviv and Guohui Wang. 2016. OpenCL-Based Mobile GPGPU Benchmarking: Methods and Challenges. In Proceedings of the 4th International Workshop on OpenCL (IWOCL '16). ACM, New York, NY, USA, Article 3, 4 pages.

Digital Library

Google Scholar

[2]

K. He, J. Sun, and X. Tang. 2013. Guided Image Filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 6 (June 2013), 1397--1409.

Digital Library

Google Scholar

[3]

A. Hosni, C. Rhemann, M. Bleyer, C. Rother, and M. Gelautz. 2013. Fast Cost-Volume Filtering for Visual Correspondence and Beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 2 (Feb 2013), 504--511.

Digital Library

Google Scholar

[4]

https://developer.qualcomm.com. 2018. Adreno GPUs. (March 2018). Retrieved April 13, 2018 from https://developer.qualcomm.com/software/adreno-gpu-sdk/gpu

Google Scholar

[5]

https://developer.qualcomm.com. 2018. Adreno OpenCL programming guide and best practices. (March 2018). Retrieved April 13, 2018 from https://developer.qualcomm.com/qfile/33472/80-nb295-11_a.pdf

Google Scholar

[6]

https://developer.qualcomm.com. 2018. Matrix Multiply on Adreno GPUs. (March 2018). Retrieved April 13, 2018 from https://developer.qualcomm.com/blog/matrix-multiply-adreno-gpus-part-1-opencl-optimization

Google Scholar

[7]

Amit Jindal and Wenjia Ruan. 2017. Symphony: Task Scheduling and Memory Management in Heterogeneous Computing. In Proceedings of the 5th International Workshop on OpenCL (IWOCL 2017). ACM, New York, NY, USA, Article 13, 1 pages.

Digital Library

Google Scholar

[8]

David R. Kaeli, Perhaad Mistry, Dana Schaa, and Dong Ping Zhang. 2015. Heterogeneous Computing with OpenCL 2.0 (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Digital Library

Google Scholar

[9]

Aaftab Munshi, Benedict Gaster, Timothy G. Mattson, James Fung, and Dan Ginsburg. 2011. OpenCL Programming Guide (1st ed.). Addison-Wesley Professional.

Digital Library

Google Scholar

[10]

David A. Patterson and John L. Hennessy. 2017. Computer Architecture: A Quantitative Approach, 6th edition. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Digital Library

Google Scholar

[11]

Qualcomm. 2018. Snapdragon Neural Processing Engine for AI. (March 2018). Retrieved April 13, 2018 from https://developer.qualcomm.com/software/snapdragon-neural-processing-engine-ai

Google Scholar

[12]

Qualcomm. 2018. Snapdragon Profiler. (March 2018). Retrieved April 13, 2018 from https://developer.qualcomm.com/software/snapdragon-profiler

Google Scholar

[13]

G. Wang, Y. Xiong, J. Yun, and J. R. Cavallaro. 2013. Accelerating computer vision algorithms using OpenCL framework on the mobile GPU - A case study. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 2629--2633.

Google Scholar

Cited By

View all

He SMeng HZhou ZLiu YHuang KChen G(2021)An efficient GPU-accelerated inference engine for binary neural network on mobile phonesJournal of Systems Architecture10.1016/j.sysarc.2021.102156117(102156)Online publication date: Aug-2021
https://doi.org/10.1016/j.sysarc.2021.102156
Chen GHe SMeng HHuang KDi Natale GFummi F(2020)PhoneBitProceedings of the 23rd Conference on Design, Automation and Test in Europe10.5555/3408352.3408531(786-791)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.5555/3408352.3408531
Chen GHe SMeng HHuang K(2020)PhoneBit: Efficient GPU-Accelerated Binary Neural Network Inference Engine for Mobile Phones2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116236(786-791)Online publication date: Mar-2020
https://doi.org/10.23919/DATE48585.2020.9116236
Show More Cited By

Index Terms

OpenCL Optimization and Best Practices for Qualcomm Adreno GPUs
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
2. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages

Recommendations

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
An OpenCL micro-benchmark suite for GPUs and CPUs

Open computing language (OpenCL) is a new industry standard for task-parallel and data-parallel heterogeneous computing on a variety of modern CPUs, GPUs, DSPs, and other microprocessor designs. OpenCL is vendor independent and hence not specialized for ...

Comments

Information & Contributors

Information

Published In

IWOCL '18: Proceedings of the International Workshop on OpenCL

May 2018

108 pages

ISBN:9781450364393

DOI:10.1145/3204919

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

In-Cooperation

Huawei Technologies Co. Ltd.: Huawei Technologies Co. Ltd.
Khronos: Khronos Group
Xilinx: Xilinx Inc.
Codeplay: Codeplay Software Ltd.
Intel: Intel
The University of Bristol: The University of Bristol

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2018

Check for updates

Author Tags

Qualifiers

Extended-abstract
Research
Refereed limited

Conference

IWOCL '18

IWOCL '18: International Workshop on OpenCL

May 14 - 16, 2018

Oxford, United Kingdom

Acceptance Rates

IWOCL '18 Paper Acceptance Rate 16 of 33 submissions, 48%;

Overall Acceptance Rate 84 of 152 submissions, 55%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
738
Total Downloads

Downloads (Last 12 months)69
Downloads (Last 6 weeks)6

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

He SMeng HZhou ZLiu YHuang KChen G(2021)An efficient GPU-accelerated inference engine for binary neural network on mobile phonesJournal of Systems Architecture10.1016/j.sysarc.2021.102156117(102156)Online publication date: Aug-2021
https://doi.org/10.1016/j.sysarc.2021.102156
Chen GHe SMeng HHuang KDi Natale GFummi F(2020)PhoneBitProceedings of the 23rd Conference on Design, Automation and Test in Europe10.5555/3408352.3408531(786-791)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.5555/3408352.3408531
Chen GHe SMeng HHuang K(2020)PhoneBit: Efficient GPU-Accelerated Binary Neural Network Inference Engine for Mobile Phones2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116236(786-791)Online publication date: Mar-2020
https://doi.org/10.23919/DATE48585.2020.9116236
Ferranti LBoutellier J(2019)Towards Algebraic Modeling of GPU Memory Access for Bank Conflict Mitigation2019 IEEE International Workshop on Signal Processing Systems (SiPS)10.1109/SiPS47522.2019.9020385(103-108)Online publication date: Oct-2019
https://doi.org/10.1109/SiPS47522.2019.9020385
Fasogbon PAksu EHeikkilä L(2019)Demo: Accelerating Depth-Map on Mobile Device Using CPU-GPU Co-processingComputer Analysis of Images and Patterns10.1007/978-3-030-29888-3_7(75-86)Online publication date: 22-Aug-2019
https://doi.org/10.1007/978-3-030-29888-3_7

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

An OpenCL micro-benchmark suite for GPUs and CPUs

Comments

Published In

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

An OpenCL micro-benchmark suite for GPUs and CPUs

Comments

Information

Published In

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations