Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1531743.1531766acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

A control-structure splitting optimization for GPGPU

Published: 18 May 2009 Publication History

Abstract

Control statements in a GPU program such as loops and branches pose serious challenges for the efficient usage of GPU resources because those control statements will lead to the serialization of threads and consequently ruin the occupancy of GPU, that is, the number of threads running concurrently. Unlike traditional vector processing units that are inside a general purpose processor, the GPU cannot leave the control statements to the CPU because fine-grain statement scheduling between GPU and CPU is impossible. We need an effective method to handle the control statements "just in place" on the GPUs.
In this paper, we propose novel techniques to transform control statements so that they can be executed efficiently on GPUs. Our techniques smartly increase code redundancy, which might be deemed as "de-optimization" for CPU, to improve the occupancy of a program on GPU and therefore improve performance. We focus our attention on how common programming structures such as loops and branches decrease the occupancy of single kernels and how to counter that. We demonstrate our optimizations on a synthetic benchmark and a complex parallel algorithm, the Lattice Boltzmann Method (LBM). Our results show that these techniques are very efficient and can lead to an increase in occupancy and a drastic improvement in performance compared to non-split version of the programs.

References

[1]
C. NVIDIA. Compute Unified Device Architecture Programming Guide. NVIDIA: Santa Clara, CA, 2007.
[2]
S. Ryoo, C. Rodrigues, S. Baghsorkhi, S. Stone, D. Kirk, and W. Wen-mei. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pages 73--82, 2008.
[3]
S. Succi. The Lattice Boltzmann Equation for Fluid Dynamics and Beyond.2001.
[4]
Y. Zhao. Lattice Boltzmann based PDE solver on the GPU. The Visual Computer, 24(5):323--333, 2008.

Cited By

View all
  • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
  • (2023)Reducing branch divergence to speed up parallel execution of unit testing on GPUsThe Journal of Supercomputing10.1007/s11227-023-05375-079:16(18340-18374)Online publication date: 13-May-2023
  • (2021)Pointer-Based Divergence Analysis for OpenCL 2.0 ProgramsACM Transactions on Parallel Computing10.1145/34706448:4(1-23)Online publication date: 15-Oct-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '09: Proceedings of the 6th ACM conference on Computing frontiers
May 2009
238 pages
ISBN:9781605584133
DOI:10.1145/1531743
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cuda
  2. gpgpu
  3. optimizations

Qualifiers

  • Research-article

Conference

CF '09
Sponsor:
CF '09: Computing Frontiers Conference
May 18 - 20, 2009
Ischia, Italy

Acceptance Rates

CF '09 Paper Acceptance Rate 26 of 113 submissions, 23%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
  • (2023)Reducing branch divergence to speed up parallel execution of unit testing on GPUsThe Journal of Supercomputing10.1007/s11227-023-05375-079:16(18340-18374)Online publication date: 13-May-2023
  • (2021)Pointer-Based Divergence Analysis for OpenCL 2.0 ProgramsACM Transactions on Parallel Computing10.1145/34706448:4(1-23)Online publication date: 15-Oct-2021
  • (2021)Contrived and Remediated GPU Thread Divergence Using a Flattening TechniqueAdvances in Parallel & Distributed Processing, and Applications10.1007/978-3-030-69984-0_46(647-658)Online publication date: 19-Oct-2021
  • (2018)Using program branch probability for the thread parallelisation of branch divergence on the CUDA platformInternational Journal of Autonomous and Adaptive Communications Systems10.1504/IJAACS.2018.09203111:2(171-191)Online publication date: 1-Jan-2018
  • (2018)Unraveling the Divergence of GPU Threads2018 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI46756.2018.00270(1398-1403)Online publication date: Dec-2018
  • (2016)GPU-accelerated string matching for database applicationsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-015-0409-y25:5(719-740)Online publication date: 1-Oct-2016
  • (2015)A Profile Guided Approach to Optimize Branch Divergence While Transforming Applications for GPUsProceedings of the 8th India Software Engineering Conference10.1145/2723742.2723760(176-185)Online publication date: 18-Feb-2015
  • (2015)Fusion of Calling Sites2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2015.16(90-97)Online publication date: Oct-2015
  • (2015)Algorithm Flattening: Complete branch elimination for GPU requires a paradigm shift from CPU thinking2015 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2015.7322477(1-6)Online publication date: Sep-2015
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media