Article

Power-efficient computing for compute-intensive GPGPU applications

Authors:

Syed Zohaib Gilani,

Nam Sung Kim,

Michael J. SchulteAuthors Info & Claims

HPCA '13: Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)

Pages 330 - 341

https://doi.org/10.1109/HPCA.2013.6522330

Published: 23 February 2013 Publication History

Abstract

The peak compute performance of GPUs has been increased by integrating more compute resources and operating them at higher frequency. However, such approaches significantly increase power consumption of GPUs, limiting further performance increase due to the power constraint. Facing such a challenge, we propose three techniques to improve power efficiency and performance of GPUs in this paper. First, we observe that many GPGPU applications are integer-intensive. For such applications, we combine a pair of dependent integer instructions into a composite instruction that can be executed by an enhanced fused multiply-add unit. Second, we observe that computations for many instructions are duplicated across multiple threads. We dynamically detect such instructions and execute them in a separate scalar unit. Finally, we observe that 16 or fewer bits are sufficient for accurate representation of operands and results of many instructions. Thus, we split the 32-bit datapath into two 16-bit datapath slices that can concurrently issue and execute up to two such instructions per cycle. All three proposed techniques can considerably increase utilization of compute resources, improving power efficiency and performance by 20% and 15%, respectively.

Cited By

View all

Angerd ASintorn EStenstrom P(2020)A GPU Register File using Static Data CompressionProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404431(1-10)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3404397.3404431
Yazdanpanah ASajadimanesh SSafari S(2020)EREERMicroprocessors & Microsystems10.1016/j.micpro.2020.10317677:COnline publication date: 1-Sep-2020
https://dl.acm.org/doi/10.1016/j.micpro.2020.103176
Asghari Esfeden HKhorasani FJeon HWong DAbu-Ghazaleh NBahar IHerlihy MWitchel ELebeck A(2019)CORFProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304026(701-714)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304026
Show More Cited By

Recommendations

Power-efficient computing for compute-intensive GPGPU applications
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

The peak performance of graphics processing units (GPUs) has traditionally been increased by increasing the number of compute resources and/or their frequency. However, these approaches significantly increase the power consumption of GPUs. Consequently, ...
A small and power efficient checkpoint core architecture for manycore processors

This article describes and evaluates a small, out-of-order, simultaneous multithreaded SMT core architecture suitable for power constrained microprocessors, such as manycore microprocessors for high performance computing. The architecture does not ...
Exploring the performance limits of simultaneous multithreading for memory intensive applications

Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications ...

Comments

Information & Contributors

Information

Published In

HPCA '13: Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)

February 2013

653 pages

ISBN:9781467355858

Publisher

IEEE Computer Society

United States

Publication History

Published: 23 February 2013

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Angerd ASintorn EStenstrom P(2020)A GPU Register File using Static Data CompressionProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404431(1-10)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3404397.3404431
Yazdanpanah ASajadimanesh SSafari S(2020)EREERMicroprocessors & Microsystems10.1016/j.micpro.2020.10317677:COnline publication date: 1-Sep-2020
https://dl.acm.org/doi/10.1016/j.micpro.2020.103176
Asghari Esfeden HKhorasani FJeon HWong DAbu-Ghazaleh NBahar IHerlihy MWitchel ELebeck A(2019)CORFProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304026(701-714)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304026
Sadrosadati MEhsani SFalahati HAusavarungnirun RTavakkol AAbaee MOrosa LWang YSarbazi-Azad HMutlu O(2019)ITAPACM Transactions on Architecture and Code Optimization10.1145/329160616:1(1-26)Online publication date: 27-Feb-2019
https://dl.acm.org/doi/10.1145/3291606
Tan JYan K(2018)Efficiently Managing the Impact of Hardware Variability on GPUs’ Streaming ProcessorsACM Transactions on Design Automation of Electronic Systems10.1145/328730824:1(1-15)Online publication date: 21-Dec-2018
https://dl.acm.org/doi/10.1145/3287308
Voitsechov DZulfiqar AStephenson MGebhart MKeckler S(2018)Software-Directed Techniques for Improved GPU Register File UtilizationACM Transactions on Architecture and Code Optimization10.1145/324390515:3(1-23)Online publication date: 24-Sep-2018
https://dl.acm.org/doi/10.1145/3243905
Liu ZWong DKim N(2018)Load-Triggered Warp Approximation on GPUProceedings of the International Symposium on Low Power Electronics and Design10.1145/3218603.3218626(1-6)Online publication date: 23-Jul-2018
https://dl.acm.org/doi/10.1145/3218603.3218626
Angerd ASintorn EStenström P(2017)A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUsACM Transactions on Architecture and Code Optimization10.1145/315103214:4(1-25)Online publication date: 5-Dec-2017
https://dl.acm.org/doi/10.1145/3151032
Wang SKan LLee CHwang YLee J(2017)Architecture and Compiler Support for GPUs Using Energy-Efficient Affine Register FilesACM Transactions on Design Automation of Electronic Systems10.1145/313321823:2(1-25)Online publication date: 7-Nov-2017
https://dl.acm.org/doi/10.1145/3133218
Kloosterman JBeaumont JJamshidi DBailey JMudge TMahlke SHunter HMoreno JEmer JSanchez D(2017)ReglessProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123974(151-164)Online publication date: 14-Oct-2017
https://dl.acm.org/doi/10.1145/3123939.3123974
Show More Cited By

Abstract

Cited By

Recommendations