Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction

Phuong, Thi Yen; Lee, Deok-Young; Lee, Jeong-Gun

doi:10.1007/s11771-017-3676-5

Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction

Published: 16 December 2017

Volume 24, pages 2624–2637, (2017)
Cite this article

Journal of Central South University Aims and scope Submit manuscript

Thi Yen Phuong¹,
Deok-Young Lee² &
Jeong-Gun Lee¹

102 Accesses
5 Citations
Explore all metrics

Abstract

In the era of modern high performance computing, GPUs have been considered an excellent accelerator for general purpose data-intensive parallel applications. To achieve application speedup from GPUs, many of performance-oriented optimization techniques have been proposed. However, in order to satisfy the recent trend of power and energy consumptions, power/energy-aware optimization of GPUs needs to be investigated with detailed analysis in addition to the performance-oriented optimization. In this work, in order to explore the impact of various optimization strategies on GPU performance, power and energy consumptions, we evaluate performance and power/energy consumption of a well-known application running on different commercial GPU devices with the different optimization strategies. In particular, in order to see the more generalized performance and power consumption patterns of GPU based accelerations, our evaluations are performed with three different Nvdia GPU generations (Fermi, Kepler and Maxwell architectures), various core clock frequencies and memory clock frequencies. We analyze how a GPU kernel execution is affected by optimization and what GPU architectural factors have much impact on its performance and power/energy consumption. This paper also categorizes which optimization technique primarily improves which metric (i.e., performance, power or energy efficiency). Furthermore, voltage frequency scaling (VFS) is also applied to examine the effect of changing a clock frequency on these metrics. In general, our work shows that effective GPU optimization strategies can improve the application performance significantly without increasing power and energy consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance/Energy Aware Optimization of Parallel Applications on GPUs Under Power Capping

Analysis and Characterization of GPU Benchmarks for Kernel Concurrency Efficiency

Kernel concurrency opportunities based on GPU benchmarks characterization

Article 17 January 2019

References

STRATTON J A, ANSSARI N, RODRIGUES C, I J SUNG, OBEID N, CHANG L W, LIU G D, HWU W. Optimization and architecture effects on GPU computing workload performance [C]//Innovative Parallel Computing (InPar). San Jose, USA: IEEE, 2012: 1–10.
Google Scholar
RYOO S, RODRIGUES C I, BAGHSORKHI S S, STONE S S, KIRK D B, HWU W W. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA [C]//Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '08). Utah, USA: ACM, 2008: 73–82.
Google Scholar
JANG B, DO S, PIEN H, KAELI D. Architecture-aware optimization targeting multithreaded stream computing [C]//Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units-GPGPU-2. Washington DC, USA: ACM, 2009: 62–70.
Google Scholar
JANG B, SCHAA D, MISTRY P, KAELI D. Exploiting memory access patterns to improve memory performance in data-parallel architectures [J]. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(1): 105–118.
Article Google Scholar
MEI X, ZHAO K, LIU C, CHU X. Benchmarking the memory hierarchy of modern GPUs [M]. Heidelberg: Springer Berlin, 2014: 144–156.
Google Scholar
SUDA R, REN D. Accurate measurements and precise modeling of power dissipation of CUDA kernels toward power optimized high performance CPU-GPU computing [C]//The Tenth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). Hiroshima, Japan: IEEE, 2009.
Google Scholar
CALANDRINI G 1, GARDEL A, BRAVO I, REVENGA P, LÁZARO J L, TOLED-MOREO F J. Power measurement methods for energy efficient applications [J]. Sensors, 2013, 13(6): 7786–7796.
Article Google Scholar
DASGUPTA A, HONG S, KIM H, PARK J. A new temperature distribution measurement method on GPU architectures using thermocouples [R]. Georgia Institute of Technology, 2012.
Google Scholar
LANG J, RÜNGER G. High-resolution power profiling of GPU functions using low-resolution measurement [C]//19th International Conference on Parallel Processing (Euro-Par 2013). Aachen, Germany: Springer-Verlag Berlin, 2013: 801–812.
Google Scholar
COLLANGE S, DEFOUR D, TISSERAND A. Power consumption of GPUs from a software perspective [C]//ICCS '09 Proceedings of the 9th International Conference on Computational Science. LA, USA: Springer-Verlag Berlin, 2009: 914–923.
Google Scholar
PHUONG T Y, LEE J G. Software based ultrasound B-mode/beamforming optimization on GPU and its performance prediction [C]//21th IEEE International Conference on High Performance Computing. Goa, India: IEEE, 2014: 1–10.
Google Scholar
JIAO Y, LIN H, BALAJI P, FENG W. Power and performance characterization of computational kernels on the GPU [C]//IEEE/ACM International Conference on Green Computing and Communications and International Conference on Cyber, Physical and Social Computing. Hangzhou, China: IEEE, 2010: 221–228.
Google Scholar
HONG S. Modeling performance and power for energy-efficient GPGPU computing [D]. Georgia: Georgia Institute of Technology, 2012.
Google Scholar
HONG S, KIM H. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness [J]. In ACM SIGARCH Computer Architecture News, 2009, 37: 152–163.
Article Google Scholar
HONG S, KIM H. An integrated GPU power and performance model [J]. In ACM SIGARCH Computer Architecture News, 2010, 38: 280–289.
Article Google Scholar
KASICHAYANULA K, TERPSTRA D, LUSZCZEK P, TOMOV S, MOORE S, PETERSON G D. Power aware computing on GPUs [C]//Symposium on Application Accelerators in High Performance Computing. Illinois, USA: IEEE, 2012: 64–73.
Google Scholar
ABE Y, SASAKI H, KATO S, INOUE K, EDAHIRO M, PERES M. Power and performance characterization and modeling of GPUaccelerated systems [C]//IEEE 28th International Symposium on Parallel and Distributed Processing. Arizona, USA: IEEE, 2014: 113–122.
Google Scholar
ABE Y, SASAKI H, PERES M, INOUE K, MURAKAMI K, KATO S. Power and performance analysis of GPU-accelerated systems [C]//Proceedings of the ACM Workshop on Power-Aware Computing and System. California, USA: ACM, 2012.
Google Scholar
MEI Xin-xin, YUNG Ling-sing, ZHAO Kai-yong, CHU Xiao-wen. A measurement study of GPU DVFS on energy conservation [C]//Proceedings of the ACM Workshop on Power-Aware Computing and System. Pennsylvania, USA: ACM, 2013.
Google Scholar
RONG G E. VOGT R, MAJUMDER J, ALAM A, BURTSCHER M, ZONG Zi-liang. Effects of dynamic voltage and frequency scaling on a K20 GPU [C]//Parallel Processing (ICPP), 2013 42nd International Conference. Lyon, France: IACC, 2013: 826–833.
Google Scholar
UKIDAVE Y, ZIABARI A K, MISTRY P, SCHIRNER G, KAELI D. Analyzing power efficiency of optimization techniques and algorithm design methods for applications on heterogeneous platforms [J]. International Journal of High Performance Computing Applications 2014, 28(3): 319–334.
Article Google Scholar
COPLIN J, BURTSCHER M. Effects of source-code optimizations on GPU performance and energy consumption [C]//Proceedings of the 8th Workshop on General Purpose Processing using GPUs. San Francisco, CA, USA, 2015.
Google Scholar
HARRIS M. Optimizing parallel reduction in CUDA, nvidia developer technology [EB/OL]. [2007]. http://developer.download. nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/d oc/reduction.pdf.
Google Scholar
NVIDIA [EB/OL]. [2017]. http://www.geforce.com/hardware/ desktop-gpus/geforce-gtx-titan-x/specifications.
HARRIS M. 5 things you should know about the new maxwell GPU architecture [EB/OL]. [2014–02–21]. http://devblogs.nvidia. com/parallelforall/5-things-you-should-know-about-new-maxwell-gp u-architecture/
Google Scholar

Download references

Author information

Authors and Affiliations

Smart Computing Lab., Department of Computer Engineering, Hallym University, Chuncheon, 24252, Korea
Thi Yen Phuong & Jeong-Gun Lee
College of General Education, Hallym University, Chuncheon, 24252, Korea
Deok-Young Lee

Authors

Thi Yen Phuong
View author publications
You can also search for this author in PubMed Google Scholar
Deok-Young Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jeong-Gun Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeong-Gun Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Phuong, T.Y., Lee, DY. & Lee, JG. Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction. J. Cent. South Univ. 24, 2624–2637 (2017). https://doi.org/10.1007/s11771-017-3676-5

Download citation

Received: 25 June 2016
Accepted: 13 March 2017
Published: 16 December 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s11771-017-3676-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance/Energy Aware Optimization of Parallel Applications on GPUs Under Power Capping

Analysis and Characterization of GPU Benchmarks for Kernel Concurrency Efficiency

Kernel concurrency opportunities based on GPU benchmarks characterization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Performance/Energy Aware Optimization of Parallel Applications on GPUs Under Power Capping

Analysis and Characterization of GPU Benchmarks for Kernel Concurrency Efficiency

Kernel concurrency opportunities based on GPU benchmarks characterization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation