Abstract
In the era of modern high performance computing, GPUs have been considered an excellent accelerator for general purpose data-intensive parallel applications. To achieve application speedup from GPUs, many of performance-oriented optimization techniques have been proposed. However, in order to satisfy the recent trend of power and energy consumptions, power/energy-aware optimization of GPUs needs to be investigated with detailed analysis in addition to the performance-oriented optimization. In this work, in order to explore the impact of various optimization strategies on GPU performance, power and energy consumptions, we evaluate performance and power/energy consumption of a well-known application running on different commercial GPU devices with the different optimization strategies. In particular, in order to see the more generalized performance and power consumption patterns of GPU based accelerations, our evaluations are performed with three different Nvdia GPU generations (Fermi, Kepler and Maxwell architectures), various core clock frequencies and memory clock frequencies. We analyze how a GPU kernel execution is affected by optimization and what GPU architectural factors have much impact on its performance and power/energy consumption. This paper also categorizes which optimization technique primarily improves which metric (i.e., performance, power or energy efficiency). Furthermore, voltage frequency scaling (VFS) is also applied to examine the effect of changing a clock frequency on these metrics. In general, our work shows that effective GPU optimization strategies can improve the application performance significantly without increasing power and energy consumption.
Similar content being viewed by others
References
STRATTON J A, ANSSARI N, RODRIGUES C, I J SUNG, OBEID N, CHANG L W, LIU G D, HWU W. Optimization and architecture effects on GPU computing workload performance [C]//Innovative Parallel Computing (InPar). San Jose, USA: IEEE, 2012: 1–10.
RYOO S, RODRIGUES C I, BAGHSORKHI S S, STONE S S, KIRK D B, HWU W W. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA [C]//Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '08). Utah, USA: ACM, 2008: 73–82.
JANG B, DO S, PIEN H, KAELI D. Architecture-aware optimization targeting multithreaded stream computing [C]//Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units-GPGPU-2. Washington DC, USA: ACM, 2009: 62–70.
JANG B, SCHAA D, MISTRY P, KAELI D. Exploiting memory access patterns to improve memory performance in data-parallel architectures [J]. IEEE Transactions on Parallel and Distributed Systems, 2011, 22(1): 105–118.
MEI X, ZHAO K, LIU C, CHU X. Benchmarking the memory hierarchy of modern GPUs [M]. Heidelberg: Springer Berlin, 2014: 144–156.
SUDA R, REN D. Accurate measurements and precise modeling of power dissipation of CUDA kernels toward power optimized high performance CPU-GPU computing [C]//The Tenth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). Hiroshima, Japan: IEEE, 2009.
CALANDRINI G 1, GARDEL A, BRAVO I, REVENGA P, LÁZARO J L, TOLED-MOREO F J. Power measurement methods for energy efficient applications [J]. Sensors, 2013, 13(6): 7786–7796.
DASGUPTA A, HONG S, KIM H, PARK J. A new temperature distribution measurement method on GPU architectures using thermocouples [R]. Georgia Institute of Technology, 2012.
LANG J, RÜNGER G. High-resolution power profiling of GPU functions using low-resolution measurement [C]//19th International Conference on Parallel Processing (Euro-Par 2013). Aachen, Germany: Springer-Verlag Berlin, 2013: 801–812.
COLLANGE S, DEFOUR D, TISSERAND A. Power consumption of GPUs from a software perspective [C]//ICCS '09 Proceedings of the 9th International Conference on Computational Science. LA, USA: Springer-Verlag Berlin, 2009: 914–923.
PHUONG T Y, LEE J G. Software based ultrasound B-mode/beamforming optimization on GPU and its performance prediction [C]//21th IEEE International Conference on High Performance Computing. Goa, India: IEEE, 2014: 1–10.
JIAO Y, LIN H, BALAJI P, FENG W. Power and performance characterization of computational kernels on the GPU [C]//IEEE/ACM International Conference on Green Computing and Communications and International Conference on Cyber, Physical and Social Computing. Hangzhou, China: IEEE, 2010: 221–228.
HONG S. Modeling performance and power for energy-efficient GPGPU computing [D]. Georgia: Georgia Institute of Technology, 2012.
HONG S, KIM H. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness [J]. In ACM SIGARCH Computer Architecture News, 2009, 37: 152–163.
HONG S, KIM H. An integrated GPU power and performance model [J]. In ACM SIGARCH Computer Architecture News, 2010, 38: 280–289.
KASICHAYANULA K, TERPSTRA D, LUSZCZEK P, TOMOV S, MOORE S, PETERSON G D. Power aware computing on GPUs [C]//Symposium on Application Accelerators in High Performance Computing. Illinois, USA: IEEE, 2012: 64–73.
ABE Y, SASAKI H, KATO S, INOUE K, EDAHIRO M, PERES M. Power and performance characterization and modeling of GPUaccelerated systems [C]//IEEE 28th International Symposium on Parallel and Distributed Processing. Arizona, USA: IEEE, 2014: 113–122.
ABE Y, SASAKI H, PERES M, INOUE K, MURAKAMI K, KATO S. Power and performance analysis of GPU-accelerated systems [C]//Proceedings of the ACM Workshop on Power-Aware Computing and System. California, USA: ACM, 2012.
MEI Xin-xin, YUNG Ling-sing, ZHAO Kai-yong, CHU Xiao-wen. A measurement study of GPU DVFS on energy conservation [C]//Proceedings of the ACM Workshop on Power-Aware Computing and System. Pennsylvania, USA: ACM, 2013.
RONG G E. VOGT R, MAJUMDER J, ALAM A, BURTSCHER M, ZONG Zi-liang. Effects of dynamic voltage and frequency scaling on a K20 GPU [C]//Parallel Processing (ICPP), 2013 42nd International Conference. Lyon, France: IACC, 2013: 826–833.
UKIDAVE Y, ZIABARI A K, MISTRY P, SCHIRNER G, KAELI D. Analyzing power efficiency of optimization techniques and algorithm design methods for applications on heterogeneous platforms [J]. International Journal of High Performance Computing Applications 2014, 28(3): 319–334.
COPLIN J, BURTSCHER M. Effects of source-code optimizations on GPU performance and energy consumption [C]//Proceedings of the 8th Workshop on General Purpose Processing using GPUs. San Francisco, CA, USA, 2015.
HARRIS M. Optimizing parallel reduction in CUDA, nvidia developer technology [EB/OL]. [2007]. http://developer.download. nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/d oc/reduction.pdf.
NVIDIA [EB/OL]. [2017]. http://www.geforce.com/hardware/ desktop-gpus/geforce-gtx-titan-x/specifications.
HARRIS M. 5 things you should know about the new maxwell GPU architecture [EB/OL]. [2014–02–21]. http://devblogs.nvidia. com/parallelforall/5-things-you-should-know-about-new-maxwell-gp u-architecture/
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Phuong, T.Y., Lee, DY. & Lee, JG. Impacts of optimization strategies on performance, power/energy consumption of a GPU based parallel reduction. J. Cent. South Univ. 24, 2624–2637 (2017). https://doi.org/10.1007/s11771-017-3676-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11771-017-3676-5