research-article

Verified instruction-level energy consumption measurement for NVIDIA GPUs

Authors:

Abdelrahman ElKanishy,

Ayatelrahman Elsayed,

Abdel-Hameed Badawy,

Gopinath Chennupati,

Stephan Eidenbenz,

Nandakishore SanthiAuthors Info & Claims

CF '20: Proceedings of the 17th ACM International Conference on Computing Frontiers

Pages 60 - 70

https://doi.org/10.1145/3387902.3392613

Published: 23 May 2020 Publication History

Abstract

GPUs are prevalent in modern computing systems at all scales. They consume a significant fraction of the energy in these systems. However, vendors do not publish the actual cost of the power/energy overhead of their internal microarchitecture. In this paper, we accurately measure the energy consumption of various PTX instructions found in modern NVIDIA GPUs. We provide an exhaustive comparison of more than 40 instructions for four high-end NVIDIA GPUs from four different generations (Maxwell, Pascal, Volta, and Turing). Furthermore, we show the effect of the CUDA compiler optimizations on the energy consumption of each instruction. We use three different software techniques to read the GPU on-chip power sensors, which use NVIDIA's NVML API and provide an in-depth comparison between these techniques. Additionally, we verified the software measurement techniques against a custom-designed hardware power measurement. The results show that Volta GPUs have the best energy efficiency of all the other generations for the different categories of the instructions. This work should aid in understanding NVIDIA GPUs' microarchitecture. It should also make energy measurements of any GPU kernel both efficient and accurate.

References

[1]

A. Agelastos, B. Allan, J. Brandt, P. Cassella, J. Enos, J. Fullop, A. Gentile, S. Monk, N. Naksinehaboon, J. Ogden, M. Rajan, M. Showerman, J. Stevenson, N. Taerat, and T. Tucker. 2014. The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications. In SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 154--165.

Digital Library

[2]

Y. Arafa, A. A. Badawy, G. Chennupati, A. Barai, N. Santhi, and S. Eidenbenz. 2020. Fast, Accurate, and Scalable Memory Modeling of GPGPUs using Reuse Profiles. In Proceedings of the ACM International Conference on Supercomputing (ICS). 12 pages.

Digital Library

[3]

Y. Arafa, A. A. Badawy, G. Chennupati, N. Santhi, and S. Eidenbenz. 2018. PPT-GPU: Performance Prediction Toolkit for GPUs Identifying the Impact of Caches: Extended Abstract. In Proceedings of the International Symposium on Memory Systems (MEMSYS). 301--302.

Digital Library

[4]

Y. Arafa, A. A. Badawy, G. Chennupati, N. Santhi, and S. Eidenbenz. 2019. Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs. In High Performance Extreme Computing Conference (HPEC). 1--8.

[5]

Y. Arafa, A. A. Badawy, G. Chennupati, N. Santhi, and S. Eidenbenz. 2019. PPT-GPU: Scalable GPU Performance Modeling. IEEE Computer Architecture Letters, vol. 18, no. 1 (2019), 55--58.

[6]

Y. Arafa, G. Chennupati, A. Barai, A. A. Badawy, N. Santhi, and S. Eidenbenz. 2019. GPUs Cache Performance Estimation using Reuse Distance Analysis. In 2019 IEEE 38th International Performance Computing and Communications Conference (IPCCC). 1--8.

[7]

A. Arunkumar, E. Bolotin, D. Nellans, and C. Wu. 2019. Understanding the Future of Energy Efficiency in Multi-Module GPUs. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 519--532.

[8]

D. Bedard, M. Y. Lim, R. Fowler, and A. Porterfield. 2010. Power-Mon: Fine-grained and integrated power monitoring for commodity computer systems. In Proceedings of the IEEE SoutheastCon 2010 (SoutheastCon). 479--484.

[9]

R. A. Bridges, N. Imam, and T. M. Mintz. 2016. Understanding GPU Power: A Survey of Profiling, Modeling, and Simulation Methods. ACM Comput. Surv. 49, 3, Article 41 (2016), 27 pages.

Digital Library

[10]

M. Burtscher, I. Zecena, and Z. Zong. 2014. Measuring GPU Power with the K20 Built-in Sensor. In Proceedings of Workshop on General Purpose Processing Using GPUs (GPGPU). 28--36.

[11]

G. Chennupati, N. Santhi, and S. Eidenbenz. 2019. Scalable Performance Prediction of Codes with Memory Hierarchy and Pipelines. In Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM-PADS). 13--24.

Digital Library

[12]

G. Chennupati, N. Santhi, S. Eidenbenz, and S. Thulasidasan. 2017. An analytical memory hierarchy model for performance prediction. In 2017 Winter Simulation Conference (WSC). 908--919.

[13]

G. Chennupati, N. Santhi, S. Eidenbenz, R. J. Zerr, M. Rosa, R. J. Zamora, E. J. Park, B. T. Nadiga, J. Liu, K. Ahmed, and M. A. Obaida. 2017. Performance Prediction Toolkit, Version 00. https://www.osti.gov/servlets/purl/1401959

[14]

J. W. Choi, D. Bedard, R. Fowler, and R. Vuduc. 2013. A Roofline Model of Energy. In 27th International Symposium on Parallel and Distributed Processing (IPDPS). 661--672.

Digital Library

[15]

NVIDIA Corporation. 2018. Turing GPU Architecture Whitepaper. https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

[16]

NVIDIA Corporation. 2019. CUDA Binary Utilities. https://docs.nvidia.com/cuda/cuda-binary-utilities

[17]

NVIDIA Corporation. 2019. CUDA Compiler Driver NVCC. https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc

[18]

NVIDIA Corporation. 2019. CUDA Runtime API. https://docs.nvidia.com/cuda/cuda-runtime-api

[19]

NVIDIA Corporation. 2019. NVIDIA Management Library (NVML). https://docs.nvidia.com/deploy/nvml-api

[20]

NVIDIA Corporation. 2019. Parallel Thread Execution ISA. https://docs.nvidia.com/cuda/parallel-thread-execution

[21]

NVIDIA Corporation. Jun. 2017. Volta Tesla V100 GPU Architecture. http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf

[22]

H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. 2010. RAPL: Memory power estimation and capping. In 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED). 189--194.

Digital Library

[23]

Niall Emmart. 2018. A Study of High Performance Multiple Precision Arithmetic on Graphics Processing Units. Ph.D. Dissertation. UMASS. https://scholarworks.umass.edu/dissertations_2/1164

[24]

L. B. Gomez, F. Cappello, L. Carro, N. DeBardeleben, B. Fang, S. Gurumurthi, K. Pattabiraman, P. Rech, and M. S. Reorda. 2014. GPGPUs: How to Combine High Computational Power with High Reliability. In Proceedings of the Conference on Design, Automation & Test in Europe (DATE). Article 341, 9 pages.

[25]

S. Gray. 2011. MaxAs: Assembler for NVIDIA Maxwell architecture. https://github.com/NervanaSystems/maxas

[26]

A. Haidar, H. Jagode, A. YarKhan, P. Vaccaro, S. Tomov, and J. Dongarra. 2017. Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi. In High Performance Extreme Computing Conference (HPEC). 1--7.

[27]

S. Hong and H. Kim. 2010. An Integrated GPU Power and Performance Model. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA). 280--289.

Digital Library

[28]

K. Kasichayanula, D. Terpstra, P. Luszczek, S. Tomov, S. Moore, and G. D. Peterson. 2012. Power Aware Computing on GPUs. In 2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC). 64--73.

Digital Library

[29]

J. H. Laros, P. Pokorny, and D. DeBonis. 2013. PowerInsight - A commodity power measurement capability. In 2013 International Green Computing Conference Proceedings (IGCC). 1--6.

[30]

J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi. 2013. GPUWattch: Enabling Energy Optimizations in GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA).

Digital Library

[31]

J. Lucas, S. Lal, M. Andersch, M. Alvarez-Mesa, and B. Juurlink. 2013. How a single chip causes massive power bills GPUSimPow: A GPGPU power simulator. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 97--106.

[32]

A. D. Malony, S. Biersdorff, S. Shende, H. Jagode, S. Tomov, G. Juckeland, R. Dietrich, D. Poole, and C. Lamb. 2011. Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs. In 2011 International Conference on Parallel Processing (ICPP). 176--185.

Digital Library

[33]

Linux Programmer's Manual. 2019. http://man7.org/linux/man-pages/man8/ld.so.8.html

[34]

H. McCraw, D. Terpstra, J. Dongarra, K. Davis, and R. Musselman. 2013. Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q. In International Supercomputing Conference (ISC). Springer, 213--225.

[35]

Performance Application Programming Interface (PAPI). 2019. Version 5.7. https://icl.utk.edu/papi

[36]

J. W. Romein and B. Veenboer. 2018. PowerSensor 2: A Fast Power Measurement Tool. In 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 111--113.

[37]

S. Sen, N. Imam, and C. Hsu. 2018. Quality Assessment of GPU Power Profiling Mechanisms. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 702--711.

[38]

TechPowerUp. Dec. 2018. NVIDIA TITAN RTX Specs. https://www.techpowerup.com/gpu-specs/titan-rtx.c3311

[39]

TechPowerUp. Mar. 2015. NVIDIA GeForce GTX TITAN X Specs. https://www.techpowerup.com/gpu-specs/geforce-gtx-titan-x.c2632

[40]

TechPowerUp. Mar. 2017. NVIDIA GeForce GTX 1080 Ti Specs. https://www.techpowerup.com/gpu-specs/geforce-gtx-1080.c2839

[41]

D. Terpstra, H. Jagode, H. You, and J. Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009. Springer, 157--173.

[42]

V. M. Weaver, M. Johnson, K. Kasichayanula, J. Ralph, P. Luszczek, D. Terpstra, and S. Moore. 2012. Measuring Energy and Power with PAPI. In Proceedings of the 41st International Conference on Parallel Processing Workshops (ICPPW). 262--268.

Digital Library

[43]

C. M. Wittenbrink, E. Kilgariff, and A. Prabhu. 2011. Fermi GF100 GPU Architecture. IEEE Micro 31, 2 (2011), 50--59.

Digital Library

[44]

X. Zhang, G. Tan, S. Xue, J. Li, K. Zhou, and M. Chen. 2017. Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). 31--43.

Digital Library

[45]

Q. Zhao, H. Yang, Z. Luan, and D. Qian. 2013. POIGEM: A Programming-Oriented Instruction Level GPU Energy Model for CUDA Program. In Proceedings of the 13th International Conference on Algorithms and Architectures for Parallel Processing. Springer, 129--142.

Digital Library

Cited By

Sha ELiu AIbrahim KMahmoud MGiannoula CAbdelhadi AMoshovos ATsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Marple: Scalable Spike Sorting for Untethered Brain-Machine InterfacingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640357(666-682)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640357
Wang FHao MZhang WWang Z(2024)Model-Free GPU Online Energy OptimizationIEEE Transactions on Sustainable Computing10.1109/TSUSC.2023.33149169:2(141-154)Online publication date: Mar-2024
https://doi.org/10.1109/TSUSC.2023.3314916
Shan YYang YQian XYu Z(2024)Guser: A GPGPU Power Stressmark Generator2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00087(1111-1124)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00087
Show More Cited By

Index Terms

Verified instruction-level energy consumption measurement for NVIDIA GPUs
1. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
2. Hardware
  1. Power and energy

Recommendations

Architecture comparisons between Nvidia and ATI GPUs: Computation parallelism and data communications
IISWC '11: Proceedings of the 2011 IEEE International Symposium on Workload Characterization

In recent years, modern graphics processing units have been widely adopted in high performance computing areas to solve large scale computation problems. The leading GPU manufacturers Nvidia and ATI have introduced series of products to the market. ...
Caracal: dynamic translation of runtime environments for GPUs
GPGPU-4: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units

Graphics Processing Units (GPU) have become the platform of choice for accelerating a large range of data parallel and task parallel applications. Both AMD and NVIDIA have developed GPU implementations targeted at the high performance computing market. ...
Power Aware Computing on GPUs
SAAHPC '12: Proceedings of the 2012 Symposium on Application Accelerators in High Performance Computing

Energy and power density concerns in modern processors have led to significant computer architecture research efforts in power-aware and temperature-aware computing. With power dissipation becoming an increasingly vexing problem, power analysis of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '20: Proceedings of the 17th ACM International Conference on Computing Frontiers

May 2020

298 pages

ISBN:9781450379564

DOI:10.1145/3387902

General Chairs:
Maurizio Palesi
University of Catania, IT
,
Gianluca Palermo
Politecnico di Milano, IT
,
Program Chairs:
Cat Graves
Hewlett Packard Labs
,
Eishi Arima
ITC University of Tokyo, JP

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 May 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

Triad National Security, LLC

Conference

CF '20

Sponsor:

SIGMICRO

CF '20: Computing Frontiers Conference

May 11 - 13, 2020

Sicily, Catania, Italy

Acceptance Rates

Overall Acceptance Rate 273 of 785 submissions, 35%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
606
Total Downloads

Downloads (Last 12 months)201
Downloads (Last 6 weeks)33

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sha ELiu AIbrahim KMahmoud MGiannoula CAbdelhadi AMoshovos ATsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)Marple: Scalable Spike Sorting for Untethered Brain-Machine InterfacingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640357(666-682)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640357
Wang FHao MZhang WWang Z(2024)Model-Free GPU Online Energy OptimizationIEEE Transactions on Sustainable Computing10.1109/TSUSC.2023.33149169:2(141-154)Online publication date: Mar-2024
https://doi.org/10.1109/TSUSC.2023.3314916
Shan YYang YQian XYu Z(2024)Guser: A GPGPU Power Stressmark Generator2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00087(1111-1124)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00087
Jayaweera MKong MWang YKaeli D(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUs2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://doi.org/10.1109/CGO57630.2024.10444795
Delestrac PMiquel JBhattacharjee DMoolchandani DCatthoor FTorres LNovo D(2024)Analyzing GPU Energy Consumption in Data Movement and Storage2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP61560.2024.00038(143-151)Online publication date: 24-Jul-2024
https://doi.org/10.1109/ASAP61560.2024.00038
Gralka PMüller CHeinemann MReina GWeiskopf DErtl T(2024)Power overwhelming: the one with the oscilloscopesJournal of Visualization10.1007/s12650-024-01001-0Online publication date: 10-Aug-2024
https://doi.org/10.1007/s12650-024-01001-0
Kocot BCzarnul PProficz J(2023)Energy-Aware Scheduling for High-Performance Computing Systems: A SurveyEnergies10.3390/en1602089016:2(890)Online publication date: 12-Jan-2023
https://doi.org/10.3390/en16020890
Xie XGu PDing YNiu DZheng HXie Y(2023)MPU: Memory-centric SIMT Processor via In-DRAM Near-bank ComputingACM Transactions on Architecture and Code Optimization10.1145/360311320:3(1-26)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3603113
Hu YZheng QJiang XPan G(2023)Fast-SNN: Fast Spiking Neural Network by Converting Quantized ANNIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.327576945:12(14546-14562)Online publication date: Dec-2023
https://doi.org/10.1109/TPAMI.2023.3275769
Yu JKim JSeo E(2023)Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070943(842-854)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10070943
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents