Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3387902.3392613acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Verified instruction-level energy consumption measurement for NVIDIA GPUs

Published: 23 May 2020 Publication History

Abstract

GPUs are prevalent in modern computing systems at all scales. They consume a significant fraction of the energy in these systems. However, vendors do not publish the actual cost of the power/energy overhead of their internal microarchitecture. In this paper, we accurately measure the energy consumption of various PTX instructions found in modern NVIDIA GPUs. We provide an exhaustive comparison of more than 40 instructions for four high-end NVIDIA GPUs from four different generations (Maxwell, Pascal, Volta, and Turing). Furthermore, we show the effect of the CUDA compiler optimizations on the energy consumption of each instruction. We use three different software techniques to read the GPU on-chip power sensors, which use NVIDIA's NVML API and provide an in-depth comparison between these techniques. Additionally, we verified the software measurement techniques against a custom-designed hardware power measurement. The results show that Volta GPUs have the best energy efficiency of all the other generations for the different categories of the instructions. This work should aid in understanding NVIDIA GPUs' microarchitecture. It should also make energy measurements of any GPU kernel both efficient and accurate.

References

[1]
A. Agelastos, B. Allan, J. Brandt, P. Cassella, J. Enos, J. Fullop, A. Gentile, S. Monk, N. Naksinehaboon, J. Ogden, M. Rajan, M. Showerman, J. Stevenson, N. Taerat, and T. Tucker. 2014. The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications. In SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 154--165.
[2]
Y. Arafa, A. A. Badawy, G. Chennupati, A. Barai, N. Santhi, and S. Eidenbenz. 2020. Fast, Accurate, and Scalable Memory Modeling of GPGPUs using Reuse Profiles. In Proceedings of the ACM International Conference on Supercomputing (ICS). 12 pages.
[3]
Y. Arafa, A. A. Badawy, G. Chennupati, N. Santhi, and S. Eidenbenz. 2018. PPT-GPU: Performance Prediction Toolkit for GPUs Identifying the Impact of Caches: Extended Abstract. In Proceedings of the International Symposium on Memory Systems (MEMSYS). 301--302.
[4]
Y. Arafa, A. A. Badawy, G. Chennupati, N. Santhi, and S. Eidenbenz. 2019. Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs. In High Performance Extreme Computing Conference (HPEC). 1--8.
[5]
Y. Arafa, A. A. Badawy, G. Chennupati, N. Santhi, and S. Eidenbenz. 2019. PPT-GPU: Scalable GPU Performance Modeling. IEEE Computer Architecture Letters, vol. 18, no. 1 (2019), 55--58.
[6]
Y. Arafa, G. Chennupati, A. Barai, A. A. Badawy, N. Santhi, and S. Eidenbenz. 2019. GPUs Cache Performance Estimation using Reuse Distance Analysis. In 2019 IEEE 38th International Performance Computing and Communications Conference (IPCCC). 1--8.
[7]
A. Arunkumar, E. Bolotin, D. Nellans, and C. Wu. 2019. Understanding the Future of Energy Efficiency in Multi-Module GPUs. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 519--532.
[8]
D. Bedard, M. Y. Lim, R. Fowler, and A. Porterfield. 2010. Power-Mon: Fine-grained and integrated power monitoring for commodity computer systems. In Proceedings of the IEEE SoutheastCon 2010 (SoutheastCon). 479--484.
[9]
R. A. Bridges, N. Imam, and T. M. Mintz. 2016. Understanding GPU Power: A Survey of Profiling, Modeling, and Simulation Methods. ACM Comput. Surv. 49, 3, Article 41 (2016), 27 pages.
[10]
M. Burtscher, I. Zecena, and Z. Zong. 2014. Measuring GPU Power with the K20 Built-in Sensor. In Proceedings of Workshop on General Purpose Processing Using GPUs (GPGPU). 28--36.
[11]
G. Chennupati, N. Santhi, and S. Eidenbenz. 2019. Scalable Performance Prediction of Codes with Memory Hierarchy and Pipelines. In Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM-PADS). 13--24.
[12]
G. Chennupati, N. Santhi, S. Eidenbenz, and S. Thulasidasan. 2017. An analytical memory hierarchy model for performance prediction. In 2017 Winter Simulation Conference (WSC). 908--919.
[13]
G. Chennupati, N. Santhi, S. Eidenbenz, R. J. Zerr, M. Rosa, R. J. Zamora, E. J. Park, B. T. Nadiga, J. Liu, K. Ahmed, and M. A. Obaida. 2017. Performance Prediction Toolkit, Version 00. https://www.osti.gov/servlets/purl/1401959
[14]
J. W. Choi, D. Bedard, R. Fowler, and R. Vuduc. 2013. A Roofline Model of Energy. In 27th International Symposium on Parallel and Distributed Processing (IPDPS). 661--672.
[15]
NVIDIA Corporation. 2018. Turing GPU Architecture Whitepaper. https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf
[16]
NVIDIA Corporation. 2019. CUDA Binary Utilities. https://docs.nvidia.com/cuda/cuda-binary-utilities
[17]
NVIDIA Corporation. 2019. CUDA Compiler Driver NVCC. https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc
[18]
NVIDIA Corporation. 2019. CUDA Runtime API. https://docs.nvidia.com/cuda/cuda-runtime-api
[19]
NVIDIA Corporation. 2019. NVIDIA Management Library (NVML). https://docs.nvidia.com/deploy/nvml-api
[20]
NVIDIA Corporation. 2019. Parallel Thread Execution ISA. https://docs.nvidia.com/cuda/parallel-thread-execution
[21]
NVIDIA Corporation. Jun. 2017. Volta Tesla V100 GPU Architecture. http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf
[22]
H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. 2010. RAPL: Memory power estimation and capping. In 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED). 189--194.
[23]
Niall Emmart. 2018. A Study of High Performance Multiple Precision Arithmetic on Graphics Processing Units. Ph.D. Dissertation. UMASS. https://scholarworks.umass.edu/dissertations_2/1164
[24]
L. B. Gomez, F. Cappello, L. Carro, N. DeBardeleben, B. Fang, S. Gurumurthi, K. Pattabiraman, P. Rech, and M. S. Reorda. 2014. GPGPUs: How to Combine High Computational Power with High Reliability. In Proceedings of the Conference on Design, Automation & Test in Europe (DATE). Article 341, 9 pages.
[25]
S. Gray. 2011. MaxAs: Assembler for NVIDIA Maxwell architecture. https://github.com/NervanaSystems/maxas
[26]
A. Haidar, H. Jagode, A. YarKhan, P. Vaccaro, S. Tomov, and J. Dongarra. 2017. Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi. In High Performance Extreme Computing Conference (HPEC). 1--7.
[27]
S. Hong and H. Kim. 2010. An Integrated GPU Power and Performance Model. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA). 280--289.
[28]
K. Kasichayanula, D. Terpstra, P. Luszczek, S. Tomov, S. Moore, and G. D. Peterson. 2012. Power Aware Computing on GPUs. In 2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC). 64--73.
[29]
J. H. Laros, P. Pokorny, and D. DeBonis. 2013. PowerInsight - A commodity power measurement capability. In 2013 International Green Computing Conference Proceedings (IGCC). 1--6.
[30]
J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi. 2013. GPUWattch: Enabling Energy Optimizations in GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA).
[31]
J. Lucas, S. Lal, M. Andersch, M. Alvarez-Mesa, and B. Juurlink. 2013. How a single chip causes massive power bills GPUSimPow: A GPGPU power simulator. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 97--106.
[32]
A. D. Malony, S. Biersdorff, S. Shende, H. Jagode, S. Tomov, G. Juckeland, R. Dietrich, D. Poole, and C. Lamb. 2011. Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs. In 2011 International Conference on Parallel Processing (ICPP). 176--185.
[33]
Linux Programmer's Manual. 2019. http://man7.org/linux/man-pages/man8/ld.so.8.html
[34]
H. McCraw, D. Terpstra, J. Dongarra, K. Davis, and R. Musselman. 2013. Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q. In International Supercomputing Conference (ISC). Springer, 213--225.
[35]
Performance Application Programming Interface (PAPI). 2019. Version 5.7. https://icl.utk.edu/papi
[36]
J. W. Romein and B. Veenboer. 2018. PowerSensor 2: A Fast Power Measurement Tool. In 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 111--113.
[37]
S. Sen, N. Imam, and C. Hsu. 2018. Quality Assessment of GPU Power Profiling Mechanisms. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 702--711.
[38]
TechPowerUp. Dec. 2018. NVIDIA TITAN RTX Specs. https://www.techpowerup.com/gpu-specs/titan-rtx.c3311
[39]
TechPowerUp. Mar. 2015. NVIDIA GeForce GTX TITAN X Specs. https://www.techpowerup.com/gpu-specs/geforce-gtx-titan-x.c2632
[40]
TechPowerUp. Mar. 2017. NVIDIA GeForce GTX 1080 Ti Specs. https://www.techpowerup.com/gpu-specs/geforce-gtx-1080.c2839
[41]
D. Terpstra, H. Jagode, H. You, and J. Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009. Springer, 157--173.
[42]
V. M. Weaver, M. Johnson, K. Kasichayanula, J. Ralph, P. Luszczek, D. Terpstra, and S. Moore. 2012. Measuring Energy and Power with PAPI. In Proceedings of the 41st International Conference on Parallel Processing Workshops (ICPPW). 262--268.
[43]
C. M. Wittenbrink, E. Kilgariff, and A. Prabhu. 2011. Fermi GF100 GPU Architecture. IEEE Micro 31, 2 (2011), 50--59.
[44]
X. Zhang, G. Tan, S. Xue, J. Li, K. Zhou, and M. Chen. 2017. Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). 31--43.
[45]
Q. Zhao, H. Yang, Z. Luan, and D. Qian. 2013. POIGEM: A Programming-Oriented Instruction Level GPU Energy Model for CUDA Program. In Proceedings of the 13th International Conference on Algorithms and Architectures for Parallel Processing. Springer, 129--142.

Cited By

View all
  • (2024)Marple: Scalable Spike Sorting for Untethered Brain-Machine InterfacingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640357(666-682)Online publication date: 27-Apr-2024
  • (2024)Model-Free GPU Online Energy OptimizationIEEE Transactions on Sustainable Computing10.1109/TSUSC.2023.33149169:2(141-154)Online publication date: Mar-2024
  • (2024)Guser: A GPGPU Power Stressmark Generator2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00087(1111-1124)Online publication date: 2-Mar-2024
  • Show More Cited By

Index Terms

  1. Verified instruction-level energy consumption measurement for NVIDIA GPUs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CF '20: Proceedings of the 17th ACM International Conference on Computing Frontiers
      May 2020
      298 pages
      ISBN:9781450379564
      DOI:10.1145/3387902
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 May 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. GPU power usage
      2. NVML
      3. PAPI
      4. PTX
      5. external power meters
      6. internal power sensors

      Qualifiers

      • Research-article

      Funding Sources

      • Triad National Security, LLC

      Conference

      CF '20
      Sponsor:
      CF '20: Computing Frontiers Conference
      May 11 - 13, 2020
      Sicily, Catania, Italy

      Acceptance Rates

      Overall Acceptance Rate 273 of 785 submissions, 35%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)201
      • Downloads (Last 6 weeks)33
      Reflects downloads up to 12 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Marple: Scalable Spike Sorting for Untethered Brain-Machine InterfacingProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640357(666-682)Online publication date: 27-Apr-2024
      • (2024)Model-Free GPU Online Energy OptimizationIEEE Transactions on Sustainable Computing10.1109/TSUSC.2023.33149169:2(141-154)Online publication date: Mar-2024
      • (2024)Guser: A GPGPU Power Stressmark Generator2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00087(1111-1124)Online publication date: 2-Mar-2024
      • (2024)Energy-Aware Tile Size Selection for Affine Programs on GPUs2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
      • (2024)Analyzing GPU Energy Consumption in Data Movement and Storage2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP61560.2024.00038(143-151)Online publication date: 24-Jul-2024
      • (2024)Power overwhelming: the one with the oscilloscopesJournal of Visualization10.1007/s12650-024-01001-0Online publication date: 10-Aug-2024
      • (2023)Energy-Aware Scheduling for High-Performance Computing Systems: A SurveyEnergies10.3390/en1602089016:2(890)Online publication date: 12-Jan-2023
      • (2023)MPU: Memory-centric SIMT Processor via In-DRAM Near-bank ComputingACM Transactions on Architecture and Code Optimization10.1145/360311320:3(1-26)Online publication date: 19-Jul-2023
      • (2023)Fast-SNN: Fast Spiking Neural Network by Converting Quantized ANNIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.327576945:12(14546-14562)Online publication date: Dec-2023
      • (2023)Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070943(842-854)Online publication date: Feb-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media