Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Power Modeling for GPU Architectures Using McPAT

Published: 23 June 2014 Publication History

Abstract

Graphics Processing Units (GPUs) are very popular for both graphics and general-purpose applications. Since GPUs operate many processing units and manage multiple levels of memory hierarchy, they consume a significant amount of power. Although several power models for CPUs are available, the power consumption of GPUs has not been studied much yet. In this article we develop a new power model for GPUs by utilizing McPAT, a CPU power tool. We generate initial power model data from McPAT with a detailed GPU configuration, and then adjust the models by comparing them with empirical data. We use the NVIDIA's Fermi architecture for building the power model, and our model estimates the GPU power consumption with an average error of 7.7% and 12.8% for the microbenchmarks and Merge benchmarks, respectively.

References

[1]
A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'09). 163--174.
[2]
F. Bellosa, S. Kellner, M. Waitz, and A. Weissel. 2003. Event-driven energy accounting for dynamic thermal management. In Proceedings of the Workshop on Compilers and Operating Systems for Low-Power.
[3]
W. Bircher and L. John. 2012. Complete system power estimation using processor performance events. IEEE Trans. Comput. 61, 4, 563--577.
[4]
D. Brooks, V. Tiwari, and M. Martonosi. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA'00). ACM Press, New York, 83--94.
[5]
J. Chen, B. Li, Y. Zhang, L. Peng, and J. Kwon Peir. 2011. Tree structured analysis on GPU power study. In Proceedings of the 29th IEEE International Conference on Computer Design (ICCD'11). 57--64.
[6]
J. W. Choi, D. Bedard, R. Fowler, and R. Vuduc. 2013. A roofline model of energy. In Proceedings of the 27th IEEE International Symposium on Parallel Distributed Processing (IPDPS'13). 661--672.
[7]
Extech. 2014. http://www.extech.com/instrument/products/310_399/380801.html.
[8]
A. Flores, J. Aragon, and M. Acacio. 2007. Sim-powercmp: A detailed simulator for energy consumption analysis in future embedded CMP architectures. In Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07), vol. 1. 752--757.
[9]
M. Gebhart, D. R. Johnson, D. Tarjan, S. W. Keckler, W. J. Dally, E. Lindholm, and K. Skadron. 2011. Energy-efficient mechanisms for managing thread context in throughput processors. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA'11). ACM Press, New York, 235--246.
[10]
N. Goswami, B. Cao, and T. Li. 2013. Power-performance co-optimization of throughput core architecture using resistive memory. In Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA'13). 342--353.
[11]
N. Goswami, A. Verma, and T. Li. 2012. Gpu-powersim. http://www.ideal.ece.ufl.edu/main.php?action=gpu-powersim.
[12]
S. Gurumurthi, A. Sivasubramaniam, M. J. Irwin, N. Vijaykrishnan, M. Kandemir, T. Li, and L. K. John. 2002. Using complete machine simulation for software power estimation: The softwatt approach. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA'02). IEEE Computer Society, 141.
[13]
S. Hong and H. Kim. 2010. An integrated GPU power and performance model. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10). ACM Press, New York, 280--289.
[14]
Hynix. 2006. 512M (16mx32) GDDR3 SDRAM hy5rs123235fp. http://www.hynix.com/datasheet/pdf/dram/HY5RS123235FP(Rev1.3).pdf.
[15]
C. Isci and M. Martonosi. 2003. Runtime power monitoring in high-end processors: Methodology and empirical data. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'03). IEEE Computer Society, 93.
[16]
H. Jacobson, A. Buyuktosunoglu, P. Bose, E. Acar, and R. Eickemeyer. 2011. Abstraction and microarchitecture scaling in early-stage power modeling. In Proceedings of the 17th International Symposium on High Performance Computer Architecture (HPCA'11). 394--405.
[17]
JEDEC. 2014. JEDEC standard GDDR5 SGRAM. http://www.jedec.org/sites/default/files/docs/JESD212.pdf.
[18]
R. Joseph and M. Martonosi. 2001. Run-time power estimation in high performance microprocessors. In Proceedings of the International Symposium on Low Power Electronics and Design (ISPLED'01). 135--140.
[19]
A. Kahng, B. Li, L.-S. Peh, and K. Samadi. 2009. Orion 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'09). 423--428.
[20]
S. Kanev, G.-Y. Wei, and D. Brooks. 2012. Xiosim: Power-performance modeling of mobile x86 cores. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'12). ACM Press, New York, 267--272.
[21]
J. Leng, T. Hetherington, A. Eltantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi. 2013. GPUWattch: Enabling energy optimizations in GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA'13). 487--498.
[22]
A. Leon, J. Shin, K. Tam, W. Bryg, F. Schumacher, P. Kongetira, D. Weisner, and A. Strong. 2006. A power-efficient high-throughput 32-thread sparc processor. In Proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC'06). 295--304.
[23]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture.
[24]
M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. 2008. Merge: A programming model for heterogeneous multi-core systems. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'08). ACM Press, New York.
[25]
E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. 2008. Nvidia Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2, 39--55.
[26]
G. Loh, S. Subramaniam, and Y. Xie. 2009. Zesto: A cycle-level simulator for highly detailed microarchitecture exploration. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'09). 53--64.
[27]
X. Ma, M. Dong, L. Zhong, and Z. Deng. 2009. Statistical power consumption analysis and modeling for GPU-based computing. In Proceedings of the ACM SOSP Workshop Power Aware Computing and Systems (HotPower'09).
[28]
MacSim Simulator. 2012. http://code.google.com/p/macsim/.
[29]
S. Mathew, M. Anders, B. Bloechel, T. Nguyen, R. Krishnamurthy, and S. Borkar. 2005. A 4-GHz 300-mW 64-bit integer execution ALU with dual supply voltages in 90-nm CMOS. IEEE J. Solid-State Circ. 40, 1, 44--51.
[30]
N. Muralimanohart, R. Balasubramonian, and N. Jouppi. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'07). 3--14.
[31]
H. Nagasaka, N. Maruyama, A. Nukada, T. Endo, and S. Matsuoka. 2010. Statistical power modeling of GPU kernels using performance counters. In Proceedings of the International Green Computing Conference. 115--122.
[32]
NVIDIA. 2009. Fermi: Nvidia's next generation CUDA compute architecture. White paper. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf.
[33]
NVIDIA. 2014a. Geforce GTX280 specification. http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-280.
[34]
NVIDIA. 2014b. Geforce GTX580 specification. http://www.geforce.com/hardware/desktop-gpus/geforce- gtx-580/specifications.
[35]
NVIDIA. 2014c. Nvidia GF100. http://www.hardwarebg.com/b4k/files/nvidiagf100whitepaper.pdf.
[36]
J. Peddersen and S. Parameswaran. 2007. Clipper: Counter-based low impact processor power estimation at runtime. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'07). 890--895.
[37]
J. Pool, A. Lastra, and M. Singh. 2010. An energy model for graphics processing units. In Proceedings of the IEEE International Conference on Computer Design (ICCD'10). 409--416.
[38]
M. Powell, A. Biswas, J. Emer, S. Mukherjee, B. Sheikh, and S. Yardi. 2009. Camp: A technique to estimate per-structure power at run-time using a few simple parameters. In Proceedings of the 15th IEEE International Symposium on High Performance Computer Architecture (HPCA'09). 289--300.
[39]
K. Ramani, A. Ibrahim, and D. Shimizu. 2011. Powerred: A flexible power modeling framework for power efficiency exploration in GPUs. In Proceedings of the Workshop on General Purpose Processing on Graphics Processing Units (GPGPU'11).
[40]
P. Rosenfeld, E. Cooper-Balis, and B. Jacob. 2011. DRAMSim2: A cycle accurate memory system simulator. Comput. Archit. Lett. 10, 1, 16--19.
[41]
W. Song, S. Yalamanchili, S. Mukhopadhyay, and A. Rodrigues. 2012. Energy Introspector User Manual. Georgia Tech Research Corporation.
[42]
A. Stepin and Y. Lyssenko. 2014. Natural born winner: Nvidia Geforce GTX580 review. page 5. http://www.xbitlabs.com/articles/graphics/display/geforce-gtx-580_5.html.
[43]
G. Wang. 2010. Power analysis and optimizations for GPU architecture using a power simulator. In Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE'10), vol. 1. V1--619--V1--623.
[44]
W. Wu, L. Jin, J. Yang, P. Liu, and S.-D. Tan. 2006. A systematic method for functional unit power estimation in microprocessors. In Proceedings of the 43rd ACM/IEEE Design Automation Conference (DAC'06). 554--557.
[45]
Y. Zhang, Y. Hu, B. Li, and L. Peng. 2011. Performance and power analysis of ATI GPU: A statistical approach. In Proceedings of the 6th IEEE International Conference on Networking, Architecture and Storage (NAS'11). 149--158.
[46]
Y. Zhang, D. Parikh, K. Sankaranarayanan, K. Skadron, and M. Stan. 2003. Hotleakage: A temperature aware model of subthreshold and gate leakage for architects. Tech. rep. University of Virginia, VA.

Cited By

View all
  • (2024)TAO: Re-Thinking DL-based Microarchitecture SimulationACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365508552:1(23-24)Online publication date: 13-Jun-2024
  • (2024)TAO: Re-Thinking DL-based Microarchitecture SimulationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560128:2(1-25)Online publication date: 29-May-2024
  • (2024)TAO: Re-Thinking DL-based Microarchitecture SimulationAbstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3652963.3655085(23-24)Online publication date: 10-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems
ACM Transactions on Design Automation of Electronic Systems  Volume 19, Issue 3
June 2014
257 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/2634048
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 23 June 2014
Accepted: 01 March 2014
Revised: 01 March 2014
Received: 01 January 2013
Published in TODAES Volume 19, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Design space exploration
  2. Fermi architecture
  3. simulation
  4. validation

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)81
  • Downloads (Last 6 weeks)8
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)TAO: Re-Thinking DL-based Microarchitecture SimulationACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365508552:1(23-24)Online publication date: 13-Jun-2024
  • (2024)TAO: Re-Thinking DL-based Microarchitecture SimulationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560128:2(1-25)Online publication date: 29-May-2024
  • (2024)TAO: Re-Thinking DL-based Microarchitecture SimulationAbstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3652963.3655085(23-24)Online publication date: 10-Jun-2024
  • (2024)STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPU2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00031(309-323)Online publication date: 2-Nov-2024
  • (2024)Power overwhelming: the one with the oscilloscopesJournal of Visualization10.1007/s12650-024-01001-027:6(1171-1193)Online publication date: 1-Dec-2024
  • (2023)Program Analysis and Machine Learning–based Approach to Predict Power Consumption of CUDA KernelACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/36035338:4(1-24)Online publication date: 24-Jul-2023
  • (2023)Build Energy-Efficient GPU Computing Environment for Machine Learning Algorithms with Register File Packing Technique2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363476(1-7)Online publication date: 25-Sep-2023
  • (2023)Post0-VR: Enabling Universal Realistic Rendering for Modern VR via Exploiting Architectural Similarity and Data Sharing2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071097(390-402)Online publication date: Feb-2023
  • (2023)An automated and portable method for selecting an optimal GPU frequencyFuture Generation Computer Systems10.1016/j.future.2023.07.011Online publication date: Jul-2023
  • (2022)Triangle Dropping: An Occluded-geometry Predictor for Energy-efficient Mobile GPUsACM Transactions on Architecture and Code Optimization10.1145/352786119:3(1-20)Online publication date: 25-May-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media