research-article

Power Modeling for GPU Architectures Using McPAT

Authors:

Nagesh B. Lakshminarayana,

Sudhakar Yalamanchili,

Wonyong SungAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 19, Issue 3

Article No.: 26, Pages 1 - 24

https://doi.org/10.1145/2611758

Published: 23 June 2014 Publication History

Abstract

Graphics Processing Units (GPUs) are very popular for both graphics and general-purpose applications. Since GPUs operate many processing units and manage multiple levels of memory hierarchy, they consume a significant amount of power. Although several power models for CPUs are available, the power consumption of GPUs has not been studied much yet. In this article we develop a new power model for GPUs by utilizing McPAT, a CPU power tool. We generate initial power model data from McPAT with a detailed GPU configuration, and then adjust the models by comparing them with empirical data. We use the NVIDIA's Fermi architecture for building the power model, and our model estimates the GPU power consumption with an average error of 7.7% and 12.8% for the microbenchmarks and Merge benchmarks, respectively.

References

[1]

A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'09). 163--174.

[2]

F. Bellosa, S. Kellner, M. Waitz, and A. Weissel. 2003. Event-driven energy accounting for dynamic thermal management. In Proceedings of the Workshop on Compilers and Operating Systems for Low-Power.

[3]

W. Bircher and L. John. 2012. Complete system power estimation using processor performance events. IEEE Trans. Comput. 61, 4, 563--577.

Digital Library

[4]

D. Brooks, V. Tiwari, and M. Martonosi. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27^th Annual International Symposium on Computer Architecture (ISCA'00). ACM Press, New York, 83--94.

Digital Library

[5]

J. Chen, B. Li, Y. Zhang, L. Peng, and J. Kwon Peir. 2011. Tree structured analysis on GPU power study. In Proceedings of the 29^th IEEE International Conference on Computer Design (ICCD'11). 57--64.

Digital Library

[6]

J. W. Choi, D. Bedard, R. Fowler, and R. Vuduc. 2013. A roofline model of energy. In Proceedings of the 27^th IEEE International Symposium on Parallel Distributed Processing (IPDPS'13). 661--672.

Digital Library

[7]

Extech. 2014. http://www.extech.com/instrument/products/310_399/380801.html.

[8]

A. Flores, J. Aragon, and M. Acacio. 2007. Sim-powercmp: A detailed simulator for energy consumption analysis in future embedded CMP architectures. In Proceedings of the 21^st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07), vol. 1. 752--757.

Digital Library

[9]

M. Gebhart, D. R. Johnson, D. Tarjan, S. W. Keckler, W. J. Dally, E. Lindholm, and K. Skadron. 2011. Energy-efficient mechanisms for managing thread context in throughput processors. In Proceedings of the 38^th Annual International Symposium on Computer Architecture (ISCA'11). ACM Press, New York, 235--246.

Digital Library

[10]

N. Goswami, B. Cao, and T. Li. 2013. Power-performance co-optimization of throughput core architecture using resistive memory. In Proceedings of the 19^th IEEE International Symposium on High Performance Computer Architecture (HPCA'13). 342--353.

Digital Library

[11]

N. Goswami, A. Verma, and T. Li. 2012. Gpu-powersim. http://www.ideal.ece.ufl.edu/main.php&quest;action=gpu-powersim.

[12]

S. Gurumurthi, A. Sivasubramaniam, M. J. Irwin, N. Vijaykrishnan, M. Kandemir, T. Li, and L. K. John. 2002. Using complete machine simulation for software power estimation: The softwatt approach. In Proceedings of the 8^th International Symposium on High-Performance Computer Architecture (HPCA'02). IEEE Computer Society, 141.

Digital Library

[13]

S. Hong and H. Kim. 2010. An integrated GPU power and performance model. In Proceedings of the 37^th Annual International Symposium on Computer Architecture (ISCA'10). ACM Press, New York, 280--289.

Digital Library

[14]

Hynix. 2006. 512M (16mx32) GDDR3 SDRAM hy5rs123235fp. http://www.hynix.com/datasheet/pdf/dram/HY5RS123235FP(Rev1.3).pdf.

[15]

C. Isci and M. Martonosi. 2003. Runtime power monitoring in high-end processors: Methodology and empirical data. In Proceedings of the 36^th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'03). IEEE Computer Society, 93.

Digital Library

[16]

H. Jacobson, A. Buyuktosunoglu, P. Bose, E. Acar, and R. Eickemeyer. 2011. Abstraction and microarchitecture scaling in early-stage power modeling. In Proceedings of the 17^th International Symposium on High Performance Computer Architecture (HPCA'11). 394--405.

Digital Library

[17]

JEDEC. 2014. JEDEC standard GDDR5 SGRAM. http://www.jedec.org/sites/default/files/docs/JESD212.pdf.

[18]

R. Joseph and M. Martonosi. 2001. Run-time power estimation in high performance microprocessors. In Proceedings of the International Symposium on Low Power Electronics and Design (ISPLED'01). 135--140.

Digital Library

[19]

A. Kahng, B. Li, L.-S. Peh, and K. Samadi. 2009. Orion 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'09). 423--428.

Digital Library

[20]

S. Kanev, G.-Y. Wei, and D. Brooks. 2012. Xiosim: Power-performance modeling of mobile x86 cores. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'12). ACM Press, New York, 267--272.

Digital Library

[21]

J. Leng, T. Hetherington, A. Eltantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi. 2013. GPUWattch: Enabling energy optimizations in GPGPUs. In Proceedings of the 40^th Annual International Symposium on Computer Architecture (ISCA'13). 487--498.

Digital Library

[22]

A. Leon, J. Shin, K. Tam, W. Bryg, F. Schumacher, P. Kongetira, D. Weisner, and A. Strong. 2006. A power-efficient high-throughput 32-thread sparc processor. In Proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC'06). 295--304.

[23]

S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42^nd Annual IEEE/ACM International Symposium on Microarchitecture.

Digital Library

[24]

M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. 2008. Merge: A programming model for heterogeneous multi-core systems. In Proceedings of the 13^th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'08). ACM Press, New York.

Digital Library

[25]

E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. 2008. Nvidia Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2, 39--55.

Digital Library

[26]

G. Loh, S. Subramaniam, and Y. Xie. 2009. Zesto: A cycle-level simulator for highly detailed microarchitecture exploration. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'09). 53--64.

[27]

X. Ma, M. Dong, L. Zhong, and Z. Deng. 2009. Statistical power consumption analysis and modeling for GPU-based computing. In Proceedings of the ACM SOSP Workshop Power Aware Computing and Systems (HotPower'09).

[28]

MacSim Simulator. 2012. http://code.google.com/p/macsim/.

[29]

S. Mathew, M. Anders, B. Bloechel, T. Nguyen, R. Krishnamurthy, and S. Borkar. 2005. A 4-GHz 300-mW 64-bit integer execution ALU with dual supply voltages in 90-nm CMOS. IEEE J. Solid-State Circ. 40, 1, 44--51.

[30]

N. Muralimanohart, R. Balasubramonian, and N. Jouppi. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40^th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'07). 3--14.

Digital Library

[31]

H. Nagasaka, N. Maruyama, A. Nukada, T. Endo, and S. Matsuoka. 2010. Statistical power modeling of GPU kernels using performance counters. In Proceedings of the International Green Computing Conference. 115--122.

Digital Library

[32]

NVIDIA. 2009. Fermi: Nvidia's next generation CUDA compute architecture. White paper. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf.

[33]

NVIDIA. 2014a. Geforce GTX280 specification. http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-280.

[34]

NVIDIA. 2014b. Geforce GTX580 specification. http://www.geforce.com/hardware/desktop-gpus/geforce- gtx-580/specifications.

[35]

NVIDIA. 2014c. Nvidia GF100. http://www.hardwarebg.com/b4k/files/nvidiagf100whitepaper.pdf.

[36]

J. Peddersen and S. Parameswaran. 2007. Clipper: Counter-based low impact processor power estimation at runtime. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'07). 890--895.

Digital Library

[37]

J. Pool, A. Lastra, and M. Singh. 2010. An energy model for graphics processing units. In Proceedings of the IEEE International Conference on Computer Design (ICCD'10). 409--416.

[38]

M. Powell, A. Biswas, J. Emer, S. Mukherjee, B. Sheikh, and S. Yardi. 2009. Camp: A technique to estimate per-structure power at run-time using a few simple parameters. In Proceedings of the 15^th IEEE International Symposium on High Performance Computer Architecture (HPCA'09). 289--300.

[39]

K. Ramani, A. Ibrahim, and D. Shimizu. 2011. Powerred: A flexible power modeling framework for power efficiency exploration in GPUs. In Proceedings of the Workshop on General Purpose Processing on Graphics Processing Units (GPGPU'11).

[40]

P. Rosenfeld, E. Cooper-Balis, and B. Jacob. 2011. DRAMSim2: A cycle accurate memory system simulator. Comput. Archit. Lett. 10, 1, 16--19.

Digital Library

[41]

W. Song, S. Yalamanchili, S. Mukhopadhyay, and A. Rodrigues. 2012. Energy Introspector User Manual. Georgia Tech Research Corporation.

[42]

A. Stepin and Y. Lyssenko. 2014. Natural born winner: Nvidia Geforce GTX580 review. page 5. http://www.xbitlabs.com/articles/graphics/display/geforce-gtx-580_5.html.

[43]

G. Wang. 2010. Power analysis and optimizations for GPU architecture using a power simulator. In Proceedings of the 3^rd International Conference on Advanced Computer Theory and Engineering (ICACTE'10), vol. 1. V1--619--V1--623.

[44]

W. Wu, L. Jin, J. Yang, P. Liu, and S.-D. Tan. 2006. A systematic method for functional unit power estimation in microprocessors. In Proceedings of the 43^rd ACM/IEEE Design Automation Conference (DAC'06). 554--557.

Digital Library

[45]

Y. Zhang, Y. Hu, B. Li, and L. Peng. 2011. Performance and power analysis of ATI GPU: A statistical approach. In Proceedings of the 6^th IEEE International Conference on Networking, Architecture and Storage (NAS'11). 149--158.

Digital Library

[46]

Y. Zhang, D. Parikh, K. Sankaranarayanan, K. Skadron, and M. Stan. 2003. Hotleakage: A temperature aware model of subthreshold and gate leakage for architects. Tech. rep. University of Virginia, VA.

Cited By

Pandey SYazdanbakhsh ALiu H(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365508552:1(23-24)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3673660.3655085
Pandey SYazdanbakhsh ALiu H(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560128:2(1-25)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3656012
Pandey SYazdanbakhsh ALiu HGaretto MMarin ACiucu FFanti GRighter R(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationAbstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3652963.3655085(23-24)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3652963.3655085
Show More Cited By

Index Terms

Power Modeling for GPU Architectures Using McPAT

Recommendations

An integrated GPU power and performance model
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is ...
Communication and computation optimization of concurrent kernels using kernel coalesce on a GPU

General purpose computation on graphics processing unit GPU is rapidly entering into various scientific and engineering fields. Many applications are being ported onto GPUs for better performance. Various optimizations, frameworks, and tools are being ...
Evaluation of GPU Architectures Using Spiking Neural Networks
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

During recent years General-Purpose Graphical Processing Units (GP-GPUs) have entered the field of High-Performance Computing (HPC) as one of the primary architectural focuses for many research groups working with complex scientific applications. Nvidia'...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 19, Issue 3

June 2014

257 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/2634048

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 23 June 2014

Accepted: 01 March 2014

Revised: 01 March 2014

Received: 01 January 2013

Published in TODAES Volume 19, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

58
Total Citations
View Citations
1,134
Total Downloads

Downloads (Last 12 months)81
Downloads (Last 6 weeks)8

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pandey SYazdanbakhsh ALiu H(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365508552:1(23-24)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3673660.3655085
Pandey SYazdanbakhsh ALiu H(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560128:2(1-25)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3656012
Pandey SYazdanbakhsh ALiu HGaretto MMarin ACiucu FFanti GRighter R(2024)TAO: Re-Thinking DL-based Microarchitecture SimulationAbstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3652963.3655085(23-24)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3652963.3655085
Li BWang YWang TEeckhout LYang JJaleel ATang X(2024)STAR: Sub-Entry Sharing-Aware TLB for Multi-Instance GPU2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00031(309-323)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00031
Gralka PMüller CHeinemann MReina GWeiskopf DErtl T(2024)Power overwhelming: the one with the oscilloscopesJournal of Visualization10.1007/s12650-024-01001-027:6(1171-1193)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s12650-024-01001-0
Alavani GDesai JSaha SSarkar S(2023)Program Analysis and Machine Learning–based Approach to Predict Power Consumption of CUDA KernelACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/36035338:4(1-24)Online publication date: 24-Jul-2023
https://dl.acm.org/doi/10.1145/3603533
Wang XZhang W(2023)Build Energy-Efficient GPU Computing Environment for Machine Learning Algorithms with Register File Packing Technique2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363476(1-7)Online publication date: 25-Sep-2023
https://doi.org/10.1109/HPEC58863.2023.10363476
Wen YXie CSong SFu X(2023)Post0-VR: Enabling Universal Realistic Rendering for Modern VR via Exploiting Architectural Similarity and Data Sharing2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071097(390-402)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071097
Ali GSide MBhalachandra SWright NChen Y(2023)An automated and portable method for selecting an optimal GPU frequencyFuture Generation Computer Systems10.1016/j.future.2023.07.011Online publication date: Jul-2023
https://doi.org/10.1016/j.future.2023.07.011
Corbalán-Navarro DAragón JAnglada MParcerisa JGonzález A(2022)Triangle Dropping: An Occluded-geometry Predictor for Energy-efficient Mobile GPUsACM Transactions on Architecture and Code Optimization10.1145/352786119:3(1-20)Online publication date: 25-May-2022
https://dl.acm.org/doi/10.1145/3527861
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents