research-article

Open access

Performance-Aware Energy-Efficient GPU Frequency Selection using DNN-based Models

Authors:

Sridutt Bhalachandra,

Nicholas J. Wright,

Yong ChenAuthors Info & Claims

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

Pages 433 - 442

https://doi.org/10.1145/3605573.3605600

Published: 13 September 2023 Publication History

All formats PDF

Abstract

Energy efficiency will be important in future accelerator-based HPC systems for sustainability and to improve overall performance. This study proposes a deep neural network (DNN)-based learning model for execution time and power consumption of workloads across GPUs DVFS design space. Micro-architectural data obtained by running SPEC-ACCEL, DGEMM, and STREAM benchmarks are used for model training. These features are consistent for a workload unaffected by frequency and input size reducing the data required significantly. For real-world applications - LAMMPS, NAMD, GROMACS, LSTM, BERT, and ResNet50 power and time models show 89% – 98% accuracy on NVIDIA Ampere. Multi-objective functions help select optimal frequencies that lower power and minimize performance impact showing maximum energy savings of 27% at a performance loss of 1.8%. The same models trained on Ampere showed an accuracy of greater than 93% on an NVIDIA Volta, thereby demonstrating model portability across architectures.

References

[1]

Yuki Abe 2014. Power and performance characterization and modeling of GPU-accelerated systems. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, 113–122.

Digital Library

[2]

Ghazanfar Ali, Sridutt Bhalachandra, Nicholas J Wright, Mert Side, and Yong Chen. 2022. Optimal GPU Frequency Selection using Multi-Objective Approaches for HPC Systems. In 2022 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1–7.

[3]

Ghazanfar Ali, Mert Side, Sridutt Bhalachandra, Nicholas J Wright, and Yong Chen. 2023. Optimal GPU Frequency Selection using Multi-Objective Approaches for HPC Systems, In Future Generation Computer Systems. Future Generation Computer Systems. ACCEPTED.

[4]

Lorenz Braun 2020. A simple model for portable and fast prediction of execution time and power consumption of GPU kernels. ACM Transactions on Architecture and Code Optimization (TACO) 18, 1 (2020), 1–25.

[5]

Jack Choquette 2021. 3.2 the A100 datacenter GPU and Ampere architecture. In 2021 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 64. IEEE, 48–50.

[6]

Tom Deakin 2016. GPU-STREAM v2. 0: Benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models. In International Conference on High Performance Computing. Springer, 489–507.

[7]

Bishwajit Dutta 2018. GPU power prediction via ensemble machine learning for DVFS space exploration. In Proceedings of the 15th ACM International Conference on Computing Frontiers. 240–243.

Digital Library

[8]

Kaijie Fan 2019. Predictable GPUs frequency scaling for energy and performance. In Proceedings of the 48th International Conference on Parallel Processing. 1–10.

Digital Library

[9]

TJ Florindo 2018. Application of the multiple criteria decision-making (MCDM) approach in the identification of Carbon Footprint reduction actions in the Brazilian beef production chain. Journal of Cleaner Production 196 (2018), 1379–1389.

[10]

Ricardo Gonzalez and Mark Horowitz. 1996. Energy dissipation in general purpose microprocessors. IEEE Journal of solid-state circuits 31, 9 (1996), 1277–1284.

[11]

João Guerreiro 2019. GPU static modeling using PTX and deep structured learning. IEEE Access 7 (2019), 159150–159161.

[12]

João Guerreiro 2019. Modeling and decoupling the GPU power consumption for cross-domain DVFS. IEEE Transactions on Parallel and Distributed Systems 30, 11 (2019), 2494–2506.

Digital Library

[13]

Stijn Heldens 2020. The Landscape of Exascale Research: A Data-Driven Literature Analysis. ACM Computing Surveys (CSUR) 53, 2 (2020), 1–43.

Digital Library

[14]

Berk Hess and other. 2008. GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation. Journal of Chemical Theory and Computation 4, 3 (2008), 435–447.

[15]

HPCC. 2020. High Performance Computing Center. Retrieved May, 2020 from http://www.depts.ttu.edu/hpcc/

[16]

Shadi Ibrahim 2014. Towards Efficient Power Management in MapReduce: Investigation of CPU-Frequencies Scaling on Power Efficiency in Hadoop. In Adaptive Resource Management and Scheduling for Cloud Computing. Springer International Publishing, Cham, 147–164.

[17]

Anil K Jain, Jianchang Mao, and K Moidin Mohiuddin. 1996. Artificial neural networks: A tutorial. Computer 29, 3 (1996), 31–44.

Digital Library

[18]

Guido Juckeland 2014. SPEC ACCEL: A standard application suite for measuring hardware accelerator performance. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems. Springer, 46–67.

[19]

Neha Kashyap. 2016. HPC Benchmarks and Applications Performance Study on Broadwell-EP 4S Processor. https://downloads.dell.com/manuals/all-products/esuprt_software/esuprt_it_ops_datcentr_mgmt/high-computing-solution-resources_white-papers59_en-us.pdf.

[20]

Kate Keahey 2020. Lessons Learned from the Chameleon Testbed. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC ’20). USENIX Association.

[21]

Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-normalizing neural networks. Advances in neural information processing systems 30 (2017).

[22]

Alexander Kraskov 2004. Estimating mutual information. Physical review E 69, 6 (2004), 066138.

[23]

James H Laros III 2013. Energy delay product. In Energy-Efficient High Performance Computing. Springer, 51–55.

[24]

Andrew L. Maas 2011. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 142–150. http://www.aclweb.org/anthology/P11-1015

Digital Library

[25]

Vishwas Mishra and Shyam Akashe. 2015. Calculation of Power Delay Product and Energy Delay Product in 4-Bit FinFET Based Priority Encoder. In Advances in Optical Science and Engineering. Springer, 283–289.

[26]

NVIDIA Corporation. 2013. CUDA Samples. https://docs.nvidia.com/cuda/cuda-samples/index.html#matrix-multiplication–cublas-

[27]

NVIDIA Corporation. 2020. NGC NAMD Container. https://ngc.nvidia.com/catalog/containers/hpc:namd.

[28]

NVIDIA Corporation. 2021. NGC LAMMPS Container. https://ngc.nvidia.com/catalog/containers/hpc:lammps.

[29]

NVIDIA Corporation. 2021. NVIDIA DCGM. https://developer.nvidia.com/dcgm

[30]

Oak Ridge National Laboratory. 2022. Exascale, Project. https://www.exascaleproject.org/wp-content/uploads/2021/12/webinar-WrongWay-220216.pdf.

[31]

Junyoung Park and Jacob A Abraham. 2011. A fast, accurate and simple critical path monitor for improving energy-delay product in dvs systems. In IEEE/ACM International Symposium on Low Power Electronics and Design. IEEE, 391–396.

Digital Library

[32]

F. Pedregosa 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.

Digital Library

[33]

James C Phillips 2020. Scalable molecular dynamics on CPU and GPU architectures with NAMD. The Journal of Chemical Physics 153, 4 (2020), 044130.

[34]

Sander Pronk 2013. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29, 7 (2013), 845–854.

[35]

Brian C Ross. 2014. Mutual information between discrete and continuous data sets. PloS one 9, 2 (2014), e87357.

[36]

Daniel Svozil, Vladimir Kvasnicka, and Jiri Pospichal. 1997. Introduction to multi-layer feed-forward neural networks. Chemometrics and intelligent laboratory systems 39, 1 (1997), 43–62.

[37]

TensorFlow. 2021. Long Short-Term Memory layer - Hochreiter 1997. https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM.

[38]

Aidan P. Thompson 2021. LAMMPS - A flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Computer Physics Communications (2021), 108171. https://doi.org/10.1016/j.cpc.2021.108171

[39]

Tijmen Tieleman and Geoffrey Hinton. 2012. Rmsprop: Divide the gradient by a running average of its recent magnitude. coursera: Neural networks for machine learning. COURSERA Neural Networks Mach. Learn (2012).

[40]

TOP500.org. 2022. Top500, June 2022 Ranking. https://www.top500.org/lists/top500/2022/06/.

[41]

Qiang Wang and Xiaowen Chu. 2020. GPGPU performance estimation with core and memory frequency scaling. IEEE Transactions on Parallel and Distributed Systems 31, 12 (2020), 2865–2881.

[42]

HPC Wire. [n. d.]. AMD’s MI300 APUs to Power Exascale El Capitan Supercomputer. https://www.hpcwire.com/2022/06/21/amds-mi300-apus-to-power-exascale-el-capitan-supercomputer. (Accessed on 10/06/2022).

[43]

Gene Wu 2015. GPGPU performance and power estimation using machine learning. In 21st International Symposium on High Performance Computer Architecture. IEEE, 564–576.

Cited By

Vysocky OHolzer MStaffelbach GVavrik RRiha L(2024)Energy-Efficient Implementation of the Lattice Boltzmann MethodEnergies10.3390/en1702050217:2(502)Online publication date: 19-Jan-2024
https://doi.org/10.3390/en17020502
Hilgers TLiem R(2024)Comparability and Reproducibility in HPC Applications' Energy Consumption CharacterizationProceedings of the 15th ACM International Conference on Future and Sustainable Energy Systems10.1145/3632775.3662162(560-568)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3632775.3662162
Park JKim DKim JHan JChun S(2024)Carbon-Aware and Fault-Tolerant Migration of Deep Learning Workloads in the Geo-Distributed Cloud2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00062(494-501)Online publication date: 7-Jul-2024
https://doi.org/10.1109/CLOUD62652.2024.00062
Show More Cited By

Recommendations

An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures
MLHPC'17: Proceedings of the Machine Learning on HPC Environments

Traditionally, Deep Learning (DL) frameworks like Caffe, TensorFlow, and Cognitive Toolkit exploited GPUs to accelerate the training process. This has been primarily achieved by aggressive improvements in parallel hardware as well as through ...
Parallel Document Inversion using GPU
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

Recent advances in the technology of the Graphics Processing Unit (GPU) has led to a surge of interest in using the GPU for general purpose applications. We can utilize the GPU in computation as a massive parallel co-processor because the GPU consists ...
A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Sparse Matrix-Vector Multiplication (SpMxV) is a widely used mathematical operation in many high-performance scientific and engineering applications. In recent years, tuned software libraries for multi-core microprocessors (CPUs) and graphics processing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

August 2023

858 pages

ISBN:9798400708435

DOI:10.1145/3605573

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

ICPP 2023

ICPP 2023: 52nd International Conference on Parallel Processing

August 7 - 10, 2023

UT, Salt Lake City, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
621
Total Downloads

Downloads (Last 12 months)621
Downloads (Last 6 weeks)97

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Vysocky OHolzer MStaffelbach GVavrik RRiha L(2024)Energy-Efficient Implementation of the Lattice Boltzmann MethodEnergies10.3390/en1702050217:2(502)Online publication date: 19-Jan-2024
https://doi.org/10.3390/en17020502
Hilgers TLiem R(2024)Comparability and Reproducibility in HPC Applications' Energy Consumption CharacterizationProceedings of the 15th ACM International Conference on Future and Sustainable Energy Systems10.1145/3632775.3662162(560-568)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3632775.3662162
Park JKim DKim JHan JChun S(2024)Carbon-Aware and Fault-Tolerant Migration of Deep Learning Workloads in the Geo-Distributed Cloud2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00062(494-501)Online publication date: 7-Jul-2024
https://doi.org/10.1109/CLOUD62652.2024.00062
S.K. PTaluri SS BKarwa LSimmhan Y(2024)PowerTrain: Fast, generalizable time and power prediction models to optimize DNN training on accelerated edgesFuture Generation Computer Systems10.1016/j.future.2024.07.001161(329-344)Online publication date: Dec-2024
https://doi.org/10.1016/j.future.2024.07.001
Wang SChen SShi Y(2023)Utilization-prediction-aware energy optimization approach for heterogeneous GPU clustersThe Journal of Supercomputing10.1007/s11227-023-05807-x80:7(9554-9578)Online publication date: 11-Dec-2023
https://doi.org/10.1007/s11227-023-05807-x

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents