Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3605573.3605600acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Open access

Performance-Aware Energy-Efficient GPU Frequency Selection using DNN-based Models

Published: 13 September 2023 Publication History

Abstract

Energy efficiency will be important in future accelerator-based HPC systems for sustainability and to improve overall performance. This study proposes a deep neural network (DNN)-based learning model for execution time and power consumption of workloads across GPUs DVFS design space. Micro-architectural data obtained by running SPEC-ACCEL, DGEMM, and STREAM benchmarks are used for model training. These features are consistent for a workload unaffected by frequency and input size reducing the data required significantly. For real-world applications - LAMMPS, NAMD, GROMACS, LSTM, BERT, and ResNet50 power and time models show 89% – 98% accuracy on NVIDIA Ampere. Multi-objective functions help select optimal frequencies that lower power and minimize performance impact showing maximum energy savings of 27% at a performance loss of 1.8%. The same models trained on Ampere showed an accuracy of greater than 93% on an NVIDIA Volta, thereby demonstrating model portability across architectures.

References

[1]
Yuki Abe 2014. Power and performance characterization and modeling of GPU-accelerated systems. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, 113–122.
[2]
Ghazanfar Ali, Sridutt Bhalachandra, Nicholas J Wright, Mert Side, and Yong Chen. 2022. Optimal GPU Frequency Selection using Multi-Objective Approaches for HPC Systems. In 2022 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1–7.
[3]
Ghazanfar Ali, Mert Side, Sridutt Bhalachandra, Nicholas J Wright, and Yong Chen. 2023. Optimal GPU Frequency Selection using Multi-Objective Approaches for HPC Systems, In Future Generation Computer Systems. Future Generation Computer Systems. ACCEPTED.
[4]
Lorenz Braun 2020. A simple model for portable and fast prediction of execution time and power consumption of GPU kernels. ACM Transactions on Architecture and Code Optimization (TACO) 18, 1 (2020), 1–25.
[5]
Jack Choquette 2021. 3.2 the A100 datacenter GPU and Ampere architecture. In 2021 IEEE International Solid-State Circuits Conference (ISSCC), Vol. 64. IEEE, 48–50.
[6]
Tom Deakin 2016. GPU-STREAM v2. 0: Benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models. In International Conference on High Performance Computing. Springer, 489–507.
[7]
Bishwajit Dutta 2018. GPU power prediction via ensemble machine learning for DVFS space exploration. In Proceedings of the 15th ACM International Conference on Computing Frontiers. 240–243.
[8]
Kaijie Fan 2019. Predictable GPUs frequency scaling for energy and performance. In Proceedings of the 48th International Conference on Parallel Processing. 1–10.
[9]
TJ Florindo 2018. Application of the multiple criteria decision-making (MCDM) approach in the identification of Carbon Footprint reduction actions in the Brazilian beef production chain. Journal of Cleaner Production 196 (2018), 1379–1389.
[10]
Ricardo Gonzalez and Mark Horowitz. 1996. Energy dissipation in general purpose microprocessors. IEEE Journal of solid-state circuits 31, 9 (1996), 1277–1284.
[11]
João Guerreiro 2019. GPU static modeling using PTX and deep structured learning. IEEE Access 7 (2019), 159150–159161.
[12]
João Guerreiro 2019. Modeling and decoupling the GPU power consumption for cross-domain DVFS. IEEE Transactions on Parallel and Distributed Systems 30, 11 (2019), 2494–2506.
[13]
Stijn Heldens 2020. The Landscape of Exascale Research: A Data-Driven Literature Analysis. ACM Computing Surveys (CSUR) 53, 2 (2020), 1–43.
[14]
Berk Hess and other. 2008. GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation. Journal of Chemical Theory and Computation 4, 3 (2008), 435–447.
[15]
HPCC. 2020. High Performance Computing Center. Retrieved May, 2020 from http://www.depts.ttu.edu/hpcc/
[16]
Shadi Ibrahim 2014. Towards Efficient Power Management in MapReduce: Investigation of CPU-Frequencies Scaling on Power Efficiency in Hadoop. In Adaptive Resource Management and Scheduling for Cloud Computing. Springer International Publishing, Cham, 147–164.
[17]
Anil K Jain, Jianchang Mao, and K Moidin Mohiuddin. 1996. Artificial neural networks: A tutorial. Computer 29, 3 (1996), 31–44.
[18]
Guido Juckeland 2014. SPEC ACCEL: A standard application suite for measuring hardware accelerator performance. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems. Springer, 46–67.
[19]
Neha Kashyap. 2016. HPC Benchmarks and Applications Performance Study on Broadwell-EP 4S Processor. https://downloads.dell.com/manuals/all-products/esuprt_software/esuprt_it_ops_datcentr_mgmt/high-computing-solution-resources_white-papers59_en-us.pdf.
[20]
Kate Keahey 2020. Lessons Learned from the Chameleon Testbed. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC ’20). USENIX Association.
[21]
Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-normalizing neural networks. Advances in neural information processing systems 30 (2017).
[22]
Alexander Kraskov 2004. Estimating mutual information. Physical review E 69, 6 (2004), 066138.
[23]
James H Laros III 2013. Energy delay product. In Energy-Efficient High Performance Computing. Springer, 51–55.
[24]
Andrew L. Maas 2011. Learning Word Vectors for Sentiment Analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 142–150. http://www.aclweb.org/anthology/P11-1015
[25]
Vishwas Mishra and Shyam Akashe. 2015. Calculation of Power Delay Product and Energy Delay Product in 4-Bit FinFET Based Priority Encoder. In Advances in Optical Science and Engineering. Springer, 283–289.
[26]
NVIDIA Corporation. 2013. CUDA Samples. https://docs.nvidia.com/cuda/cuda-samples/index.html#matrix-multiplication–cublas-
[27]
NVIDIA Corporation. 2020. NGC NAMD Container. https://ngc.nvidia.com/catalog/containers/hpc:namd.
[28]
NVIDIA Corporation. 2021. NGC LAMMPS Container. https://ngc.nvidia.com/catalog/containers/hpc:lammps.
[29]
NVIDIA Corporation. 2021. NVIDIA DCGM. https://developer.nvidia.com/dcgm
[30]
Oak Ridge National Laboratory. 2022. Exascale, Project. https://www.exascaleproject.org/wp-content/uploads/2021/12/webinar-WrongWay-220216.pdf.
[31]
Junyoung Park and Jacob A Abraham. 2011. A fast, accurate and simple critical path monitor for improving energy-delay product in dvs systems. In IEEE/ACM International Symposium on Low Power Electronics and Design. IEEE, 391–396.
[32]
F. Pedregosa 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[33]
James C Phillips 2020. Scalable molecular dynamics on CPU and GPU architectures with NAMD. The Journal of Chemical Physics 153, 4 (2020), 044130.
[34]
Sander Pronk 2013. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29, 7 (2013), 845–854.
[35]
Brian C Ross. 2014. Mutual information between discrete and continuous data sets. PloS one 9, 2 (2014), e87357.
[36]
Daniel Svozil, Vladimir Kvasnicka, and Jiri Pospichal. 1997. Introduction to multi-layer feed-forward neural networks. Chemometrics and intelligent laboratory systems 39, 1 (1997), 43–62.
[37]
TensorFlow. 2021. Long Short-Term Memory layer - Hochreiter 1997. https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM.
[38]
Aidan P. Thompson 2021. LAMMPS - A flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Computer Physics Communications (2021), 108171. https://doi.org/10.1016/j.cpc.2021.108171
[39]
Tijmen Tieleman and Geoffrey Hinton. 2012. Rmsprop: Divide the gradient by a running average of its recent magnitude. coursera: Neural networks for machine learning. COURSERA Neural Networks Mach. Learn (2012).
[40]
TOP500.org. 2022. Top500, June 2022 Ranking. https://www.top500.org/lists/top500/2022/06/.
[41]
Qiang Wang and Xiaowen Chu. 2020. GPGPU performance estimation with core and memory frequency scaling. IEEE Transactions on Parallel and Distributed Systems 31, 12 (2020), 2865–2881.
[42]
HPC Wire. [n. d.]. AMD’s MI300 APUs to Power Exascale El Capitan Supercomputer. https://www.hpcwire.com/2022/06/21/amds-mi300-apus-to-power-exascale-el-capitan-supercomputer. (Accessed on 10/06/2022).
[43]
Gene Wu 2015. GPGPU performance and power estimation using machine learning. In 21st International Symposium on High Performance Computer Architecture. IEEE, 564–576.

Cited By

View all
  • (2024)Energy-Efficient Implementation of the Lattice Boltzmann MethodEnergies10.3390/en1702050217:2(502)Online publication date: 19-Jan-2024
  • (2024)Comparability and Reproducibility in HPC Applications' Energy Consumption CharacterizationProceedings of the 15th ACM International Conference on Future and Sustainable Energy Systems10.1145/3632775.3662162(560-568)Online publication date: 4-Jun-2024
  • (2024)Carbon-Aware and Fault-Tolerant Migration of Deep Learning Workloads in the Geo-Distributed Cloud2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00062(494-501)Online publication date: 7-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing
August 2023
858 pages
ISBN:9798400708435
DOI:10.1145/3605573
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2023

Check for updates

Author Tags

  1. Ampere GPU
  2. GPU
  3. Volta GPU
  4. dynamic voltage frequency scaling
  5. energy-efficiency
  6. high-performance computing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICPP 2023
ICPP 2023: 52nd International Conference on Parallel Processing
August 7 - 10, 2023
UT, Salt Lake City, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)621
  • Downloads (Last 6 weeks)97
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Energy-Efficient Implementation of the Lattice Boltzmann MethodEnergies10.3390/en1702050217:2(502)Online publication date: 19-Jan-2024
  • (2024)Comparability and Reproducibility in HPC Applications' Energy Consumption CharacterizationProceedings of the 15th ACM International Conference on Future and Sustainable Energy Systems10.1145/3632775.3662162(560-568)Online publication date: 4-Jun-2024
  • (2024)Carbon-Aware and Fault-Tolerant Migration of Deep Learning Workloads in the Geo-Distributed Cloud2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00062(494-501)Online publication date: 7-Jul-2024
  • (2024)PowerTrain: Fast, generalizable time and power prediction models to optimize DNN training on accelerated edgesFuture Generation Computer Systems10.1016/j.future.2024.07.001161(329-344)Online publication date: Dec-2024
  • (2023)Utilization-prediction-aware energy optimization approach for heterogeneous GPU clustersThe Journal of Supercomputing10.1007/s11227-023-05807-x80:7(9554-9578)Online publication date: 11-Dec-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media