Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581784.3607055acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy Saving

Published: 11 November 2023 Publication History

Abstract

Energy-efficient computing uses power management techniques such as frequency scaling to save energy. Implementing energy-efficient techniques on large-scale computing systems is challenging for several reasons. While most modern architectures, including GPUs, are capable of frequency scaling, these features are often not available on large systems. In addition, achieving higher energy savings requires precise energy tuning because not only applications but also different kernels can have different energy characteristics. We propose SYnergy, a novel energy-efficient approach that spans languages, compilers, runtimes, and job schedulers to achieve unprecedented fine-grained energy savings on large-scale heterogeneous clusters. SYnergy defines an extension to the SYCL programming model that allows programmers to define a specific energy goal for each kernel. For example, a kernel can aim to minimize well-known energy metrics such as EDP and ED2P or to achieve predefined energy-performance tradeoffs, such as the best performance with 25% energy savings. Through compiler integration and a machine learning model, each kernel is statically optimized for the specific target. On large computing systems, a SLURM plug-in allows SYnergy to run on all available devices in the cluster, providing scalable energy savings. The methodology is inherently portable and has been evaluated on both NVIDIA and AMD GPUs. Experimental results show unprecedented improvements in energy and energy-related metrics on real-world applications, as well as scalable energy savings on a 64-GPU cluster.

Supplemental Material

MP4 File - SC23 paper presentation recording for "SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy Saving"
SC23 paper presentation recording for "SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy Saving", by: Kaijie Fan, Marco D'Antonio, Lorenzo Carpentieri, Biagio Cosenza, Federico Ficarelli, Daniele Cesarini.

References

[1]
Aksel Alpay and Vincent Heuveline. 2020. SYCL beyond OpenCL: The architecture, current state and future direction of hipSYCL. In IWOCL '20: International Workshop on OpenCL. 8:1.
[2]
AMD. 2023. ROCm System Management Interface. https://github.com/RadeonOpenCompute/rocm_smi_lib
[3]
Eishi Arima, Minjoon Kang, Issa Saba, Josef Weidendorfer, Carsten Trinitis, and Martin Schulz. 2022. Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps. In Workshop Proceedings of the 51st International Conference on Parallel Processing, ICPP Workshops 2022, Bordeaux, France, 29 August 2022 - 1 September 2022. ACM, 9:1--9:10.
[4]
Wenlei Bao, Changwan Hong, Sudheer Chunduri, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, and P. Sadayappan. 2016. Static and Dynamic Frequency Scaling on Multicore CPUs. ACM Trans. Archit. Code Optim. 13, 4 (2016), 51:1--51:26.
[5]
Martin Burtscher, Ivan Zecena, and Ziliang Zong. 2014. Measuring GPU Power with the K20 Built-in Sensor. In Proceedings of Workshop on General Purpose Processing Using GPUs (Salt Lake City, UT, USA) (GPGPU-7). Association for Computing Machinery, New York, NY, USA, 28--36.
[6]
CINECA. 2023. The Marconi100 Supercomputer. https://www.hpc.cineca.it/hardware/marconi100
[7]
Mark Endrei, Chao Jin, Minh Ngoc Dinh, David Abramson, Heidi Poxon, Luiz DeRose, and Bronis R. de Supinski. 2018. Energy efficiency modeling of parallel applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, Dallas, TX, USA, November 11--16, 2018. IEEE / ACM, 17:1--17:13.
[8]
Kaijie Fan, Biagio Cosenza, and Ben H. H. Juurlink. 2019. Predictable GPUs Frequency Scaling for Energy and Performance. In Proceedings of the 48th International Conference on Parallel Processing, ICPP, Kyoto, Japan, August 05--08. 52:1--52:10.
[9]
Yiannis Georgiou, Thomas Cadeau, David Glesser, Danny Auble, Morris Jette, and Matthieu Hautreux. 2014. Energy Accounting and Control with SLURM Resource and Job Management System. In Distributed Computing and Networking - 15th International Conference, ICDCN 2014, Coimbatore, India, January 4--7, 2014. Proceedings (Lecture Notes in Computer Science, Vol. 8314), Mainak Chatterjee, Jiannong Cao, Kishore Kothapalli, and Sergio Rajsbaum (Eds.). Springer, 96--118.
[10]
Neha Gholkar, Frank Mueller, and Barry Rountree. 2016. Power Tuning HPC Jobs on Power-Constrained Systems. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016, Haifa, Israel, September 11--15, 2016, Ayal Zaks, Bilha Mendelson, Lawrence Rauchwerger, and Wen-mei W. Hwu (Eds.). ACM, 179--191.
[11]
David Glesser, Yiannis Georgiou, Matthieu Hautreux, and Denis Trystram. 2014. Introducing Power-capping in Slurm scheduling. Technical Report. Lugano, Switzerland. https://hal.science/hal-01102285
[12]
Khronos OpenCL Working Group. 2021. OpenCL 3.0 API Specification. Technical Report. Khronos Group.
[13]
Khronos OpenCL Working Group. 2021. OpenCL 3.0 C Language Specification. Technical Report. Khronos Group.
[14]
Khronos SYCL Working Group. 2021. SYCL 2020 Specification, revision 3. Technical Report. Khronos Group.
[15]
Joao Guerreiro, Aleksandar Ilic, Nuno Roma, and Pedro Tomas. 2018. GPGPU Power Modelling for Multi-Domain Voltage-Frequency Scaling. In 24th IEEE International Symposium on High-Performance Computing Architecture, HPCA.
[16]
Meng Hao, Weizhe Zhang, Yiming Wang, Gangzhao Lu, Farui Wang, and Athanasios V. Vasilakos. 2021. Fine-Grained Powercap Allocation for Power-Constrained Systems Based on Multi-Objective Machine Learning. IEEE Trans. Parallel Distributed Syst. 32, 7 (2021), 1789--1801.
[17]
JA Herdman, WP Gaudin, Simon McIntosh-Smith, Michael Boulton, David A Beckingsale, AC Mallinson, and Stephen A Jarvis. 2012. Accelerating hydrocodes with OpenACC, OpenCL and CUDA. In 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. IEEE, 465--471.
[18]
Michael A. Heroux, Lois McInnes, Xiaoye Sherry Li, James Ahrens, Todd Munson, Kathryn Mohror, Terece Turton, Jeffrey Vetter, and Rajeev Thakur. 2022. ECP Software Technology Capability Assessment Report. Technical Report.
[19]
M. Horowitz, T. Indermaur, and R. Gonzalez. 1994. Low-power digital design. In Proceedings of 1994 IEEE Symposium on Low Power Electronics. 8--11.
[20]
Intel. 2014. RAPL Running Average Power Limit Power Meter. https://01.org/blogs/2014/running-average-power-limit-aAS-rapl
[21]
Intel. 2022. Level Zero Specification documentation. https://spec.oneapi.io/level-zero/latest/index.html
[22]
Intel. 2022. oneAPI Data Parallel C++ compiler. https://github.com/intel/llvm/releases/tag/2022-09 Online; accessed 6 Apr 2023.
[23]
Shailendra Jain, Surhud Khare, Satish Yada, V. Ambili, Praveen Salihundam, Shiva Ramani, Sriram Muthukumar, M. Srinivasan, Arun Kumar, Shasi Kumar, Rajaraman Ramanarayanan, Vasantha Erraguntla, Jason Howard, Sriram R. Vangal, Saurabh Dighe, Gregory Ruhl, Paolo A. Aseron, Howard Wilson, Nitin Borkar, Vivek De, and Shekhar Borkar. 2012. A 280mV-to-1.2V wide-operating-range IA-32 processor in 32nm CMOS. In IEEE International Solid-State Circuits Conference, ISSCC. 66--68.
[24]
Morris Jette and Danny Auble. 2012. SLURM Integration with IBM Parallel Environment. SLURM User Group Meeting.
[25]
Hamidreza Khaleghzadeh, Muhammad Fahad, Arsalan Shahid, Ravi Reddy Manumachu, and Alexey Lastovetsky. 2021. Bi-Objective Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms for Performance and Energy Through Workload Distribution. IEEE Transactions on Parallel and Distributed Systems 32, 3 (2021), 543--560.
[26]
Karlo Kraljic, Daniel Kerger, and Martin Schulz. 2022. Energy Efficient Frequency Scaling on GPUs in Heterogeneous HPC Systems. In Architecture of Computing Systems - 35th International Conference, ARCS 2022, Heilbronn, Germany, September 13--15, 2022, Proceedings (Lecture Notes in Computer Science, Vol. 13642). Springer, 3--16.
[27]
Alexey Lastovetsky and Ravi Reddy Manumachu. 2017. New Model-Based Methods and Algorithms for Performance and Energy Optimization of Data Parallel Applications on Homogeneous Multicore Clusters. IEEE Transactions on Parallel and Distributed Systems 28, 4 (2017), 1119--1133.
[28]
A Mallinson, D Beckingsale, W Gaudin, J Herdman, and S Jarvis. 2013. Towards portable performance for explicit hydrodynamics codes. In The International Workshop on OpenCL (IWOCL), Vol. 2013.
[29]
Ravi Reddy Manumachu and Alexey L. Lastovetsky. 2018. Bi-Objective Optimization of Data-Parallel Applications on Homogeneous Multicore Clusters for Performance and Energy. IEEE Trans. Computers 67, 2 (2018), 160--177.
[30]
Matthew Norman, Isaac Lyngaas, Abhishek Bagusetty, and Mark Berrill. 2022. Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL). International Journal of Parallel Programming (2022), 1--22.
[31]
Matthew R Norman and USDOE. 2020. miniWeather.
[32]
NVIDIA. 2023. NVIDIA NVML API Reference Guide. https://docs.nvidia.com/deploy/nvml-api/index.html
[33]
George Papadimitriou, Athanasios Chatzidimitriou, and Dimitris Gizopoulos. 2019. Adaptive Voltage/Frequency Scaling and Core Allocation for Balanced Energy and Performance on Multicore CPUs. In HPCA. IEEE, 133--146.
[34]
Srinivasan Ramesh, Swann Perarnau, Sridutt Bhalachandra, Allen D. Malony, and Peter H. Beckman. 2019. Understanding the Impact of Dynamic Power Capping on Application Progress. In 2019 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019, Rio de Janeiro, Brazil, May 20--24, 2019. IEEE, 793--804.
[35]
Haris Ribic and Yu David Liu. 2016. AEQUITAS: Coordinated Energy Management Across Parallel Applications. In Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, Istanbul, Turkey, June 1--3, 2016, Ozcan Ozturk, Kemal Ebcioglu, Mahmut T. Kandemir, and Onur Mutlu (Eds.). ACM, 4:1--4:12.
[36]
Issa Saba, Eishi Arima, Dai Liu, and Martin Schulz. 2022. Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning. In Architecture of Computing Systems - 35th International Conference, ARCS 2022, Heilbronn, Germany, September 13--15, 2022, Proceedings (Lecture Notes in Computer Science, Vol. 13642), Martin Schulz, Carsten Trinitis, Nikela Papadopoulou, and Thilo Pionteck (Eds.). Springer, 51--67.
[37]
Philip Salzmann, Fabian Knorr, Peter Thoman, Philipp Gschwandtner, Biagio Cosenza, and Thomas Fahringer. 2023. An Asynchronous Dataflow-Driven Execution Model For Distributed Accelerator Computing. In 23rd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2023, Bangalore, India, May 1--4, 2023. IEEE, 82--93.
[38]
Mohammed Sourouri, Espen Birger Raknes, Nico Reissmann, Johannes Langguth, Daniel Hackenberg, Robert Schöne, and Per Gunnar Kjeldsberg. 2017. Towards fine-grained dynamic tuning of HPC applications on modern multi-core architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, Denver, CO, USA, November 12 - 17, 2017. ACM, 41.
[39]
Peter Thoman, Philip Salzmann, Biagio Cosenza, and Thomas Fahringer. 2019. Celerity: High-Level C++ for Accelerator Clusters. In Euro-Par 2019: Parallel Processing - 25th International Conference on Parallel and Distributed Computing, G"ottingen, Germany, August 26--30, 2019, Proceedings (Lecture Notes in Computer Science, Vol. 11725). Springer, 291--303.
[40]
David Wallace. 2014. SLURM on Cray Systems. Slurm Birds of a Feather, International Conference for High Performance Computing, Networking, Storage, and Analysis, SC.
[41]
Qiang Wang and Xiaowen Chu. 2020. GPGPU Performance Estimation With Core and Memory Frequency Scaling. IEEE Trans. Parallel Distributed Syst. 31, 12 (2020), 2865--2881.
[42]
Jonathan A. Winter, David H. Albonesi, and Christine A. Shoemaker. 2010. Scalable thread scheduling and global power management for heterogeneous many-core architectures. In 19th International Conference on Parallel Architectures and Compilation Techniques, PACT 2010, Vienna, Austria, September 11--15, 2010, Valentina Salapura, Michael Gschwind, and Jens Knoop (Eds.). ACM, 29--40.
[43]
Andy B. Yoo, Morris A. Jette, and Mark Grondona. 2003. SLURM: Simple Linux Utility for Resource Management. In Job Scheduling Strategies for Parallel Processing, 9th International Workshop, JSSPP 2003, Seattle, WA, USA, June 24, 2003, Revised Papers (Lecture Notes in Computer Science, Vol. 2862), Dror G. Feitelson, Larry Rudolph, and Uwe Schwiegelshohn (Eds.). Springer, 44--60.
[44]
Andy B. Yoo, Morris A. Jette, and Mark Grondona. 2003. SLURM: Simple Linux Utility for Resource Management. In Job Scheduling Strategies for Parallel Processing, Gerhard Goos, Juris Hartmanis, Jan van Leeuwen, Dror Feitelson, Larry Rudolph, and Uwe Schwiegelshohn (Eds.). Vol. 2862. Springer Berlin Heidelberg, Berlin, Heidelberg, 44--60. Series Title: Lecture Notes in Computer Science.
[45]
Huazhe Zhang and Henry Hoffmann. 2018. Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power Caps. In Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018, Eugene, OR, USA, August 13--16, 2018. ACM, 67:1--67:11.
[46]
Huazhe Zhang and Henry Hoffmann. 2019. PoDD: power-capping dependent distributed applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019, Denver, Colorado, USA, November 17--19, 2019. ACM, 28:1--28:23.
[47]
Marcela Zuluaga, Andreas Krause, and Markus Püschel. 2016. e-PAL: An Active Learning Approach to the Multi-Objective Optimization Problem. Journal of Machine Learning Research 17 (2016), 104:1--104:32.

Cited By

View all
  • (2024)Green Intrusion Detection Systems: A Comprehensive Review and DirectionsSensors10.3390/s2417551624:17(5516)Online publication date: 26-Aug-2024
  • (2024)Enabling performance portability on the LiGen drug discovery pipelineFuture Generation Computer Systems10.1016/j.future.2024.03.045158:C(44-59)Online publication date: 18-Jul-2024
  • (2023)Domain-Specific Energy Modeling for Drug Discovery and Magnetohydrodynamics ApplicationsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624261(1790-1800)Online publication date: 12-Nov-2023

Index Terms

  1. SYnergy: Fine-grained Energy-Efficient Heterogeneous Computing for Scalable Energy Saving

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
      November 2023
      1428 pages
      ISBN:9798400701092
      DOI:10.1145/3581784
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 November 2023

      Check for updates

      Badges

      Author Tags

      1. frequency scaling
      2. heterogeneous computing
      3. energy efficiency
      4. modeling

      Qualifiers

      • Research-article

      Funding Sources

      • European High-Performance Computing Joint Undertaking

      Conference

      SC '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)657
      • Downloads (Last 6 weeks)83
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Green Intrusion Detection Systems: A Comprehensive Review and DirectionsSensors10.3390/s2417551624:17(5516)Online publication date: 26-Aug-2024
      • (2024)Enabling performance portability on the LiGen drug discovery pipelineFuture Generation Computer Systems10.1016/j.future.2024.03.045158:C(44-59)Online publication date: 18-Jul-2024
      • (2023)Domain-Specific Energy Modeling for Drug Discovery and Magnetohydrodynamics ApplicationsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624261(1790-1800)Online publication date: 12-Nov-2023

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media