research-article

Penelope: Peer-to-peer Power Management

Authors:

Tapan Srivastava,

Henry HoffmannAuthors Info & Claims

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

Article No.: 43, Pages 1 - 11

https://doi.org/10.1145/3545008.3545047

Published: 13 January 2023 Publication History

Abstract

Large scale distributed computing setups rely on power management systems to enforce tight power budgets. Existing systems use a central authority that redistributes excess power to power-hungry nodes. This central authority, however, is both a single point of failure and a critical bottleneck—especially at large scale. To address these limitations we propose Penelope, a distributed power management system which shifts power through peer-to-peer transactions, ensuring that it remains robust in faulty environments and at large scale. We implement Penelope and compare its achieved performance to SLURM, a centralized power manager, under a variety of power budgets. We find that under normal conditions SLURM and Penelope achieve almost equivalent performance; however in faulty environments, Penelope achieves 8–15% mean application performance gains over SLURM. At large scale and with increasing frequency of messages, Penelope maintains its performance in contrast to centralized approaches which degrade and become unusable.

References

[1]

Dong H. Ahn, Ned Bass, Albert Chu, Jim Garlick, Mark Grondona, Stephen Herbein, Helgi I. IngÃ³lfsson, Joseph Koning, Tapasya Patki, Thomas R.W. Scogland, Becky Springmeyer, and Michela Taufer. 2020. Flux: Overcoming scheduling challenges for exascale workflows. Future Generation Computer Systems 110 (2020), 202–213. https://doi.org/10.1016/j.future.2020.04.006

[2]

Peter E Bailey, Aniruddha Marathe, David K Lowenthal, Barry Rountree, and Martin Schulz. 2015. Finding the limits of power-constrained application performance. In SC. ACM, Austin Texas, 1–12. https://doi.org/10.1145/2807591.2807637

Digital Library

[3]

Pete Beckman, Ron Brightwell, Maya Gokhale, Bronis R. de Supinski, Steven Hofmeyr, Sriram Krishnamoorthy, Mike Lang, Barney Maccabe, John Shalf, and Marc Snir. 2012. Exascale Operating Systems and Runtime Software Report. (12 2012). https://doi.org/10.2172/1471119

[4]

[4] NAS Parallel Benchmark.[n.d.]. https://www.nas.nasa.gov/publications/npb.html.

[5]

Keren Bergman, Shekhar Borkar, Dan Campbell, William Carlson, William Dally, Monty Denneau, Paul Franzon, William Harrod, Kerry Hill, Jon Hiller, 2008. Exascale computing study: Technology challenges in achieving exascale systems. DARPA IPTO, Tech. Rep 15(2008).

[6]

Stephanie Brink, Matthew Larsen, Hank Childs, and Barry Rountree. 2021. Evaluating adaptive and predictive power management strategies for optimizing visualization performance on supercomputers. Parallel Comput. 104-105(2021), 102782. https://doi.org/10.1016/j.parco.2021.102782

[7]

Rolando Brondolin, Marco Arnaboldi, and Marco D. Santambrogio. 2020. Power Consumption Management under a Low-Level Performance Constraint in the Xen Hypervisor. SIGBED Rev. 17, 1 (July 2020), 42–48. https://doi.org/10.1145/3412821.3412828

Digital Library

[8]

Ramon Canal, Carles Hernandez, Rafa Tornero, Alessandro Cilardo, Giuseppe Massari, Federico Reghenzani, William Fornaciari, Marina Zapater, David Atienza, Ariel Oleksiak, Wojciech Piundefinedtek, and Jaume Abella. 2020. Predictive Reliability and Fault Management in Exascale Systems: State of the Art and Perspectives. ACM Comput. Surv. 53, 5, Article 95 (Sept. 2020), 32 pages. https://doi.org/10.1145/3403956

Digital Library

[9]

J Chen, Alok Choudhary, S Feldman, B Hendrickson, CR Johnson, R Mount, V Sarkar, V White, and D Williams. 2013. Synergistic Challenges in Data-Intensive Science and Exascale Computing: DOE ASCAC Data Subcommittee Report. Department of Energy Office of Science. Type: Report.

[10]

Jian Chen and Lizy Kurian John. 2011. Predictive coordination of multiple on-chip resources for chip multiprocessors. In ICS ’11. ACM Press, Tucson, Arizona, USA, 192–201. https://doi.org/10.1145/1995896.1995927

Digital Library

[11]

Anwesha Das, Frank Mueller, and Barry Rountree. 2020. Aarohi: Making Real-Time Node Failure Prediction Feasible. In 2020 IPDPS. 1092–1101. https://doi.org/10.1109/IPDPS47924.2020.00115

[12]

H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. 2010. RAPL: Memory power estimation and capping. In 2010 ACM/IEEE ISLPED. 189–194. https://doi.org/10.1145/1840845.1840883

Digital Library

[13]

Qingyuan Deng, David Meisner, Abhishek Bhattacharjee, Thomas F Wenisch, and Ricardo Bianchini. 2012. CoScale: Coordinating CPU and memory system DVFS in server systems. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 143–154. https://doi.org/10.1109/MICRO.2012.22

Digital Library

[14]

Qingyuan Deng, David Meisner, Abhishek Bhattacharjee, Thomas F Wenisch, and Ricardo Bianchini. 2012. MultiScale: memory system DVFS with multiple memory controllers. In ISLPED ’12. ACM Press, Redondo Beach, California, USA, 297–302. https://doi.org/10.1145/2333660.2333727

Digital Library

[15]

Bruno Diniz, Dorgival Guedes, Wagner Meira Jr, and Ricardo Bianchini. 2007. Limiting the power consumption of main memory. In ISCA ’07. ACM Press, San Diego, California, USA, 290–301. https://doi.org/10.1145/1250662.1250699

Digital Library

[16]

Daniel Ellsworth, Tapasya Patki, Martin Schulz, Barry Rountree, and Allen Malony. 2017. Simulating Power Scheduling at Scale(E2SC’17). Association for Computing Machinery, New York, NY, USA, Article 2, 8 pages. https://doi.org/10.1145/3149412.3149414

Digital Library

[17]

Daniel A Ellsworth, Allen D Malony, Barry Rountree, and Martin Schulz. 2015. Dynamic power sharing for higher job throughput. In SC’15. IEEE, 1–11. https://doi.org/10.1145/2807591.2807643

Digital Library

[18]

Daniel A Ellsworth, Allen D Malony, Barry Rountree, and Martin Schulz. 2015. POW: System-wide Dynamic Reallocation of Limited Power in HPC. In HPDC. ACM, Portland Oregon USA, 145–148. https://doi.org/10.1145/2749246.2749277

Digital Library

[19]

Keiichiro Fukazawa, Masatsugu Ueda, Mutsumi Aoyagi, Tomonori Tsuhata, Kyohei Yoshida, Aruta Uehara, Masakazu Kuze, Yuichi Inadomi, and Koji Inoue. 2014. Power consumption evaluation of an mhd simulation with cpu power capping. In 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 612–617. https://doi.org/10.1109/CCGrid.2014.47

Digital Library

[20]

Neha Gholkar, Frank Mueller, and Barry Rountree. 2016. Power tuning HPC jobs on power-constrained systems. In 2016 PACT. IEEE, 179–190. https://doi.org/10.1145/2967938.2967961

Digital Library

[21]

Neha Gholkar, Frank Mueller, and Barry Rountree. 2019. Uncore Power Scavenger: A Runtime for Uncore Power Conservation on HPC Systems(SC ’19). Association for Computing Machinery, New York, NY, USA, Article 27, 23 pages. https://doi.org/10.1145/3295500.3356150

Digital Library

[22]

Neha Gholkar, Frank Mueller, Barry Rountree, and Aniruddha Marathe. 2018. PShifter: feedback-based dynamic power shifting within HPC jobs for performance. In HPDC. ACM, Tempe Arizona, 106–117. https://doi.org/10.1145/3208040.3208047

Digital Library

[23]

Henry Hoffmann, Jim Holt, George Kurian, Eric Lau, Martina Maggio, Jason E Miller, Sabrina M Neuman, Mahmut Sinangil, Yildiz Sinangil, Anant Agarwal, 2012. Self-aware computing in the Angstrom processor. In DAC ’12. ACM Press, 259–264. https://doi.org/10.1145/2228360.2228409

Digital Library

[24]

Henry Hoffmann and Martina Maggio. 2014. PCP: A Generalized Approach to Optimizing Performance Under Power Constraints through Resource Management. In ICAC ’14. 241–247.

[25]

Connor Imes and Henry Hoffmann. 2016. Bard: A unified framework for managing soft timing and power constraints. In AMOS. IEEE, 31–38. https://doi.org/10.1109/SAMOS.2016.7818328

[26]

Connor Imes, Steven Hofmeyr, and Henry Hoffmann. 2018. Energy-efficient Application Resource Scheduling using Machine Learning Classifiers. In Proceedings of the 47th International Conference on Parallel Processing. ACM, Eugene OR USA, 1–11. https://doi.org/10.1145/3225058.3225088

Digital Library

[27]

David E Keyes, Lois C McInnes, Carol Woodward, William Gropp, Eric Myra, Michael Pernice, John Bell, Jed Brown, Alain Clo, Jeffrey Connors, 2013. Multiphysics simulations: Challenges and opportunities. The International Journal of High Performance Computing Applications 27, 1(2013), 4–83. https://doi.org/10.1177/1094342012468181 arXiv:https://doi.org/10.1177/1094342012468181

Digital Library

[28]

Mohammed G Khatib and Zvonimir Bandic. 2016. PCAP: Performance-aware Power Capping for the Disk Drive in the Cloud. In FAST. USENIX Association, Santa Clara, CA, 227–240. https://www.usenix.org/conference/fast16/technical-sessions/presentation/khatib

[29]

Charles Lefurgy, Xiaorui Wang, and Malcolm Ware. 2008. Power capping: a prelude to power shifting. Cluster Computing 11, 2 (June 2008), 183–195. https://doi.org/10.1007/s10586-007-0045-4

Digital Library

[30]

Matthias Maiterth, Torsten Wilde, David Lowenthal, Barry Rountree, Martin Schulz, Jonathan Eastep, and Dieter Kranzlmüller. 2017. Power Aware High Performance Computing: Challenges and Opportunities for Application and System Developers — Survey Tutorial. In HPCS. 3–10. https://doi.org/10.1109/HPCS.2017.11

[31]

Tapasya Patki, Zachary Frye, Harsh Bhatia, Francesco Di Natale, James Glosli, Helgi Ingolfsson, and Barry Rountree. 2019. Comparing GPU Power and Frequency Capping: A Case Study with the MuMMI Workflow. In WORKS. 31–39. https://doi.org/10.1109/WORKS49585.2019.00009

[32]

Tapasya Patki, Zachary Frye, Harsh Bhatia, Francesco Di Natale, James Glosli, Helgi Ingolfsson, and Barry Rountree. 2019. Comparing GPU Power and Frequency Capping: A Case Study with the MuMMI Workflow. In WORKS. IEEE, 31–39.

[33]

Tapasya Patki, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R De Supinski. 2013. Exploring hardware overprovisioning in power-constrained, high performance computing. In ICS. ACM Press, 173–182. https://doi.org/10.1145/2464996.2465009

Digital Library

[34]

Tapasya Patki, David K Lowenthal, Anjana Sasidharan, Matthias Maiterth, Barry L Rountree, Martin Schulz, and Bronis R De Supinski. 2015. Practical Resource Management in Power-Constrained, High Performance Computing. In HPDC. ACM, Portland Oregon USA, 121–132. https://doi.org/10.1145/2749246.2749262

Digital Library

[35]

Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, and Xiaoyun Zhu. 2008. No ”power” struggles: coordinated multi-level power management for the data center. ACM SIGARCH Computer Architecture News 36, 1 (March 2008), 48–59. https://doi.org/10.1145/1353534.1346289

Digital Library

[36]

Haris Ribic and Yu David Liu. 2016. AEQUITAS: Coordinated Energy Management Across Parallel Applications. In ICS. ACM, Istanbul Turkey, 1–12. https://doi.org/10.1145/2925426.2926260

Digital Library

[37]

Barry Rountree, Dong H Ahn, Bronis R De Supinski, David K Lowenthal, and Martin Schulz. 2012. Beyond DVFS: A first look at performance under a hardware-enforced power bound. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IEEE, 947–953. https://doi.org/10.1109/IPDPSW.2012.116

Digital Library

[38]

Ryuichi Sakamoto, Tapasya Patki, Thang Cao, Masaaki Kondo, Koji Inoue, Masatsugu Ueda, Daniel Ellsworth, Barry Rountree, and Martin Schulz. 2018. Analyzing Resource Trade-offs in Hardware Overprovisioned Supercomputers. In 2018 IPDPS. 526–535. https://doi.org/10.1109/IPDPS.2018.00062

[39]

Ahmed Salem, Theodoros Salonidis, Nirmit Desai, and Tamer Nadeem. 2017. Kinaara: Distributed discovery and allocation of mobile edge resources. In MASS. IEEE, 153–161. https://doi.org/10.1109/MASS.2017.10

[40]

Osman Sarood, Akhil Langer, Abhishek Gupta, and Laxmikant Kale. 2014. Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget. In SC ’14. IEEE, 807–818. https://doi.org/10.1109/SC.2014.71

Digital Library

[41]

Osman Sarood, Akhil Langer, Laxmikant Kalé, Barry Rountree, and Bronis De Supinski. 2013. Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems. In CLUSTER. IEEE, 1–8. https://doi.org/10.1109/CLUSTER.2013.6702684

[42]

Lee Savoie, David K. Lowenthal, Bronis R. De Supinski, Tanzima Islam, Kathryn Mohror, Barry Rountree, and Martin Schulz. 2016. I/O Aware Power Shifting. In IPDPS. IEEE, Chicago, IL, 740–749. https://doi.org/10.1109/IPDPS.2016.15

[43]

SLURM. [n.d.]. The SLURM Workload Manager. https://slurm.schedmd.com.

[44]

Giacomo Tanganelli, Carlo Vallati, and Enzo Mingozzi. 2017. Edge-Centric Distributed Discovery and Access in the Internet of Things. IEEE Internet of Things Journal 5, 1 (2017), 425–438. https://doi.org/10.1109/JIOT.2017.2767381

[45]

ExaOSR Team. [n.d.]. Key Challenges for Exascale OS/R. https://collab.cels.anl.gov/display/exaosr/Challenges.

[46]

Andy B Yoo, Morris A Jette, and Mark Grondona. 2003. Slurm: Simple linux utility for resource management. In Job Scheduling Strategies for Parallel Processing, Dror Feitelson, Larry Rudolph, and Uwe Schwiegelshohn (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 44–60.

[47]

Javad Zarrin, Rui L Aguiar, and João Paulo Barraca. 2018. Resource discovery for distributed computing systems: A comprehensive survey. J. Parallel and Distrib. Comput. 113 (2018), 127–166. https://doi.org/10.1016/j.jpdc.2017.11.010

Digital Library

[48]

Huazhe Zhang. [n.d.]. A quantitative evaluation of the RAPL power control system. ([n. d.]).

[49]

Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. ACM SIGPLAN Notices 51, 4 (June 2016), 545–559. https://doi.org/10.1145/2954679.2872375

Digital Library

[50]

Huazhe Zhang and Henry Hoffmann. 2018. Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power Caps. In ICPP. ACM, Eugene OR USA, 1–11. https://doi.org/10.1145/3225058.3225098

Digital Library

[51]

Huazhe Zhang and Henry Hoffmann. 2019. PoDD: power-capping dependent distributed applications. In SC. ACM, Denver Colorado, 1–23. https://doi.org/10.1145/3295500.3356174

Digital Library

Cited By

Khoshrooynemati TKhaneghah E(2023)ExaPRR: A Framework for Support Dynamic and Interactive Events on Distributed Published Resource Repositories Mechanism in Distributed Exascale Computing SystemsInternational Journal of Networked and Distributed Computing10.1007/s44227-023-00015-812:1(53-81)Online publication date: 21-Dec-2023
https://doi.org/10.1007/s44227-023-00015-8

Index Terms

Penelope: Peer-to-peer Power Management
1. Hardware
  1. Power and energy
    1. Power estimation and optimization
      1. Enterprise level and data centers power issues

Recommendations

Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems

Power and thermal dissipation constrain multicore performance scaling. Modern processors are built such that they could sustain damaging levels of power dissipation, creating a need for systems that can implement processor power caps. A particular ...
Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques
ASPLOS '16

Power and thermal dissipation constrain multicore performance scaling. Modern processors are built such that they could sustain damaging levels of power dissipation, creating a need for systems that can implement processor power caps. A particular ...
Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques
ASPLOS'16

Power and thermal dissipation constrain multicore performance scaling. Modern processors are built such that they could sustain damaging levels of power dissipation, creating a need for systems that can implement processor power caps. A particular ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

August 2022

976 pages

ISBN:9781450397339

DOI:10.1145/3545008

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP '22

ICPP '22: 51st International Conference on Parallel Processing

August 29 - September 1, 2022

Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
91
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)2

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Khoshrooynemati TKhaneghah E(2023)ExaPRR: A Framework for Support Dynamic and Interactive Events on Distributed Published Resource Repositories Mechanism in Distributed Exascale Computing SystemsInternational Journal of Networked and Distributed Computing10.1007/s44227-023-00015-812:1(53-81)Online publication date: 21-Dec-2023
https://doi.org/10.1007/s44227-023-00015-8

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents