Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3545008.3545047acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Penelope: Peer-to-peer Power Management

Published: 13 January 2023 Publication History

Abstract

Large scale distributed computing setups rely on power management systems to enforce tight power budgets. Existing systems use a central authority that redistributes excess power to power-hungry nodes. This central authority, however, is both a single point of failure and a critical bottleneck—especially at large scale. To address these limitations we propose Penelope, a distributed power management system which shifts power through peer-to-peer transactions, ensuring that it remains robust in faulty environments and at large scale. We implement Penelope and compare its achieved performance to SLURM, a centralized power manager, under a variety of power budgets. We find that under normal conditions SLURM and Penelope achieve almost equivalent performance; however in faulty environments, Penelope achieves 8–15% mean application performance gains over SLURM. At large scale and with increasing frequency of messages, Penelope maintains its performance in contrast to centralized approaches which degrade and become unusable.

References

[1]
Dong H. Ahn, Ned Bass, Albert Chu, Jim Garlick, Mark Grondona, Stephen Herbein, Helgi I. Ingólfsson, Joseph Koning, Tapasya Patki, Thomas R.W. Scogland, Becky Springmeyer, and Michela Taufer. 2020. Flux: Overcoming scheduling challenges for exascale workflows. Future Generation Computer Systems 110 (2020), 202–213. https://doi.org/10.1016/j.future.2020.04.006
[2]
Peter E Bailey, Aniruddha Marathe, David K Lowenthal, Barry Rountree, and Martin Schulz. 2015. Finding the limits of power-constrained application performance. In SC. ACM, Austin Texas, 1–12. https://doi.org/10.1145/2807591.2807637
[3]
Pete Beckman, Ron Brightwell, Maya Gokhale, Bronis R. de Supinski, Steven Hofmeyr, Sriram Krishnamoorthy, Mike Lang, Barney Maccabe, John Shalf, and Marc Snir. 2012. Exascale Operating Systems and Runtime Software Report. (12 2012). https://doi.org/10.2172/1471119
[4]
[4] NAS Parallel Benchmark.[n.d.]. https://www.nas.nasa.gov/publications/npb.html.
[5]
Keren Bergman, Shekhar Borkar, Dan Campbell, William Carlson, William Dally, Monty Denneau, Paul Franzon, William Harrod, Kerry Hill, Jon Hiller, 2008. Exascale computing study: Technology challenges in achieving exascale systems. DARPA IPTO, Tech. Rep 15(2008).
[6]
Stephanie Brink, Matthew Larsen, Hank Childs, and Barry Rountree. 2021. Evaluating adaptive and predictive power management strategies for optimizing visualization performance on supercomputers. Parallel Comput. 104-105(2021), 102782. https://doi.org/10.1016/j.parco.2021.102782
[7]
Rolando Brondolin, Marco Arnaboldi, and Marco D. Santambrogio. 2020. Power Consumption Management under a Low-Level Performance Constraint in the Xen Hypervisor. SIGBED Rev. 17, 1 (July 2020), 42–48. https://doi.org/10.1145/3412821.3412828
[8]
Ramon Canal, Carles Hernandez, Rafa Tornero, Alessandro Cilardo, Giuseppe Massari, Federico Reghenzani, William Fornaciari, Marina Zapater, David Atienza, Ariel Oleksiak, Wojciech Piundefinedtek, and Jaume Abella. 2020. Predictive Reliability and Fault Management in Exascale Systems: State of the Art and Perspectives. ACM Comput. Surv. 53, 5, Article 95 (Sept. 2020), 32 pages. https://doi.org/10.1145/3403956
[9]
J Chen, Alok Choudhary, S Feldman, B Hendrickson, CR Johnson, R Mount, V Sarkar, V White, and D Williams. 2013. Synergistic Challenges in Data-Intensive Science and Exascale Computing: DOE ASCAC Data Subcommittee Report. Department of Energy Office of Science. Type: Report.
[10]
Jian Chen and Lizy Kurian John. 2011. Predictive coordination of multiple on-chip resources for chip multiprocessors. In ICS ’11. ACM Press, Tucson, Arizona, USA, 192–201. https://doi.org/10.1145/1995896.1995927
[11]
Anwesha Das, Frank Mueller, and Barry Rountree. 2020. Aarohi: Making Real-Time Node Failure Prediction Feasible. In 2020 IPDPS. 1092–1101. https://doi.org/10.1109/IPDPS47924.2020.00115
[12]
H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. 2010. RAPL: Memory power estimation and capping. In 2010 ACM/IEEE ISLPED. 189–194. https://doi.org/10.1145/1840845.1840883
[13]
Qingyuan Deng, David Meisner, Abhishek Bhattacharjee, Thomas F Wenisch, and Ricardo Bianchini. 2012. CoScale: Coordinating CPU and memory system DVFS in server systems. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 143–154. https://doi.org/10.1109/MICRO.2012.22
[14]
Qingyuan Deng, David Meisner, Abhishek Bhattacharjee, Thomas F Wenisch, and Ricardo Bianchini. 2012. MultiScale: memory system DVFS with multiple memory controllers. In ISLPED ’12. ACM Press, Redondo Beach, California, USA, 297–302. https://doi.org/10.1145/2333660.2333727
[15]
Bruno Diniz, Dorgival Guedes, Wagner Meira Jr, and Ricardo Bianchini. 2007. Limiting the power consumption of main memory. In ISCA ’07. ACM Press, San Diego, California, USA, 290–301. https://doi.org/10.1145/1250662.1250699
[16]
Daniel Ellsworth, Tapasya Patki, Martin Schulz, Barry Rountree, and Allen Malony. 2017. Simulating Power Scheduling at Scale(E2SC’17). Association for Computing Machinery, New York, NY, USA, Article 2, 8 pages. https://doi.org/10.1145/3149412.3149414
[17]
Daniel A Ellsworth, Allen D Malony, Barry Rountree, and Martin Schulz. 2015. Dynamic power sharing for higher job throughput. In SC’15. IEEE, 1–11. https://doi.org/10.1145/2807591.2807643
[18]
Daniel A Ellsworth, Allen D Malony, Barry Rountree, and Martin Schulz. 2015. POW: System-wide Dynamic Reallocation of Limited Power in HPC. In HPDC. ACM, Portland Oregon USA, 145–148. https://doi.org/10.1145/2749246.2749277
[19]
Keiichiro Fukazawa, Masatsugu Ueda, Mutsumi Aoyagi, Tomonori Tsuhata, Kyohei Yoshida, Aruta Uehara, Masakazu Kuze, Yuichi Inadomi, and Koji Inoue. 2014. Power consumption evaluation of an mhd simulation with cpu power capping. In 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 612–617. https://doi.org/10.1109/CCGrid.2014.47
[20]
Neha Gholkar, Frank Mueller, and Barry Rountree. 2016. Power tuning HPC jobs on power-constrained systems. In 2016 PACT. IEEE, 179–190. https://doi.org/10.1145/2967938.2967961
[21]
Neha Gholkar, Frank Mueller, and Barry Rountree. 2019. Uncore Power Scavenger: A Runtime for Uncore Power Conservation on HPC Systems(SC ’19). Association for Computing Machinery, New York, NY, USA, Article 27, 23 pages. https://doi.org/10.1145/3295500.3356150
[22]
Neha Gholkar, Frank Mueller, Barry Rountree, and Aniruddha Marathe. 2018. PShifter: feedback-based dynamic power shifting within HPC jobs for performance. In HPDC. ACM, Tempe Arizona, 106–117. https://doi.org/10.1145/3208040.3208047
[23]
Henry Hoffmann, Jim Holt, George Kurian, Eric Lau, Martina Maggio, Jason E Miller, Sabrina M Neuman, Mahmut Sinangil, Yildiz Sinangil, Anant Agarwal, 2012. Self-aware computing in the Angstrom processor. In DAC ’12. ACM Press, 259–264. https://doi.org/10.1145/2228360.2228409
[24]
Henry Hoffmann and Martina Maggio. 2014. PCP: A Generalized Approach to Optimizing Performance Under Power Constraints through Resource Management. In ICAC ’14. 241–247.
[25]
Connor Imes and Henry Hoffmann. 2016. Bard: A unified framework for managing soft timing and power constraints. In AMOS. IEEE, 31–38. https://doi.org/10.1109/SAMOS.2016.7818328
[26]
Connor Imes, Steven Hofmeyr, and Henry Hoffmann. 2018. Energy-efficient Application Resource Scheduling using Machine Learning Classifiers. In Proceedings of the 47th International Conference on Parallel Processing. ACM, Eugene OR USA, 1–11. https://doi.org/10.1145/3225058.3225088
[27]
David E Keyes, Lois C McInnes, Carol Woodward, William Gropp, Eric Myra, Michael Pernice, John Bell, Jed Brown, Alain Clo, Jeffrey Connors, 2013. Multiphysics simulations: Challenges and opportunities. The International Journal of High Performance Computing Applications 27, 1(2013), 4–83. https://doi.org/10.1177/1094342012468181 arXiv:https://doi.org/10.1177/1094342012468181
[28]
Mohammed G Khatib and Zvonimir Bandic. 2016. PCAP: Performance-aware Power Capping for the Disk Drive in the Cloud. In FAST. USENIX Association, Santa Clara, CA, 227–240. https://www.usenix.org/conference/fast16/technical-sessions/presentation/khatib
[29]
Charles Lefurgy, Xiaorui Wang, and Malcolm Ware. 2008. Power capping: a prelude to power shifting. Cluster Computing 11, 2 (June 2008), 183–195. https://doi.org/10.1007/s10586-007-0045-4
[30]
Matthias Maiterth, Torsten Wilde, David Lowenthal, Barry Rountree, Martin Schulz, Jonathan Eastep, and Dieter Kranzlmüller. 2017. Power Aware High Performance Computing: Challenges and Opportunities for Application and System Developers — Survey Tutorial. In HPCS. 3–10. https://doi.org/10.1109/HPCS.2017.11
[31]
Tapasya Patki, Zachary Frye, Harsh Bhatia, Francesco Di Natale, James Glosli, Helgi Ingolfsson, and Barry Rountree. 2019. Comparing GPU Power and Frequency Capping: A Case Study with the MuMMI Workflow. In WORKS. 31–39. https://doi.org/10.1109/WORKS49585.2019.00009
[32]
Tapasya Patki, Zachary Frye, Harsh Bhatia, Francesco Di Natale, James Glosli, Helgi Ingolfsson, and Barry Rountree. 2019. Comparing GPU Power and Frequency Capping: A Case Study with the MuMMI Workflow. In WORKS. IEEE, 31–39.
[33]
Tapasya Patki, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R De Supinski. 2013. Exploring hardware overprovisioning in power-constrained, high performance computing. In ICS. ACM Press, 173–182. https://doi.org/10.1145/2464996.2465009
[34]
Tapasya Patki, David K Lowenthal, Anjana Sasidharan, Matthias Maiterth, Barry L Rountree, Martin Schulz, and Bronis R De Supinski. 2015. Practical Resource Management in Power-Constrained, High Performance Computing. In HPDC. ACM, Portland Oregon USA, 121–132. https://doi.org/10.1145/2749246.2749262
[35]
Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, and Xiaoyun Zhu. 2008. No ”power” struggles: coordinated multi-level power management for the data center. ACM SIGARCH Computer Architecture News 36, 1 (March 2008), 48–59. https://doi.org/10.1145/1353534.1346289
[36]
Haris Ribic and Yu David Liu. 2016. AEQUITAS: Coordinated Energy Management Across Parallel Applications. In ICS. ACM, Istanbul Turkey, 1–12. https://doi.org/10.1145/2925426.2926260
[37]
Barry Rountree, Dong H Ahn, Bronis R De Supinski, David K Lowenthal, and Martin Schulz. 2012. Beyond DVFS: A first look at performance under a hardware-enforced power bound. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IEEE, 947–953. https://doi.org/10.1109/IPDPSW.2012.116
[38]
Ryuichi Sakamoto, Tapasya Patki, Thang Cao, Masaaki Kondo, Koji Inoue, Masatsugu Ueda, Daniel Ellsworth, Barry Rountree, and Martin Schulz. 2018. Analyzing Resource Trade-offs in Hardware Overprovisioned Supercomputers. In 2018 IPDPS. 526–535. https://doi.org/10.1109/IPDPS.2018.00062
[39]
Ahmed Salem, Theodoros Salonidis, Nirmit Desai, and Tamer Nadeem. 2017. Kinaara: Distributed discovery and allocation of mobile edge resources. In MASS. IEEE, 153–161. https://doi.org/10.1109/MASS.2017.10
[40]
Osman Sarood, Akhil Langer, Abhishek Gupta, and Laxmikant Kale. 2014. Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget. In SC ’14. IEEE, 807–818. https://doi.org/10.1109/SC.2014.71
[41]
Osman Sarood, Akhil Langer, Laxmikant Kalé, Barry Rountree, and Bronis De Supinski. 2013. Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems. In CLUSTER. IEEE, 1–8. https://doi.org/10.1109/CLUSTER.2013.6702684
[42]
Lee Savoie, David K. Lowenthal, Bronis R. De Supinski, Tanzima Islam, Kathryn Mohror, Barry Rountree, and Martin Schulz. 2016. I/O Aware Power Shifting. In IPDPS. IEEE, Chicago, IL, 740–749. https://doi.org/10.1109/IPDPS.2016.15
[43]
SLURM. [n.d.]. The SLURM Workload Manager. https://slurm.schedmd.com.
[44]
Giacomo Tanganelli, Carlo Vallati, and Enzo Mingozzi. 2017. Edge-Centric Distributed Discovery and Access in the Internet of Things. IEEE Internet of Things Journal 5, 1 (2017), 425–438. https://doi.org/10.1109/JIOT.2017.2767381
[45]
ExaOSR Team. [n.d.]. Key Challenges for Exascale OS/R. https://collab.cels.anl.gov/display/exaosr/Challenges.
[46]
Andy B Yoo, Morris A Jette, and Mark Grondona. 2003. Slurm: Simple linux utility for resource management. In Job Scheduling Strategies for Parallel Processing, Dror Feitelson, Larry Rudolph, and Uwe Schwiegelshohn (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 44–60.
[47]
Javad Zarrin, Rui L Aguiar, and João Paulo Barraca. 2018. Resource discovery for distributed computing systems: A comprehensive survey. J. Parallel and Distrib. Comput. 113 (2018), 127–166. https://doi.org/10.1016/j.jpdc.2017.11.010
[48]
Huazhe Zhang. [n.d.]. A quantitative evaluation of the RAPL power control system. ([n. d.]).
[49]
Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. ACM SIGPLAN Notices 51, 4 (June 2016), 545–559. https://doi.org/10.1145/2954679.2872375
[50]
Huazhe Zhang and Henry Hoffmann. 2018. Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power Caps. In ICPP. ACM, Eugene OR USA, 1–11. https://doi.org/10.1145/3225058.3225098
[51]
Huazhe Zhang and Henry Hoffmann. 2019. PoDD: power-capping dependent distributed applications. In SC. ACM, Denver Colorado, 1–23. https://doi.org/10.1145/3295500.3356174

Cited By

View all
  • (2023)ExaPRR: A Framework for Support Dynamic and Interactive Events on Distributed Published Resource Repositories Mechanism in Distributed Exascale Computing SystemsInternational Journal of Networked and Distributed Computing10.1007/s44227-023-00015-812:1(53-81)Online publication date: 21-Dec-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
August 2022
976 pages
ISBN:9781450397339
DOI:10.1145/3545008
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Adaptive Systems
  2. Power Management

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP '22
ICPP '22: 51st International Conference on Parallel Processing
August 29 - September 1, 2022
Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)2
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)ExaPRR: A Framework for Support Dynamic and Interactive Events on Distributed Published Resource Repositories Mechanism in Distributed Exascale Computing SystemsInternational Journal of Networked and Distributed Computing10.1007/s44227-023-00015-812:1(53-81)Online publication date: 21-Dec-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media