Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-23220-6_14guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

On the Convergence of Malleability and the HPC PowerStack: Exploiting Dynamism in Over-Provisioned and Power-Constrained HPC Systems

Published: 04 January 2023 Publication History

Abstract

Recent High-Performance Computing (HPC) systems are facing important challenges, such as massive power consumption, while at the same time significantly under-utilized system resources. Given the power consumption trends, future systems will be deployed in an over-provisioned manner where more resources are installed than they can afford to power simultaneously. In such a scenario, maximizing resource utilization and energy efficiency, while keeping a given power constraint, is pivotal. Driven by this observation, in this position paper we first highlight the recent trends of resource management techniques, with a particular focus on malleability support (i.e., dynamically scaling resource allocations/requirements for a job), co-scheduling (i.e., co-locating multiple jobs within a node), and power management. Second, we consider putting them together, assess their relationships/synergies, and discuss the functionality requirements in each software component for future over-provisioned and power-constrained HPC systems. Third, we briefly introduce our ongoing efforts on the integration of software tools, which will ultimately lead to the convergence of malleability and power management, as it is designed in the HPC PowerStack initiative.

References

[1]
Deep-sea: Programming environment for european exascale systems. https://www.deep-projects.eu/, Accessed 25 Apr 2022
[2]
The hpc powerstack. https://hpcpowerstack.github.io/index.html, LNCS Accessed 25 Apr 2022
[3]
Regale: Open architecture for exascale supercomputers. https://regale-project.eu/, Accessed 25 Apr 2022
[5]
Ahn DH et al. Flux: overcoming scheduling challenges for exascale workflows Future Gener. Comput. Syst. 2020 110 202-213
[6]
Aupy, G., et al.: Co-scheduling HPC workloads on cache-partitioned CMP platforms. In: CLUSTER, pp. 348–358 (2018)
[7]
Bartolini, A., et al.: A pulp-based parallel power controller for future exascale systems. In: ICECS, pp. 771–774 (2019)
[8]
Bhadauria, M., et al.: An approach to resource-aware co-scheduling for CMPs. In: ICS, pp. 189–199 (2010)
[9]
Borghesi, A., et al.: Examon-x: a predictive maintenance framework for automatic monitoring in industrial iot systems. IEEE Internet Things J. (2021)
[10]
Breitbart, J., et al.: Case study on co-scheduling for HPC applications. In: ICPPW, pp. 277–285 (2015)
[11]
Breitbart, J., et al.: Dynamic co-scheduling driven by main memory bandwidth utilization. In: CLUSTER, pp. 400–409 (2017)
[12]
Breslow, A.D., et al.: Enabling fair pricing on hpc systems with node sharing. In: SC (2013)
[13]
Capit, N., et al.: A batch scheduler with high level components. In: CCGrid, vol. 2, pp. 776–783 (2005)
[14]
Castain RH et al. Pmix: process management for exascale environments Parallel Comput. 2018 79 9-29
[15]
Cesarini D et al. Countdown slack: a run-time library to reduce energy footprint in large-scale mpi applications IEEE TPDS 2020 31 11 2696-2709
[16]
Cochran, R., et al.: Pack & cap: adaptive dvfs and thread packing under power caps. In: MICRO, pp. 175–185 (2011)
[17]
Comprés, I., et al.: Infrastructure and api extensions for elastic execution of mpi applications, pp. 82–97. EuroMPI (2016)
[18]
Corbalan, J., et al.: EAR: energy management framework for supercomputers. In: Barcelona Supercomputing Center (BSC) Working paper (2019)
[19]
D’Amico, M., et al.: Holistic slowdown driven scheduling and resource management for malleable jobs. In: ICPP (2019)
[20]
Esmaeilzadeh, H., et al.: Dark silicon and the end of multicore scaling. In: ISCA, pp. 365–376 (2011)
[21]
Feitelson, D.G., et al.: Toward convergence in job schedulers for parallel supercomputers. In: JSSPP, pp. 1–26 (1996)
[22]
Hennessy, J., Patterson, D.: A new golden age for computer architecture: domain-specific hardware/software co-design, enhanced. In: ISCA (2018)
[23]
Kale, L.V., et al.: A malleable-job system for timeshared parallel machines. In: CCGRID, pp. 230–230 (2002)
[24]
Mo-Hellenbrand, A., et al.: A large-scale malleable tsunami simulation realized on an elastic mpi infrastructure. In: CF, pp. 271–274 (2017)
[25]
Netti, A., et al.: From facility to application sensor data: modular, continuous and holistic monitoring with dcdb. In: SC, pp. 1–27 (2019)
[26]
Patki, T., et al.: Exploring hardware overprovisioning in power-constrained, high performance computing. In: ICS, pp. 173–182 (2013)
[27]
Patki, T., et al.: Practical resource management in power-constrained, high performance computing. In: HPDC, pp. 121–132 (2015)
[28]
Sakamoto, R., et al.: Analyzing resource trade-offs in hardware overprovisioned supercomputers. In: IPDPS, pp. 526–535 (2018)
[29]
Sarood, O., et al.: Maximizing throughput of overprovisioned HPC data centers under a strict power budget. In: SC, pp. 807–818 (2014)
[30]
Schreiber, M., et al.: Invasive compute balancing for applications with hybrid parallelization. In: SBAC-PAD, pp. 136–143 (2013)
[31]
Scogland, T.R., et al.: A power-measurement methodology for large-scale, high-performance computing. In: ICPE, pp. 149–159 (2014)
[32]
Shalf J The future of computing beyond moore’s law Phil. Trans. Roy. Soc. A 2020 378 2166 20190061
[33]
Utrera, G., et al.: A job scheduling approach for multi-core clusters based on virtual malleability. In: Euro-Par, pp. 191–203 (2012)
[34]
Vigouroux, X., et al.: Towards energy consumption application profiling with bull energy software. https://prace-ri.eu/wp-content/uploads/PRACE-at-SC17-Ludovic-Sauge.pdf, Accessed 14 Mar 2022
[35]
Yoo, A.B., et al.: Slurm: simple linux utility for resource management. In: JSSPP, pp. 44–60 (2003)
[36]
Zhu, Q., et al.: Co-run scheduling with power cap on integrated CPU-GPU systems. In: IPDPS, pp. 967–977 (2017)

Cited By

View all
  • (2023)Sustainability in HPC: Vision and OpportunitiesProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624271(1876-1880)Online publication date: 12-Nov-2023
  • (2023)An End-to-End HPC Framework for Dynamic Power ObjectivesProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624262(1801-1811)Online publication date: 12-Nov-2023
  • (2023)Towards Smarter Schedulers: Molding Jobs into the Right Shape via Monitoring and ModelingHigh Performance Computing10.1007/978-3-031-40843-4_6(68-81)Online publication date: 21-May-2023
  • Show More Cited By

Index Terms

  1. On the Convergence of Malleability and the HPC PowerStack: Exploiting Dynamism in Over-Provisioned and Power-Constrained HPC Systems
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Guide Proceedings
            High Performance Computing. ISC High Performance 2022 International Workshops: Hamburg, Germany, May 29 – June 2, 2022, Revised Selected Papers
            May 2022
            398 pages
            ISBN:978-3-031-23219-0
            DOI:10.1007/978-3-031-23220-6

            Publisher

            Springer-Verlag

            Berlin, Heidelberg

            Publication History

            Published: 04 January 2023

            Author Tags

            1. Malleability
            2. Dynamic resource management
            3. Power management
            4. Over-provisioning
            5. Co-scheduling
            6. Heterogeneity

            Qualifiers

            • Article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 24 Jan 2025

            Other Metrics

            Citations

            Cited By

            View all
            • (2023)Sustainability in HPC: Vision and OpportunitiesProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624271(1876-1880)Online publication date: 12-Nov-2023
            • (2023)An End-to-End HPC Framework for Dynamic Power ObjectivesProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624262(1801-1811)Online publication date: 12-Nov-2023
            • (2023)Towards Smarter Schedulers: Molding Jobs into the Right Shape via Monitoring and ModelingHigh Performance Computing10.1007/978-3-031-40843-4_6(68-81)Online publication date: 21-May-2023
            • (2023)Towards Achieving Transparent Malleability Thanks to MPI Process VirtualizationHigh Performance Computing10.1007/978-3-031-40843-4_3(28-41)Online publication date: 21-May-2023

            View Options

            View options

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media