research-article

Managing Heterogeneous Resources in HPC Systems

Authors:

Giovanni Agosta,

William Fornaciari,

Giuseppe Massari,

Federico Reghenzani,

Michele ZanellaAuthors Info & Claims

PARMA-DITAM '18: Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms

Pages 7 - 12

https://doi.org/10.1145/3183767.3183769

Published: 23 January 2018 Publication History

Abstract

To sustain performance while facing always tighter power and energy envelopes, High Performance Computing (HPC) is increasingly leveraging heterogeneous architectures. This poses new challenges: to efficiently exploit the available resources, both in terms of hardware and energy, resource management must support a wide range of different heterogeneous devices and programming models that target different application domains. We present a strategy for resource management and programming model support for heterogeneous accelerators for HPC systems with requirements targeting performance, power and predictability. We show how resource management can, in addition to allowing multiple applications to share a set of resources, reduce the burden on the application developer and improve the efficiency of resource allocation.

References

[1]

ARB 2008. OpenMP Application Program Interface, version 3.0. ARB. http://www.openmp.org

[2]

Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurr. Comput.: Pract. Exper. 23, 2 (Feb. 2011), 187--198.

Digital Library

[3]

Patrick Bellasi, Giuseppe Massari, and William Fornaciari. 2015. Effective Runtime Resource Management Using Linux Control Groups with the BarbequeRTRM Framework. ACM Trans. Embed. Comput. Syst. 14, 2, Article 39 (March 2015), 17 pages.

Digital Library

[4]

José Flich, Giovanni Agosta, Philipp Ampletzer, David Atienza Alonso, Carlo Brandolese, Etienne Cappe, Alessandro Cilardo, Leon Dragić, Alexandre Dray, Alen Duspara, et al. 2017. MANGO: Exploring Manycore Architectures for Next-GeneratiOn HPC Systems. In 2017 Euromicro Conference on Digital System Design (DSD). 478--485.

[5]

Jose Flich, Giovanni Agosta, Philipp Ampletzer, David Atienza Alonso, Alessandro Cilardo, William Fornaciari, Mario Kovac, Fabrice Roudet, and Davide Zoni. 2015. The MANGO FET-HPC Project: An Overview. In Computational Science and Engineering (CSE), 2015 IEEE 18th International Conference on. IEEE, 351--354.

Digital Library

[6]

Morris Jette and Mark Grondona. 2003. SLURM: Simple Linux Utility for Resource Management. In ClusterWorld Conference and Expo.

[7]

Khronos OpenCL Working Group. 2014. The OpenCL Specification, Version 1.2. https://www.khronos.org/registry/cl/specs/opencl-1.2.pdf. Aaftab Munshi eds.

[8]

Khronos OpenCL Working Group -- SYCL subgroup. 2014. SYCL™ Specification, Version 1.2. https://www.khronos.org/registry/sycl/specs/sycl-1.2.pdf. Lee Howes and Maria Rovatsou eds.

[9]

Bastian Koller, Nico Struckmann, Jochen Buchholz, and Michael Gienger. 2015. Towards an Environment to Deliver High Performance Computing to Small and Medium Enterprises. Springer International Publishing, Cham, 41--50.

[10]

G. Massari, E. Paone, P. Bellasi, G. Palermo, V. Zaccaria, W. Fornaciari, and C. Silvano. 2014. Combining application adaptivity and system-wide Resource Management on multi-core platforms. In 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV). 26--33.

[11]

Microsoft Corporation. 2013. C++ AMP: C++ Accelerated Massive Parallelism, Version 1.2. http://download.microsoft.com/download/4/0/E/40EA02D8-23A7-4BD2-AD3A-0BFFFB640F28/CppAMPLanguageAndProgrammingModel.pdf.

[12]

John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable Parallel Programming with CUDA. ACM Queue 6, 2 (2008), 40--53.

Digital Library

[13]

nVidia Corp. 2008. CUDA Technology. http://www.nvidia.com/CUDA. (September 2008).

[14]

A. Pupykina and G. Agosta. 2017. Optimizing Memory Management in Deeply Heterogeneous HPC Accelerators. In 2017 46th International Conference on Parallel Processing Workshops (ICPPW). 291--300.

[15]

Ehsan Totoni, Babak Behzad, Swapnil Ghike, and Josep Torrellas. 2012. Comparing the Power and Performance of Intel's SCC to State-of-the-art CPUs and GPUs. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS '12). IEEE Computer Society, Washington, DC, USA, 78--87.

Digital Library

[16]

Sandra Wienke, Paul Springer, Christian Terboven, and Dieter an Mey. 2012. OpenACC: First Experiences with Real-world Applications. In Proceedings of the 18th International Conference on Parallel Processing (Euro-Par' 12). Springer-Verlag, Berlin, Heidelberg, 859--870.

Digital Library

Cited By

Keßler RVolpert SWesner S(2024)Towards Improving Resource Allocation for Multi-Tenant HPC Systems: An Exploratory HPC Cluster Utilization Case Study2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops)10.1109/CLUSTERWorkshops61563.2024.00019(66-75)Online publication date: 24-Sep-2024
https://doi.org/10.1109/CLUSTERWorkshops61563.2024.00019
Pfandzelter TDhakal AFrachtenberg EChalamalasetti SEmmot DHogade NEnriquez RRattihalli GBermbach DMilojicic D(2023)Kernel-as-a-ServiceProceedings of the 24th International Middleware Conference10.1145/3590140.3629115(192-206)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3590140.3629115
Michelogiannakis GArafa YCook BDai LHameed Badawy AGlick MWang YBergman KShalf J(2023)Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00021(158-172)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00021
Show More Cited By

Index Terms

Managing Heterogeneous Resources in HPC Systems

Recommendations

RADICAL-Pilot and PMIx/PRRTE: Executing Heterogeneous Workloads at Large Scale on Partitioned HPC Resources
Job Scheduling Strategies for Parallel Processing
Abstract
Execution of heterogeneous workflows on high-performance computing (HPC) platforms present unprecedented resource management and execution coordination challenges for runtime systems. Task heterogeneity increases the complexity of resource and ...
Simplifying programming and load balancing of data parallel applications on heterogeneous systems
GPGPU '16: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit

Heterogeneous architectures have experienced a great development thanks to their excellent cost/performance ratio and low power consumption. But heterogeneity significantly complicates both programming and efficient use of the resources. As a result, ...
Hydra: Brokering Cloud and HPC Resources to Support the Execution of Heterogeneous Workloads at Scale
FlexScience'24: Proceedings of the 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures

Scientific discovery increasingly depends on middleware that enables the execution of heterogeneous workflows on heterogeneous platforms. One of the main challenges is to design software components that integrate within the existing ecosystem to enable ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

PARMA-DITAM '18: Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms

January 2018

76 pages

ISBN:9781450364447

DOI:10.1145/3183767

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

HiPEAC: HiPEAC Network of Excellence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 January 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

PARMA-DITAM '18

PARMA-DITAM '18: 9th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and 7th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms

January 23, 2018

Manchester, United Kingdom

Acceptance Rates

Overall Acceptance Rate 11 of 24 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
301
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)3

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Keßler RVolpert SWesner S(2024)Towards Improving Resource Allocation for Multi-Tenant HPC Systems: An Exploratory HPC Cluster Utilization Case Study2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops)10.1109/CLUSTERWorkshops61563.2024.00019(66-75)Online publication date: 24-Sep-2024
https://doi.org/10.1109/CLUSTERWorkshops61563.2024.00019
Pfandzelter TDhakal AFrachtenberg EChalamalasetti SEmmot DHogade NEnriquez RRattihalli GBermbach DMilojicic D(2023)Kernel-as-a-ServiceProceedings of the 24th International Middleware Conference10.1145/3590140.3629115(192-206)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3590140.3629115
Michelogiannakis GArafa YCook BDai LHameed Badawy AGlick MWang YBergman KShalf J(2023)Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00021(158-172)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00021
Brilli GCavicchioli RSolieri MValente PMarongiu A(2022)Evaluating Controlled Memory Request Injection for Efficient Bandwidth Utilization and Predictable Execution in Heterogeneous SoCsACM Transactions on Embedded Computing Systems10.1145/354877322:1(1-25)Online publication date: 13-Dec-2022
https://dl.acm.org/doi/10.1145/3548773
Young MHu ALemieux G(2022)Cache Abstraction for Data Race Detection in Heterogeneous Systems with Non-coherent AcceleratorsACM Transactions on Embedded Computing Systems10.1145/353545722:1(1-25)Online publication date: 13-Dec-2022
https://dl.acm.org/doi/10.1145/3535457
Monniaux DSix C(2022)Formally Verified Loop-Invariant Code Motion and Assorted OptimizationsACM Transactions on Embedded Computing Systems10.1145/352950722:1(1-27)Online publication date: 13-Dec-2022
https://dl.acm.org/doi/10.1145/3529507
Oh DMoon YHam DHam TPark YLee JAhn JLee E(2022)MaPHeA: A Framework for Lightweight Memory Hierarchy-aware Profile-guided Heap AllocationACM Transactions on Embedded Computing Systems10.1145/352785322:1(1-28)Online publication date: 13-Dec-2022
https://dl.acm.org/doi/10.1145/3527853
Michelogiannakis GKlenk BCook BTeh MGlick MDennison LBergman KShalf J(2022)A Case For Intra-rack Resource Disaggregation in HPCACM Transactions on Architecture and Code Optimization10.1145/351424519:2(1-26)Online publication date: 7-Mar-2022
https://dl.acm.org/doi/10.1145/3514245
Agosta GAldinucci MAlvarez CAmmendola RArfat YBeaumont OBernaschi MBiagioni ABoccali TBramas BBrandolese CCantalupo BCarrozzo MCattaneo DCelestini ACelino MColonnelli ICretaro PD’Ambra PDanelutto MEsposito REyraud-Dubois LFilgueras AFornaciari WFrezza OGalimberti AGiacomini FGoglin BGregori DGuermouche AIannone FKulczewski MLo Cicero FLonardo AMartinelli AMartinelli MMartorell XMassari GMontangero SMittone GNamyst ROleksiak APalazzari PPaolucci PReghenzani FRossi CSaponara SSimula FTerraneo FThibault STorquati MTurisini MVicini PVidal MZoni DZummo G(2022)Towards EXtreme scale technologies and accelerators for euROhpc hw/Sw supercomputing applications for exascale: The TEXTAROSSA approachMicroprocessors and Microsystems10.1016/j.micpro.2022.10467995(104679)Online publication date: Nov-2022
https://doi.org/10.1016/j.micpro.2022.104679
Zanella M(2022)Post-cloud Computing: Addressing Resource Management in the Resource ContinuumSpecial Topics in Information Technology10.1007/978-3-031-15374-7_9(105-115)Online publication date: 11-Nov-2022
https://doi.org/10.1007/978-3-031-15374-7_9
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents