research-article

Energy and performance improvements in stencil computations on multi-node HPC systems with different network and communication topologies

Authors:

Miłosz Ciżnicki,

Krzysztof Kurowski,

Jan WȩglarzAuthors Info & Claims

Volume 115, Issue C

Pages 45 - 58

https://doi.org/10.1016/j.future.2020.08.018

Published: 09 July 2024 Publication History

Abstract

Energy and performance improvements in stencil computations are relevant for both application developers and data center administrators. They appear as the fundamental scheme in many large-scale scientific simulations and workloads. Many research efforts have focused on some estimation techniques of the energy usage of HPC systems based on specific characteristics of parallel applications. In case of stencils, we have previously concentrated on detailed estimations of energy consumption and the energy-aware distribution of stencil computations on heterogeneous processors. However, we have restricted our comprehensive studies to a single heterogeneous computing node only. In this paper, we show how scheduling and optimization techniques can be applied for energy and performance improvements of stencil computations on multi-node HPC systems using different network topologies. We formulate a scheduling model together with a new Tabu Search algorithm, called Task Movement (TM), taking into account the communication hierarchies, to minimize the overall energy usage and the execution time of stencil computations. Experimental studies show that this algorithm solves the considered problem more efficiently comparing to other, simpler heuristics. We present computational experiments for a reference 7 point stencil computation pattern on three commonly used low-diameter network topologies: Fat-tree, Dragonfly, and Torus. According to our studies, the most promising multi-node HPC architecture for stencil computations is based on the Torus network concept. Finally, we argue that the proposed scheduling model and TM algorithm can be easily adopted within existing high-level parallel execution environments for stencils automatic performance tuning.

Highlights

•

Discussion of the multi-node communication topologies for the stencil computations.

•

Modeling energy usage for a stencil pattern on supercomputer architectures.

•

Formulation of a topology-aware scheduling model for heterogeneous processors.

•

Presentation of a Tabu Search algorithm for solving the problem.

References

[1]

Kasahara H., Narita S., Practical multiprocessor scheduling algorithms for efficient parallel processing, IEEE Trans. Comput. C-33 (1984) 1023–1029.

Abstract

Highlights

References

Recommendations

Energy aware scheduling model and online heuristics for stencil codes on heterogeneous computing architectures

Energy-efficient stencil computations on distributed GPUs using dynamic parallelism and GPU-controlled communication

Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations