Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503470.3503476acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

Performance tuning of the Helmholtz matrix-vector product kernel in the computational fluid dynamics solver Nek5000/RS for the A64FX processor

Published: 13 January 2022 Publication History

Abstract

Nek5000/RS is an open source computational fluid dynamics solver based on the spectral element method. One of the important kernel of the Nek5000/RS is called “axhelm”, which computes the Helmholtz matrix-vector product. In this paper, we have evaluated the axhelm kernel on the A64FX processor for the simplest case of polynomial degree N = 7. We have optimized the kernel for the A64FX processor by using well known optimization techniques such as SIMDization, software pipelining, continuous access enhancing, and software prefetch. We also provide the performance analysis data to investigate the effects of the optimization techniques to help understanding the A64FX processor and the Fujitsu compiler.

References

[1]
Paul Fischer, Stefan Kerkemeier, Misun Min, Yu-Hsiang Lan, Malachi Phillips, Thilina Rathnayake, Elia Merzari, Ananias Tomboulides, Ali Karakus, Noel Chalmers, and Tim Warburton. 2021. NekRS, a GPU-Accelerated Spectral Element Navier-Stokes Solver. arXiv:arXiv:2104.05829
[2]
Paul Fischer, Misun Min, Thilina Rathnayake, Som Dutta, Tzanio Kolev, Veselin Dobrev, Jean-Sylvain Camier, Martin Kronbichler, Tim Warburton, Kasia Swirydowicz, and Jed Brown. 2020. Scalability of high-performance PDE solvers. Int. J. HPC App. 34, 5 (2020), 562–586.
[3]
Fujitsu Limited. 2021. A64FX Microarchtecture Manual. https://github.com/ fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.6.pdf.
[4]
Arm Limited. 2021. Arm A64 Instruction Set Architecture (version 2021-06). https://developer.arm.com/documentation/ddi0596/2021-06.
[5]
David S Medina, Amik St-Cyr, and Timothy Warburton. 2014. OCCA: A unified approach to multi-threading languages. arXiv:arXiv:1403.0968
[6]
NekBench. [n. d.]. A benchmark suite of Nek5000/RS. https://github.com/ Nek5000/nekBench.
[7]
NekBench. [n. d.]. readme.md. https://github.com/Nek5000/nekBench/blob/master /axhelm/readme.md.
[8]
Ryohei Okazaki, Takekazu Tabata, Sota Sakashita, Kenichi Kitamura, Noriko Takagi, Hideki Sakata, Takeshi Ishibashi, Takeo Nakamura, and Yuichiro Ajima. 2020. Supercomputer Fugaku CPU A64FX Realizing High Performance, High-Density Packaging, and Low Power Consumption. Technical Report Fujitsu technical review.
[9]
Mitsuhisa Sato, Yutaka Ishikawa, Hirofumi Tomita, Yuetsu Kodama, Tetsuya Odajima, Miwako Tsuji, Hisashi Yashiro, Masaki Aoki, Naoyuki Shida, Ikuo Miyoshi, Kouichi Hirai, Atsushi Furuya, Akira Asato, Kuniki Morita, and Toshiyuki Shimizu. 2020. Co-Design for A64FX Manycore Processor and Fugaku. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20). Article 47, 15 pages.
[10]
LLC UCHICAGO ARGONNE. [n. d.]. Nek5000. http://nek5000.github.io/.

Cited By

View all

Index Terms

  1. Performance tuning of the Helmholtz matrix-vector product kernel in the computational fluid dynamics solver Nek5000/RS for the A64FX processor
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      HPCAsia '22 Workshops: International Conference on High Performance Computing in Asia-Pacific Region Workshops
      January 2022
      83 pages
      ISBN:9781450395649
      DOI:10.1145/3503470
      © 2022 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 January 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Arm-based processor
      2. Performance optimization
      3. computational fluid dynamics

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • U.S. Department of Energy, Office of Science
      • Exascale Computing Project

      Conference

      HPCAsia 2022 Workshop

      Acceptance Rates

      Overall Acceptance Rate 69 of 143 submissions, 48%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 197
        Total Downloads
      • Downloads (Last 12 months)20
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media