Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581576.3581621acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article
Public Access

Application Experiences on a GPU-Accelerated Arm-based HPC Testbed

Published: 27 February 2023 Publication History

Abstract

This paper assesses and reports the experience of ten teams working to port, validate, and benchmark several High Performance Computing applications on a novel GPU-accelerated Arm testbed system. The testbed consists of eight NVIDIA Arm HPC Developer Kit systems, each one equipped with a server-class Arm CPU from Ampere Computing and two data center GPUs from NVIDIA Corp. The systems are connected together using InfiniBand interconnect. The selected applications and mini-apps are written using several programming languages and use multiple accelerator-based programming models for GPUs such as CUDA, OpenACC, and OpenMP offloading. Working on application porting requires a robust and easy-to-access programming environment, including a variety of compilers and optimized scientific libraries. The goal of this work is to evaluate platform readiness and assess the effort required from developers to deploy well-established scientific workloads on current and future generation Arm-based GPU-accelerated HPC systems. The reported case studies demonstrate that the current level of maturity and diversity of software and tools is already adequate for large-scale production deployments.

References

[1]
Bilge Acun, David J. Hardy, Laxmikant Kale, Ke Li, James C. Phillips, and John E. Stone. 2018. Scalable Molecular Dynamics with NAMD on the Summit System. IBM Journal of Research and Development 62, 6 (2018), 4:1–4:9. https://doi.org/10.1147/JRD.2018.2888986
[2]
Holger Brunst, Sunita Chandrasekaran, Florina Ciorba, Nick Hagerty, Robert Henschel, Guido Juckeland, Junjie Li, Veronica G. Melesse Vergara, Sandra Wienke, and Miguel Zavala. 2022. First Experiences in Performance Benchmarking with the New SPEChpc 2021 Suites. https://doi.org/10.48550/ARXIV.2203.06751
[3]
S. H. Bryngelson, K. Schmidmayer, and T. Colonius. 2019. A quantitative comparison of phase-averaged models for bubbly, cavitating flows. International Journal of Multiphase Flow 115 (2019), 137–143. https://doi.org/10.1016/j.ijmultiphaseflow.2019.03.028
[4]
Spencer H Bryngelson, Kevin Schmidmayer, Vedran Coralic, Jomela C Meng, Kazuki Maeda, and Tim Colonius. 2021. MFC: An open-source high-order multi-component, multi-phase, and multi-scale compressible flow solver. Computer Physics Communications 266 (2021), 107396.
[5]
M. Bussmann, H. Burau, T. E. Cowan, A. Debus, A. Huebl, G. Juckeland, T. Kluge, W. E. Nagel, R. Pausch, F. Schmitt, U. Schramm, J. Schuchart, and R. Widera. 2013. Radiative Signatures of the Relativistic Kelvin–Helmholtz Instability. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC ’13). ACM, New York, NY, USA, Article 5, 12 pages. https://doi.org/10.1145/2503210.2504564
[6]
Aurélien Cavelan, Rubén M. Cabezón, Michal Grabarczyk, and Florina M. Ciorba. 2020. A Smoothed Particle Hydrodynamics Mini-App for Exascale. In Proceedings of the Platform for Advanced Scientific Computing Conference (Geneva, Switzerland) (PASC ’20). Association for Computing Machinery, New York, NY, USA, Article 11, 11 pages. https://doi.org/10.1145/3394277.3401855
[7]
A. Charalampopoulos, S. H. Bryngelson, T. Colonius, and T. P. Sapsis. 2022. Hybrid quadrature moment method for accurate and stable representation of non-Gaussian processes applied to bubble dynamics. Philosophical Transactions of the Royal Society A (2022).
[8]
M. A. Clark and A. D. Kennedy. 2007. Accelerating staggered-Fermion dynamics with the rational hybrid Monte Carlo algorithm. Physical Review D 75, 1 (2007). https://doi.org/10.1103/physrevd.75.011502
[9]
Tom Deakin, Simon McIntosh-Smith, James Price, Andrei Poenaru, Patrick Atkinson, Codrin Popa, and Justin Salmon. 2019. Performance Portability across Diverse Computer Architectures. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 1–13. https://doi.org/10.1109/P3HPC49587.2019.00006
[10]
Tom Deakin, Andrei Poenaru, Tom Lin, and Simon McIntosh-Smith. 2020. Tracking Performance Portability on the Yellow Brick Road to Exascale. In 2020 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). 1–13. https://doi.org/10.1109/P3HPC51967.2020.00006
[11]
Wael Elwasif, William Godoy, Nick Hagerty, J. Austin Harris, Oscar Hernandez, Balint Joo, Paul Kent, Damien Lebrun-Grandie, Elijah Maccarthy, Veronica G. Melesse Vergara, Bronson Messer, Ross Miller, Sarp Opal, Sergei Bastrakov, Michael Bussmann, Alexander Debus, Klaus Steinger, Jan Stephan, Rene Widera, Spencer H. Bryngelson, Henry Le Berre, Anand Radhakrishnan, Jefferey Young, Sunita Chandrasekaran, Florina Ciorba, Osman Simsek, Kate Clark Filippo Spiga, Jeff Hammond, John E. Stone. David Hardy, Sebastian Keller, and Jean-Guillaume Piccinali. Christian Trott. 2022. Application Experiences on a GPU-Accelerated Arm-based HPC Testbed. https://doi.org/10.48550/ARXIV.2209.09731
[12]
Catherine Feldman, Benjamin Michalowicz, Eva Siegmann, Tony Curtis, Alan Calder, and Robert Harrison. 2022. Experiences with Porting the FLASH Code to Ookami, an HPE Apollo 80 A64FX Platform. HPCAsia 2022 (to appear)(2022).
[13]
E. Follana, Q. Mason, C. Davies, K. Hornbostel, G. P. Lepage, J. Shigemitsu, H. Trottier, and K. Wong. 2007. Highly improved staggered quarks on the lattice with applications to charm physics. Physical Review D 75, 5 (mar 2007). https://doi.org/10.1103/physrevd.75.054502
[14]
Nicholas Frontiere, J. D. Emberson, Michael Buehlmann, Joseph Adamo, Salman Habib, Katrin Heitmann, and Claude-André Faucher-Giguère. 2022. Simulating Hydrodynamics in Cosmology with CRK-HACC. https://doi.org/10.48550/ARXIV.2202.02840
[15]
Todd Gamblin, Matthew P. LeGendre, Michael R. Collette, Gregory L. Lee, Adam Moody, Bronis R. de Supinski, and W. Scott Futral. 2015. The Spack Package Manager: Bringing order to HPC software chaos. In Supercomputing 2015 (SC’15). Austin, Texas.
[16]
J. Austin Harris, Ran Chu, Sean M Couch, Anshu Dubey, Eirik Endeve, Antigoni Georgiadou, Rajeev Jain, Daniel Kasen, M P Laiu, OE B Messer, Jared O’Neal, Michael A Sandoval, and Klaus Weide. 2022. Exascale models of stellar explosions: Quintessential multi-physics simulation. The International Journal of High Performance Computing Applications 36, 1(2022), 59–77. https://doi.org/10.1177/10943420211027937 arXiv:https://doi.org/10.1177/10943420211027937
[17]
William Humphrey, Andrew Dalke, and Klaus Schulten. 1996. VMD – Visual Molecular Dynamics. Journal of Molecular Graphics 14, 1 (1996), 33–38. https://doi.org/10.1016/0263-7855(96)00018-5
[18]
Laxmikant V. Kalé and Gengbin Zheng. 2013. Chapter 1: The Charm++ Programming Model. In Parallel Science and Engineering Applications: The Charm++ Approach (1st ed.), Laxmikant V. Kale and Abhinav Bhatele (Eds.). CRC Press, Inc., Boca Raton, FL, USA, Chapter 1, 1–16. https://doi.org/10.1201/b16251
[19]
Jeffrey Kelling, Sergei Bastrakov, Alexander Debus, Thomas Kluge, Matt Leinhauser, Richard Pausch, Klaus Steiniger, Jan Stephan, René Widera, Jeff Young, 2021. Challenges Porting a C++ Template-Metaprogramming Abstraction Layer to Directive-based Offloading. arXiv preprint arXiv:2110.08650(2021).
[20]
P. R. C. Kent, Abdulgani Annaberdiyev, Anouar Benali, M. Chandler Bennett, Edgar Josué Landinez Borda, Peter Doak, Hongxia Hao, Kenneth D. Jordan, Jaron T. Krogel, Ilkka Kylänpää, Joonho Lee, Ye Luo, Fionn D. Malone, Cody A. Melton, Lubos Mitas, Miguel A. Morales, Eric Neuscamman, Fernando A. Reboredo, Brenda Rubenstein, Kayahan Saritas, Shiv Upadhyay, Guangming Wang, Shuai Zhang, and Luning Zhao. 2020. QMCPACK: Advances in the development, efficiency, and application of auxiliary field and real-space variational and diffusion quantum Monte Carlo. The Journal of Chemical Physics 152 (2020), 174105. https://doi.org/10.1063/5.0004860
[21]
M. Paul Laiu, Eirik Endeve, Ran Chu, J. Austin Harris, and O. E. Bronson Messer. 2021. A DG-IMEX Method for Two-moment Neutrino Transport: Nonlinear Solvers for Neutrino-Matter Coupling. Astrophys. J., Suppl. Ser. 253, 2, Article 52 (April 2021), 52 pages. https://doi.org/10.3847/1538-4365/abe2a8 arxiv:2102.02186 [astro-ph.HE]
[22]
Elijah A MacCarthy, Chengxin Zhang, Yang Zhang, and KC Dukka. 2022. GPU-I-TASSER: a GPU accelerated I-TASSER protein structure prediction tool. Bioinformatics (2022).
[23]
Alexander Matthes, René Widera, Erik Zenker, Benjamin Worpitz, Axel Huebl, and Michael Bussmann. 2017. Tuning and Optimization for a Variety of Many-Core Architectures Without Changing a Single Line of Implementation Code Using the Alpaka Library. In High Performance Computing, Julian M. Kunkel, Rio Yokota, Michela Taufer, and John Shalf (Eds.). Springer International Publishing, Cham, 496–514. https://doi.org/10.1007/978-3-319-67630-2_36
[24]
Simon McIntosh-Smith, James Price, Andrei Poenaru, and Tom Deakin. 2020. Benchmarking the first generation of production quality Arm-based supercomputers. Concurrency and Computation: Practice and Experience 32, 20(2020), e5569.
[25]
Marcelo C. R. Melo, Rafael C. Bernardi, Till Rudack, Maximilian Scheurer, Christoph Riplinger, James C. Phillips, Julio D. C. Maia, Gerd B. Rocha, João V. Ribeiro, John E. Stone, Frank Nesse, Klaus Schulten, and Zaida Luthey-Schulten. 2018. NAMD goes quantum: An integrative suite for hybrid simulations. Nature Methods 15(2018), 351–354.
[26]
James C. Phillips, David J. Hardy, Julio D. C. Maia, John E. Stone, João V. Ribeiro, Rafael C. Bernardi, Ronak Buch, Giacomo Fiorin, Jérôme Hénin, Wei Jiang, Ryan McGreevy, Marcelo C. R. Melo, Brian Radak, Robert D. Skeel, Abhishek Singharoy, Yi Wang, Benoît Roux, Aleksei Aksimentiev, Zaida Luthey-Schulten, Laxmikant V. Kalé, Klaus Schulten, Christophe Chipot, and Emad Tajkhorshid. 2020. Scalable molecular dynamics on CPU and GPU architectures with NAMD. Journal of Chemical Physics 153 (2020), 044130. https://doi.org/10.1063/5.0014475
[27]
Nikola Rajovic, Alejandro Rico, Nikola Puzovic, Chris Adeniyi-Jones, and Alex Ramirez. 2014. Tibidabo: Making the case for an ARM-based HPC system. Future Generation Computer Systems 36 (2014), 322–334.
[28]
Mitsuhisa Sato, Yutaka Ishikawa, Hirofumi Tomita, Yuetsu Kodama, Tetsuya Odajima, Miwako Tsuji, Hisashi Yashiro, Masaki Aoki, Naoyuki Shida, Ikuo Miyoshi, Kouichi Hirai, Atsushi Furuya, Akira Asato, Kuniki Morita, and Toshiyuki Shimizu. 2020. Co-Design for A64FX Manycore Processor and “Fugaku”. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–15. https://doi.org/10.1109/SC41405.2020.00051
[29]
K. Schmidmayer, S. H. Bryngelson, and T. Colonius. 2020. An assessment of multicomponent flow models and interface capturing schemes for spherical bubble dynamics. J. Comput. Phys. 402(2020), 109080. https://doi.org/10.1016/j.jcp.2019.109080
[30]
N. Stephens, S. Biles, M. Boettcher, J. Eapen, M. Eyole, G. Gabrielli, M. Horsnell, G. Magklis, A. Martinez, N. Premillieu, A. Reid, A. Rico, and P. Walker. 2017. The ARM Scalable Vector Extension. IEEE Micro 37, 02 (mar 2017), 26–39. https://doi.org/10.1109/MM.2017.35
[31]
John E. Stone, Michael J. Hallock, James C. Phillips, Joseph R. Peterson, Zaida Luthey-Schulten, and Klaus Schulten. 2016. Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads. 2016 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)(2016), 89–100. https://doi.org/10.1109/IPDPSW.2016.130
[32]
John E. Stone, David J. Hardy, Jan Saam, Kirby L. Vandivort, and Klaus Schulten. 2011. GPU-Accelerated Computation and Interactive Display of Molecular Orbitals. In GPU Computing Gems, Wen-mei Hwu (Ed.). Morgan Kaufmann Publishers, Chapter 1, 5–18.
[33]
John E. Stone, David J. Hardy, Ivan S. Ufimtsev, and Klaus Schulten. 2010. GPU-Accelerated Molecular Modeling Coming of Age. J. Molecular Graphics and Modelling 29 (2010), 116–125.
[34]
John E. Stone, Antti-Pekka Hynninen, James C. Phillips, and Klaus Schulten. 2016. Early Experiences Porting the NAMD and VMD Molecular Simulation and Analysis Software to GPU-Accelerated OpenPOWER Platforms. International Workshop on OpenPOWER for HPC (IWOPH’16) (2016), 188–206.
[35]
John E. Stone, Jan Saam, David J. Hardy, Kirby L. Vandivort, Wen-mei W. Hwu, and Klaus Schulten. 2009. High Performance Computation and Interactive Display of Molecular Orbitals on GPUs and Multi-core CPUs. In Proceedings of the 2nd Workshop on General-Purpose Processing on Graphics Processing Units, ACM International Conference Proceeding Series, Vol. 383. ACM, New York, NY, USA, 9–18.
[36]
A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in ’t Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, and S. J. Plimpton. 2022. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Comp. Phys. Comm. 271(2022), 108171. https://doi.org/10.1016/j.cpc.2021.108171
[37]
Christian R. Trott, Damien Lebrun-Grandié, Daniel Arndt, Jan Ciesko, Vinh Dang, Nathan Ellingwood, Rahulkumar Gayatri, Evan Harvey, Daisy S. Hollman, Dan Ibanez, Nevin Liber, Jonathan Madsen, Jeff Miles, David Poliakoff, Amy Powell, Sivasankaran Rajamanickam, Mikael Simberg, Dan Sunderland, Bruno Turcksin, and Jeremiah Wilke. 2022. Kokkos 3: Programming Model Extensions for the Exascale Era. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2022), 805–817. https://doi.org/10.1109/TPDS.2021.3097283
[38]
Verónica G Vergara Larrea, Wayne Joubert, Michael J Brim, Reuben D Budiardja, Don Maxwell, Matt Ezell, Christopher Zimmer, Swen Boehm, Wael Elwasif, Sarp Oral, 2019. Scaling the summit: deploying the world’s fastest supercomputer. In International Conference on High Performance Computing. Springer, 330–351.
[39]
Wei Zheng, Chengxin Zhang, Eric W Bell, and Yang Zhang. 2019. I-TASSER gateway: a protein structure and function prediction server powered by XSEDE. Future Generation Computer Systems 99 (2019), 73–85.

Cited By

View all
  • (2024)Evaluating ARM and RISC-V Architectures for High-Performance Computing with Docker and KubernetesElectronics10.3390/electronics1317349413:17(3494)Online publication date: 3-Sep-2024
  • (2024)OpenACC offloading of the MFC compressible multiphase flow solver on AMD and NVIDIA GPUsSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00242(1923-1933)Online publication date: 17-Nov-2024

Index Terms

  1. Application Experiences on a GPU-Accelerated Arm-based HPC Testbed
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          HPCAsia '23 Workshops: Proceedings of the HPC Asia 2023 Workshops
          February 2023
          101 pages
          ISBN:9781450399890
          DOI:10.1145/3581576
          Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 27 February 2023

          Permissions

          Request permissions for this article.

          Check for updates

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Funding Sources

          Conference

          HPCAsia2023 Workshop

          Acceptance Rates

          HPCAsia '23 Workshops Paper Acceptance Rate 9 of 10 submissions, 90%;
          Overall Acceptance Rate 69 of 143 submissions, 48%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)384
          • Downloads (Last 6 weeks)49
          Reflects downloads up to 13 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Evaluating ARM and RISC-V Architectures for High-Performance Computing with Docker and KubernetesElectronics10.3390/electronics1317349413:17(3494)Online publication date: 3-Sep-2024
          • (2024)OpenACC offloading of the MFC compressible multiphase flow solver on AMD and NVIDIA GPUsSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00242(1923-1933)Online publication date: 17-Nov-2024

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media