Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3624062.3624604acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Comparing a Naive and a Tree-Based N-Body Algorithm using Different Standard SYCL Implementations on Various Hardware

Published: 12 November 2023 Publication History

Abstract

N-body algorithms aim to calculate the interactions between n different bodies with the goal of obtaining their trajectories. Algorithms that solve the n-body problem can leverage significant amounts of parallelism. Today, GPUs are commonly used besides CPUs for the execution of parallel algorithms. However, targeting several hardware platforms at once often requires using different programming languages. In this work, we have implemented a naive and tree-based Barnes-Hut n-body algorithm using SYCL to target CPUs and GPUs with the same programming language. We compare both algorithms on heterogeneous hardware platforms and for different SYCL implementations, with respect to their runtime behavior and support for several performance optimizations. Our results show that some optimizations reveal unexpected behavior for different SYCL implementations. And even though data center GPUs have a clear performance advantage for the naive algorithm, surprisingly consumer GPUs offer competitive runtimes for the Barnes-Hut algorithm.

Supplemental Material

MP4 File - Conference presentation recording
Recording of "Comparing a Naive and a Tree-Based N-Body Algorithm using Different Standard SYCL Implementations on Various Hardware" presentation at the Tenth Workshop on Accelerator Programming and Directives (WACCPD 2023)
PDF File
Appendix

References

[1]
Aksel Alpay. 2019. Teralens - A parallel (quasar) microlensing code for multi-teraflop devices. University of Heidelberg. https://github.com/illuhad/teralens
[2]
Aksel Alpay and Vincent Heuveline. 2020. SYCL beyond OpenCL: The Architecture, Current State and Future Direction of HipSYCL. In Proceedings of the International Workshop on OpenCL (Munich, Germany) (IWOCL ’20). Association for Computing Machinery, New York, NY, USA, Article 8, 1 pages. https://doi.org/10.1145/3388333.3388658
[3]
Aksel Alpay. 2019. SpatialCL - a high performance library for the spatial processing of particles on GPUs. University of Heidelberg. https://github.com/illuhad/SpatialCL
[4]
AMD. 2020. "Vega" 7nm Instruction Set Architecture Reference Guide. https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/vega-7nm-shader-instruction-set-architecture.pdf
[5]
Nitin Arora, Aashay Shringarpure, and Richard W. Vuduc. 2009. Direct N-body Kernels for Multicore Platforms. In 2009 International Conference on Parallel Processing. IEEE, 379–387. https://doi.org/10.1109/ICPP.2009.71
[6]
Josh Barnes and Piet Hut. 1986. A hierarchical O(N log N) force-calculation algorithm. OriginalPaper. nature 324, 6096 (1986), 446–449. https://doi.org/10.1038/324446a0
[7]
Martin Burtscher and Keshav Pingali. 2011. Chapter 6 - An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm. In GPU computing Gems Emerald edition. Elsevier. https://doi.org/10.1016/B978-0-12-384988-5.00006-1
[8]
R. Capuzzo-Dolcetta and M. Spera. 2013. A performance comparison of different graphics processing units running direct N-body simulations. Computer Physics Communications 184, 11 (2013). https://doi.org/10.1016/j.cpc.2013.07.005
[9]
Codeplay Software Ltd. 2023. cuda-to-sycl-nbody. Codeplay Software Ltd. https://github.com/codeplaysoftware/cuda-to-sycl-nbody
[10]
D J Evans and W G Hoover. 1986. Flows Far From Equilibrium Via Molecular Dynamics. Annual Review of Fluid Mechanics 18, 1 (1986), 243–264. https://doi.org/10.1146/annurev.fl.18.010186.001331
[11]
Paul Gibbon, Robert Speck, Anupam Karmakar, Lukas Arnold, Wolfgang Frings, Benjamin Berberich, Detlef Reiter, and Martin Mašek. 2010. Progress in Mesh-Free Plasma Simulation With Parallel Tree Codes. IEEE Transactions on Plasma Science 38, 9 (2010), 2367–2376. https://doi.org/10.1109/TPS.2010.2055165
[12]
Vladimir Rokhlin Greengard, Leslie. 1987. A Fast Algorithm for Particle Simulations. Journal of computational physics 73 (1987), 315–348. https://doi.org/10.1016/0021-9991(87)90140-9
[13]
Ernst Hairer, Christian Lubich, and Gerhard Wanner. 2003. Geometric numerical integration illustrated by the Störmer–Verlet method. Acta Numerica 12 (2003), 399–450. https://doi.org/10.1017/S0962492902000144
[14]
Beau Johnston, Jeffrey S. Vetter, and Josh Milthorpe. 2020. Evaluating the Performance and Portability of Contemporary SYCL Implementations. In 2020 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 45–56. https://doi.org/10.1109/P3HPC51967.2020.00010
[15]
Khronos-Group. 2023. SYCL - C++ Single-source Heterogeneous Programming for Acceleration Offload. Khronos Group. https://www.khronos.org/sycl/
[16]
Khronos®-SYCL™-Working-Group. 2020. SYCL™ Specification. Khronos Group. https://registry.khronos.org/SYCL/specs/sycl-1.2.1.pdf
[17]
Khronos®-SYCL™-Working-Group. 2023. SYCL™ 2020 Specification (revision 7). Khronos Group. https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html
[18]
NVIDIA. 2020. NVIDIA A100 Tensor Core GPU Architecture. https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf
[19]
NVIDIA. 2023. CUDA C++ Best Practices Guide. NVIDIA. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html
[20]
NVIDIA 2023. CUDA C++ Programming Guide Release 12.2. NVIDIA. https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf
[21]
NVIDIA. 2021. NVIDIA AMPERE GA102 GPU ARCHITECTURE. https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf
[22]
Lars Nyland, Mark Harris, and Jan Prins. 2007. GPU Gems 3. 62–66 pages. https://developer.nvidia.com/gpugems/gpugems3/part-v-physics-simulation/chapter-31-fast-n-body-simulation-cuda
[23]
John K. Salmon and Michael S. Warren. 1994. Fast Parallel Tree Codes for Gravitational and Fluid Dynamical N-Body Problems. The International Journal of Supercomputer Applications and High Performance Computing 8, 2 (1994), 129–142. https://doi.org/10.1177/109434209400800205
[24]
Woosuk Shin, Kwan-Hee Yoo, and Nakhoon Baek. 2020. Large-Scale Data Computing Performance Comparisons on SYCL Heterogeneous Parallel Processing Layer Implementations. Applied Sciences 10, 5 (2020), 1656. https://doi.org/10.3390/app10051656
[25]
Stuart Slattery, Samuel Reeve, Christoph Junghans, 2022. Cabana: A Performance Portable Library for Particle-Based Simulations. Journal of Open Source Software 7 (2022), 4115. https://doi.org/10.21105/joss.04115
[26]
Tim Thüring. 2023. Comparison of different n-body algorithms on various hardware platforms using SYCL. Bachelor Thesis. University of Stuttgart.
[27]
Tim Thüring and Marcel Breyer. 2023. TimThuering/N-Body-Simulation: Version 1.1. https://doi.org/10.5281/zenodo.8382540
[28]
Christian R. Trott, Damien Lebrun-Grandié, Daniel Arndt, 2022. Kokkos 3: Programming Model Extensions for the Exascale Era. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2022), 805–817. https://doi.org/10.1109/TPDS.2021.3097283
[29]
M. S. Warren and J. K. Salmon. 1993. A Parallel Hashed Oct-Tree N-Body Algorithm. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing (Portland, Oregon, USA) (Supercomputing ’93). Association for Computing Machinery, New York, NY, USA, 12–21. https://doi.org/10.1145/169627.169640
[30]
Rio Yokota and Lorena A. Barba. 2011. Chapter 9 - Treecode and Fast Multipole Method for N-Body Simulation with CUDA. In GPU Computing Gems Emerald Edition, Wen mei W. Hwu (Ed.). Morgan Kaufmann, Boston, 113–132. https://doi.org/10.1016/B978-0-12-384988-5.00009-7

Cited By

View all
  • (2024)Efficient Tree-based Parallel Algorithms for N-Body Simulations Using C++ Standard ParallelismSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00099(708-717)Online publication date: 17-Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. CPU
  2. GPU
  3. Performance Comparison
  4. SYCL
  5. n-body

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SC-W 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)6
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Tree-based Parallel Algorithms for N-Body Simulations Using C++ Standard ParallelismSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00099(708-717)Online publication date: 17-Nov-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media