Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3624062.3624604acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections

Comparing a Naive and a Tree-Based N-Body Algorithm using Different Standard SYCL Implementations on Various Hardware

Published: 12 November 2023 Publication History


N-body algorithms aim to calculate the interactions between n different bodies with the goal of obtaining their trajectories. Algorithms that solve the n-body problem can leverage significant amounts of parallelism. Today, GPUs are commonly used besides CPUs for the execution of parallel algorithms. However, targeting several hardware platforms at once often requires using different programming languages. In this work, we have implemented a naive and tree-based Barnes-Hut n-body algorithm using SYCL to target CPUs and GPUs with the same programming language. We compare both algorithms on heterogeneous hardware platforms and for different SYCL implementations, with respect to their runtime behavior and support for several performance optimizations. Our results show that some optimizations reveal unexpected behavior for different SYCL implementations. And even though data center GPUs have a clear performance advantage for the naive algorithm, surprisingly consumer GPUs offer competitive runtimes for the Barnes-Hut algorithm.

Supplemental Material

MP4 File - Conference presentation recording
Recording of "Comparing a Naive and a Tree-Based N-Body Algorithm using Different Standard SYCL Implementations on Various Hardware" presentation at the Tenth Workshop on Accelerator Programming and Directives (WACCPD 2023)
PDF File


Aksel Alpay. 2019. Teralens - A parallel (quasar) microlensing code for multi-teraflop devices. University of Heidelberg. https://github.com/illuhad/teralens
Aksel Alpay and Vincent Heuveline. 2020. SYCL beyond OpenCL: The Architecture, Current State and Future Direction of HipSYCL. In Proceedings of the International Workshop on OpenCL (Munich, Germany) (IWOCL ’20). Association for Computing Machinery, New York, NY, USA, Article 8, 1 pages. https://doi.org/10.1145/3388333.3388658
Aksel Alpay. 2019. SpatialCL - a high performance library for the spatial processing of particles on GPUs. University of Heidelberg. https://github.com/illuhad/SpatialCL
AMD. 2020. "Vega" 7nm Instruction Set Architecture Reference Guide. https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/vega-7nm-shader-instruction-set-architecture.pdf
Nitin Arora, Aashay Shringarpure, and Richard W. Vuduc. 2009. Direct N-body Kernels for Multicore Platforms. In 2009 International Conference on Parallel Processing. IEEE, 379–387. https://doi.org/10.1109/ICPP.2009.71
Josh Barnes and Piet Hut. 1986. A hierarchical O(N log N) force-calculation algorithm. OriginalPaper. nature 324, 6096 (1986), 446–449. https://doi.org/10.1038/324446a0
Martin Burtscher and Keshav Pingali. 2011. Chapter 6 - An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm. In GPU computing Gems Emerald edition. Elsevier. https://doi.org/10.1016/B978-0-12-384988-5.00006-1
R. Capuzzo-Dolcetta and M. Spera. 2013. A performance comparison of different graphics processing units running direct N-body simulations. Computer Physics Communications 184, 11 (2013). https://doi.org/10.1016/j.cpc.2013.07.005
Codeplay Software Ltd. 2023. cuda-to-sycl-nbody. Codeplay Software Ltd. https://github.com/codeplaysoftware/cuda-to-sycl-nbody
D J Evans and W G Hoover. 1986. Flows Far From Equilibrium Via Molecular Dynamics. Annual Review of Fluid Mechanics 18, 1 (1986), 243–264. https://doi.org/10.1146/annurev.fl.18.010186.001331
Paul Gibbon, Robert Speck, Anupam Karmakar, Lukas Arnold, Wolfgang Frings, Benjamin Berberich, Detlef Reiter, and Martin Mašek. 2010. Progress in Mesh-Free Plasma Simulation With Parallel Tree Codes. IEEE Transactions on Plasma Science 38, 9 (2010), 2367–2376. https://doi.org/10.1109/TPS.2010.2055165
Vladimir Rokhlin Greengard, Leslie. 1987. A Fast Algorithm for Particle Simulations. Journal of computational physics 73 (1987), 315–348. https://doi.org/10.1016/0021-9991(87)90140-9
Ernst Hairer, Christian Lubich, and Gerhard Wanner. 2003. Geometric numerical integration illustrated by the Störmer–Verlet method. Acta Numerica 12 (2003), 399–450. https://doi.org/10.1017/S0962492902000144
Beau Johnston, Jeffrey S. Vetter, and Josh Milthorpe. 2020. Evaluating the Performance and Portability of Contemporary SYCL Implementations. In 2020 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, 45–56. https://doi.org/10.1109/P3HPC51967.2020.00010
Khronos-Group. 2023. SYCL - C++ Single-source Heterogeneous Programming for Acceleration Offload. Khronos Group. https://www.khronos.org/sycl/
Khronos®-SYCL™-Working-Group. 2020. SYCL™ Specification. Khronos Group. https://registry.khronos.org/SYCL/specs/sycl-1.2.1.pdf
Khronos®-SYCL™-Working-Group. 2023. SYCL™ 2020 Specification (revision 7). Khronos Group. https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html
NVIDIA. 2020. NVIDIA A100 Tensor Core GPU Architecture. https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf
NVIDIA. 2023. CUDA C++ Best Practices Guide. NVIDIA. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html
NVIDIA 2023. CUDA C++ Programming Guide Release 12.2. NVIDIA. https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf
NVIDIA. 2021. NVIDIA AMPERE GA102 GPU ARCHITECTURE. https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf
Lars Nyland, Mark Harris, and Jan Prins. 2007. GPU Gems 3. 62–66 pages. https://developer.nvidia.com/gpugems/gpugems3/part-v-physics-simulation/chapter-31-fast-n-body-simulation-cuda
John K. Salmon and Michael S. Warren. 1994. Fast Parallel Tree Codes for Gravitational and Fluid Dynamical N-Body Problems. The International Journal of Supercomputer Applications and High Performance Computing 8, 2 (1994), 129–142. https://doi.org/10.1177/109434209400800205
Woosuk Shin, Kwan-Hee Yoo, and Nakhoon Baek. 2020. Large-Scale Data Computing Performance Comparisons on SYCL Heterogeneous Parallel Processing Layer Implementations. Applied Sciences 10, 5 (2020), 1656. https://doi.org/10.3390/app10051656
Stuart Slattery, Samuel Reeve, Christoph Junghans, 2022. Cabana: A Performance Portable Library for Particle-Based Simulations. Journal of Open Source Software 7 (2022), 4115. https://doi.org/10.21105/joss.04115
Tim Thüring. 2023. Comparison of different n-body algorithms on various hardware platforms using SYCL. Bachelor Thesis. University of Stuttgart.
Tim Thüring and Marcel Breyer. 2023. TimThuering/N-Body-Simulation: Version 1.1. https://doi.org/10.5281/zenodo.8382540
Christian R. Trott, Damien Lebrun-Grandié, Daniel Arndt, 2022. Kokkos 3: Programming Model Extensions for the Exascale Era. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2022), 805–817. https://doi.org/10.1109/TPDS.2021.3097283
M. S. Warren and J. K. Salmon. 1993. A Parallel Hashed Oct-Tree N-Body Algorithm. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing (Portland, Oregon, USA) (Supercomputing ’93). Association for Computing Machinery, New York, NY, USA, 12–21. https://doi.org/10.1145/169627.169640
Rio Yokota and Lorena A. Barba. 2011. Chapter 9 - Treecode and Fast Multipole Method for N-Body Simulation with CUDA. In GPU Computing Gems Emerald Edition, Wen mei W. Hwu (Ed.). Morgan Kaufmann, Boston, 113–132. https://doi.org/10.1016/B978-0-12-384988-5.00009-7

Cited By

View all
  • (2024)Efficient Tree-based Parallel Algorithms for N-Body Simulations Using C++ Standard ParallelismSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00099(708-717)Online publication date: 17-Nov-2024



Information & Contributors


Published In

cover image ACM Other conferences
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2023


Request permissions for this article.

Check for updates


Author Tags

  1. CPU
  2. GPU
  3. Performance Comparison
  4. SYCL
  5. n-body


  • Research-article
  • Research
  • Refereed limited

Funding Sources


SC-W 2023


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)6
Reflects downloads up to 27 Jan 2025

Other Metrics


Cited By

View all
  • (2024)Efficient Tree-based Parallel Algorithms for N-Body Simulations Using C++ Standard ParallelismSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00099(708-717)Online publication date: 17-Nov-2024

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.


HTML Format

View this article in HTML Format.

HTML Format






Share this Publication link

Share on social media