Performance Portable Implementation of a Kinetic Plasma Simulation Mini-App

Asahi, Yuuichi; Latu, Guillaume; Grandgirard, Virginie; Bigot, Julien

doi:10.1007/978-3-030-49943-3_6

Yuuichi Asahi ORCID: orcid.org/0000-0002-9997-1274¹⁰,
Guillaume Latu¹¹,
Virginie Grandgirard¹¹ &
…
Julien Bigot¹²

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12017))

Included in the following conference series:

International Workshop on Accelerator Programming Using Directives

336 Accesses
2 Citations

Abstract

Performance portability is considered to be an inevitable requirement in the exascale era. We explore a performance portable approach for fusion plasma turbulence simulation code employing kinetic model, namely the GYSELA code. For this purpose, we extract the key features of GYSELA such as the high dimensionality and the semi-Lagrangian scheme, and encapsulate them into a mini-application which solves the similar but a simplified Vlasov-Poisson system. We implement the mini-app with a mixed OpenACC/OpenMP and Kokkos, where we suppress unnecessary duplications of code lines. For a reference case with the problem size of $128^4$, the Skylake (Kokkos), Nvidia Tesla P100 (OpenACC), and P100 (Kokkos) versions achieve an acceleration of 1.45, 12.95, and 17.83, respectively, with respect to the baseline OpenMP version on Intel Skylake. In addition to the performance portability, we discuss the code readability and productivity of each implementation. Based on our experience, Kokkos can offer a readable and productive code at the cost of initial porting efforts, which would be enormous for a large scale simulation code like GYSELA.

Supported by QST, Japan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Optimizing BIT1, a Particle-in-Cell Monte Carlo Code, with OpenMP/OpenACC and GPU Acceleration

Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond

Accelerating Fusion Plasma Collision Operator Solves with Portable Batched Iterative Solvers on GPUs

References

The STREAM2 Home Page. http://www.cs.virginia.edu/stream/stream2/. Accessed 09 Oct 2019
Sustainable Memory Bandwidth in High Performance Computers. https://www.cs.virginia.edu/stream/. Accessed 09 Oct 2019
TSUBAME Computing Services TSUBAME3.0. http://www.t3.gsic.titech.ac.jp/en
Asahi, Y., Latu, G., Ina, T., Idomura, Y., Grandgirard, V., Garbet, X.: Optimization of fusion kernels on accelerators with indirect or strided memory access patterns. IEEE Trans. Parallel Distrib. Syst. 28(7), 1974–1988 (2017). https://doi.org/10.1109/TPDS.2016.2633349
Article Google Scholar
Asahi, Y., Latu, G., Bigot, J., Maeyama, S., Grandgirard, V., Idomura, Y.: Overlapping communications in gyrokinetic codes on accelerator-based platforms, concurrency and computation: practice and experience. https://doi.org/10.1002/cpe.5551
Asanović, K., et al.: The landscape of parallel computing research: a view from Berkeley. Technical report. UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006). http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
Crouseilles, N., Latu, G., Sonnendrücker, E.: A parallel Vlasov solver based on local cubic spline interpolation on patches. J. Comput. Phys. 228(5), 1429–1446 (2009). https://doi.org/10.1016/j.jcp.2008.10.041, http://www.sciencedirect.com/science/article/pii/S0021999108005652
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). https://doi.org/10.1016/j.jpdc.2014.07.003, http://www.sciencedirect.com/science/article/pii/S0743731514001257
Garbet, X., Idomura, Y., Villard, L., Watanabe, T.H.: Gyrokinetic simulations of turbulent transport. Nucl. Fusion 50, 043002 (2010). https://doi.org/10.1088/0029-5515/50/4/043002
Article Google Scholar
Grandgirard, V., et al.: A 5D gyrokinetic full-f global semi-Lagrangian code for flux-driven ion turbulence simulations. Comput. Phys. Commun. 207, 35–68 (2016). https://doi.org/10.1016/j.cpc.2016.05.007, http://www.sciencedirect.com/science/article/pii/S0010465516301230
Grete, P., Glines, F.W., O’Shea, B.W.: K-athena: a performance portable structured grid finite volume magnetohydrodynamics code. CoRR abs/1905.04341 (2019). http://arxiv.org/abs/1905.04341
Hornung, R.D., Keasler, J.A.: The RAJA Portability Layer: Overview and Status. Technical report, Lawrence Livermore National Lab. (LLNL), Livermore, CA, USA. https://doi.org/10.2172/1169830
Intel: ${\rm intel}^{\textregistered }{\rm xeon}^{\textregistered } $ glod 6148 processor (27.5 m cache, 2.40 Ghz). https://ark.intel.com/content/www/us/en/ark/products/120489/intel-xeon-gold-6148-processor-27-5m-cache-2-40-ghz.html
Kruse, M., Finkel, H.: A proposal for loop-transformation pragmas. CoRR abs/1805.03374 (2018). http://arxiv.org/abs/1805.03374
Latu, G., ASAHI, Y., Bigot, J., Fehér, T., Grandgirard, V.: Scaling and optimizing the Gysela code on a cluster of many-core processors. In: SBAC-PAD 2018, WAMCA Workshop, SBAC-PAD 2018 Proceedings, Lyon, France, September 2018. https://hal.inria.fr/hal-01719208
Law, T.R., et al.: Performance portability of an unstructured hydrodynamics mini-application. In: Proceedings of 2018 International Workshop on Performance, Portability, and Productivity in HPC (P3HPC). ACM, New York (2018). https://doi.org/10.1109/CLUSTER.2018.00078
Nvidia: NVIDIA Tesla P100. https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf
OpenACC: OpenACC 2.7 API Reference Card (2019). https://www.openacc.org/sites/default/files/inline-files/API%20Guide%202.7.pdf. Accessed 20 Aug 2019
OpenMP: OpenMP 5.0 Reference Guide (2019). https://www.openmp.org/wp-content/uploads/OpenMPRef-5.0-0519-print.pdf. Accessed 20 Aug 2019
Strang, G.: On the construction and comparison of difference schemes. SIAM J. Num. Anal. 5(3), 506–517 (1968). https://doi.org/10.1137/0705041
Sunderland, D., Peterson, B., Schmidt, J., Humphrey, A., Thornock, J., Berzins, M.: An overview of performance portability in the Uintah runtime system through the use of Kokkos. In: 2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2), pp. 44–47 (2016). https://doi.org/10.1109/ESPM2.2016.012
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785

Download references

Acknowledgement

This work was carried out using the JFRS-1 supercomputer at Computational Simulation Centre of International Fusion Energy Research Centre (IFERC-CSC) in Rokkasho Fusion Institute of QST and Tsubame 3.0 supercomputer at Tokyo Tech. This work was partly supported by JHPCN projects jh180081-NAHI and jh190065-NHI, 102515-15 the MEXT, Grant for HPCI Strategic Program Field No. 4: Next-Generation Industrial Innovations, and Grant for Post-K priority issue No. 6: Development of Innovative Clean Energy.

Author information

Authors and Affiliations

National Institutes for Quantum and Radiological Science and Technology, Rokkasho, Aomori, 039-3212, Japan
Yuuichi Asahi
CEA, IRFM, Cadarache, 13108, St.Paul-lez-Durance Cedex, France
Guillaume Latu & Virginie Grandgirard
Maison de la Simulation, CEA, CNRS, Univ. Paris-Sud, UVSQ, Université Paris-Saclay, 91191, Gif-sur-Yvette, France
Julien Bigot

Authors

Yuuichi Asahi
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Latu
View author publications
You can also search for this author in PubMed Google Scholar
Virginie Grandgirard
View author publications
You can also search for this author in PubMed Google Scholar
Julien Bigot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuuichi Asahi .

Editor information

Editors and Affiliations

RWTH Aachen University, Aachen, Germany
Sandra Wienke
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Sridutt Bhalachandra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Asahi, Y., Latu, G., Grandgirard, V., Bigot, J. (2020). Performance Portable Implementation of a Kinetic Plasma Simulation Mini-App. In: Wienke, S., Bhalachandra, S. (eds) Accelerator Programming Using Directives. WACCPD 2019. Lecture Notes in Computer Science(), vol 12017. Springer, Cham. https://doi.org/10.1007/978-3-030-49943-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-49943-3_6
Published: 09 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49942-6
Online ISBN: 978-3-030-49943-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Performance Portable Implementation of a Kinetic Plasma Simulation Mini-App

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing BIT1, a Particle-in-Cell Monte Carlo Code, with OpenMP/OpenACC and GPU Acceleration

Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond

Accelerating Fusion Plasma Collision Operator Solves with Portable Batched Iterative Solvers on GPUs

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Performance Portable Implementation of a Kinetic Plasma Simulation Mini-App

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing BIT1, a Particle-in-Cell Monte Carlo Code, with OpenMP/OpenACC and GPU Acceleration

Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond

Accelerating Fusion Plasma Collision Operator Solves with Portable Batched Iterative Solvers on GPUs

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation