Abstract
Performance portability is considered to be an inevitable requirement in the exascale era. We explore a performance portable approach for fusion plasma turbulence simulation code employing kinetic model, namely the GYSELA code. For this purpose, we extract the key features of GYSELA such as the high dimensionality and the semi-Lagrangian scheme, and encapsulate them into a mini-application which solves the similar but a simplified Vlasov-Poisson system. We implement the mini-app with a mixed OpenACC/OpenMP and Kokkos, where we suppress unnecessary duplications of code lines. For a reference case with the problem size of \(128^4\), the Skylake (Kokkos), Nvidia Tesla P100 (OpenACC), and P100 (Kokkos) versions achieve an acceleration of 1.45, 12.95, and 17.83, respectively, with respect to the baseline OpenMP version on Intel Skylake. In addition to the performance portability, we discuss the code readability and productivity of each implementation. Based on our experience, Kokkos can offer a readable and productive code at the cost of initial porting efforts, which would be enormous for a large scale simulation code like GYSELA.
Supported by QST, Japan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
The STREAM2 Home Page. http://www.cs.virginia.edu/stream/stream2/. Accessed 09 Oct 2019
Sustainable Memory Bandwidth in High Performance Computers. https://www.cs.virginia.edu/stream/. Accessed 09 Oct 2019
TSUBAME Computing Services TSUBAME3.0. http://www.t3.gsic.titech.ac.jp/en
Asahi, Y., Latu, G., Ina, T., Idomura, Y., Grandgirard, V., Garbet, X.: Optimization of fusion kernels on accelerators with indirect or strided memory access patterns. IEEE Trans. Parallel Distrib. Syst. 28(7), 1974–1988 (2017). https://doi.org/10.1109/TPDS.2016.2633349
Asahi, Y., Latu, G., Bigot, J., Maeyama, S., Grandgirard, V., Idomura, Y.: Overlapping communications in gyrokinetic codes on accelerator-based platforms, concurrency and computation: practice and experience. https://doi.org/10.1002/cpe.5551
Asanović, K., et al.: The landscape of parallel computing research: a view from Berkeley. Technical report. UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006). http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
Crouseilles, N., Latu, G., Sonnendrücker, E.: A parallel Vlasov solver based on local cubic spline interpolation on patches. J. Comput. Phys. 228(5), 1429–1446 (2009). https://doi.org/10.1016/j.jcp.2008.10.041, http://www.sciencedirect.com/science/article/pii/S0021999108005652
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). https://doi.org/10.1016/j.jpdc.2014.07.003, http://www.sciencedirect.com/science/article/pii/S0743731514001257
Garbet, X., Idomura, Y., Villard, L., Watanabe, T.H.: Gyrokinetic simulations of turbulent transport. Nucl. Fusion 50, 043002 (2010). https://doi.org/10.1088/0029-5515/50/4/043002
Grandgirard, V., et al.: A 5D gyrokinetic full-f global semi-Lagrangian code for flux-driven ion turbulence simulations. Comput. Phys. Commun. 207, 35–68 (2016). https://doi.org/10.1016/j.cpc.2016.05.007, http://www.sciencedirect.com/science/article/pii/S0010465516301230
Grete, P., Glines, F.W., O’Shea, B.W.: K-athena: a performance portable structured grid finite volume magnetohydrodynamics code. CoRR abs/1905.04341 (2019). http://arxiv.org/abs/1905.04341
Hornung, R.D., Keasler, J.A.: The RAJA Portability Layer: Overview and Status. Technical report, Lawrence Livermore National Lab. (LLNL), Livermore, CA, USA. https://doi.org/10.2172/1169830
Intel: \({\rm intel}^{\textregistered }{\rm xeon}^{\textregistered } \) glod 6148 processor (27.5 m cache, 2.40 Ghz). https://ark.intel.com/content/www/us/en/ark/products/120489/intel-xeon-gold-6148-processor-27-5m-cache-2-40-ghz.html
Kruse, M., Finkel, H.: A proposal for loop-transformation pragmas. CoRR abs/1805.03374 (2018). http://arxiv.org/abs/1805.03374
Latu, G., ASAHI, Y., Bigot, J., Fehér, T., Grandgirard, V.: Scaling and optimizing the Gysela code on a cluster of many-core processors. In: SBAC-PAD 2018, WAMCA Workshop, SBAC-PAD 2018 Proceedings, Lyon, France, September 2018. https://hal.inria.fr/hal-01719208
Law, T.R., et al.: Performance portability of an unstructured hydrodynamics mini-application. In: Proceedings of 2018 International Workshop on Performance, Portability, and Productivity in HPC (P3HPC). ACM, New York (2018). https://doi.org/10.1109/CLUSTER.2018.00078
Nvidia: NVIDIA Tesla P100. https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf
OpenACC: OpenACC 2.7 API Reference Card (2019). https://www.openacc.org/sites/default/files/inline-files/API%20Guide%202.7.pdf. Accessed 20 Aug 2019
OpenMP: OpenMP 5.0 Reference Guide (2019). https://www.openmp.org/wp-content/uploads/OpenMPRef-5.0-0519-print.pdf. Accessed 20 Aug 2019
Strang, G.: On the construction and comparison of difference schemes. SIAM J. Num. Anal. 5(3), 506–517 (1968). https://doi.org/10.1137/0705041
Sunderland, D., Peterson, B., Schmidt, J., Humphrey, A., Thornock, J., Berzins, M.: An overview of performance portability in the Uintah runtime system through the use of Kokkos. In: 2016 Second International Workshop on Extreme Scale Programming Models and Middlewar (ESPM2), pp. 44–47 (2016). https://doi.org/10.1109/ESPM2.2016.012
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785
Acknowledgement
This work was carried out using the JFRS-1 supercomputer at Computational Simulation Centre of International Fusion Energy Research Centre (IFERC-CSC) in Rokkasho Fusion Institute of QST and Tsubame 3.0 supercomputer at Tokyo Tech. This work was partly supported by JHPCN projects jh180081-NAHI and jh190065-NHI, 102515-15 the MEXT, Grant for HPCI Strategic Program Field No. 4: Next-Generation Industrial Innovations, and Grant for Post-K priority issue No. 6: Development of Innovative Clean Energy.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Asahi, Y., Latu, G., Grandgirard, V., Bigot, J. (2020). Performance Portable Implementation of a Kinetic Plasma Simulation Mini-App. In: Wienke, S., Bhalachandra, S. (eds) Accelerator Programming Using Directives. WACCPD 2019. Lecture Notes in Computer Science(), vol 12017. Springer, Cham. https://doi.org/10.1007/978-3-030-49943-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-49943-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49942-6
Online ISBN: 978-3-030-49943-3
eBook Packages: Computer ScienceComputer Science (R0)