Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2063384.2063415acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems

Published: 12 November 2011 Publication History

Abstract

The gyrokinetic Particle-in-Cell (PIC) method is a critical computational tool enabling petascale fusion simulation research. In this work, we present novel multi- and manycore-centric optimizations to enhance performance of GTC, a PIC-based production code for studying plasma microturbulence in tokamak devices. Our optimizations encompass all six GTC sub-routines and include multi-level particle and grid decompositions designed to improve multi-node parallel scaling, particle binning for improved load balance, GPU acceleration of key subroutines, and memory-centric optimizations to improve single-node scaling and reduce memory utilization. The new hybrid MPI-OpenMP and MPI-OpenMP-CUDA GTC versions achieve up to a 2x speedup over the production Fortran code on four parallel systems --- clusters based on the AMD Magny-Cours, Intel Nehalem-EP, IBM BlueGene/P, and NVIDIA Fermi architectures. Finally, strong scaling experiments provide insight into parallel scalability, memory utilization, and programmability trade-offs for large-scale gyrokinetic PIC simulations, while attaining a 1.6× speedup on 49,152 XE6 cores.

References

[1]
M. Adams, S. Ethier, and N. Wichmann. Performance of particle in cell methods on highly concurrent computational architectures. Journal of Physics: Conference Series, 78:012001 (10pp), 2007.
[2]
E. Akarsu, K. Dincer, T. Haupt, and G. Fox. Particle-in-cell simulation codes in High Performance Fortran. In Proc. ACM/IEEE Conference on Supercomputing (SC'96), page 38, Nov. 1996.
[3]
E. Bertschinger and J. Gelb. Cosmological N-body simulations. Computers in Physics, 5:164--175, 1991.
[4]
K. Bowers. Accelerating a particle-in-cell simulation using a hybrid counting sort. Journal of Computational Physics, 173(2):393--411, 2001.
[5]
K. Bowers, B. Albright, B. Bergen, L. Yin, K. Barker, and D. Kerbyson. 0.374 Pflop/s trillion-particle kinetic modeling of laser plasma interaction on Roadrunner. In Proc. 2008 ACM/IEEE Conf. on Supercomputing, pages 1--11, Austin, TX, Nov. 2008. IEEE Press.
[6]
S. Briguglio, B. M. G. Fogaccia, and G. Vlad. Hierarchical MPI+OpenMP implementation of parallel PIC applications on clusters of Symmetric MultiProcessors. In Proc. Recent Advances in Parallel Virtual Machine and Message Passing Interface (Euro PVM/MPI), pages 180--187, Sep--Oct 1996.
[7]
E. Carmona and L. Chandler. On parallel PIC versatility and the structure of parallel PIC approaches. Concurrency: Practice and Experience, 9(12):1377--1405, 1998.
[8]
A. Danalis, G. Marin, C. McCurdy, J. Meredith, P. Roth, K. Spafford, V. Tipparaju, and J. Vetter. The scalable heterogeneous computing (SHOC) benchmark suite. In Proc. 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU '10), pages 63--74. ACM, 2010.
[9]
V. K. Decyk. UPIC: A framework for massively parallel particle-in-cell codes. Computer Physics Communications, 177(1--2):95--97, 2007.
[10]
V. K. Decyk and T. V. Singh. Adaptable particle-in-cell algorithms for graphical processing units. Computer Physics Communications, 182(3):641--648, 2011.
[11]
S. Ethier, W. Tang, and Z. Lin. Gyrokinetic particle-in-cell simulations of plasma microturbulence on advanced computing platforms. Journal of Physics: Conference Series, 16:1--15, 2005.
[12]
S. Ethier, W. Tang, R. Walkup, and L. Oliker. Large-scale gyrokinetic particle simulation of microturbulence in magnetically confined fusion plasmas. IBM Journal of Research and Development, 52(1--2):105--115, 2008.
[13]
R. Fonseca et al. OSIRIS: A three-dimensional, fully relativistic particle in cell code for modeling plasma based accelerators. In Proc. Int'l. Conference on Computational Science (ICCS '02), pages 342--351, Apr. 2002.
[14]
R. Hockney and J. Eastwood. Computer simulation using particles. Taylor & Francis, Inc., Bristol, PA, USA, 1988.
[15]
C. Huang et al. QUICKPIC: A highly efficient particle-in-cell code for modeling wakefield acceleration in plasmas. Journal of Computational Physics, 217(2):658--679, 2006.
[16]
The ITER project. http://www.iter.org/.
[17]
JET, the Joint European Torus. http://www.jet.efda.org/jet/, last accessed Apr 2011.
[18]
A. Koniges et al. Application acceleration on current and future Cray platforms. In Proc. Cray User Group Meeting, May 2009.
[19]
W. Lee. Gyrokinetic particle simulation model. Journal of Computational Physics, 72(1):243--269, 1987.
[20]
Z. Lin, T. Hahm, W. Lee, W. Tang, and R. White. Turbulent transport reduction by zonal flows: Massively parallel simulations. Science, 281(5384):1835--1837, 1998.
[21]
K. Madduri, E. J. Im, K. Ibrahim, S. Williams, S. Ethier, and L. Oliker. Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms. Parallel Computing, 2011. in press, http://dx.doi.org/10.1016/j.parco.2011.02.001.
[22]
K. Madduri, S. Williams, S. Ethier, L. Oliker, J. Shalf, E. Strohmaier, and K. Yelick. Memory-efficient optimization of gyrokinetic particle-to-grid interpolation for multicore processors. In Proc. ACM/IEEE Conf. on Supercomputing (SC 2009), pages 48:1--48:12, Nov. 2009.
[23]
G. Marin, G. Jin, and J. Mellor-Crummey. Managing locality in grand challenge applications: a case study of the gyrokinetic toroidal code. Journal of Physics: Conference Series, 125:012087 (6pp), 2008.
[24]
J. D. McCalpin. STREAM: Sustainable Memory Bandwidth in High Performance Computers. http://www.cs.virginia.edu/stream/.
[25]
H. Nakashima, Y. Miyake, H. Usui, and Y. Omura. OhHelp: a scalable domain-decomposing dynamic load balancing for particle-in-cell simulations. In Proc. 23rd International Conference on Supercomputing (ICS '09), pages 90--99, June 2009.
[26]
C. Nieter and J. Cary. VORPAL: a versatile plasma simulation code. Journal of Computational Physics, 196(2):448--473, 2004.
[27]
L. Oliker, A. Canning, J. Carter, J. Shalf, and S. Ethier. Scientific computations on modern parallel vector systems. In Proc. 2004 ACM/IEEE Conf. on Supercomputing, page 10, Pittsburgh, PA, Nov. 2004. IEEE Computer Society.
[28]
G. Stantchev, W. Dorland, and N. Gumerov. Fast parallel particle-to-grid interpolation for plasma PIC simulations on the GPU. Journal of Parallel and Distributed Computing, 68(10):1339--1349, 2008.
[29]
Top500 Supercomputer Sites. http://www.top500.org.

Cited By

View all
  • (2023)Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system面对E级超算系统的可扩展性和效率挑战: 神威E级原型系统并行支撑环境的实践Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.220041224:1(41-58)Online publication date: 23-Jan-2023
  • (2023)A 3D Unstructured Mesh based Particle Tracking Code for Impurity Transport Simulation in Fusion TokamaksComputer Physics Communications10.1016/j.cpc.2023.108861(108861)Online publication date: Jul-2023
  • (2023)Development of an unstructured mesh gyrokinetic particle-in-cell code for exascale fusion plasma simulations on GPUsComputer Physics Communications10.1016/j.cpc.2023.108824(108824)Online publication date: Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
November 2011
866 pages
ISBN:9781450307710
DOI:10.1145/2063384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hybrid programming
  2. multicore optimization
  3. particle-in-cell

Qualifiers

  • Research-article

Funding Sources

Conference

SC '11
Sponsor:

Acceptance Rates

SC '11 Paper Acceptance Rate 74 of 352 submissions, 21%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system面对E级超算系统的可扩展性和效率挑战: 神威E级原型系统并行支撑环境的实践Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.220041224:1(41-58)Online publication date: 23-Jan-2023
  • (2023)A 3D Unstructured Mesh based Particle Tracking Code for Impurity Transport Simulation in Fusion TokamaksComputer Physics Communications10.1016/j.cpc.2023.108861(108861)Online publication date: Jul-2023
  • (2023)Development of an unstructured mesh gyrokinetic particle-in-cell code for exascale fusion plasma simulations on GPUsComputer Physics Communications10.1016/j.cpc.2023.108824(108824)Online publication date: Jun-2023
  • (2021)PUMIPic: A mesh-based approach to unstructured mesh Particle-In-Cell on GPUsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2021.06.004Online publication date: Jun-2021
  • (2020)The Need for Precise and Efficient Memory Capacity BudgetingProceedings of the International Symposium on Memory Systems10.1145/3422575.3422791(169-177)Online publication date: 28-Sep-2020
  • (2020)Fast Modeling of Network Contention in Batch Point-to-point Communications by Packet-level Simulation with Dynamic Time-steppingWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409398(1-10)Online publication date: 17-Aug-2020
  • (2020)Improving Performance of Batch Point-to-Point Communications by Active Contention Reduction Through Congestion-Avoiding Message SchedulingAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-38991-8_27(404-418)Online publication date: 22-Jan-2020
  • (2019)Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputersInternational Journal of High Performance Computing Applications10.1177/109434201771205933:1(169-188)Online publication date: 1-Jan-2019
  • (2019)Heterogeneous Programming and Optimization of Gyrokinetic Toroidal Code Using DirectivesAccelerator Programming Using Directives10.1007/978-3-030-12274-4_1(3-21)Online publication date: 24-Jan-2019
  • (2016)The gyrokinetic particle simulation of fusion plasmas on Tianhe-2 supercomputerProceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems10.5555/3019094.3019098(25-32)Online publication date: 13-Nov-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media