9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios

H Fu, C He, B Chen, Z Yin, Z Zhang, W Zhang… - Proceedings of the …, 2017 - dl.acm.org
H Fu, C He, B Chen, Z Yin, Z Zhang, W Zhang, T Zhang, W Xue, W Liu, W Yin, G Yang…
Proceedings of the International Conference for High Performance Computing …, 2017dl.acm.org
This paper reports our large-scale nonlinear earthquake simulation software on Sunway
TaihuLight. Our innovations include:(1) a customized parallelization scheme that employs
the 10 million cores efficiently at both the process and the thread levels;(2) an elaborate
memory scheme that integrates on-chip halo exchange through register communcation,
optimized blocking configuration guided by an analytic model, and coalesced DMA access
with array fusion;(3) on-the-fly compression that doubles the maximum problem size and …
This paper reports our large-scale nonlinear earthquake simulation software on Sunway TaihuLight. Our innovations include: (1) a customized parallelization scheme that employs the 10 million cores efficiently at both the process and the thread levels; (2) an elaborate memory scheme that integrates on-chip halo exchange through register communcation, optimized blocking configuration guided by an analytic model, and coalesced DMA access with array fusion; (3) on-the-fly compression that doubles the maximum problem size and further improves the performance by 24%. With these innovations to remove the memory constraints of Sunway TaihuLight, our software achieves over 15% of the system's peak, better than the 11.8% efficiency achieved by a similar software running on Titan, whose byte to flop ratio is 5 times better than TaihuLight. The extreme cases demonstrate a sustained performance of over 18.9 Pflops, enabling the simulation of Tangshan earthquake as an 18-Hz scenario with an 8-meter resolution.
ACM Digital Library