Abstract
With the continual development of multi and many-core architectures, there is a constant need for architecture-specific tuning of application-codes in order to realize high computational performance and energy efficiency, closer to the theoretical peaks of these architectures. In this paper, we present optimization and tuning of HipGISAXS, a parallel X-ray scattering simulation codeĀ [9], on various massively-parallel state-of-the-art supercomputers based on multi and many-core processors. In particular, we target clusters of general-purpose multi-cores such as Intel Sandy Bridge and AMD Magny Cours, and many-core accelerators like Nvidia Kepler GPUs and Intel Xeon Phi coprocessors. We present both high-level algorithmic and low-level architecture-aware optimization and tuning methodologies on these platforms. We cover a detailed performance study of our codes on single and multiple nodes of several current top-ranking supercomputers. Additionally, we implement autotuning of many of the algorithmic and optimization parameters for dynamic selection of their optimal values to ensure high-performance and high-efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tesla Kepler GPU Accelerators. Datasheet (2012)
Intel Xeon Phi Coprocessor. Developerās Quick Start Guide. Version 1.5. White Paper (2013)
Performance Application Programming Interface (PAPI) (2013), http://icl.cs.utk.edu/papi
Top500 Supercomputers (June 2013), http://www.top500.org
Chourou, S., Sarje, A., Li, X., Chan, E., Hexemer, A.: HipGISAXS: A High Performance Computing Code for Simulating Grazing Incidence X-Ray Scattering Data. Submitted to the Journal of Applied Crystallography (2013)
Intel Corp.: Intel Xeon Phi Coprocessor Instruction Set Architecture Reference Manual (September 2012)
Kim, C., Satish, N., Chhugani, J., et al.: Closing the Ninja Performance Gap through Traditional Programming and Compiler Technology. Tech. Rep. (2011)
Pommier, J.: SIMD implementation of sin, cos, exp and log. Tech. Rep. (2007), http://gruntthepeon.free.fr/ssemath
Sarje, A., Li, X., Chourou, S., Chan, E., Hexemer, A.: Massively Parallel X-ray Scattering Simulations. In: Supercomputing (SC 2012) (2012)
Satish, N., Kim, C., Chhugani, J., et al.: Can traditional programming bridge the Ninja performance gap for parallel computing applications? SIGARCH Computer Architecture News 40(3), 440ā451 (2012). http://doi.acm.org/10.1145/2366231.2337210
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Sarje, A., Li, X.S., Hexemer, A. (2014). Tuning HipGISAXS on Multi and Many Core Supercomputers. In: Jarvis, S., Wright, S., Hammond, S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation. PMBS 2013. Lecture Notes in Computer Science(), vol 8551. Springer, Cham. https://doi.org/10.1007/978-3-319-10214-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-10214-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10213-9
Online ISBN: 978-3-319-10214-6
eBook Packages: Computer ScienceComputer Science (R0)