Abstract
Hardware accelerators are classic scientific coprocessors in HPC machines. However, the number of CPU cores on the mother board is increasing and constitutes a non negligible part of the total computing power of the machine. So, running an application both on an accelerator (like a GPU or a Xeon-Phi device) and on the CPU cores can provide the highest performance. Moreover, it is now possible to include different accelerators in a machine, in order to support and to speedup a larger set of applications. Then, running an application part on the most suitable device allows to reach high performance, but using all unused devices in the machine should permit to improve even more the performance of that part. However, the overlapping of computations with inter-device data transfers is mandatory to limit the overhead of this approach, leading to complex asynchronous algorithms and multi-paradigm optimized codes. This article introduces our research and experiments on cooperation between several CPU and both a GPU and a Xeon-Phi accelerators, all included in a same machine.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Shuttle Radar Topography Mission (2000). https://lta.cr.usgs.gov/SRTM1Arc
Calandra, H., Dolbeau, R., Fortin, P., Lamotte, J.L., Said, I.: Evaluation of successive CPUs/APUs/GPUs based on an OpenCL finite difference stencil. In: 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), February 2013
Contassot-Vivier, S., Vialle, S.: Algorithmic scheme for hybrid computing with CPU, Xeon-Phi/MIC and GPU devices on a single machine. In: ParCo 2015, Edinburgh, UK, September 2015
Courtès, L.: C language extensions for hybrid CPU/GPU programming with StarPU. Technical Report 8278, INRIA (2013)
Dijkstra, E.: A note on two problems in connexion with graphs. Numerische Mathematik 1(1), 269–271 (1959). doi:10.1007/BF01386390
Fang, J., Varbanescu, A.L., Imbernon, B., Cecilia, J.M., Perez-Sanchez, H.: Parallel computation of non-bonded interactions in drug discovery: Nvidia GPUs vs. Intel Xeon Phi. In: 2nd International Work-Conference on Bioinformatics and Biomedical Engineering, Granada, Spain (2014)
Gaster, B., Howes, L., Kaeli, D., Mistry, P., Schaa, D.: Heterogeneous Computing with OpenCL, 2nd edn. Morgan Kaufmann, Burlington (2012). ISBN 9780124058941
Rao, J.S.: Optimization. In: Rao, J.S. (ed.) History of Rotating Machinery Dynamics. HMMS, vol. 20, pp. 341–351. Springer, Heidelberg (2011)
Jin, G., Lin, J., Endo, T.: Efficient utilization of memory hierarchy to enable the computation on bigger domains for stencil computation in CPU-GPU based systems. In: 2014 International Conference on High Performance Computing and Applications (ICHPCA), December 2014
Su, H., Wu, N., Wen, M., Zhang, C., Cai, X.: On the GPU-CPU performance portability of OpenCL for 3D stencil computations. In: Proceedings of the 2013 International Conference on Parallel and Distributed Systems, ICPADS 2013, Washington, DC, USA (2013)
Szustak, L., Rojek, K., Olas, T., Kuczynski, L., Halbiniak, K., Gepner, P.: Adaptation of MPDATA heterogeneous stencil computation to Intel Xeon Phi coprocessor. Sci. Prog. 2015, Article ID 642705, 14 (2015). Doi:10.1155/2015/642705
Wende, F., Steinke, T.: Swendsen-Wang multi-cluster algorithm for the 2D/3D Ising model on Xeon Phi and GPU. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2013, ACM, New York, NY, USA (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Vialle, S., Contassot-Vivier, S., Mercier, P. (2016). Generic Algorithmic Scheme for 2D Stencil Applications on Hybrid Machines. In: Hannig, F., Cardoso, J.M.P., Pionteck, T., Fey, D., Schröder-Preikschat, W., Teich, J. (eds) Architecture of Computing Systems – ARCS 2016. ARCS 2016. Lecture Notes in Computer Science(), vol 9637. Springer, Cham. https://doi.org/10.1007/978-3-319-30695-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-30695-7_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30694-0
Online ISBN: 978-3-319-30695-7
eBook Packages: Computer ScienceComputer Science (R0)