Abstract
Recently large scale scientific computation on heterogeneous supercomputers equipped with accelerators is receiving attraction. However, traditional static job execution methods and memory management methods are insufficient in order to harness heterogeneous computing resources including memory efficiently, since they introduce larger data movement costs and lower resource usage. This paper takes the Cholesky decomposition computation, which is an important linear algebra kernel, as the target for optimization. And we describe a scalable data-driven scheduling method and a heterogeneous memory management method in order to improve resource utilization and reduce amount of data movement. Through the performance evaluation on TSUBAME2.5, which is a heterogeneous supercomputer with NVIDIA GPUs, we demonstrate the efficiency of the proposed task scheduling method and data replacement strategies considering data reusability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In Cholesky decomposition, each task depends on two tasks or less.
- 2.
Note that the input tile data is received by a work thread, not by the ignition thread.
References
Top500. http://www.top500.org/
Tsubame2.5. http://tsubame.gsic.titech.ac.jp/
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. In: Concurrency and Computation: Practice and Experience, pp. 187–198, 23 Februaly 2011
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Herault, T., Dongarra, J.: PaRSEC: exploiting heterogeneity to enhance scalability. IEEE Comput. Sci. Eng. 15(6), 36–45 (2013)
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: a generic distributed DAG engine for high performance computing. Parallel Comput. 38, 27–51 (2012)
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: a generic distributed DAG engine for high performance computing. Technical report ICL-UT-10-01, Innovative Computing. Laboratory 11 April 2010
Choi, J., Dongarra, J., Ostrouchov, S., Petitet, A., Walker, D., Whaley, R.C.: The design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. In: Technial report UT CS-94-246, LAPACK Working Note 80, September 1994
Endo, T., Jin, G.: Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations. In: Proceedings of IEEE Cluster Computing (CLUSTER2014), pp. 132–139 (2014)
Endo, T., Nukada, A., Matsuoka, S., Maruyama, N.: Linpack evaluation on a supercomputer with heterogeneous accelerators. In: Proceedings of IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2010), pp. 1–8 (2010)
Fujisawa, K., Sato, H., Matsuoka, S., Endo, T., Yamashita, M., Nakata, M.: High-performancd general solver for extremely largescale semidefinite programming problems. In: Proceedings of IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC12), pp. 1–11 (2012)
Fujisawa, K., Endo, T., Yasui, Y., Sato, H., Matsuzawa, N., Matsuoka, S., Waki, H.: Peta-scale general solver for semidefinite programming problems with over two million constraints. In: Proceedings of the International Conference on Parallel and Distributed Processing Symposium 2014 (IPDPS2014), p. 10 (2014)
Yamashita, M., Fujisawa, K., Kojima, M.: SDPARA: semidefinite programming algorithm parallel version. Parallel Comput. 29, 1053–1067 (2003)
Acknowledgment
This research was supported by the Japan Science and Technology Agency (JST), the Core Research of Evolutionary Science and Technology (CREST) research project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Tsujita, Y., Endo, T. (2017). Data Driven Scheduling Approach for the Multi-node Multi-GPU Cholesky Decomposition. In: Desai, N., Cirne, W. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP JSSPP 2015 2016. Lecture Notes in Computer Science(), vol 10353. Springer, Cham. https://doi.org/10.1007/978-3-319-61756-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-61756-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61755-8
Online ISBN: 978-3-319-61756-5
eBook Packages: Computer ScienceComputer Science (R0)