Abstract
We present three algorithms for Cholesky factorization using minimum block storage for a distributed memory (DM) environment. One of the distributed square block packed (SBP) format algorithms performs similar to ScaLAPACK PDPOTRF, and our algorithm with iteration overlapping typically outperforms it by 15–50% for small and medium sized matrices. By storing the blocks contiguously, we get better performing BLAS operations. Our DM algorithms are not sensitive to cache conflicts and thus give smooth and predictable performance. We also investigate the intricacies of using rectangular full packed (RFP) format with ScaLAPACK routines and point out some advantages and drawbacks.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agarwal, R.C., Gustavson, F.G.: A parallel implementation of matrix multiplication and LU factorization on the IBM 3090. In: Wright, M. (ed.) Aspects of Computation on Asynchronous and Parallel Processors, pp. 217–221. IFIP, North-Holland, Amsterdam (1989)
Baboulin, M., Giraud, L., Gratton, S., Langou, J.: A distributed packed storage for large parallel calculations. Technical Report TR/PA/05/30, CERFACS, Toulouse, France (2005)
Blackford, L.S., et al.: ScaLAPACK user’s guide. SIAM Publications (1997)
Choi, J., Dongarra, J.J., Ostrouchov, S., Petitet, A.P., Walker, D.W., Whaley, R.C.: Design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Scientific Programming 5(3), 173–184 (1996)
Dackland, K., Elmroth, E., Kågström, B.: A ring–oriented approach for block matrix factorizations on shared and distributed memory architectures. In: Sincovec, R.F., et al. (eds.) SIAM Conference on Parallel Processing for Scientific Computing, pp. 330–338. SIAM Publications (1993)
D’Azevedo, E., Dongarra, J.: Packed storage extension for ScaLAPACK. Technical Report UT-CS-98-385 (1998)
Gustavson, F.: Algorithm compiler architecture interaction relative to dense linear algebra. Technical Report RC 23715, IBM Thomas J. Watson Research Center (September 2005)
Gustavson, F.: New generalized data structures for matrices lead to a variety of high performance dense linear algebra algorithms. In: Dongarra, J.J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 11–20. Springer, Heidelberg (2006)
Gustavson, F., Wasniewski, J.: LAPACK Cholesky routines in rectangular full packed format. In: Rectangular Full Packed Format for LAPACK Algorithms Timings on Several Computers. Workshop on State-of-the-Art in Scientific and Parallel Computing. LNCS, pp. 570–579. Springer, Heidelberg, 2006 (to appear)
Kurzak, J., Dongarra, J.J.: Pipelined shared memory implementation of linear algebra routines with arbitrary lookahead – LU, Cholesky, QR. In: Implementing Linear Algebra Routines on Multi-core Processors with Pipelining and a Look Ahead. Workshop on State-of-the-Art in Scientific and Parallel Computing. LNCS, pp. 147–156. Springer, Heidelberg, 2006 (to appear)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gustavson, F.G., Karlsson, L., Kågström, B. (2007). Three Algorithms for Cholesky Factorization on Distributed Memory Using Packed Storage. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2006. Lecture Notes in Computer Science, vol 4699. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75755-9_67
Download citation
DOI: https://doi.org/10.1007/978-3-540-75755-9_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75754-2
Online ISBN: 978-3-540-75755-9
eBook Packages: Computer ScienceComputer Science (R0)