Three Algorithms for Cholesky Factorization on Distributed Memory Using Packed Storage

Gustavson, Fred G.; Karlsson, Lars; Kågström, Bo

doi:10.1007/978-3-540-75755-9_67

Fred G. Gustavson^1,2,
Lars Karlsson² &
Bo Kågström²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4699))

Included in the following conference series:

International Workshop on Applied Parallel Computing

1793 Accesses

Abstract

We present three algorithms for Cholesky factorization using minimum block storage for a distributed memory (DM) environment. One of the distributed square block packed (SBP) format algorithms performs similar to ScaLAPACK PDPOTRF, and our algorithm with iteration overlapping typically outperforms it by 15–50% for small and medium sized matrices. By storing the blocks contiguously, we get better performing BLAS operations. Our DM algorithms are not sensitive to cache conflicts and thus give smooth and predictable performance. We also investigate the intricacies of using rectangular full packed (RFP) format with ScaLAPACK routines and point out some advantages and drawbacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

High Performance Polar Decomposition on Distributed Memory Systems

Exploiting Data Sparsity for Large-Scale Matrix Computations

Single Matrix Block Shift (SMBS) Dense Matrix Multiplication Algorithm

References

Agarwal, R.C., Gustavson, F.G.: A parallel implementation of matrix multiplication and LU factorization on the IBM 3090. In: Wright, M. (ed.) Aspects of Computation on Asynchronous and Parallel Processors, pp. 217–221. IFIP, North-Holland, Amsterdam (1989)
Google Scholar
Baboulin, M., Giraud, L., Gratton, S., Langou, J.: A distributed packed storage for large parallel calculations. Technical Report TR/PA/05/30, CERFACS, Toulouse, France (2005)
Google Scholar
Blackford, L.S., et al.: ScaLAPACK user’s guide. SIAM Publications (1997)
Google Scholar
Choi, J., Dongarra, J.J., Ostrouchov, S., Petitet, A.P., Walker, D.W., Whaley, R.C.: Design and implementation of the ScaLAPACK LU, QR, and Cholesky factorization routines. Scientific Programming 5(3), 173–184 (1996)
Google Scholar
Dackland, K., Elmroth, E., Kågström, B.: A ring–oriented approach for block matrix factorizations on shared and distributed memory architectures. In: Sincovec, R.F., et al. (eds.) SIAM Conference on Parallel Processing for Scientific Computing, pp. 330–338. SIAM Publications (1993)
Google Scholar
D’Azevedo, E., Dongarra, J.: Packed storage extension for ScaLAPACK. Technical Report UT-CS-98-385 (1998)
Google Scholar
Gustavson, F.: Algorithm compiler architecture interaction relative to dense linear algebra. Technical Report RC 23715, IBM Thomas J. Watson Research Center (September 2005)
Google Scholar
Gustavson, F.: New generalized data structures for matrices lead to a variety of high performance dense linear algebra algorithms. In: Dongarra, J.J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 11–20. Springer, Heidelberg (2006)
Chapter Google Scholar
Gustavson, F., Wasniewski, J.: LAPACK Cholesky routines in rectangular full packed format. In: Rectangular Full Packed Format for LAPACK Algorithms Timings on Several Computers. Workshop on State-of-the-Art in Scientific and Parallel Computing. LNCS, pp. 570–579. Springer, Heidelberg, 2006 (to appear)
Google Scholar
Kurzak, J., Dongarra, J.J.: Pipelined shared memory implementation of linear algebra routines with arbitrary lookahead – LU, Cholesky, QR. In: Implementing Linear Algebra Routines on Multi-core Processors with Pipelining and a Look Ahead. Workshop on State-of-the-Art in Scientific and Parallel Computing. LNCS, pp. 147–156. Springer, Heidelberg, 2006 (to appear)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA
Fred G. Gustavson
Department of Computing Science and HPC2N, UmeåUniversity, SE-901 87 Umeå, Sweden
Fred G. Gustavson, Lars Karlsson & Bo Kågström

Authors

Fred G. Gustavson
View author publications
You can also search for this author in PubMed Google Scholar
Lars Karlsson
View author publications
You can also search for this author in PubMed Google Scholar
Bo Kågström
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bo Kågström Erik Elmroth Jack Dongarra Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gustavson, F.G., Karlsson, L., Kågström, B. (2007). Three Algorithms for Cholesky Factorization on Distributed Memory Using Packed Storage. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2006. Lecture Notes in Computer Science, vol 4699. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75755-9_67

Download citation

DOI: https://doi.org/10.1007/978-3-540-75755-9_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75754-2
Online ISBN: 978-3-540-75755-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Three Algorithms for Cholesky Factorization on Distributed Memory Using Packed Storage

Abstract

Access this chapter

Preview

Similar content being viewed by others

High Performance Polar Decomposition on Distributed Memory Systems

Exploiting Data Sparsity for Large-Scale Matrix Computations

Single Matrix Block Shift (SMBS) Dense Matrix Multiplication Algorithm

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Three Algorithms for Cholesky Factorization on Distributed Memory Using Packed Storage

Abstract

Access this chapter

Preview

Similar content being viewed by others

High Performance Polar Decomposition on Distributed Memory Systems

Exploiting Data Sparsity for Large-Scale Matrix Computations

Single Matrix Block Shift (SMBS) Dense Matrix Multiplication Algorithm

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation