New Level-3 BLAS Kernels for Cholesky Factorization

Gustavson, Fred G.; Waśniewski, Jerzy; Herrero, José R.

doi:10.1007/978-3-642-31464-3_7

Fred G. Gustavson^19,20,
Jerzy Waśniewski²¹ &
José R. Herrero²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7203))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

2176 Accesses

Abstract

Some Linear Algebra Libraries use Level-2 routines during the factorization part of any Level-3 block factorization algorithm. We discuss four Level-3 routines called DPOTF3, a new type of BLAS, for the factorization part of a block Cholesky factorization algorithm for use by LAPACK routine DPOTRF or for BPF (Blocked Packed Format) Cholesky factorization. The four routines DPOTF3 are Fortran routines. Our main result is that performance of routines DPOTF3 is still increasing when the performance of Level-2 routine DPOTF2 of LAPACK starts to decrease. This means that the performance of DGEMM, DSYRK, and DTRSM will increase due to their use of larger block sizes and also to making less passes over the matrix elements. We present corroborating performance results for DPOTF3 versus DPOTF2 on a variety of common platforms. The four DPOTF3 routines are based on simple register blocking; different platforms have different numbers of registers and so our four routines have different register blockings. Blocked Packed Format (BPF) is discussed. LAPACK routines for _POTRF and _PPTRF using BPF instead of full and packed format are shown to be trivial modifications of LAPACK _POTRF source codes. Upper BPF is shown to be identical to square block packed format. Performance results for DBPTRF and DPOTRF for large n show that routines DPOTF3 does increase performance for large n.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Tall-and-Skinny QR Factorization for Clusters of GPUs Using High-Performance Building Blocks

Factoring Multivariate Polynomials Represented by Black Boxes: A Maple + C Implementation

Article 21 September 2022

Cholesky and Gram-Schmidt Orthogonalization for Tall-and-Skinny QR Factorizations on Graphics Processors

References

Andersen, B.S., Gustavson, F.G., Waśniewski, J.: A Recursive Formulation of Cholesky Factorization of a Matrix in Packed Storage. ACM TOMS 27(2), 214–244 (2001)
Article MATH Google Scholar
Andersen, B.S., Gunnels, J.A., Gustavson, F.G., Reid, J.K., Waśniewski, J.: A Fully Portable High Performance Minimal Storage Hybrid Cholesky Algorithm. ACM TOMS 31(2), 201–227 (2005)
Article MATH Google Scholar
Anderson, E., et al.: LAPACK Users’ Guide Release 3.0. SIAM, Philadelphia (1999)
Book Google Scholar
D’Azevedo, E., Dongarra, J.J.: Packed storage extension of ScaLAPACK. ORNL Report 6190, Oak Ridge National Laboratory, 13 pages (May 1998)
Google Scholar
Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: Set of Level 3 Basic Linear Algebra Subprograms. TOMS 16(1), 1–17 (1990)
Article MATH Google Scholar
Elmroth, E., Gustavson, F.G., Jonsson, I., Kågström, B.: Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software. SIAM Review 46(1), 3–45 (2004)
Article MathSciNet MATH Google Scholar
Gunnels, J.A., Gustavson, F.G., Pingali, K.K., Yotov, K.: Is Cache-Oblivious DGEMM Viable? In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 919–928. Springer, Heidelberg (2007)
Chapter Google Scholar
Gustavson, F.G.: Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms. IBM J. R. & D 41(6), 737–755 (1997)
Article Google Scholar
Gustavson, F.G., Jonsson, I.: Minimal Storage High Performance Cholesky via Blocking and Recursion. IBM J. R. & D 44(6), 823–849 (2000)
Article Google Scholar
Gustavson, F.G.: New Generalized Data Structures for Matrices Lead to a Variety of High-Performance Algorithms. In: Boisvert, R.F., Tang, P.T.P. (eds.) Proceedings of the IFIP WG 2.5 Working Group on The Architecture of Scientific Software, Ottawa, Canada, October 2-4, pp. 211–234. Kluwer Academic Pub. (2000)
Google Scholar
Gustavson, F.G.: High Performance Linear Algebra Algorithms using New Generalized Data Structures for Matrices. IBM J. R. & D 47(1), 31–55 (2003)
Article MathSciNet Google Scholar
Gustavson, F.G., Gunnels, J., Sexton, J.: Minimal Data Copy For Dense Linear Algebra Factorization. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 540–549. Springer, Heidelberg (2007)
Chapter Google Scholar
Gustavson, F.G., Reid, J.K., Waśniewski, J.: Algorithm 865: Fortran 95 Subroutines for Cholesky Factorization in Blocked Hybrid Format. ACM TOMS 33(1), 5 pages (2007)
Article Google Scholar
Gustavson, F.G.: Cache Blocking. In: Jónasson, K. (ed.) PARA 2010, Part I. LNCS, vol. 7133, pp. 22–32. Springer, Heidelberg (2012)
Chapter Google Scholar
Gustavson, F.G., Karlsson, L., Kågström, B.: Parallel and Cache-Efficient In-Place Matrix Storage Format Conversion. ACM TOMS, 34 pages (to appear, 2012)
Google Scholar
Herrero, J.R., Navarro, J.J.: Compiler-Optimized Kernels: An Efficient Alternative to Hand-Coded Inner Kernels. In: Gavrilova, M.L., Gervasi, O., Kumar, V., Tan, C.J.K., Taniar, D., Laganá, A., Mun, Y., Choo, H. (eds.) ICCSA 2006. LNCS, vol. 3984, pp. 762–771. Springer, Heidelberg (2006)
Chapter Google Scholar
Herrero, J.R.: New Data Structures for Matrices and Specialized Inner Kernels: Low Overhead for High Performance. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 659–667. Springer, Heidelberg (2008)
Chapter Google Scholar
Knuth, D.: The Art of Computer Programming, 3rd edn., vol. 1&2. Addison-Wesley
Google Scholar
Kurzak, J., Buttari, A., Dongarra, J.: Solving systems of Linear Equations on the Cell Processor using Cholesky Factorization. IEEE Trans. Parallel Distrib. Syst. 19(9), 1175–1186 (2008)
Article Google Scholar
Whaley, C.: Empirically tuning LAPACK’s blocking factor for increased performance. In: Proc. of the Conf. on Computer Aspects of Numerical Algs., 8 pages (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research, Emeritus, USA
Fred G. Gustavson
Umeå University, Sweden
Fred G. Gustavson
Technical University of Denmark, Denmark
Jerzy Waśniewski
Universitat Politècnica de Catalunya, BarcelonaTech, Spain
José R. Herrero

Authors

Fred G. Gustavson
View author publications
You can also search for this author in PubMed Google Scholar
Jerzy Waśniewski
View author publications
You can also search for this author in PubMed Google Scholar
José R. Herrero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer and Information Science, Czestochowa University of Technology, Dabrowskiego 69, 42-201, Czestochowa, Poland
Roman Wyrzykowski & Konrad Karczewski &
Electrical Engineering and Computer Science Department, University of Tennessee, 1122 Volunteer Blvd, 37996-3450, Knoxville, TN, USA
Jack Dongarra
Department of Informatics and Mathematical Modeling, Technical University of Denmark, Richard Petersens Plads, Building 321, 2800, Kongens Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gustavson, F.G., Waśniewski, J., Herrero, J.R. (2012). New Level-3 BLAS Kernels for Cholesky Factorization. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2011. Lecture Notes in Computer Science, vol 7203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31464-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-31464-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31463-6
Online ISBN: 978-3-642-31464-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

New Level-3 BLAS Kernels for Cholesky Factorization

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Tall-and-Skinny QR Factorization for Clusters of GPUs Using High-Performance Building Blocks

Factoring Multivariate Polynomials Represented by Black Boxes: A Maple + C Implementation

Cholesky and Gram-Schmidt Orthogonalization for Tall-and-Skinny QR Factorizations on Graphics Processors

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

New Level-3 BLAS Kernels for Cholesky Factorization

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Tall-and-Skinny QR Factorization for Clusters of GPUs Using High-Performance Building Blocks

Factoring Multivariate Polynomials Represented by Black Boxes: A Maple + C Implementation

Cholesky and Gram-Schmidt Orthogonalization for Tall-and-Skinny QR Factorizations on Graphics Processors

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation