High Performance Linear Algebra Algorithms: An Introduction

Gustavson, Fred G.; Waśniewski, Jerzy

doi:10.1007/11558958_26

Fred G. Gustavson¹⁹ &
Jerzy Waśniewski²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3732))

Included in the following conference series:

International Workshop on Applied Parallel Computing

1480 Accesses

Abstract

This Mini-Symposium consisted of two back to back sessions, each consisting of five presentations, held on the afternoon of Monday, June 21, 2004. A major theme of both sessions was novel data structures for the matrices of dense linear algebra, DLA. Talks one to four of session one all centered on new data layouts of matrices. Cholesky factorization was considered in the first three talks and a contrast was made between a square block hybrid format, a recursive packed format and the two standard data structures of DLA, full and packed format. In both talks one and two, the first two new formats led to level three high performance implementations of Cholesky factorization while using exactly the same amount of storage that standard packed format required. Of course, full format requires twice the amount of storage of the other three formats. In talk one, John Reid presented a definitive study of Cholesky factorization using a standard block based iterative Cholesky factorization, [1]. This factorization is typical of Lapack type factorizations; the major difference of [1] is the type of data structure it uses: talk one uses square blocks of order NB to represent a lower (upper) triangular matrix. In talk two, Jerzy Waśniewski presented the recursive packed format and its related Cholesky factorization algorithm, [2]. This novel format gave especially good Cholesky performance for very large matrices. In talk three, Jerzy Waśniewski demonstrated a detailed tuning strategy for talk one and presented performance results on six important platforms, Alpha, IBM, Intel, Itanium, SGI and Sun. The performance runs covered the algorithms of talks one and two as well as Lapack’s full and packed Cholesky codes, [3]. Overall, the square block hybrid method was best but was not a clear winner. The recursive method suffered because it did not combine blocking with recursion, [4]. Talk four, presented by Fred Gustavson, had a different flavor. Another novel data format was described which showed that two standard full format arrays could represent a triangular or symmetric matrix using only a total storage that was equal to the storage of standard packed storage, [5]. Therefore, this format has the same desirable property of standard full format arrays: one can use standard level 3 BLAS, [6] as well as some 125 or so full format Lapack symmetric / triangular routines on it. Thus new codes written for the new format are trivial to produce as they mostly consist of just calls to already existing codes. The last talk of session one, by James Sexton was on the massively

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Article 01 October 2016

Programming parallel dense matrix factorizations with look-ahead and OpenMP

Article 02 April 2019

Optimized Batched Linear Algebra for Modern Architectures

References

Andersen, B.S., Gunnels, J.A., Gustavson, F.G., Reid, J.K., Waśniewski, J.: A Fully Portable High Performance Minimal Storage Hybrid Cholesky Algorithm. ACM TOMS 31(2), 201–227 (2005)
Article MATH Google Scholar
Andersen, B.S., Gustavson, F.G., Waśniewski, J.: A Recursive Formulation of Cholesky Factorization of a Matrix in Packed Storage. ACM TOMS 27(2), 214–244 (2001)
Article MATH Google Scholar
Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J.: LAPACK Users’ Guide, 3rd edn. Society for Industrial and Applied Mathematics, pub.
Google Scholar
Gustavson, F.G., Jonsson, I.: Minimal Storage High Performance Cholesky via Blocking and Recursion. IBM Journal of Research and Development 44(6), 823–849 (2000)
Article Google Scholar
Gunnels, J.A., Gustavson, F.G.: A New Array Format for Symmetric and Triangular Matrices. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 247–255. Springer, Heidelberg (2006)
Chapter Google Scholar
Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.S.: A Set of Level 3 Basic Linear Algebra Subprograms. ACMTransactions on Mathematical Software 16(1), 1–17 (1990)
Article MATH Google Scholar
The BlueGene/L Supercomputer Architecture. J. C. Sexton. Slides on Para04 website
Google Scholar
Scholtes, C.: A Method to Derive the Cache Performance of Irregular Applications on Machines with Direct Mapped Caches. Slides on Para04 website
Google Scholar
Gunnels, J.A., Gustavson, F.G., Henry, G.M., van de Geijn, R.A.: A Family of High- Performance Matrix Multiplication Algorithms. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 256–265. Springer, Heidelberg (2006)
Chapter Google Scholar
Gunnels, J.A., Gustavson, F.G., Pingali, K., Yotov, K.: A General Purpose Compiler that obtains High performance for Some Common Dense Linear Algebra Codes. Slides on Para04 website
Google Scholar
Clint Whaley, R., Petitet, A., Dongarra, J.J.: Automated Empirical Optimization of Software and the ATLAS Project. LAWN Report #147, pp. 1-33 (September 2000)
Google Scholar
Barnes, D.J., Hopkins, T.R.: Applying Software Testing Metrics to Lapack. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 228–236. Springer, Heidelberg (2006)
Chapter Google Scholar
Drackenberg, N.P.: A Matrix-type for Performance Portability. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds.) PARA 2004. LNCS, vol. 3732, pp. 237–246. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Fred G. Gustavson
Department of Informatics & Mathematical Modeling, Technical University of Denmark, DK-2800, Lyngby, Denmark
Jerzy Waśniewski

Authors

Fred G. Gustavson
View author publications
You can also search for this author in PubMed Google Scholar
Jerzy Waśniewski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra
Department of Informatics and Mathematical Modelling, Technical University of Denmark, DK-2800, Lyngby, Denmark
Kaj Madsen
Informatics & Mathematical Modeling, Technical University of Denmark, DK-2800, Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gustavson, F.G., Waśniewski, J. (2006). High Performance Linear Algebra Algorithms: An Introduction. In: Dongarra, J., Madsen, K., Waśniewski, J. (eds) Applied Parallel Computing. State of the Art in Scientific Computing. PARA 2004. Lecture Notes in Computer Science, vol 3732. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11558958_26

Download citation

DOI: https://doi.org/10.1007/11558958_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29067-4
Online ISBN: 978-3-540-33498-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

High Performance Linear Algebra Algorithms: An Introduction

Abstract

Access this chapter

Similar content being viewed by others

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Programming parallel dense matrix factorizations with look-ahead and OpenMP

Optimized Batched Linear Algebra for Modern Architectures

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

High Performance Linear Algebra Algorithms: An Introduction

Abstract

Access this chapter

Similar content being viewed by others

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Programming parallel dense matrix factorizations with look-ahead and OpenMP

Optimized Batched Linear Algebra for Modern Architectures

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation