Abstract
In [5],[6], we presented algorithm RGEQR3, a purely recursive formulation of the QR factorization. Using recursion leads us to a natural way to choose the k-way aggregating Householder transform of Schreiber and Van Loan [10]. RGEQR3 is a performance critical subroutine for the main (hybrid recursive) routine RGEQRF for QR factorization of a general m×n matrix. This contribution presents a new version of RGEQRF and its accompanying SMP parallel counterpart, implemented for a future release of the IBM ESSL library. It represents a robust high-performance piece of library software for QR factorization on uniprocessor and multiprocessor systems. The implementation builds on previous results [5],[6]. In particular, the new version is optimized in a number of ways to improve the performance; e.g., for small matrices and matrices with a very small number of columns. This is partly done by including mini blocking in the otherwise pure recursive RGEQR3. We describe the salient features of this implementation. Our serial implementation outperforms the corresponding LAPACK routine by 10-65% for square matrices and 10-100% on tall and thin matrices on the IBM POWER2 and POWER3 nodes. The tests covered matrix sizes which varied from very small to very large. The SMP parallel implementation shows close to perfect speedup on a 4-processor PPC604e node.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R.C. Agarwal, F.G. Gustavson, and M. Zubair. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM J. Res. Develop, 38(5):563–576, September 1994.
E._Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Green-baum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users’ Guide-Release 2.0. SIAM, Philadelphia, 1994.
C. Bischof and C. Van Loan. The WY representation for products of householder matrices. SIAM J. Scientific and Statistical Computing, 8(1):s2–s13, 1987.
A. Chalmers and J. Tidmus. Practical Parallel Processing. International Thomson Computer Press, UK, 1996.
E. Elmroth and F. Gustavson. Applying Recursion to Serial and Parallel QR Facto-rization Leads to Better Performance. IBM Journal of Research and Development, 44, No. 4, 605–624, 2000.
E. Elmroth and F. Gustavson. New serial and parallel recursive QR factorization algorithms for SMP systems. In B. Kågström et al., editors, Applied Parallel Com-puting, Large Scale Scientific and Industrial Problems, Lecture Notes in Computer Science, No. 1541, pages 120–128, 1998.
F. Gustavson. Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms. IBM Journal of Research and Development, Vol. 41, No. 6, 1997.
F. Gustavson, A. Henriksson, I. Jonsson, B. Kågström and P. Ling. Superscalar GEMM-based Level 3 BLAS-The On-going Evolution of a Portable and High-Performance Library. In Kågström et al. (eds), Applied Parallel Computing. Large Scale Scientific and Industrial Problems, Lecture Notes in Computer Science, Vol. 1541, pp 207–215, Springer-Verlag, 1998.
F. Gustavson and I. Jonsson. High Performance Cholesky Factorization via Blocking and Recursion that uses Minimal Storage. This Proceedings.
R. Schreiber and C. Van Loan. A storage efficient WY representation for products of householder transformations. SIAM J. Scientific and Statistical Computing, 10(1):53–57, 1989.
S. Toledo. Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix. Anal. Appl., 18(4), 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Elmroth, E., Gustavson, F. (2001). High-Performance Library Software for QR Factorization. In: Sørevik, T., Manne, F., Gebremedhin, A.H., Moe, R. (eds) Applied Parallel Computing. New Paradigms for HPC in Industry and Academia. PARA 2000. Lecture Notes in Computer Science, vol 1947. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-70734-4_9
Download citation
DOI: https://doi.org/10.1007/3-540-70734-4_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41729-3
Online ISBN: 978-3-540-70734-9
eBook Packages: Springer Book Archive