Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2063384.2063393acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Tiled QR factorization algorithms

Published: 12 November 2011 Publication History

Abstract

This work revisits existing algorithms for the QR factorization of rectangular matrices composed of p × q tiles, where pq. Within this framework, we study the critical paths and performance of algorithms such as Sameh-Kuck, Fibonacci, Greedy, and those found within PLASMA. Although neither Fibonacci nor Greedy is optimal, both are shown to be asymptotically optimal for all matrices of size p = q2f(q), where f is any function such that lim+∞ f = 0. This novel and important complexity result applies to all matrices where p and q are proportional, p = λq, with λ ≥ 1, thereby encompassing many important situations in practice (least squares). We provide an extensive set of experiments that show the superiority of the new algorithms for tall matrices.

References

[1]
E. Agullo, J. Dongarra, R. Nath, and S. Tomov. A fully empirical autotuned dense QR factorization for multicore architectures. Technical Report 242, LAPACK Working Note, 2011.
[2]
E. Agullo, B. Hadri, H. Ltaief, and J. Dongarra. Comparative study of one-sided factorizations with multiple software packages on multi-core hardware. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09), pages 1--12. IEEE Computer Society Press, 2009.
[3]
S. Blackford and J. J. Dongarra. Installation guide for LAPACK. Technical Report 41, LAPACK Working Note, June 1999. originally released March 1992.
[4]
H. Bouwmeester, M. Jacquelin, J. Langou, and Y. Robert. Tiled QR factorization algorithms. Technical Report 7601, INRIA, France, Apr. 2011. Available at http://hal.inria.fr/docs/00/58/62/39/PDF/RR-7601.pdf.
[5]
A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. Parallel tiled QR factorization for multicore architectures. Concurrency Computat.: Pract. Exper., 20(13):1573--1590, 2008.
[6]
A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Computing, 35(1):38--53, 2009.
[7]
M. Cosnard, J.-M. Muller, and Y. Robert. Parallel QR decomposition of a rectangular matrix. Numerische Mathematik, 48:239--249, 1986.
[8]
M. Cosnard and Y. Robert. Complexity of parallel QR factorization. Journal of the A. C. M., 33(4):712--723, 1986.
[9]
J. W. Demmel, L. Grigori, M. Hoemmen, and J. Langou. Communication-avoiding parallel and sequential QR and LU factorizations: theory and practice. Technical Report 204, LAPACK Working Note, 2008.
[10]
B. Hadri, H. Ltaief, E. Agullo, and J. Dongarra. Enhancing parallelism of tile QR factorization for multicore architectures. Technical Report 222, LAPACK Working Note, 2009.
[11]
B. Hadri, H. Ltaief, E. Agullo, and J. Dongarra. Tile QR factorization with parallel panel processing for multicore architectures. In 24th IEEE Int. Parallel Distributed Processing Symposium IPDPS'10, 2010.
[12]
J. Modi and M. Clarke. An alternative Givens ordering. Numerische Mathematik, 43:83--90, 1984.
[13]
G. Quintana-Ortí, E. S. Quintana-Ortí, R. A. van de Geijn, F. G. V. Zee, and E. Chan. Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Transactions on Mathematical Software, 36(3), 2009.
[14]
A. Sameh and D. Kuck. On stable parallel linear systems solvers. J. ACM, 25:81--91, 1978.
[15]
SimGrid. URL: http://simgrid.gforge.inria.fr.
[16]
R. C. Whaley and A. M. Castaldo. Achieving accurate and context-sensitive timing for code optimization. Softw. Pract. Exper., 38:1621--1642, December 2008.
[17]
S. Williams, A. Waterman, and D. Patterson. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM, 52:65--76, April 2009.

Cited By

View all
  • (2024)Cooperative Localization and Mapping Based on UWB/IMU Fusion Using Factor GraphsIEEE Sensors Journal10.1109/JSEN.2023.331627824:14(21931-21940)Online publication date: 15-Jul-2024
  • (2019)Least squares solvers for distributed-memory machines with GPU acceleratorsProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330356(117-126)Online publication date: 26-Jun-2019
  • (2019)Leveraging Task-Based Polar Decomposition Using PARSEC on Massively Parallel Systems2019 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2019.8891024(1-12)Online publication date: Sep-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
November 2011
866 pages
ISBN:9781450307710
DOI:10.1145/2063384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. QR factorization
  2. critical path
  3. greedy algorithms
  4. tall matrix

Qualifiers

  • Research-article

Funding Sources

Conference

SC '11
Sponsor:

Acceptance Rates

SC '11 Paper Acceptance Rate 74 of 352 submissions, 21%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Cooperative Localization and Mapping Based on UWB/IMU Fusion Using Factor GraphsIEEE Sensors Journal10.1109/JSEN.2023.331627824:14(21931-21940)Online publication date: 15-Jul-2024
  • (2019)Least squares solvers for distributed-memory machines with GPU acceleratorsProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330356(117-126)Online publication date: 26-Jun-2019
  • (2019)Leveraging Task-Based Polar Decomposition Using PARSEC on Massively Parallel Systems2019 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2019.8891024(1-12)Online publication date: Sep-2019
  • (2018)Hierarchical QR factorization algorithms for multi-core clustersParallel Computing10.1016/j.parco.2013.01.00339:4-5(212-232)Online publication date: 31-Dec-2018
  • (2017)Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2017.46(668-677)Online publication date: May-2017
  • (2016)Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime SystemsACM Transactions on Mathematical Software10.1145/289834843:2(1-22)Online publication date: 16-Aug-2016
  • (2015)MN-MATEACM Journal on Emerging Technologies in Computing Systems10.1145/270142912:1(1-25)Online publication date: 3-Aug-2015
  • (2014)Implementing a Systolic Algorithm for QR Factorization on Multicore Clusters with PaRSECEuro-Par 2013: Parallel Processing Workshops10.1007/978-3-642-54420-0_64(657-667)Online publication date: 2014
  • (2013)Tiled QR Decomposition and Its Optimization on CPU and GPU Computing SystemProceedings of the 2013 42nd International Conference on Parallel Processing10.1109/ICPP.2013.88(744-753)Online publication date: 1-Oct-2013
  • (2012)Hierarchical QR Factorization Algorithms for Multi-core Cluster SystemsProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium10.1109/IPDPS.2012.62(607-618)Online publication date: 21-May-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media