Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1513895.1513904acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

QR decomposition on GPUs

Published: 08 March 2009 Publication History

Abstract

QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive systems commonly employ QR decomposition to solve overdetermined least squares problems. Performance of QR decomposition is typically the crucial factor limiting problem sizes.
Graphics Processing Units (GPUs) are high-performance processors capable of executing hundreds of floating point operations in parallel. As commodity accelerators for 3D graphics, GPUs offer tremendous computational performance at relatively low costs. While GPUs are favorable to applications with much inherent parallelism requiring coarse-grain synchronization between processors, methods for efficiently utilizing GPUs for algorithms computing QR decomposition remain elusive.
In this paper, we discuss the architectural characteristics of GPUs and explain how a high-performance implementation of QR decomposition may be implemented. We provide detailed performance analysis of the resulting implementation for real-valued matrices and offer recommendations for achieving high performance to future developers of dense linear algebra procedures for GPUs. Our implementation sustains 143 GFLOP/s, and we believe this is the fastest announced QR implementation executing entirely on the GPU.

References

[1]
NVIDIA Corporation, Santa Clara, California, NVIDIA CUDA Compute Unified Device Architecture, 2008.
[2]
Khronos OpenCL Working Group, The OpenCL Specification, 2008.
[3]
A. Kerr, D. Campbell, and M. Richards, GPU Performance Assessment with the HPEC Challenge, in HPEC Workshop 2008, Lexington, MA, 2008, MIT Lincoln Laboratory.
[4]
R. Haney, T. Meuse, J. Kepner, and J. Lebak, HPEC Challenge Overview, MIT Lincoln Laboratory, 2005.
[5]
G. Golub and C. V. Loan, Matrix Computations, Third ed. (Johns Hopkins University Press, Baltimore, MD., 1996).
[6]
C. H. Bischof and C. V. Loan, The WY Representation for Products of Householder Matrices, Cornell University, Ithaca, NY, USA, 1985.
[7]
D. Bindel, J. Demmel, W. Kahan, and O. Marques, On computing Givens rotations reliably and efficiently, ACM Trans. Math. Softw., New York, NY, USA, 2002.
[8]
A. H. Sameh and D. J. Kuck, On Stable Parallel Linear System Solvers, Journal of the ACM, 1978.
[9]
H. Hoffmann, Stream Algorithms and Architecture, Master's thesis, Massachusetts Institute of Technology, 2003.
[10]
NVIDIA, CUDA CUBLAS Library, NVIDIA Corporation, Santa Clara, California, 2008.
[11]
V. Volkov and J. W. Demmel, Benchmarking GPUs to Tune Dense Linear Algebra, in SC '08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, pp. 1--11, Piscataway, NJ, USA, 2008, IEEE Press.
[12]
A. Kerr, D. Campbell, and M. Richards, GPU VSIPL, in HPEC Workshop 2008, Lexington, MA, 2008, MIT Lincoln Laboratory.
[13]
M. Baboulin, J. Dongarra, and S. Tomov, Some Issues in Dense Linear Algebra for Multicore and Special Purpose Architectures, Technical Report UT-CS-08-200, University of Tennessee, 2008.
[14]
D. A. Schwartz, R. R. Judd, W. J. Harrod, and D. P. Manley, VSIPL 1.3 API, VSIPL Forum, 2008.

Cited By

View all
  • (2023)Optimal Linear Subspace Search: Learning to Construct Fast and High-Quality Schedulers for Diffusion ModelsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614999(463-472)Online publication date: 21-Oct-2023
  • (2023)eGPU: A 750 MHz Class Soft GPGPU for FPGA2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL60245.2023.00047(277-282)Online publication date: 4-Sep-2023
  • (2022)High-Speed Depth-Normal Measurement and Fusion Based on Multiband Sensing and Block ParallelizationJournal of Robotics and Mechatronics10.20965/jrm.2022.p111134:5(1111-1121)Online publication date: 20-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
March 2009
107 pages
ISBN:9781605585178
DOI:10.1145/1513895
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 March 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

GPGPU '09

Acceptance Rates

Overall Acceptance Rate 57 of 129 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)5
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Optimal Linear Subspace Search: Learning to Construct Fast and High-Quality Schedulers for Diffusion ModelsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614999(463-472)Online publication date: 21-Oct-2023
  • (2023)eGPU: A 750 MHz Class Soft GPGPU for FPGA2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL60245.2023.00047(277-282)Online publication date: 4-Sep-2023
  • (2022)High-Speed Depth-Normal Measurement and Fusion Based on Multiband Sensing and Block ParallelizationJournal of Robotics and Mechatronics10.20965/jrm.2022.p111134:5(1111-1121)Online publication date: 20-Oct-2022
  • (2022)Evaluating the Performance Acceleration of Generalized Linear Solver using Normal Equation on Three Architectures for Tall Skinny Datasets2022 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI58124.2022.00028(134-139)Online publication date: Dec-2022
  • (2021)Acceleration of Parallel-Blocked QR Decomposition of Tall-and-Skinny Matrices on FPGAsACM Transactions on Architecture and Code Optimization10.1145/344777518:3(1-25)Online publication date: 10-May-2021
  • (2021)System on chip implementation of floating point matrix inversion using modified Gram-Schmidt based QR decomposition on PYNQ FPGA2021 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS)10.1109/iSES52644.2021.00030(84-88)Online publication date: Dec-2021
  • (2021)Continuous-Time Varying Complex QR Decomposition via Zeroing Neural DynamicsNeural Processing Letters10.1007/s11063-021-10566-yOnline publication date: 24-Jun-2021
  • (2020)MemFlow: Memory-Driven Data Scheduling With Datapath Co-Design in Accelerators for Large-Scale Inference ApplicationsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2019.292537739:9(1875-1888)Online publication date: Sep-2020
  • (2020)An Architecture for Solving the Eigenvalue Problem on Embedded FPGAsArchitecture of Computing Systems – ARCS 202010.1007/978-3-030-52794-5_3(32-43)Online publication date: 9-Jul-2020
  • (2019)Accelerating Solution of Generalized Linear Models by Solving Normal Equation Using GPGPU on a Large Real-World Tall-Skinny Data Set2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2019.00029(112-119)Online publication date: Oct-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media