Abstract
We study the use of massively parallel architectures for computing a matrix inverse. Two different algorithms are reviewed, the traditional approach based on Gaussian elimination and the Gauss–Jordan elimination alternative, and several high performance implementations are presented and evaluated. The target architecture is a current general-purpose multicore processor (CPU) connected to a graphics processor (GPU). Numerical experiments show the efficiency attained by the proposed implementations and how the computation of large-scale inverses, which only a few years ago would have required a distributed-memory cluster, take only a few minutes on a hybrid architecture formed by a multicore CPU and a GPU.
Similar content being viewed by others
References
Agullo E, Demmel J, Dongarra J, Hadri B, Kurzak J, Langou J, Ltaief H, Luszczek P, Tomov S (2009) Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J Phys Conf Ser 180(1):012037
Anderson E, Bai Z, Bischof C, Demmel J, Dongarra J, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D (1999) LAPACK users’ guide, 3rd edn. SIAM, Philadelphia
Benner P, Ezzatti P, Quintana ES, Remón A (2009) Using hybrid cpu-gpu platforms to accelerate the computation of the matrix sign function. In: LNCS, 7th int workshop on algorithms, models and tools for parallel computing on heterogeneous networks. Springer, Berlin
Bientinesi P, Gunnels JA, Myers ME, Quintana-Ortí ES, van de Geijn RA (2005) The science of deriving dense linear algebra algorithms. ACM Trans Math Softw 31(1):1–26
Gerbessiotis AV (1997) Algorithmic and practical considerations for dense matrix computations on the BSP model. PRG-TR 32, Oxford University Computing Laboratory
Golub G, Loan CV (1996) Matrix computations, 3rd edn. The Johns Hopkins University Press, Baltimore
Gunnels JA, Gustavson FG, Henry GM, van de Geijn RA (2001) FLAME: formal linear algebra methods environment. ACM Trans Math Softw 27(4):422–455
Quintana-Ortí E, Quintana-Ortí G, Sun X, van de Geijn R (2001) A note on parallel matrix inversion. SIAM J Sci Comput 22:1762–1771
Strazdins A (1998) A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. TR-CS-98-07 07, The Australian National University
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ezzatti, P., Quintana-Ortí, E.S. & Remón, A. Using graphics processors to accelerate the computation of the matrix inverse. J Supercomput 58, 429–437 (2011). https://doi.org/10.1007/s11227-011-0606-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0606-4