Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3571885.3571996acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Solving linear systems on a GPU with hierarchically off-diagonal low-rank approximations

Published: 18 November 2022 Publication History

Abstract

We are interested in solving linear systems arising from three applications: (1) kernel methods in machine learning, (2) discretization of boundary integral equations from mathematical physics, and (3) Schur complements formed in the factorization of many large sparse matrices. The coefficient matrices are often data-sparse in the sense that their off-diagonal blocks have low numerical ranks; specifically, we focus on "hierarchically off-diagonal low-rank (HODLR)" matrices. We introduce algorithms for factorizing HODLR matrices and for applying the factorizations on a GPU. The algorithms leverage the efficiency of batched dense linear algebra, and they scale nearly linearly with the matrix size when the numerical ranks are fixed. The accuracy of the HODLR-matrix approximation is a tunable parameter, so we can construct high-accuracy fast direct solvers or low-accuracy robust preconditioners. Numerical results show that we can solve problems with several millions of unknowns in a couple of seconds on a single GPU.

Supplementary Material

MP4 File (SC22_Presentation_Chen_Chao.mp4)
Presentation at SC '22

References

[1]
S. Ambikasaran and E. Darve, "An O (N log N) fast direct solver for partial hierarchically semi-separable matrices," Journal of Scientific Computing, vol. 57, no. 3, pp. 477--501, 2013.
[2]
A. Aminfar, S. Ambikasaran, and E. Darve, "A fast block low-rank dense solver with applications to finite-element matrices," Journal of Computational Physics, vol. 304, pp. 170--188, 2016.
[3]
A. G. Gray and A. W. Moore, "N-Body problems in statistical learning," Advances in neural information processing systems, pp. 521--527, 2001.
[4]
T. Hofmann, B. Schölkopf, and A. J. Smola, "Kernel methods in machine learning," The annals of statistics, vol. 36, no. 3, pp. 1171--1220, 2008.
[5]
S. Ambikasaran, Fast algorithms for dense numerical linear algebra and applications. Stanford University, 2013.
[6]
J. Y. Li, S. Ambikasaran, E. F. Darve, and P. K. Kitanidis, "A kalman filter powered by-matrices for quasi-continuous data assimilation problems," Water Resources Research, vol. 50, no. 5, pp. 3734--3749, 2014.
[7]
S. Ambikasaran, M. O'Neil, and K. R. Singh, "Fast symmetric factorization of hierarchical matrices with applications," arXiv preprint arXiv:1405.0223, 2014.
[8]
S. Ambikasaran, D. Foreman-Mackey, L. Greengard, D. W. Hogg, and M. O'Neil, "Fast direct methods for Gaussian processes," IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 252--265, 2015.
[9]
P.-G. Martinsson, Fast direct solvers for elliptic PDEs. SIAM, 2019.
[10]
T. A. Davis, S. Rajamanickam, and W. M. Sid-Lakhdar, "A survey of direct methods for sparse linear systems," Acta Numerica, vol. 25, pp. 383--566, 2016.
[11]
J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li, "Superfast multifrontal method for large structured linear systems of equations," SIAM Journal on Matrix Analysis and Applications, vol. 31, no. 3, pp. 1382--1411, 2010.
[12]
P.-G. Martinsson, "A fast direct solver for a class of elliptic partial differential equations," J. Sci. Comput., vol. 38, no. 3, p. 316--330, mar 2009. [Online].
[13]
H. Ltaief, J. Cranney, D. Gratadour, Y. Hong, L. Gatineau, and D. Keyes, "Meeting the real-time challenges of ground-based telescopes using low-rank matrix computations," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2021, pp. 1--16.
[14]
P. R. Amestoy, A. Buttari, J.-Y. L'excellent, and T. Mary, "Performance and scalability of the block low-rank multifrontal factorization on multicore architectures," ACM Transactions on Mathematical Software (TOMS), vol. 45, no. 1, pp. 1--26, 2019.
[15]
K. Akbudak, H. Ltaief, A. Mikhalev, and D. Keyes, "Tile low rank cholesky factorization for climate/weather modeling applications on manycore architectures," in International Supercomputing Conference. Springer, 2017, pp. 22--40.
[16]
N. Al-Harthi, R. Alomairy, K. Akbudak, R. Chen, H. Ltaief, H. Bagci, and D. Keyes, "Solving acoustic boundary integral equations using high performance tile low-rank lu factorization," in International Conference on High Performance Computing. Springer, 2020, pp. 209--229.
[17]
W. Boukaram, S. Zampini, G. Turkiyyah, and D. Keyes, "H2opus-tlr: High performance tile low rank symmetric factorizations using adaptive randomized approximation," arXiv preprint arXiv:2108.11932, 2021.
[18]
W. Hackbusch, "A sparse matrix arithmetic based on H-matrices. part I: Introduction to H-matrices," Computing, vol. 62, no. 2, pp. 89--108, 1999.
[19]
W. Hackbusch and B. N. Khoromskij, "A sparse H-matrix arithmetic. part ii: Application to multi-dimensional problems," Computing, vol. 64, no. 1, p. 21--47, Jan. 2000.
[20]
W. Hackbusch and S. Börm, "Data-sparse approximation by adaptive H2-matrices," Computing, vol. 69, no. 1, pp. 1--35, 2002.
[21]
P.-G. Martinsson and V. Rokhlin, "A fast direct solver for boundary integral equations in two dimensions," Journal of Computational Physics, vol. 205, no. 1, pp. 1--23, 2005.
[22]
S. Chandrasekaran, M. Gu, and T. Pals, "A fast ULV decomposition solver for hierarchically semiseparable representations," SIAM Journal on Matrix Analysis and Applications, vol. 28, no. 3, pp. 603--622, 2006.
[23]
J. Xia, S. Chandrasekaran, M. Gu, and X. S. Li, "Fast algorithms for hierarchically semiseparable matrices," Numerical Linear Algebra with Applications, vol. 17, no. 6, pp. 953--976, 2010.
[24]
L. Grasedyck, R. Kriemann, and S. Le Borne, "Domain decomposition based H-LU preconditioning," Numerische Mathematik, vol. 112, no. 4, pp. 565--600, 2009.
[25]
R. Kriemann, "H-LU factorization on many-core systems," Computing and Visualization in Science, vol. 16, no. 3, pp. 105--117, 2013.
[26]
K. L. Ho and L. Ying, "Hierarchical interpolative factorization for elliptic operators: integral equations," Comm. Pure Appl. Math, vol. 69, no. 7, pp. 1314--1353, 2016.
[27]
F.-H. Rouet, X. S. Li, P. Ghysels, and A. Napov, "A distributed-memory package for dense hierarchically semi-separable matrix computations using randomization," ACM Transactions on Mathematical Software (TOMS), vol. 42, no. 4, pp. 1--35, 2016.
[28]
P. Coulier, H. Pouransari, and E. Darve, "The inverse fast multipole method: using a fast approximate direct solver as a preconditioner for dense linear systems," SIAM Journal on Scientific Computing, vol. 39, no. 3, pp. A761--A796, 2017.
[29]
V. Minden, K. L. Ho, A. Damle, and L. Ying, "A recursive skeletonization factorization based on strong admissibility," Multiscale Modeling & Simulation, vol. 15, no. 2, pp. 768--796, 2017.
[30]
J. Xia, "Multi-layer hierarchical structures," CSIAM Transaction of Applied Mathematics, vol. 2, pp. 263--296, 2021.
[31]
Y. Liu, P. Ghysels, L. Claus, and X. S. Li, "Sparse approximate multifrontal factorization with butterfly compression for high-frequency wave equations," SIAM Journal on Scientific Computing, vol. 43, no. 5, pp. S367--S391, 2021.
[32]
D. Sushnikova, L. Greengard, M. O'Neil, and M. Rachh, "FMM-LU: A fast direct solver for multiscale boundary integral equations in three dimensions," arXiv preprint arXiv:2201.07325, 2022.
[33]
S. Ambikasaran, K. R. Singh, and S. S. Sankaran, "HODLRlib: A library for hierarchical matrices," Journal of Open Source Software, vol. 4, no. 34, p. 1167, 2019.
[34]
S. Massei, L. Robol, and D. Kressner, "hm-toolbox: Matlab software for HODLR and HSS matrices," SIAM Journal on Scientific Computing, vol. 42, no. 2, pp. C43--C68, 2020.
[35]
Y. Dong and P.-G. Martinsson, "Simpler is better: A comparative study of randomized algorithms for computing the cur decomposition," arXiv preprint arXiv:2104.05877, 2021.
[36]
L. Lin, J. Lu, and L. Ying, "Fast construction of hierarchical matrix representation from matrix-vector multiplication," Journal of Computational Physics, vol. 230, no. 10, pp. 4071--4087, 2011.
[37]
P.-G. Martinsson, "Compressing rank-structured matrices via randomized sampling," SIAM Journal on Scientific Computing, vol. 38, no. 4, pp. A1959--A1986, 2016.
[38]
J. Levitt and P.-G. Martinsson, "Linear-complexity black-box randomized compression of hierarchically block separable matrices," arXiv preprint arXiv:2205.02990, 2022.
[39]
I. D. Fernando, S. Jayasena, M. Fernando, and H. Sundar, "A scalable hierarchical semi-separable library for heterogeneous clusters," in 2017 46th International Conference on Parallel Processing (ICPP). IEEE, 2017, pp. 513--522.
[40]
P.-G. Martinsson, "A fast randomized algorithm for computing a hierarchically semiseparable representation of a matrix," SIAM Journal on Matrix Analysis and Applications, vol. 32, no. 4, pp. 1251--1274, 2011.
[41]
W. Boukaram, G. Turkiyyah, and D. Keyes, "Randomized GPU algorithms for the construction of hierarchical matrices from matrix-vector operations," SIAM Journal on Scientific Computing, vol. 41, no. 4, pp. C339--C366, 2019.
[42]
C. D. Yu, J. Levitt, S. Reiz, and G. Biros, "Geometry-oblivious FMM for compressing dense SPD matrices," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2017, pp. 1--14.
[43]
D. Y. Chenhan, S. Reiz, and G. Biros, "Distributed-memory hierarchical compression of dense SPD matrices," in SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2018, pp. 183--197.
[44]
R. Chandra, L. Dagum, D. Kohr, R. Menon, D. Maydan, and J. McDonald, Parallel programming in OpenMP. Morgan kaufmann, 2001.
[45]
K. L. Ho and L. Greengard, "A fast direct solver for structured linear systems by recursive skeletonization," SIAM Journal on Scientific Computing, vol. 34, no. 5, pp. A2507--A2532, 2012.
[46]
S. Ambikasaran and E. Darve, "The inverse fast multipole method," arXiv preprint arXiv:1407.1572, 2014.
[47]
T. Takahashi, C. Chen, and E. Darve, "Parallelization of the inverse fast multipole method with an application to boundary element method," Computer Physics Communications, vol. 247, p. 106975, 2020.
[48]
H. Pouransari, P. Coulier, and E. Darve, "Fast hierarchical solvers for sparse matrices using extended sparsification and low-rank approximation," SIAM Journal on Scientific Computing, vol. 39, no. 3, pp. A797--A830, 2017.
[49]
D. A. Sushnikova and I. V. Oseledets, ""compress and eliminate" solver for symmetric positive definite sparse matrices," SIAM Journal on Scientific Computing, vol. 40, no. 3, pp. A1742--A1762, 2018.
[50]
C. Chen, H. Pouransari, S. Rajamanickam, E. G. Boman, and E. Darve, "A distributed-memory hierarchical solver for general sparse linear systems," Parallel Computing, vol. 74, pp. 49--64, 2018.
[51]
C. Chen, L. Cambier, E. G. Boman, S. Rajamanickam, R. S. Tuminaro, and E. Darve, "A robust hierarchical solver for ill-conditioned systems with applications to ice sheet modeling," Journal of Computational Physics, vol. 396, pp. 819--836, 2019.
[52]
J. Rotne and S. Prager, "Variational treatment of hydrodynamic interaction in polymers," The Journal of Chemical Physics, vol. 50, no. 11, pp. 4831--4837, 1969.
[53]
H. Yamakawa, "Transport properties of polymer chains in dilute solution: hydrodynamic interaction," The Journal of Chemical Physics, vol. 53, no. 1, pp. 436--443, 1970.
[54]
R. Kress, V. Maz'ya, and V. Kozlov, Linear integral equations. Springer, 1989, vol. 82.
[55]
W. McLean and W. C. H. McLean, Strongly elliptic systems and boundary integral equations. Cambridge university press, 2000.
[56]
A. Gillman, P. M. Young, and P.-G. Martinsson, "A direct solver with O(N) complexity for integral equations on one-dimensional domains," Frontiers of Mathematics in China, vol. 7, no. 2, pp. 217--247, 2012.
[57]
T. A. Davis and I. S. Duff, "An unsymmetric-pattern multifrontal method for sparse lu factorization," SIAM Journal on Matrix Analysis and Applications, vol. 18, no. 1, pp. 140--158, 1997.
[58]
T. A. Davis, "Algorithm 832: Umfpack v4. 3---an unsymmetric-pattern multifrontal method," ACM Transactions on Mathematical Software (TOMS), vol. 30, no. 2, pp. 196--199, 2004.
[59]
O. Schenk, K. Gärtner, and W. Fichtner, "Efficient sparse lu factorization with left-right looking strategy on shared memory multiprocessors," BIT Numerical Mathematics, vol. 40, no. 1, pp. 158--176, 2000.
[60]
O. Schenk and K. Gärtner, "Solving unsymmetric sparse systems of linear equations with pardiso," Future Generation Computer Systems, vol. 20, no. 3, pp. 475--487, 2004.
[61]
J. Kwack, G. Bauer, and S. Koric, "Performance test of parallel linear equation solvers on blue waters-cray xe6/xk7 system," in Preceedings of the Cray Users Group Meeting (CUG2016), London, England, 2016.
[62]
K. Świrydowicz, E. Darve, W. Jones, J. Maack, S. Regev, M. A. Saunders, S. J. Thomas, and S. Peleš, "Linear solvers for power grid optimization problems: a review of gpu-accelerated linear solvers," Parallel Computing, p. 102870, 2021.
[63]
S. Kapur and V. Rokhlin, "High-order corrected trapezoidal quadrature rules for singular functions," SIAM Journal on Numerical Analysis, vol. 34, no. 4, pp. 1331--1356, 1997.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2022
1277 pages
ISBN:9784665454445

Sponsors

In-Cooperation

  • IEEE CS

Publisher

IEEE Press

Publication History

Published: 18 November 2022

Check for updates

Author Tags

  1. LU factorization
  2. batched dense linear algebra
  3. boundary integral equation
  4. elliptic partial differential equations
  5. hierarchical low-rank approximation
  6. hierarchical matrix
  7. kernel matrix
  8. linear solver on GPU
  9. rank structured matrix

Qualifiers

  • Research-article

Conference

SC '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 90
    Total Downloads
  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)3
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media