Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

A Feature-complete SPIKE Dense Banded Solver

Published: 16 October 2020 Publication History

Abstract

This article presents a parallel, effective, and feature-complete recursive SPIKE algorithm that achieves near feature-parity with the standard linear algebra package banded linear system solver. First, we present a flexible parallel implementation of the recursive SPIKE scheme that aims at removing its original limitation that the number of cores/processors be restricted to powers of two. A new transpose solve option for SPIKE is then developed to satisfy a standard requirement of most numerical solver libraries. Finally, a pivoting recursive SPIKE strategy is presented as an alternative to the non-pivoting scheme to improve numerical stability. All these new enhancements lead to the release of a new black-box feature-complete SPIKE-OpenMP package that significantly improves upon the performance and scalability obtained with other state-of-the-art banded solvers.

References

[1]
E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, A. McKenney, J. Du Croz, S. Hammarling, J. Demmel, C. Bischof, and D. Sorensen. 1990. LAPACK: A portable linear algebra library for high-performance computers. In Proceedings of the ACM/IEEE Conference on Supercomputing (Supercomputing’90). IEEE Computer Society Press, Los Alamitos, CA, 2--11. Retrieved from http://dl.acm.org/citation.cfm?id=110382.110385.
[2]
M. W. Berry and A. H. Sameh. 1988. Multiprocessor schemes for solving block tridiagonal linear systems. Int. J. Supercomput. Appl. 2, 3 (1988), 37--57.
[3]
L. S. Blackford, J. Choi, A. Cleary, E. D’Azeuedo, J. Demmel, I. Dhillon, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. 1997. ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA.
[4]
L.-W. Chang, J. A. Stratton, H.-S. Kim, and W.-M. W. Hwu. 2012. A scalable, numerically stable, high-performance tridiagonal solver using GPUs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE Computer Society Press, Los Alamitos, CA, Article 27, 11 pages. Retrieved from http://dl.acm.org/citation.cfm?id=2388996.2389033.
[5]
S. C. Chen, D. J. Kuck, and A. H. Sameh. 1978. Practical parallel band triangular system solvers. ACM Trans. Math. Softw. 4, 3 (Sept. 1978), 270--277.
[6]
E. Cuthill and J. McKee. 1969. Reducing the bandwidth of sparse symmetric matrices. In Proceedings of the 1969 24th National Conference (ACM’69). ACM, New York, NY, 157--172.
[7]
J. J. Dongarra and A. H. Sameh. 1984. On some parallel banded system solvers. Parallel Comput. 1, 3–4 (Dec. 1984), 223--235.
[8]
J. Du Croz, P. Mayes, and G. Radicati. 1990. Factorizations of Band Matrices Using Level 3 BLAS. Technical Report 21. LAPACK Working Note. Retrieved from http://www.netlib.org/lapack/lawnspdf/lawn21.pdf.
[9]
V. Eijkhout and R. van de Geijn. 2012. The spike factorization as a domain decomposition method: Equivalent and variant approaches. In High-Performance Scientific Computing: Algorithms and Applications, M. W. Berry, K. A. Gallivan, E. Gallopoulos, A. Grama, B. Philippe, Y. Saad, and F. Saied (Eds.). Springer, London, 157--170.
[10]
FEAST-library. 2009-2020. FEAST Eigenvalue Solver. Retrieved from http://www.feast-solver.org/.
[11]
K. A. Gallivan, E. Gallopoulos, A. Grama, B. Philippe, E. Polizzi, Y. Saad, F. Saied, and D. Sorensen. 2012. Parallel numerical computing from Illiac IV to exascale—The contributions of Ahmed H. Sameh. In High-Performance Scientific Computing: Algorithms and Applications, M. W. Berry, K. A. Gallivan, E. Gallopoulos, A. Grama, B. Philippe, Y. Saad, and F. Saied (Eds.). Springer, London, 1--44.
[12]
E. Gallopoulos, P. Bernard, and A. H. Sameh. 2016. Parallelism in Matrix Computations. Springer.
[13]
N. J. Higham. 2002. Accuracy and Stability of Numerical Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, Chapter 9, 157--193.
[14]
J. Kestyn, E. Polizzi, and P. T. P. Tang. 2016. Feast eigensolver for non-hermitian problems. SIAM J. Sci. Comput. 38, 5 (2016).
[15]
D. H. Lawrie and A. H. Sameh. 1984. The computation and communication complexity of a parallel banded system solver. ACM Trans. Math. Softw. 10, 2 (May 1984), 185--195.
[16]
A. Li, A. Seidl, R. Serban, and D. Negrut. 2014. SPIKE::GPU—A SPIKE-based Preconditioned GPU Solver for Sparse Linear Systems. Technical Report.
[17]
M. Manguoglu, M. Koyutürk, A. H. Sameh, and A. Grama. 2010. Weighted matrix ordering and parallel banded preconditioners for iterative linear system solvers. SIAM J. Sci. Comput. 32, 3 (2010), 1201--1216.
[18]
M. Manguoglu, F. Saied, A. H. Sameh, and A. Grama. 2011. Performance models for the Spike banded linear system solver. Sci. Program. 19, 1 (2011), 13--25.
[19]
M. Manguoglu, A. H. Sameh, and O. Schenk. 2009. PSPIKE: A parallel hybrid sparse linear system solver. In Proceedings of the International Conference on Parallel Computing (Euro-Par’09), Henk Sips, Dick Epema, and Hai-Xiang Lin (Eds.). Springer, Berlin, 797--808.
[20]
K. Mendiratta and E. Polizzi. 2011. A threaded “SPIKE” algorithm for solving general banded systems. Parallel Comput. 37, 12 (2011), 733--741.
[21]
C. Mikkelsen and M. Manguoglu. 2009. Analysis of the truncated SPIKE algorithm. SIAM J. Matrix Anal. Appl. 30, 4 (2009), 1500--1519.
[22]
M. Naumov, M. Manguoglu, and A. H. Sameh. 2010. A tearing-based hybrid parallel sparse linear system solver. J. Comput. Appl. Math. 234, 10 (2010), 3025--3038.
[23]
E. Polizzi. 2009. Density-matrix-based algorithm for solving eigenvalue problems. Phys. Rev. B 79 (Mar. 2009), 115112. Issue 11.
[24]
E. Polizzi. 2011. SPIKE. In Encyclopedia of Parallel Computing, D. Padua (Ed.). Springer U.S., 1912--1920.
[25]
E. Polizzi. 2020. FEAST eigenvalue solver v4.0 user guide. Retrieved from arxiv:cs.MS/2002.04807.
[26]
E. Polizzi and N. Ben Abdallah. 2005. Subband decomposition approach for the simulation of quantum electron transport in nanostructures. J. Comput. Phys. 202, 1 (Jan. 2005), 150--180.
[27]
E. Polizzi and A. Sameh. 2006. A parallel hybrid banded system solver: The SPIKE algorithm. Parallel Comput. 32, 2 (2006), 177--194.
[28]
E. Polizzi and A. Sameh. 2007. SPIKE: A parallel environment for solving banded linear systems. Comput. Fluids 36, 1 (2007), 113--120.
[29]
A. H. Sameh and D. J. Kuck. 1978. On stable parallel linear system solvers. J. ACM 25, 1 (Jan. 1978), 81--91.
[30]
A. H. Sameh and V. Sarin. 1999. Hybrid parallel linear system solvers. Int. J. Comput. Fluid Dynam. 12, 3–4 (1999), 213--223.
[31]
SPIKE-library. 2018. SPIKE shared-memory solver, v1.0. Retrieved from http://www.spike-solver.org/.
[32]
SPIKE-MPI-library. 2011. Intel Adaptive Spike-based Solver. Retrieved from https://software.intel.com/en-us/articles/intel-adaptive-spike-based-solver/.
[33]
B. S. Spring. 2014. Enhanced Capabilities of the Spike Algorithm and a New Spike-OpenMP Solver. Master’s thesis. University of Massachusetts, Amherst. Retrieved from http://scholarworks.umass.edu/masters_theses_2/116.
[34]
Ioannis E. Venetis, Alexandros Kouris, Alexandros Sobczyk, Efstratios Gallopoulos, and Ahmed H. Sameh. 2015. A direct tridiagonal solver based on Givens rotations for GPU architectures. Parallel Comput. 49 (2015), 101--116.

Cited By

View all
  • (2022)Partitioning and Reordering for Spike-Based Distributed-Memory Parallel Gauss--SeidelSIAM Journal on Scientific Computing10.1137/21M141160344:2(C99-C123)Online publication date: 1-Jan-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software
ACM Transactions on Mathematical Software  Volume 46, Issue 4
December 2020
272 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/3430683
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2020
Accepted: 01 July 2020
Revised: 01 July 2020
Received: 01 November 2018
Published in TOMS Volume 46, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SPIKE
  2. banded matrices
  3. linear system solver

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • NSF

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)75
  • Downloads (Last 6 weeks)10
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Partitioning and Reordering for Spike-Based Distributed-Memory Parallel Gauss--SeidelSIAM Journal on Scientific Computing10.1137/21M141160344:2(C99-C123)Online publication date: 1-Jan-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media