Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3149457.3149472acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article
Public Access

A Left-Looking Selected Inversion Algorithm and Task Parallelism on Shared Memory Systems

Published: 28 January 2018 Publication History

Abstract

Given a sparse matrix A, the selected inversion algorithm is an efficient method for computing certain selected elements of A-1. These selected elements correspond to all or some nonzero elements of the LU factors of A. In many ways, the types of matrix updates performed in the selected inversion algorithm are similar to those performed in the LU factorization, although the sequence of operations is different.
In the context of LU factorization, it is known that the left-looking and right-looking algorithms exhibit different memory access and data communication patterns, and hence different behavior on shared memory and distributed memory parallel machines. Corresponding to right-looking and left-looking LU factorization, the selected inversion algorithm can be organized as a left-looking or a right-looking algorithm. The parallel right-looking version of the algorithm has been developed in [9]. The sequence of operations performed in this version of the selected inversion algorithm is similar to those performed in a left-looking LU factorization algorithm.
In this paper, we describe the left-looking variant of the selected inversion algorithm, and present an efficient implementation of the algorithm for shared memory machines using a task parallel method. We demonstrate that with the task scheduling features provided by OpenMP 4.0, the left-looking selected inversion algorithm can scale well both on the Intel Haswell multicore architecture and on the Intel Knights Landing (KNL) manycore architecture up to 16 and 64 cores, respectively. On the KNL architecture, we observe that the maximum parallel efficiency achieved by the left-looking selected inversion algorithm can be as high as 62% even when all 64 cores are used, despite the inherent asynchronous nature of the computation and communication patterns in sparse matrix operations. Compared to the right-looking selected inversion algorithm, the left-looking formulation facilitates efficient pipelining of operations along different branches of the elimination tree, and can be a promising candidate for future development of massively parallel selected inversion algorithms on heterogeneous architectures.

References

[1]
C. Bekas, A. Curioni, and I. Fedulova. 2009. Low cost high performance uncertainty quantification. In Proc. 2nd Workshop on High Performance Computational Finance. 8.
[2]
T. A. Davis and Y. Hu. 2011. The University of Florida sparse matrix collection. ACM Trans. Math. Software 38 (2011), 1.
[3]
J.S. Duff and J.K. Reid. 1983. The multifrontal solution of indefinite sparse symmetric linear equations. ACM Trans. Math. Software 9 (1983), 302--325.
[4]
S. C Eisenstat, M. H. Schultz, and A. H. Sherman. 1981. Algorithms and data structures for sparse symmetric Gaussian elimination. SIAM J. Sci. Stat. Comput. 2, 2 (1981), 225--237.
[5]
A. George, M. T. Heath, J. Liu, and E. Ng. 1988. Sparse Cholesky factorization on a local-memory multiprocessor. SIAM J. Sci. Stat. Comput. 9 (1988), 327--340.
[6]
P. Hohenberg and W. Kohn. 1964. Inhomogeneous electron gas. Phys. Rev. 136 (1964), B864--B871.
[7]
W. Hu, L. Lin, and C. Yang. 2015. DGDFT: A massively parallel method for large scale density functional theory calculations. J. Chem. Phys. 143 (2015), 124110.
[8]
M. Jacquelin, L. Lin, N. Wichmann, and C. Yang. 2015. Enhancing the scalability and load balancing of the parallel selected inversion algorithm via tree-based asynchronous communication. submitted (2015).
[9]
M. Jacquelin, L. Lin, and C. Yang. 2015. PSelInv--A distributed memory parallel algorithm for selected inversion: the symmetric case. ACM Trans. Math. Software accepted (2015).
[10]
W. Kohn and L. Sham. 1965. Self-consistent equations including exchange and correlation effects. Phys. Rev. 140 (1965), A1133--A1138.
[11]
Gabriel Kotliar, Sergej Y Savrasov, Kristjan Haule, Viktor S Oudovenko, O Parcollet, and CA Marianetti. 2006. Electronic structure calculations with dynamical mean-field theory. Rev. Mod. Phys. 78 (2006), 865--952.
[12]
S. Li, S. Ahmed, G. Klimeck, and E. Darve. 2008. Computing entries of the inverse of a sparse matrix using the FIND algorithm. J. Comput. Phys. 227 (2008), 9408--9427.
[13]
S. Li, W. Wu, and E. Darve. 2013. A fast algorithm for sparse matrix computations related to inversion. J. Comput. Phys. 242 (2013), 915--945.
[14]
X. S. Li and J. W. Demmel. 2003. SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans. Math. Software 29 (2003), 110.
[15]
L. Lin, M. Chen, C. Yang, and L. He. 2013. Accelerating Atomic Orbital-based Electronic Structure Calculation via Pole Expansion and Selected Inversion. J. Phys. Condens. Matter 25 (2013), 295501.
[16]
L. Lin, J. Lu, L. Ying, R. Car, and W. E. 2009b. Fast algorithm for extracting the diagonal of the inverse matrix with application to the electronic structure analysis of metallic systems. Comm. Math. Sci. 7 (2009b), 755.
[17]
L. Lin, J. Lu, L. Ying, and W. E. 2012. Adaptive local basis set for Kohn-Sham density functional theory in a discontinuous Galerkin framework I: Total energy calculation. J. Comput. Phys. 231 (2012), 2140--2154.
[18]
L. Lin, C. Yang, J. Lu, L. Ying, and W. E. 2011a. A Fast Parallel Algorithm for Selected Inversion of Structured Sparse Matrices with Application to 2D Electronic Structure Calculations. SIAM J. Sci. Comput. 33 (2011a), 1329.
[19]
L. Lin, C. Yang, J. Meza, J. Lu, L. Ying, and W. E. 2011b. SelInv -- An algorithm for selected inversion of a sparse symmetric matrix. ACM. Trans. Math. Software 37 (2011b), 40.
[20]
J. Liu. 1990. The role of elimination trees in sparse factorization. SIAM J. Matrix Anal. Appl. 11 (1990), 134.
[21]
National Energy Research Scientific Computing Center (NERSC). 2016. http://www.nersc.gov/users/computational-systems/cori/cori-phase-i. (mar 2016).
[22]
E. Ng and B. Peyton. 1993. Block sparse Cholesky algorithms on advanced uniprocessor computers. SIAM J. Sci. Comput. 14 (1993), 1034.
[23]
E. Rothberg and A. Gupta. 1993. An evaluation of left-looking, right-looking and multifrontal approaches to sparse Cholesky factorization on hierarchical-memory machines. Int. J. High Performance Comput. 5 (1993), 537--593.
[24]
J. M. Soler, E. Artacho, J. D. Gale, A. García, J. Junquera, P. Ordejón, and D. Sánchez-Portal. 2002. The SIESTA method for ab initio order-N materials simulation. J. Phys.: Condens. Matter 14 (2002), 2745--2779.
[25]
J. M. Tang and Y. Saad. 2012. A probing method for computing the diagonal of a matrix inverse. Numer. Lin. Alg. Appl. 19 (2012), 485--501.
[26]
I. Yamazaki and X. S. Li. 2012. New Scheduling Strategies and Hybrid Programming for a Parallel Right-looking Sparse LU Factorization Algorithm on Multicore Cluster Systems. In Int. Parallel Distrib. Proc. Symp. 2012. 619--630.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HPCAsia '18: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
January 2018
322 pages
ISBN:9781450353724
DOI:10.1145/3149457
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 January 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. high performance computation
  2. manycore
  3. multicore
  4. openmp
  5. parallel algorithm
  6. scheduling
  7. selected inversion
  8. shared memory
  9. task

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

HPC Asia 2018

Acceptance Rates

HPCAsia '18 Paper Acceptance Rate 30 of 67 submissions, 45%;
Overall Acceptance Rate 69 of 143 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 185
    Total Downloads
  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)8
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media