research-article

Performance Analysis and Optimal Node-aware Communication for Enlarged Conjugate Gradient Methods

Authors:

Shelby Lockhart,

Luke OlsonAuthors Info & Claims

ACM Transactions on Parallel Computing, Volume 10, Issue 1

Article No.: 2, Pages 1 - 25

https://doi.org/10.1145/3580003

Published: 29 March 2023 Publication History

Abstract

Krylov methods are a key way of solving large sparse linear systems of equations but suffer from poor strong scalability on distributed memory machines. This is due to high synchronization costs from large numbers of collective communication calls alongside a low computational workload. Enlarged Krylov methods address this issue by decreasing the total iterations to convergence, an artifact of splitting the initial residual and resulting in operations on block vectors. In this article, we present a performance study of an enlarged Krylov method, Enlarged Conjugate Gradients (ECG), noting the impact of block vectors on parallel performance at scale. Most notably, we observe the increased overhead of point-to-point communication as a result of denser messages in the sparse matrix-block vector multiplication kernel. Additionally, we present models to analyze expected performance of ECG, as well as motivate design decisions. Most importantly, we introduce a new point-to-point communication approach based on node-aware communication techniques that increases efficiency of the method at scale.

References

[1]

Tarun Agarwal, Amit Sharma, A. Laxmikant, and Laxmikant V. Kalé. 2006. Topology-aware task mapping for reducing communication contention on large parallel machines. In Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium. IEEE, 10.

[2]

Hasan Metin Aktulga, Chao Yang, Esmond G. Ng, Pieter Maris, and James P. Vary. 2014. Improving the scalability of a symmetric iterative eigensolver for multi-core platforms. Concurr. Comput.: Pract. Exp. 26, 16 (2014), 2631–2651.

Digital Library

[3]

Albert Alexandrov, Mihai F. Ionescu, Klaus E. Schauser, and Chris Scheiman. 1997. LogGP: Incorporating long messages into the LogP model for parallel computation. J. Parallel Distrib. Comput. 44, 1 (1997), 71–79. DOI:

Digital Library

[4]

Robert Anderson, Julian Andrej, Andrew Barker, Jamie Bramwell, Jean-Sylvain Camier, Jakub Cerveny, Veselin Dobrev, Yohann Dudouit, Aaron Fisher, Tzanio Kolev, Will Pazner, Mark Stowell, Vladimir Tomov, Ido Akkerman, Johann Dahm, David Medina, and Stefano Zampini. 2021. MFEM: A modular finite element methods library. Comput. Math. Appl. 81 (2021), 42--74. DOI:

[5]

Allison H. Baker, Martin Schulz, and Ulrike M. Yang. 2010. On the performance of an algebraic multigrid solver on multicore clusters. In International Conference on High Performance Computing for Computational Science. Springer, 102–115.

[6]

Satish Balay, Shrirang Abhyankar, Mark Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Alp Dener, Victor Eijkhout, W. Gropp, et al. 2019. PETSc Users Manual.

[7]

Amanda Bienz, William Gropp, and Luke Olson. 2019. Node aware sparse matrix–vector multiplication. J. Parallel Distrib. Comput. 130 (08 2019), 166–178. DOI:

Digital Library

[8]

Amanda Bienz, William D. Gropp, and Luke N. Olson. 2018. Improving performance models for irregular point-to-point communication. In Proceedings of the 25th European MPI Users' Group Meeting (EuroMPI'18). Association for Computing Machinery, New York, NY, 8 pages.

[9]

Amanda Bienz, William D. Gropp, and Luke N. Olson. 2020. Reducing communication in algebraic multigrid with multi-step node aware communication. Int. J. High Perf. Comput. Appl. 34, 5 (June 2020), 547–561. DOI:

Digital Library

[10]

Amanda Bienz and Luke N. Olson. 2017. RAPtor: Parallel Algebraic Multigrid v0.1, Release 0.1. Retrieved from https://github.com/raptor-library/raptor.

[11]

Brett Bode, Michelle Butler, Thom Dunning, Torsten Hoefler, William Kramer, William Gropp, and Wen-mei Hwu. 2013. The blue waters super-system for super-science. In Contemporary High Performance Computing. Chapman & Hall/CRC, 339–366. https://www.taylorfrancis.com/books/e/9781466568358.

[12]

Erin Carson, Nicholas Knight, and James Demmel. 2013. Avoiding communication in nonsymmetric Lanczos-based Krylov subspace methods. SIAM J. Sci. Comput. 35, 5 (January 2013), S42–S61. DOI:. arXiv:

Digital Library

[13]

Ümit V. Çatalyürek and Cevdet Aykanat. 1999. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Distrib. Syst. 10, 7 (1999), 673–693. DOI:

Digital Library

[14]

Ümit V. Çatalyürek, Cevdet Aykanat, and Bora Uçar. 2010. On two-dimensional sparse matrix partitioning: Models, methods, and a recipe. SIAM J. Sci. Comput. 32, 2 (2010), 656–683. DOI:

Digital Library

[15]

A. T. Chronopoulos and C. W. Gear. 1989. s-step iterative methods for symmetric linear systems. J. Comput. Appl. Math. 25, 2 (February 1989), 153–168. DOI:

Digital Library

[16]

S. Cools and W. Vanroose. 2017. The communication-hiding pipelined BiCGstab method for the parallel solution of large unsymmetric linear systems. Parallel Comput. 65 (July 2017), 1–20. DOI:

Digital Library

[17]

David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. 1993. LogP: Towards a realistic model of parallel computation. SIGPLAN Not. 28, 7 (July 1993), 1–12. DOI:

Digital Library

[18]

Timothy A. Davis and Yifan Hu. 2011. The university of florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1, Article 1 (December 2011), 25 pages. DOI:

Digital Library

[19]

P. Ghysels and W. Vanroose. 2014. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. Parallel Comput. 40, 7 (July 2014), 224–238. DOI:

Digital Library

[20]

Laura Grigori, Sophie Moufawad, and Frederic Nataf. 2016. Enlarged Krylov subspace conjugate gradient methods for reducing communication. SIAM J. Matrix Anal. Appl. 37, 2 (January 2016), 744–773. arXiv:

Digital Library

[21]

Laura Grigori and Olivier Tissot. 2019. Scalable linear solvers based on enlarged Krylov subspaces with dynamic reduction of search directions. SIAM J. Sci. Comput. 41, 5 (January 2019), C522–C547. arXiv:

Digital Library

[22]

William Gropp, Luke N. Olson, and Philipp Samfass. 2016. Modeling MPI communication performance on SMP nodes. In Proceedings of the 23rd European MPI Users’ Group Meeting (EuroMPI’16). ACM Press, New York, NY, 41–50. DOI:

Digital Library

[23]

W. A. Hanson. 2020. The CORAL supercomputer systems. IBM J. Res. Dev. 64, 3/4 (2020), 1:1–1:10.

[24]

Bruce Hendrickson and Tamara G. Kolda. 200. Graph partitioning models for parallel computing. Parallel Comput. 26, 12 (200), 1519–1534. DOI:

Digital Library

[25]

William Kramer, Michelle Butler, Gregory Bauer, Kalyana Chadalavada, and Celso Mendes. 2015. Blue waters parallel I/O storage sub-system. In High Performance Parallel I/O, Prabhat and Quincey Koziol (Eds.). CRC Publications, Taylor & Francis Group, 17–32.

[26]

Hoemmen M. 2010. Communication-Avoiding Krylov Subspace Methods. Ph.D. Dissertation. University of California, Berkeley.

[27]

Tania Malik, Vladimir Rychkov, and Alexey Lastovetsky. 2016. Network-aware optimization of communications for parallel matrix multiplication on hierarchical HPC platforms. Concurr. Comput.: Pract. Exp. 28, 3 (2016), 802–821.

Digital Library

[28]

Lois Curfman McInnes, Barry Smith, Hong Zhang, and Richard Tran Mills. 2014. Hierarchical Krylov and nested Krylov methods for extreme-scale computing. Parallel Comput. 40, 1 (January 2014), 17–31. DOI:

Digital Library

[29]

Marghoob Mohiyuddin, Mark Hoemmen, James Demmel, and Katherine Yelick. 2009. Minimizing communication in sparse matrix solvers. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC’09). Association for Computing Machinery, New York, NY, Article 36, 12 pages. DOI:

Digital Library

[30]

Sophie M. Moufawad. 2020. s-step enlarged krylov subspace conjugate gradient methods. SIAM J. Sci. Comput. 42, 1 (January 2020), A187–A219. arXiv:

Digital Library

[31]

Dianne P. O’Leary. 1980. The block conjugate gradient algorithm and related methods. Lin. Algebr. Appl. 29 (February 1980), 293–322. DOI:

[32]

Brian A. Page and Peter M. Kogge. 2018. Scalability of hybrid sparse matrix dense vector (spmv) multiplication. In Proceedings of the International Conference on High Performance Computing & Simulation (HPCS’18). IEEE, 406–414.

[33]

Jesper Larsson Träff. 2002. Implementing the MPI process topology mechanism. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’02). IEEE, Los Alamitos, CA, 1–14.

[34]

Brendan Vastenhouw and Rob H. Bisseling. 2005. A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Rev. 47, 1 (2005), 67–95. DOI:

Digital Library

[35]

Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65–76.

Digital Library

Cited By

Collom GLi RBienz A(2023)Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware ParallelismProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624111(427-437)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624111
Lockhart SBienz AGropp WOlson L(2023)Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architecturesParallel Computing10.1016/j.parco.2023.103021116:COnline publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1016/j.parco.2023.103021
Kronbichler MSashko DMunch P(2022)Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementationsThe International Journal of High Performance Computing Applications10.1177/1094342022110788037:2(61-81)Online publication date: 7-Jul-2022
https://doi.org/10.1177/10943420221107880

Index Terms

Performance Analysis and Optimal Node-aware Communication for Enlarged Conjugate Gradient Methods
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
2. Mathematics of computing
  1. Mathematical software
    1. Mathematical software performance
    2. Solvers

Recommendations

Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architectures
Abstract
Supercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of ...
s-Step Enlarged Krylov Subspace Conjugate Gradient Methods

Recently, enlarged Krylov subspace methods, which consist of enlarging the Krylov subspace by a maximum of $t$ vectors per iteration based on the domain decomposition of the graph of $A$, were introduced with the aim of reducing communication when solving ...
Nonsymmetric Preconditioning for Conjugate Gradient and Steepest Descent Methods1

We analyze a possibility of turning off post-smoothing (relaxation) in geometric multigrid when used as a preconditioner in preconditioned conjugate gradient (PCG) linear and eigenvalue solvers for the 3D Laplacian. The geometric Semicoarsening ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 10, Issue 1

March 2023

124 pages

ISSN:2329-4949

EISSN:2329-4957

DOI:10.1145/3589020

Editor:
David A. Bader
New Jersey Institute of Technology, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2023

Online AM: 17 January 2023

Accepted: 10 January 2023

Revised: 15 December 2022

Received: 10 March 2022

Published in TOPC Volume 10, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Department of Energy, National Nuclear Security Administration
National Science Foundation
National Center for Supercomputing Applications
Department of Energy, National Nuclear Security Administration

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
216
Total Downloads

Downloads (Last 12 months)93
Downloads (Last 6 weeks)11

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Collom GLi RBienz A(2023)Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware ParallelismProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624111(427-437)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624111
Lockhart SBienz AGropp WOlson L(2023)Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architecturesParallel Computing10.1016/j.parco.2023.103021116:COnline publication date: 1-Jul-2023
https://dl.acm.org/doi/10.1016/j.parco.2023.103021
Kronbichler MSashko DMunch P(2022)Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementationsThe International Journal of High Performance Computing Applications10.1177/1094342022110788037:2(61-81)Online publication date: 7-Jul-2022
https://doi.org/10.1177/10943420221107880

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents