Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Performance Analysis and Optimal Node-aware Communication for Enlarged Conjugate Gradient Methods

Published: 29 March 2023 Publication History

Abstract

Krylov methods are a key way of solving large sparse linear systems of equations but suffer from poor strong scalability on distributed memory machines. This is due to high synchronization costs from large numbers of collective communication calls alongside a low computational workload. Enlarged Krylov methods address this issue by decreasing the total iterations to convergence, an artifact of splitting the initial residual and resulting in operations on block vectors. In this article, we present a performance study of an enlarged Krylov method, Enlarged Conjugate Gradients (ECG), noting the impact of block vectors on parallel performance at scale. Most notably, we observe the increased overhead of point-to-point communication as a result of denser messages in the sparse matrix-block vector multiplication kernel. Additionally, we present models to analyze expected performance of ECG, as well as motivate design decisions. Most importantly, we introduce a new point-to-point communication approach based on node-aware communication techniques that increases efficiency of the method at scale.

References

[1]
Tarun Agarwal, Amit Sharma, A. Laxmikant, and Laxmikant V. Kalé. 2006. Topology-aware task mapping for reducing communication contention on large parallel machines. In Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium. IEEE, 10.
[2]
Hasan Metin Aktulga, Chao Yang, Esmond G. Ng, Pieter Maris, and James P. Vary. 2014. Improving the scalability of a symmetric iterative eigensolver for multi-core platforms. Concurr. Comput.: Pract. Exp. 26, 16 (2014), 2631–2651.
[3]
Albert Alexandrov, Mihai F. Ionescu, Klaus E. Schauser, and Chris Scheiman. 1997. LogGP: Incorporating long messages into the LogP model for parallel computation. J. Parallel Distrib. Comput. 44, 1 (1997), 71–79. DOI:
[4]
Robert Anderson, Julian Andrej, Andrew Barker, Jamie Bramwell, Jean-Sylvain Camier, Jakub Cerveny, Veselin Dobrev, Yohann Dudouit, Aaron Fisher, Tzanio Kolev, Will Pazner, Mark Stowell, Vladimir Tomov, Ido Akkerman, Johann Dahm, David Medina, and Stefano Zampini. 2021. MFEM: A modular finite element methods library. Comput. Math. Appl. 81 (2021), 42--74. DOI:
[5]
Allison H. Baker, Martin Schulz, and Ulrike M. Yang. 2010. On the performance of an algebraic multigrid solver on multicore clusters. In International Conference on High Performance Computing for Computational Science. Springer, 102–115.
[6]
Satish Balay, Shrirang Abhyankar, Mark Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Alp Dener, Victor Eijkhout, W. Gropp, et al. 2019. PETSc Users Manual.
[7]
Amanda Bienz, William Gropp, and Luke Olson. 2019. Node aware sparse matrix–vector multiplication. J. Parallel Distrib. Comput. 130 (08 2019), 166–178. DOI:
[8]
Amanda Bienz, William D. Gropp, and Luke N. Olson. 2018. Improving performance models for irregular point-to-point communication. In Proceedings of the 25th European MPI Users' Group Meeting (EuroMPI'18). Association for Computing Machinery, New York, NY, 8 pages.
[9]
Amanda Bienz, William D. Gropp, and Luke N. Olson. 2020. Reducing communication in algebraic multigrid with multi-step node aware communication. Int. J. High Perf. Comput. Appl. 34, 5 (June 2020), 547–561. DOI:
[10]
Amanda Bienz and Luke N. Olson. 2017. RAPtor: Parallel Algebraic Multigrid v0.1, Release 0.1. Retrieved from https://github.com/raptor-library/raptor.
[11]
Brett Bode, Michelle Butler, Thom Dunning, Torsten Hoefler, William Kramer, William Gropp, and Wen-mei Hwu. 2013. The blue waters super-system for super-science. In Contemporary High Performance Computing. Chapman & Hall/CRC, 339–366. https://www.taylorfrancis.com/books/e/9781466568358.
[12]
Erin Carson, Nicholas Knight, and James Demmel. 2013. Avoiding communication in nonsymmetric Lanczos-based Krylov subspace methods. SIAM J. Sci. Comput. 35, 5 (January 2013), S42–S61. DOI:. arXiv:
[13]
Ümit V. Çatalyürek and Cevdet Aykanat. 1999. Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Distrib. Syst. 10, 7 (1999), 673–693. DOI:
[14]
Ümit V. Çatalyürek, Cevdet Aykanat, and Bora Uçar. 2010. On two-dimensional sparse matrix partitioning: Models, methods, and a recipe. SIAM J. Sci. Comput. 32, 2 (2010), 656–683. DOI:
[15]
A. T. Chronopoulos and C. W. Gear. 1989. s-step iterative methods for symmetric linear systems. J. Comput. Appl. Math. 25, 2 (February 1989), 153–168. DOI:
[16]
S. Cools and W. Vanroose. 2017. The communication-hiding pipelined BiCGstab method for the parallel solution of large unsymmetric linear systems. Parallel Comput. 65 (July 2017), 1–20. DOI:
[17]
David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. 1993. LogP: Towards a realistic model of parallel computation. SIGPLAN Not. 28, 7 (July 1993), 1–12. DOI:
[18]
Timothy A. Davis and Yifan Hu. 2011. The university of florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1, Article 1 (December 2011), 25 pages. DOI:
[19]
P. Ghysels and W. Vanroose. 2014. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm. Parallel Comput. 40, 7 (July 2014), 224–238. DOI:
[20]
Laura Grigori, Sophie Moufawad, and Frederic Nataf. 2016. Enlarged Krylov subspace conjugate gradient methods for reducing communication. SIAM J. Matrix Anal. Appl. 37, 2 (January 2016), 744–773. arXiv:
[21]
Laura Grigori and Olivier Tissot. 2019. Scalable linear solvers based on enlarged Krylov subspaces with dynamic reduction of search directions. SIAM J. Sci. Comput. 41, 5 (January 2019), C522–C547. arXiv:
[22]
William Gropp, Luke N. Olson, and Philipp Samfass. 2016. Modeling MPI communication performance on SMP nodes. In Proceedings of the 23rd European MPI Users’ Group Meeting (EuroMPI’16). ACM Press, New York, NY, 41–50. DOI:
[23]
W. A. Hanson. 2020. The CORAL supercomputer systems. IBM J. Res. Dev. 64, 3/4 (2020), 1:1–1:10.
[24]
Bruce Hendrickson and Tamara G. Kolda. 200. Graph partitioning models for parallel computing. Parallel Comput. 26, 12 (200), 1519–1534. DOI:
[25]
William Kramer, Michelle Butler, Gregory Bauer, Kalyana Chadalavada, and Celso Mendes. 2015. Blue waters parallel I/O storage sub-system. In High Performance Parallel I/O, Prabhat and Quincey Koziol (Eds.). CRC Publications, Taylor & Francis Group, 17–32.
[26]
Hoemmen M. 2010. Communication-Avoiding Krylov Subspace Methods. Ph.D. Dissertation. University of California, Berkeley.
[27]
Tania Malik, Vladimir Rychkov, and Alexey Lastovetsky. 2016. Network-aware optimization of communications for parallel matrix multiplication on hierarchical HPC platforms. Concurr. Comput.: Pract. Exp. 28, 3 (2016), 802–821.
[28]
Lois Curfman McInnes, Barry Smith, Hong Zhang, and Richard Tran Mills. 2014. Hierarchical Krylov and nested Krylov methods for extreme-scale computing. Parallel Comput. 40, 1 (January 2014), 17–31. DOI:
[29]
Marghoob Mohiyuddin, Mark Hoemmen, James Demmel, and Katherine Yelick. 2009. Minimizing communication in sparse matrix solvers. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC’09). Association for Computing Machinery, New York, NY, Article 36, 12 pages. DOI:
[30]
Sophie M. Moufawad. 2020. s-step enlarged krylov subspace conjugate gradient methods. SIAM J. Sci. Comput. 42, 1 (January 2020), A187–A219. arXiv:
[31]
Dianne P. O’Leary. 1980. The block conjugate gradient algorithm and related methods. Lin. Algebr. Appl. 29 (February 1980), 293–322. DOI:
[32]
Brian A. Page and Peter M. Kogge. 2018. Scalability of hybrid sparse matrix dense vector (spmv) multiplication. In Proceedings of the International Conference on High Performance Computing & Simulation (HPCS’18). IEEE, 406–414.
[33]
Jesper Larsson Träff. 2002. Implementing the MPI process topology mechanism. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’02). IEEE, Los Alamitos, CA, 1–14.
[34]
Brendan Vastenhouw and Rob H. Bisseling. 2005. A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. SIAM Rev. 47, 1 (2005), 67–95. DOI:
[35]
Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65–76.

Cited By

View all
  • (2023)Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware ParallelismProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624111(427-437)Online publication date: 12-Nov-2023
  • (2023)Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architecturesParallel Computing10.1016/j.parco.2023.103021116:COnline publication date: 1-Jul-2023
  • (2022)Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementationsThe International Journal of High Performance Computing Applications10.1177/1094342022110788037:2(61-81)Online publication date: 7-Jul-2022

Index Terms

  1. Performance Analysis and Optimal Node-aware Communication for Enlarged Conjugate Gradient Methods

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Parallel Computing
        ACM Transactions on Parallel Computing  Volume 10, Issue 1
        March 2023
        124 pages
        ISSN:2329-4949
        EISSN:2329-4957
        DOI:10.1145/3589020
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 29 March 2023
        Online AM: 17 January 2023
        Accepted: 10 January 2023
        Revised: 15 December 2022
        Received: 10 March 2022
        Published in TOPC Volume 10, Issue 1

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Parallel
        2. communication
        3. node-aware
        4. sparse matrix
        5. collectives

        Qualifiers

        • Research-article

        Funding Sources

        • Department of Energy, National Nuclear Security Administration
        • National Science Foundation
        • National Center for Supercomputing Applications
        • Department of Energy, National Nuclear Security Administration

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)93
        • Downloads (Last 6 weeks)11
        Reflects downloads up to 15 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware ParallelismProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624111(427-437)Online publication date: 12-Nov-2023
        • (2023)Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architecturesParallel Computing10.1016/j.parco.2023.103021116:COnline publication date: 1-Jul-2023
        • (2022)Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementationsThe International Journal of High Performance Computing Applications10.1177/1094342022110788037:2(61-81)Online publication date: 7-Jul-2022

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        Full Text

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media