Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3539781.3539785acmconferencesArticle/Chapter ViewAbstractPublication PagespascConference Proceedingsconference-collections
research-article
Open access

Reducing communication in the conjugate gradient method: a case study on high-order finite elements

Published: 12 July 2022 Publication History

Abstract

Currently, a major bottleneck for several scientific computations is communication, both communication between different processors, so-called horizontal communication, and vertical communication between different levels of the memory hierarchy. With this bottleneck in mind, we target a notoriously communication-bound solver at the core of many high-performance applications, namely the conjugate gradient method (CG). To reduce the communication we present lower bounds on the vertical data movement in CG and go on to make a CG solver with reduced data movement. Using our theoretical analysis we apply our CG solver on a high-performance discretization used in practice, the spectral element method (SEM). Guided by our analysis, we show that for the Poisson equation on modern GPUs we can improve the performance by 30% by both rematerializing the discrete system and by reformulating the system to work on unique degrees of freedom. In order to investigate how horizontal communication can be reduced, we compare CG to two communication-reducing techniques, namely communication-avoiding and pipelined CG. We strong scale up to 4096 CPU cores and showcase performance improvements of upwards of 70% for pipelined CG compared to standard CG when applied on SEM at scale. We show that in addition to improving the scaling capabilities of the solver, initial measurements indicate that the convergence of SEM is largely unaffected by pipelined CG.

References

[1]
Accessed Dec 10 2021. AMD CDNA 2 Architecture. https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf.
[2]
Grey Ballard, Erin Carson, James Demmel, Mark Hoemmen, Nicholas Knight, and Oded Schwartz. 2014. Communication lower bounds and optimal algorithms for numerical linear algebra. Acta Numerica 23 (2014), 1--155.
[3]
Richard Barrett, Michael Berry, Tony F Chan, James Demmel, June Donato, Jack Dongarra, Victor Eijkhout, Roldan Pozo, Charles Romine, and Henk Van der Vorst. 1994. Templates for the solution of linear systems: building blocks for iterative methods. SIAM.
[4]
Keren Bergman, Shekhar Borkar, Dan Campbell, William Carlson, William Dally, Monty Denneau, Paul Franzon, William Harrod, Kerry Hill, Jon Hiller, et al. 2008. Exascale computing study: Technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech. Rep 15 (2008).
[5]
Uday Bondhugula, Muthu Baskaran, Sriram Krishnamoorthy, Jagannathan Ramanujam, Atanas Rountev, and Ponnuswamy Sadayappan. 2008. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In International Conference on Compiler Construction. Springer, 132--146.
[6]
Erin Carson, Nicholas Knight, and James Demmel. 2014. An efficient deflation technique for the communication-avoiding conjugate gradient method. Electronic Transactions on Numerical Analysis 43, 125141 (2014), 09.
[7]
Erin Claire Carson. 2015. Communication-avoiding Krylov subspace methods in theory and practice. University of California, Berkeley.
[8]
Anthony T. Chronopoulos and C. William Gear. 1989. On the efficient implementation of preconditioned s-step conjugate gradient methods on multiprocessors with memory hierarchy. Parallel computing 11, 1 (1989), 37--53.
[9]
Jeffrey Cornelis, Siegfried Cools, and Wim Vanroose. 2018. The communication-hiding conjugate gradient method with deep pipelines. arXiv preprint arXiv:1801.04728 (2018).
[10]
David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten Von Eicken. 1993. LogP: Towards a realistic model of parallel computation. In Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming. 1--12.
[11]
Erik D Demaine and Quanquan C Liu. 2018. Red-blue pebble game: Complexity of computing the trade-off between cache size and memory transfers. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures. 195--204.
[12]
James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou. 2012. Communication-optimal parallel and sequential QR and LU factorizations. SIAM Journal on Scientific Computing 34, 1 (2012), A206--A239.
[13]
James Demmel, Mark Hoemmen, Marghoob Mohiyuddin, and Katherine Yelick. 2008. Avoiding communication in sparse matrix computations. In 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, 1--12.
[14]
Michel O Deville, Paul F Fischer, Paul F Fischer, EH Mund, et al. 2002. High-order methods for incompressible fluid flow. Vol. 9. Cambridge university press.
[15]
Venmugil Elango, Fabrice Rastello, Louis-Noël Pouchet, J Ramanujam, and P Sadayappan. 2013. Data access complexity: The red/blue pebble game revisited. Technical Report. Technical Report.
[16]
Venmugil Elango, Fabrice Rastello, Louis-Noël Pouchet, Jagannathan Ramanujam, and Ponnuswamy Sadayappan. 2014. On characterizing the data movement complexity of computational DAGs for parallel execution. In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures. 296--306.
[17]
Paul Fischer, Stefan Kerkemeier, Misun Min, Yu-Hsiang Lan, Malachi Phillips, Thilina Rathnayake, Elia Merzari, Ananias Tomboulides, Ali Karakus, Noel Chalmers, et al. 2021. NekRS, a GPU-Accelerated Spectral Element Navier-Stokes Solver. arXiv preprint arXiv:2104.05829 (2021).
[18]
Paul Fischer, Misun Min, Thilina Rathnayake, Som Dutta, Tzanio Kolev, Veselin Dobrev, Jean-Sylvain Camier, Martin Kronbichler, Tim Warburton, Kasia Świry-dowicz, et al. 2020. Scalability of high-performance PDE solvers. The International Journal of High Performance Computing Applications 34, 5 (2020), 562--586.
[19]
Paul F Fischer. 2015. Scaling limits for PDE-based simulation. In 22nd AIAA Computational Fluid Dynamics Conference. 3049.
[20]
Pieter Ghysels and Wim Vanroose. 2014. Hiding global synchronization latency in the preconditioned conjugate gradient algorithm. Parallel Comput. 40, 7 (2014), 224--238.
[21]
Anne Greenbaum. 1997. Estimating the attainable accuracy of recursively computed residual methods. SIAM journal on matrix analysis and applications 18, 3 (1997), 535--551.
[22]
Magnus Rudolph Hestenes, Eduard Stiefel, et al. 1952. Methods of conjugate gradients for solving linear systems. Vol. 49. NBS Washington, DC.
[23]
Mark Hoemmen. 2010. Communication-avoiding Krylov subspace methods. University of California, Berkeley.
[24]
Dror Irony, Sivan Toledo, and Alexander Tiskin. 2004. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel and Distrib. Comput. 64, 9 (2004), 1017--1026.
[25]
Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, and Torsten Hoefler. 2021. Data Movement Is All You Need: A Case Study on Optimizing Transformers. Proceedings of Machine Learning and Systems 3 (2021).
[26]
Niclas Jansson, Martin Karp, Artur Podobas, Stefano Markidis, and Philipp Schlatter. 2021. Neko: A Modern, Portable, and Scalable Framework for High-Fidelity Computational Fluid Dynamics. arXiv preprint arXiv:2107.01243 (2021).
[27]
Hong Jia-Wei and Hsiang-Tsung Kung. 1981. I/O complexity: The red-blue pebble game. In Proceedings of the thirteenth annual ACM symposium on Theory of computing. 326--333.
[28]
Martin Karp, Artur Podobas, Tobias Kenter, Niclas Jansson, Christian Plessl, Philipp Schlatter, and Stefano Markidis. 2022. A high-fidelity flow solver for unstructured meshes on field-programmable gate arrays: Design, evaluation, and future challenges. In International Conference on High Performance Computing in Asia-Pacific Region. 125--136.
[29]
Tzanio Kolev, Paul Fischer, Misun Min, Jack Dongarra, Jed Brown, Veselin Dobrev, Tim Warburton, Stanimire Tomov, Mark S Shephard, Ahmad Abdelfattah, et al. 2021. Efficient exascale discretizations: High-order finite element methods. The International Journal of High Performance Computing Applications (2021), 1--26.
[30]
Dimitri Komatitsch, Seiji Tsuboi, Jeroen Tromp, A Levander, and G Nolet. 2005. The spectral-element method in seismology. Geophysical Monograph-American Geophysical Union 157 (2005), 205.
[31]
Grzegorz Kwasniewski, Marko Kabić, Maciej Besta, Joost VandeVondele, Raffaele Solcà, and Torsten Hoefler. 2019. Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--22.
[32]
Jörg Liesen and Petr Tichỳ. 2004. Convergence analysis of Krylov subspace methods. GAMM-Mitteilungen 27, 2 (2004), 153--173.
[33]
James W Lottes and Paul F Fischer. 2005. Hybrid multigrid/Schwarz algorithms for the spectral element method. Journal of Scientific Computing 24, 1 (2005), 45--78.
[34]
Vladimir Marjanović, José Gracia, and Colin W Glass. 2014. Performance modeling of the HPCG benchmark. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems. Springer, 172--192.
[35]
Marghoob Mohiyuddin, Mark Hoemmen, James Demmel, and Katherine Yelick. 2009. Minimizing communication in sparse matrix solvers. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. IEEE, 1--12.
[36]
Anthony T Patera. 1984. A spectral element method for fluid dynamics: laminar flow in a channel expansion. Journal of computational Physics 54, 3 (1984), 468--488.
[37]
James W. Lottes Paul F. Fischer and Stefan G. Kerkemeier. 2008. nek5000 Web page. http://nek5000.mcs.anl.gov.
[38]
Artur Podobas, Kentaro Sano, and Satoshi Matsuoka. 2020. A survey on coarse-grained reconfigurable architectures from a performance perspective. IEEE Access 8 (2020), 146719--146743.
[39]
Samyam Rajbhandari, Akshay Nikam, Pai-Wei Lai, Kevin Stock, Sriram Krishnamoorthy, and Ponnuswamy Sadayappan. 2014. A communication-optimal framework for contracting distributed tensors. In SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 375--386.
[40]
John Shalf, Sudip Dosanjh, and John Morrison. 2010. Exascale computing technology challenges. In International Conference on High Performance Computing for Computational Science. Springer, 1--25.
[41]
Edgar Solomonik, Erin Carson, Nicholas Knight, and James Demmel. 2017. Tradeoffs between synchronization, communication, and computation in parallel linear algebra computations. ACM Transactions on Parallel Computing (TOPC) 3, 1 (2017), 1--47.
[42]
Sivan Avraham Toledo. 1995. Quantitative performance modeling of scientific computations and creating locality in numerical algorithms. Ph.D. Dissertation. Massachusetts Institute of Technology.
[43]
Henry M Tufo and Paul F Fischer. 1999. Terascale spectral element algorithms and implementations. In Proceedings of the 1999 ACM/IEEE Conference on Supercomputing. 68--es.
[44]
Samuel Williams, Mike Lijewski, Ann Almgren, Brian Van Straalen, Erin Carson, Nicholas Knight, and James Demmel. 2014. s-step Krylov subspace methods as bottom solvers for geometric multigrid. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, 1149--1158.
[45]
Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.

Cited By

View all
  • (2023)Large-Scale direct numerical simulations of turbulence using GPUs and modern FortranInternational Journal of High Performance Computing Applications10.1177/1094342023115861637:5(487-502)Online publication date: 1-Sep-2023
  • (2023)In-Situ Techniques on GPU-Accelerated Data-Intensive Applications2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254865(1-10)Online publication date: 9-Oct-2023

Index Terms

  1. Reducing communication in the conjugate gradient method: a case study on high-order finite elements

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PASC '22: Proceedings of the Platform for Advanced Scientific Computing Conference
      June 2022
      181 pages
      ISBN:9781450394109
      DOI:10.1145/3539781
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      In-Cooperation

      • CSCS: Swiss National Supercomputing Centre

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 July 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      PASC '22
      Sponsor:

      Acceptance Rates

      PASC '22 Paper Acceptance Rate 17 of 22 submissions, 77%;
      Overall Acceptance Rate 109 of 221 submissions, 49%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)126
      • Downloads (Last 6 weeks)27
      Reflects downloads up to 12 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Large-Scale direct numerical simulations of turbulence using GPUs and modern FortranInternational Journal of High Performance Computing Applications10.1177/1094342023115861637:5(487-502)Online publication date: 1-Sep-2023
      • (2023)In-Situ Techniques on GPU-Accelerated Data-Intensive Applications2023 IEEE 19th International Conference on e-Science (e-Science)10.1109/e-Science58273.2023.10254865(1-10)Online publication date: 9-Oct-2023

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media