Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods

Published: 01 January 2012 Publication History

Abstract

Algebraic multigrid methods for large, sparse linear systems are a necessity in many computational simulations, yet parallel algorithms for such solvers are generally decomposed into coarse-grained tasks suitable for distributed computers with traditional processing cores. However, accelerating multigrid methods on massively parallel throughput-oriented processors, such as graphics processing units, demands algorithms with abundant fine-grained parallelism. In this paper, we develop a parallel algebraic multigrid method which exposes substantial fine-grained parallelism in both the construction of the multigrid hierarchy as well as the cycling or solve stage. Our algorithms are expressed in terms of scalable parallel primitives that are efficiently implemented on the GPU. The resulting solver achieves an average speedup of $1.8\times$ in the setup phase and $5.7\times$ in the cycling phase when compared to a representative CPU implementation.

References

[1]
M. Adams, M. Brezina, J. Hu, and R. Tuminaro, Parallel multigrid smoothing: Polynomial versus Gauss-Seidel, J. Comput. Phys., 188 (2003), pp. 593--610.
[2]
A. H. Baker, T. Gamblin, M. Schulz, and U. M. Yang, Challenges of scaling algebraic multigrid across modern multicore architectures, in Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium, 2011.
[3]
R. E. Bank and C. C. Douglas, Sparse matrix multiplication package (SMMP), Adv. Comput. Math., 1 (1993), pp. 127--137.
[4]
M. M. Baskaran and R. Bordawekar, Optimizing Sparse Matrix-Vector Multiplication on GPUs, Research report RC24704, IBM, 2009.
[5]
N. Bell and M. Garland, Efficient Sparse Matrix-Vector Multiplication on CUDA, Technical report NVR-2008-004, NVIDIA Corporation, 2008.
[6]
N. Bell and M. Garland, Implementing sparse matrix-vector multiplication on throughput-oriented processors, in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, New York, ACM, 2009, pp. 18:1--18:11.
[7]
N. Bell and M. Garland, CUSP: Generic Parallel Algorithms for Sparse Matrix and Graph Computations, http://code.google.com/p/cusp-library (2009).
[8]
G. E. Blelloch, Vector Models for Data-Parallel Computing, MIT Press, Cambridge, MA, 1990.
[9]
J. Bolz, I. Farmer, E. Grinspun, and P. Schröoder, Sparse matrix solvers on the GPU: Conjugate gradients and multigrid, ACM Trans. Graph., 22 (2003), pp. 917--924.
[10]
E. Chow, R. D. Falgout, J. J. Hu, R. S. Tuminaro, and U. M. Yang, A survey of parallelization techniques for multigrid solvers, Parallel Processing for Scientific Computing, SIAM, Philadelphia, 2006, pp. 179--202.
[11]
M. Christen, O. Schenk, and H. Burkhar, General-purpose sparse matrix building blocks using the NVIDIA CUDA technology platform, in Proceedings of the First Workshop on General Purpose Processing on Graphics Processing Units, Northeastern University, Boston, MA, 2007.
[12]
A. J. Cleary, R. D. Falgout, V. E. Henson, J. E. Jones, T. A. Manteuffel, S. F. McCormick, G. N. Miranda, and J. W. Ruge, Robustness and scalability of algebraic multigrid, SIAM J. Sci. Comput., 21 (2000), pp. 1886--1908.
[13]
J. M. Cohen and M. J. Molemaker, A fast double precision CFD code using CUDA, in Proceedings of the 21st International Conference on Parallel Computational Fluid Dynamics, 2009.
[14]
Cublas Library, Version 3.1, NVIDIA Corporation, 2010, http://developer.nvidia.com/cublas.
[15]
E. F. DAzevedo, M. R. Fahey, and R. T. Mills, Vectorized sparse matrix multiply for compressed row storage format, in Proceedings of the International Conference on Computational Science, Springer, New York, 2005, pp. 99--106.
[16]
M. Garland and D. B. Kirk, Understanding throughput-oriented architectures, Commun. ACM, 53 (2010), pp. 58--66.
[17]
M. W. Gee, C. M. Siefert, J. J. Hu, R. S. Tuminaro, and M. G. Sala, ML $5.0$ Smoothed Aggregation User's Guide, Technical report SAND2006-2649, Sandia National Laboratories, Livermore, CA, 2006.
[18]
D. Göddeke, R. Strzodka, J. Mohd-Yusof, P. S. McCormick, H. Wobker, C. Becker, and S. Turek, Using GPUs to improve multigrid solver performance on a cluster, Int. J. Comput. Sci. Engrg., 4 (2008), pp. 36--55.
[19]
N. Goodnight, G. Lewin, D. Luebke, and K. Skadron, A multigrid solver for boundary value problems using programmable graphics hardware, in HWWS '03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, Aire-la-Ville, Switzerland, Eurographics Association, 2003, pp. 102--111.
[20]
F. G. Gustavson, Two fast algorithms for sparse matrices: Multiplication and permuted transposition, ACM Trans. Math. Softw., 4 (1978), pp. 250--269.
[21]
G. Haase, M. Liebmann, G. Plank, and C. Douglas, Parallel algebraic multigrid on general purpose GPUS, in Proceedings of the 3rd Austrian Grid Symposium, J. Volkert et al., ed., 2010, pp. 28--37.
[22]
V. E. Henson and U. M. Yang, Boomerang: A parallel algebraic multigrid solver and preconditioner, Appl. Numer. Math., 41 (2002), pp. 155--177.
[23]
J. Hoberock and N. Bell, Thrust: A Parallel Template Library, Version 1.4.0, 2011, http:// thrust/github.com.
[24]
G. Karypis and V. Kumar, Parallel multilevel k-way partitioning scheme for irregular graphs, SIAM Rev., 41 (1999), pp. 278--300.
[25]
M. Kazhdan and H. Hoppe, Streaming multigrid for gradient-domain operations on large images, in Proceedings of SIGGRAPH '08, New York, ACM, 2008, pp. 21:1--21:10.
[26]
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, Basic linear algebra subprograms for Fortran usage, ACM Trans. Math. Software, 5 (1979), pp. 308--323.
[27]
M. Luby, A simple parallel algorithm for the maximal independent set problem, SIAM J. Comput., 15 (1986), pp. 1036--1055.
[28]
D. G. Merrill and A. S. Grimshaw, Revisiting Sorting for GPGNU Stream Architectures, Technical report CS2010-03, Department of Computer Science, University of Virginia, Charlottesville, VA, 2010.
[29]
D. Merrill and A. Grimshaw, Parallel Scan for Stream Architectures, Technical report CS2009-14, Department of Computer Science, University of Virginia, Charlottesville, VA, 2009.
[30]
NVIDIA CUDA Programming Guide, version 4.0, NVIDIA Corporation, 2011, http:// developer.nvidia.com/cuda.
[31]
L. N. Olson, J. Schroder, and R. S. Tuminaro, A new perspective on strength measures in algebraic multigrid, Numer. Linear Algebra Appl., 17 (2010), pp. 713--733.
[32]
J. W. Ruge and K. Stüben, Algebraic multigrid, in Multigrid Methods, Front. Appl. Math. 3, SIAM, Philadelphia, 1987, pp. 73--130.
[33]
S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens, Scan primitives for GPU computing, in Proceedings of Graphics Hardware 2007, ACM, 2007, pp. 97--106.
[34]
M. Stürmer, H. Köstler, and U. Rüde, How to optimize geometric multigrid methods on GPUS, in Proceedings of the 15th Copper Mountain Conference on Multigrid Methods, 2011.
[35]
R. S. Tuminaro and C. Tong, Parallel smoothed aggregation multigrid : Aggregation strategies on massively parallel machines, in Proceedings of the Supercomp Conference, 2000, p. 5.
[36]
S. Tzeng and L.-Y. Wei, Parallel white noise generation on a GPU via cryptographic hash, in Proceedings of the 2008 Symposium on Interactive 3D Graphics and Games, ACM, 2008, pp. 79--87.
[37]
P. Vaněk, J. Mandel, and M. Brezina, Algebraic Multigrid by Smoothed Aggregation for Second and Fourth Order Elliptic Problems, Computing, 56 (1996), pp. 179--196.
[38]
R. W. Vuduc and H.-J. Moon, Fast sparse matrix-vector multiplication by exploiting variable block structure, in Proceedings of the High Performance Computing and Communications: First International Conference, HPCC 2005, Sorrento, Italy, 2005.
[39]
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel, Optimization of sparse matrix-vector multiplication on emerging multicore platforms, in Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, 2007.
[40]
F. Zafar, M. Olano, and A. Curtis, GPU random numbers via the tiny encryption algorithm, in Proceedings of the Conference on High Performance Graphics, Eurographics Association, 2010, pp. 133--141.

Cited By

View all
  • (2024)Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUsInternational Journal of High Performance Computing Applications10.1177/1094342024123192838:3(245-259)Online publication date: 1-May-2024
  • (2024)SaSpGEMM: Sorting-Avoiding Sparse General Matrix-Matrix Multiplication on Multi-Core ProcessorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673054(1166-1175)Online publication date: 12-Aug-2024
  • (2024)Optimization of Sparse Matrix Computation for Algebraic Multigrid on GPUsACM Transactions on Architecture and Code Optimization10.1145/366492421:3(1-27)Online publication date: 15-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image SIAM Journal on Scientific Computing
SIAM Journal on Scientific Computing  Volume 34, Issue 4
2012
767 pages
ISSN:1064-8275
DOI:10.1137/sjoce3.34.4
Issue’s Table of Contents

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 2012

Author Tags

  1. algebraic multigrid
  2. parallel
  3. sparse
  4. graphics processing units
  5. iterative

Author Tags

  1. 65-04
  2. 68-04
  3. 65F08
  4. 65F50
  5. 68W10

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUsInternational Journal of High Performance Computing Applications10.1177/1094342024123192838:3(245-259)Online publication date: 1-May-2024
  • (2024)SaSpGEMM: Sorting-Avoiding Sparse General Matrix-Matrix Multiplication on Multi-Core ProcessorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673054(1166-1175)Online publication date: 12-Aug-2024
  • (2024)Optimization of Sparse Matrix Computation for Algebraic Multigrid on GPUsACM Transactions on Architecture and Code Optimization10.1145/366492421:3(1-27)Online publication date: 15-May-2024
  • (2024)Compilation of Modular and General Sparse WorkspacesProceedings of the ACM on Programming Languages10.1145/36564268:PLDI(1213-1238)Online publication date: 20-Jun-2024
  • (2024)Optimizing sparse general matrix–matrix multiplication for DCUsThe Journal of Supercomputing10.1007/s11227-024-06234-280:14(20176-20200)Online publication date: 30-May-2024
  • (2023)Generalizing downsampling from regular data to graphsProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i6.25824(6718-6727)Online publication date: 7-Feb-2023
  • (2023)End-to-end GPU acceleration of low-order-refined preconditioning for high-order finite element discretizationsInternational Journal of High Performance Computing Applications10.1177/1094342023117546237:5(578-599)Online publication date: 1-Sep-2023
  • (2023)A New Sparse GEneral Matrix-matrix Multiplication Method for Long Vector Architecture by Hierarchical Row MergingProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3625131(756-759)Online publication date: 12-Nov-2023
  • (2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
  • (2023)Efficient Execution of SpGEMM on Long Vector ArchitecturesProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3593000(101-113)Online publication date: 7-Aug-2023
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media