Article

Combining performance aspects of irregular gauss-seidel via sparse tiling

Authors:

Michelle Mills Strout,

Jeanne Ferrante,

Jonathan Freeman,

Barbara KreaseckAuthors Info & Claims

LCPC'02: Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Pages 90 - 110

https://doi.org/10.1007/11596110_7

Published: 25 July 2002 Publication History

Abstract

Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as Gauss-Seidel. To improve performance, iterative smoothers can exploit parallelism, intra-iteration data reuse, and inter-iteration data reuse. Current methods for parallelizing Gauss-Seidel on irregular grids, such as multi-coloring and owner-computes based techniques, exploit parallelism and possibly intra-iteration data reuse but not inter-iteration data reuse. Sparse tiling techniques were developed to improve intra-iteration and inter-iteration data locality in iterative smoothers. This paper describes how sparse tiling can additionally provide parallelism. Our results show the effectiveness of Gauss-Seidel parallelized with sparse tiling techniques on shared memory machines, specifically compared to owner-computes based Gauss-Seidel methods. The latter employ only parallelism and intra-iteration locality. Our results support the premise that better performance occurs when all three performance aspects (parallelism, intra-iteration, and inter-iteration data locality) are combined.

References

[1]

Mark F. Adams. Finite element market. http://www.cs.berkeley.edu/~madams/femarket/ index.html.

[2]

Mark F. Adams. A distributed memory unstructured Gauss-Seidel algorithm for multigrid smoothers. In ACM, editor, SC2001: High Performance Networking and Computing. Denver, CO, 2001.

Digital Library

[3]

Mark F. Adams. Evaluation of three unstructured multigrid methods on 3D finite element problems in solid mechanics. International Journal for Numerical Methods in Engineering, To Appear.

[4]

George Almsi and David Padua. Majic: Compiling matlab for speed and responsiveness. In PLDI 2002, 2002.

Digital Library

[5]

R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994.

[6]

Frederico Bassetti, Kei Davis, and Dan Quinlan. Optimizing transformations of stencil operations for parallel object-oriented scientific frameworks on cache-based architectures. Lecture Notes in Computer Science, 1505, 1998.

Digital Library

[7]

Emergy Berger, Calvin Lin, and Samuel Z. Guyer. Customizing software libraries for performance portability. In 10th SIAM Conference on Parallel Processing for Scientific Computing, March 2001.

[8]

Steve Carr and Ken Kennedy. Compiler blockability of numerical algorithms. The Journal of Supercomputing, pages 114-124, November 1992.

[9]

Arun Chauhan and Ken Kennedy. Optimizing strategies for telescoping languages: Procedure strength reduction and procedure vectorization. In Proceedings of the 15th ACM International Conference on Supercomputing, pages 92-102, New York, 2001.

Digital Library

[10]

Joseph Culberson. Graph coloring programs. http://www.cs.ualberta.ca/~joe/Coloring/ Colorsrc/index.html.

[11]

Chen Ding and Ken Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN '99 Conference on Programming Language Design and Implementation, pages 229-241, Atlanta, Georgia, May 1-4, 1999.

Digital Library

[12]

Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rüde, and Christian Weiß. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21-40, February 2000.

[13]

Dawson R. Engler. Interface compilation: Steps toward compiling program interfaces as languages. IEEE Transactions on Software Engineering, 25(3):387-400, May/June 1999.

Digital Library

[14]

Dennis Gannon, William Jalby, and Kyle Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587-616, October 1988.

Digital Library

[15]

M.J. Hagger. Automatic domain decomposition on unstructured grids (doug). Advances in Computational Mathematics, (9):281-310, 1998.

[16]

Hwansoo Han and Chau-Wen Tseng. A comparison of locality transformations for irregular codes. In 5th International Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers (LCR'2000). Springer, 2000.

Digital Library

[17]

Van Emden Henson and Ulrike Meier Yang. BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics: Transactions of IMACS, 41(1):155-177, 2002.

Digital Library

[18]

Michael Holst. Fetk - the finite element tool kit. http://www.fetk.org.

[19]

Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiply. Ph.d. thesis, University of California, Berkeley, May 2000.

Digital Library

[20]

F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the 15th Annual ACM SIGPLAN Symposium on Priniciples of Programming Languages, pages 319-329, 1988.

Digital Library

[21]

Guohua Jin, John Mellor-Crummey, and Robert Fowler. Increasing temporal locality with skewing and recursive blocking. In SC2001: High Performance Networking and Computing, Denver, Colorodo, November 2001. ACM Press and IEEE Computer Society Press.

Digital Library

[22]

George Karypis and Vipin Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96-129, 10 January 1998.

Digital Library

[23]

Induprakas Kodukula, Nawaaz Ahmed, and Keshav Pingali. Data-centric multi-level blocking. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI-97), volume 32, 5 of ACM SIGPLAN Notices, pages 346-357, New York, June 15-18 1997. ACM Press.

Digital Library

[24]

Kathryn S. McKinley, Steve Carr, and Chau-Wen Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424-453, July 1996.

Digital Library

[25]

John Mellor-Crummey, David Whalley, and Ken Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 Conference on Supercomputing, ACM SIGARCH, pages 425-433, June 1999.

Digital Library

[26]

Nicholas Mitchell, Larry Carter, and Jeanne Ferrante. Localizing non-affine array references. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT '99), pages 192-202, Newport Beach, California, October 12-16, 1999. IEEE Computer Society Press.

Digital Library

[27]

William Pugh and Evan Rosser. Iteration space slicing for locality. In LCPC Workshop, La Jolla, California, August 1999. LCPC99 website.

Digital Library

[28]

Dan Quinlan. Rose: Compiler support for object-oriented frameworks. In Proceedings of Conference on Parallel Compilers (CPC2000), Aussois, France, January 2000. Also published in a special issue of Parallel Processing Letters, Vol.10.

[29]

Sriram Sellappa and Siddhartha Chatterjee. Cache-efficient multigrid algorithms. In V.N. Alexandrov, J.J. Dongarra, and C.J.K. Tan, editors, Proceedings of the 2001 International Conference on Computational Science, Lecture Notes in Computer Science, San Francisco, CA, USA, May 28-30, 2001. Springer.

Digital Library

[30]

Shamik D. Sharma, Ravi Ponnusamy, Bongki Moon, Yuan-Shin Hwang, Raja Das, and Joel Saltz. Run-time and compile-time support for adaptive irregular problems. In Supercomputing '94. IEEE Computer Society, 1994.

Digital Library

[31]

Barry F. Smith, Petter E. Bjørstad, and William Gropp. Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, 1996.

Digital Library

[32]

Yonghong Song and Zhiyuan Li. New tiling techniques to improve cache temporal locality. ACM SIGPLAN Notices, 34(5):215-228, May 1999.

Digital Library

[33]

Michelle Mills Strout, Larry Carter, and Jeanne Ferrante. Rescheduling for locality in sparse matrix computations. In V.N. Alexandrov, J.J. Dongarra, and C.J.K. Tan, editors, Proceedings of the 2001 International Conference on Computational Science, Lecture Notes in Computer Science, New Haven, Connecticut, May 28-30, 2001. Springer.

Digital Library

[34]

Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. In Programming Language Design and Implementation, 1991.

Digital Library

[35]

Michael J. Wolfe. Iteration space tiling for memory hierarchies. In Third SIAM Conference on Parallel Processing for Scientific Computing, pages 357-361, 1987.

Digital Library

[36]

David Wonnacott. Achieving scalable locality with time skewing. International Journal of Parallel Programming, 30(3):181-221, 2002.

Digital Library

Cited By

Luporini FLange MJacobs CGorman GRamanujam JKelly P(2019)Automated Tiling of Unstructured Mesh Computations with Application to Seismological ModelingACM Transactions on Mathematical Software10.1145/330225645:2(1-30)Online publication date: 3-May-2019
https://dl.acm.org/doi/10.1145/3302256
Cheshmi KKamil SStrout MDehnavi M(2018)ParSyProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291739(1-15)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.5555/3291656.3291739
Wang XLiu WXue WWu L(2018)swSpTRSVACM SIGPLAN Notices10.1145/3200691.317851353:1(338-353)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178513
Show More Cited By

Index Terms

Combining performance aspects of irregular gauss-seidel via sparse tiling
1. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Index terms have been assigned to the content through auto-classification.

Recommendations

Parallel iteration space alternate tiling Gauss-Seidel solver
CLUSTER '07: Proceedings of the 2007 IEEE International Conference on Cluster Computing

Many important scientific kernels compute solutions using finite difference techniques, and the most time consuming part of them is the iterative method, such as Gauss-Seidel or SOR. To improve performance, iterative method can exploit parallelism, ...
Tiling imperfectly-nested loop nests
SC '00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing

Tiling is one of the more important transformations for enhancing loca lity of reference in programs. Intuitively, tiling a set of loops achieves the effect of interleaving iterations of these loops. Tiling of perfectly-nested loop nests (which are loop ...
Tiling Imperfectly-nested Loops

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

LCPC'02: Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

July 2002

376 pages

ISBN:3540307818

Editors:
Bill Pugh
Deptartment of Computer Science, University of Maryland, 4135 A.V. Williams Bldg., College Park, MD
,
Chau-Wen Tseng
Dept. of Computer Science, Univ. of Maryland at College Park, 4135 A.V. Williams Bldg., College Park, MD

Sponsors

UMIACS: U of MD Inst for Advanced Comp Studies

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 25 July 2002

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Luporini FLange MJacobs CGorman GRamanujam JKelly P(2019)Automated Tiling of Unstructured Mesh Computations with Application to Seismological ModelingACM Transactions on Mathematical Software10.1145/330225645:2(1-30)Online publication date: 3-May-2019
https://dl.acm.org/doi/10.1145/3302256
Cheshmi KKamil SStrout MDehnavi M(2018)ParSyProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291739(1-15)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.5555/3291656.3291739
Wang XLiu WXue WWu L(2018)swSpTRSVACM SIGPLAN Notices10.1145/3200691.317851353:1(338-353)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178513
Wang XLiu WXue WWu LKrall AGross T(2018)swSpTRSVProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178513(338-353)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3178487.3178513
Cheshmi KKamil SStrout MDehnavi M(2018)ParSyProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00065(1-15)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.1109/SC.2018.00065
Venkat AMohammadi MPark JRong HBarik RStrout MHall MWest J(2016)Automating wavefront parallelization for sparse matrix computationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014959(1-12)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3014904.3014959
Chen LJiang PAgrawal GFranke BWu YRastello F(2016)Exploiting recent SIMD architectural advances for irregular applicationsProceedings of the 2016 International Symposium on Code Generation and Optimization10.1145/2854038.2854046(47-58)Online publication date: 29-Feb-2016
https://dl.acm.org/doi/10.1145/2854038.2854046
Ravishankar MEisenlohr JPouchet LRamanujam JRountev ASadayappan P(2014)Automatic parallelization of a class of irregular loops for distributed memory systemsACM Transactions on Parallel Computing10.1145/26602511:1(1-37)Online publication date: 3-Oct-2014
https://dl.acm.org/doi/10.1145/2660251
Grosser TCohen AKelly PRamanujam JSadayappan PVerdoolaege S(2013)Split tiling for GPUsProceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units10.1145/2458523.2458526(24-31)Online publication date: 16-Mar-2013
https://dl.acm.org/doi/10.1145/2458523.2458526
Ravishankar MEisenlohr JPouchet LRamanujam JRountev ASadayappan PHollingsworth J(2012)Code generation for parallel execution of a class of irregular loops on distributed memory systemsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/2388996.2389094(1-11)Online publication date: 10-Nov-2012
https://dl.acm.org/doi/10.5555/2388996.2389094
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents