Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/11596110_7guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Combining performance aspects of irregular gauss-seidel via sparse tiling

Published: 25 July 2002 Publication History

Abstract

Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as Gauss-Seidel. To improve performance, iterative smoothers can exploit parallelism, intra-iteration data reuse, and inter-iteration data reuse. Current methods for parallelizing Gauss-Seidel on irregular grids, such as multi-coloring and owner-computes based techniques, exploit parallelism and possibly intra-iteration data reuse but not inter-iteration data reuse. Sparse tiling techniques were developed to improve intra-iteration and inter-iteration data locality in iterative smoothers. This paper describes how sparse tiling can additionally provide parallelism. Our results show the effectiveness of Gauss-Seidel parallelized with sparse tiling techniques on shared memory machines, specifically compared to owner-computes based Gauss-Seidel methods. The latter employ only parallelism and intra-iteration locality. Our results support the premise that better performance occurs when all three performance aspects (parallelism, intra-iteration, and inter-iteration data locality) are combined.

References

[1]
Mark F. Adams. Finite element market. http://www.cs.berkeley.edu/~madams/femarket/ index.html.
[2]
Mark F. Adams. A distributed memory unstructured Gauss-Seidel algorithm for multigrid smoothers. In ACM, editor, SC2001: High Performance Networking and Computing. Denver, CO, 2001.
[3]
Mark F. Adams. Evaluation of three unstructured multigrid methods on 3D finite element problems in solid mechanics. International Journal for Numerical Methods in Engineering, To Appear.
[4]
George Almsi and David Padua. Majic: Compiling matlab for speed and responsiveness. In PLDI 2002, 2002.
[5]
R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition. SIAM, Philadelphia, PA, 1994.
[6]
Frederico Bassetti, Kei Davis, and Dan Quinlan. Optimizing transformations of stencil operations for parallel object-oriented scientific frameworks on cache-based architectures. Lecture Notes in Computer Science, 1505, 1998.
[7]
Emergy Berger, Calvin Lin, and Samuel Z. Guyer. Customizing software libraries for performance portability. In 10th SIAM Conference on Parallel Processing for Scientific Computing, March 2001.
[8]
Steve Carr and Ken Kennedy. Compiler blockability of numerical algorithms. The Journal of Supercomputing, pages 114-124, November 1992.
[9]
Arun Chauhan and Ken Kennedy. Optimizing strategies for telescoping languages: Procedure strength reduction and procedure vectorization. In Proceedings of the 15th ACM International Conference on Supercomputing, pages 92-102, New York, 2001.
[10]
Joseph Culberson. Graph coloring programs. http://www.cs.ualberta.ca/~joe/Coloring/ Colorsrc/index.html.
[11]
Chen Ding and Ken Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN '99 Conference on Programming Language Design and Implementation, pages 229-241, Atlanta, Georgia, May 1-4, 1999.
[12]
Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rüde, and Christian Weiß. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21-40, February 2000.
[13]
Dawson R. Engler. Interface compilation: Steps toward compiling program interfaces as languages. IEEE Transactions on Software Engineering, 25(3):387-400, May/June 1999.
[14]
Dennis Gannon, William Jalby, and Kyle Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587-616, October 1988.
[15]
M.J. Hagger. Automatic domain decomposition on unstructured grids (doug). Advances in Computational Mathematics, (9):281-310, 1998.
[16]
Hwansoo Han and Chau-Wen Tseng. A comparison of locality transformations for irregular codes. In 5th International Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers (LCR'2000). Springer, 2000.
[17]
Van Emden Henson and Ulrike Meier Yang. BoomerAMG: A parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics: Transactions of IMACS, 41(1):155-177, 2002.
[18]
Michael Holst. Fetk - the finite element tool kit. http://www.fetk.org.
[19]
Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiply. Ph.d. thesis, University of California, Berkeley, May 2000.
[20]
F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the 15th Annual ACM SIGPLAN Symposium on Priniciples of Programming Languages, pages 319-329, 1988.
[21]
Guohua Jin, John Mellor-Crummey, and Robert Fowler. Increasing temporal locality with skewing and recursive blocking. In SC2001: High Performance Networking and Computing, Denver, Colorodo, November 2001. ACM Press and IEEE Computer Society Press.
[22]
George Karypis and Vipin Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96-129, 10 January 1998.
[23]
Induprakas Kodukula, Nawaaz Ahmed, and Keshav Pingali. Data-centric multi-level blocking. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI-97), volume 32, 5 of ACM SIGPLAN Notices, pages 346-357, New York, June 15-18 1997. ACM Press.
[24]
Kathryn S. McKinley, Steve Carr, and Chau-Wen Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424-453, July 1996.
[25]
John Mellor-Crummey, David Whalley, and Ken Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 Conference on Supercomputing, ACM SIGARCH, pages 425-433, June 1999.
[26]
Nicholas Mitchell, Larry Carter, and Jeanne Ferrante. Localizing non-affine array references. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT '99), pages 192-202, Newport Beach, California, October 12-16, 1999. IEEE Computer Society Press.
[27]
William Pugh and Evan Rosser. Iteration space slicing for locality. In LCPC Workshop, La Jolla, California, August 1999. LCPC99 website.
[28]
Dan Quinlan. Rose: Compiler support for object-oriented frameworks. In Proceedings of Conference on Parallel Compilers (CPC2000), Aussois, France, January 2000. Also published in a special issue of Parallel Processing Letters, Vol.10.
[29]
Sriram Sellappa and Siddhartha Chatterjee. Cache-efficient multigrid algorithms. In V.N. Alexandrov, J.J. Dongarra, and C.J.K. Tan, editors, Proceedings of the 2001 International Conference on Computational Science, Lecture Notes in Computer Science, San Francisco, CA, USA, May 28-30, 2001. Springer.
[30]
Shamik D. Sharma, Ravi Ponnusamy, Bongki Moon, Yuan-Shin Hwang, Raja Das, and Joel Saltz. Run-time and compile-time support for adaptive irregular problems. In Supercomputing '94. IEEE Computer Society, 1994.
[31]
Barry F. Smith, Petter E. Bjørstad, and William Gropp. Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, 1996.
[32]
Yonghong Song and Zhiyuan Li. New tiling techniques to improve cache temporal locality. ACM SIGPLAN Notices, 34(5):215-228, May 1999.
[33]
Michelle Mills Strout, Larry Carter, and Jeanne Ferrante. Rescheduling for locality in sparse matrix computations. In V.N. Alexandrov, J.J. Dongarra, and C.J.K. Tan, editors, Proceedings of the 2001 International Conference on Computational Science, Lecture Notes in Computer Science, New Haven, Connecticut, May 28-30, 2001. Springer.
[34]
Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. In Programming Language Design and Implementation, 1991.
[35]
Michael J. Wolfe. Iteration space tiling for memory hierarchies. In Third SIAM Conference on Parallel Processing for Scientific Computing, pages 357-361, 1987.
[36]
David Wonnacott. Achieving scalable locality with time skewing. International Journal of Parallel Programming, 30(3):181-221, 2002.

Cited By

View all
  • (2019)Automated Tiling of Unstructured Mesh Computations with Application to Seismological ModelingACM Transactions on Mathematical Software10.1145/330225645:2(1-30)Online publication date: 3-May-2019
  • (2018)ParSyProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291739(1-15)Online publication date: 11-Nov-2018
  • (2018)swSpTRSVACM SIGPLAN Notices10.1145/3200691.317851353:1(338-353)Online publication date: 10-Feb-2018
  • Show More Cited By

Index Terms

  1. Combining performance aspects of irregular gauss-seidel via sparse tiling
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    LCPC'02: Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
    July 2002
    376 pages
    ISBN:3540307818
    • Editors:
    • Bill Pugh,
    • Chau-Wen Tseng

    Sponsors

    • UMIACS: U of MD Inst for Advanced Comp Studies

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 25 July 2002

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Automated Tiling of Unstructured Mesh Computations with Application to Seismological ModelingACM Transactions on Mathematical Software10.1145/330225645:2(1-30)Online publication date: 3-May-2019
    • (2018)ParSyProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291739(1-15)Online publication date: 11-Nov-2018
    • (2018)swSpTRSVACM SIGPLAN Notices10.1145/3200691.317851353:1(338-353)Online publication date: 10-Feb-2018
    • (2018)swSpTRSVProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178513(338-353)Online publication date: 10-Feb-2018
    • (2018)ParSyProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00065(1-15)Online publication date: 11-Nov-2018
    • (2016)Automating wavefront parallelization for sparse matrix computationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014959(1-12)Online publication date: 13-Nov-2016
    • (2016)Exploiting recent SIMD architectural advances for irregular applicationsProceedings of the 2016 International Symposium on Code Generation and Optimization10.1145/2854038.2854046(47-58)Online publication date: 29-Feb-2016
    • (2014)Automatic parallelization of a class of irregular loops for distributed memory systemsACM Transactions on Parallel Computing10.1145/26602511:1(1-37)Online publication date: 3-Oct-2014
    • (2013)Split tiling for GPUsProceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units10.1145/2458523.2458526(24-31)Online publication date: 16-Mar-2013
    • (2012)Code generation for parallel execution of a class of irregular loops on distributed memory systemsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/2388996.2389094(1-11)Online publication date: 10-Nov-2012
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media