Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2834899.2834907acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Public Access

GPU-accelerated co-design of induced dimension reduction: algorithmic fusion and kernel overlap

Published: 15 November 2015 Publication History

Abstract

In this paper we present an optimized GPU co-design of the Induced Dimension Reduction (IDR) algorithm for solving linear systems. Starting from a baseline implementation based on the generic BLAS routines from the MAGMA software library, we apply optimizations that are based on kernel fusion and kernel overlap. Runtime experiments are used to investigate the benefit of the distinct optimization techniques for different variants of the IDR algorithm. A comparison to the reference implementation reveals that the interplay between them can succeed in cutting the overall runtime by up to about one third.

References

[1]
The induced dimension reduction method; http://ta.twi.tudelft.nl/nw/users/gijzen/IDR.html.
[2]
The top 500 list, http://www.top.org/.
[3]
ViennaCL. http://viennacl.sourceforge.net/, 2015.
[4]
J. Aliaga, J. Perez, E. Quintana-Orti, and H. Anzt. Reformulated Conjugate Gradient for the Energy-Aware Solution of Linear Systems on GPUs. In Parallel Processing (ICPP), 2013 42nd International Conference on, pages 320--329, Oct 2013.
[5]
J. I. Aliaga, J. Pérez, and E. S. Quintana-Ortí. Systematic fusion of cuda kernels for iterative sparse linear system solvers. In Lecture Notes in Computer Science, Euro-Par 2015, accepted.
[6]
H. Anzt, W. Sawyer, S. Tomov, P. Luszczek, and J. Dongarra. Acceleration of GPU-based Krylov Solvers via Data Transfer Reduction. International Journal of High Performance Computing, 2015.
[7]
T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. ACM Transactions on Mathematical Software, 38(1):1--25, 2011.
[8]
A. Dorostkar, D. Lukarski, B. Lund, M. Neytcheva, Y. Notay, and P. Schmidt. CPU and GPU performance of large scale numerical simulations in geophysics. In Euro-Par 2014: Parallel Processing Workshops, volume 8805 of Lecture Notes in Computer Science, pages 12--23. Springer International Publishing, 2014.
[9]
J. Filipovic, M. Madzin, J. Fousek, and L. Matyska. Optimizing CUDA code by kernel fusion---application on BLAS. CoRR, abs/1305.1183, 2013.
[10]
H. Knibbe, C. Oosterlee, and C. Vuik. GPU implementation of a helmholtz krylov solver preconditioned by a shifted laplace multigrid method. Journal of Computational and Applied Mathematics, 236(3):281--293, 2011. Aspects of Numerical Algorithms, Parallelization and Applications.
[11]
P. Kogge et al. ExaScale computing study: Technology challenges in achieving ExaScale systems, 2008.
[12]
R. Li and Y. Saad. GPU-accelerated preconditioned iterative linear solvers. The Journal of Supercomputing, 63(2):443--466, 2013.
[13]
M. Lukash, K. Rupp, and S. Selberherr. Sparse Approximate Inverse Preconditioners for Iterative Solvers on GPUs. In HPC '12: Proceedings of the 2012 Symposium on High Performance Computing, pages 1--8, San Diego, CA, USA, 2012. Society for Computer Simulation International.
[14]
MAGMA 1.6.2. http://icl.cs.utk.edu/magma/, 2015.
[15]
PARALUTION. http://www.paralution.com/, 2015.
[16]
NVIDIA Corporation. CUDA Toolkit v7.0, March 2015.
[17]
NVIDIA Corporation. cuSPARSE Toolkit v7.0, v7.0 edition, March 2015.
[18]
NVIDIA Corporation v7.0. CUDA cuBLAS Toolkit, March 2015.
[19]
O. Rendel, A. Rizvanolli, and J.-P. M. Zemke. IDR: A new generation of Krylov subspace methods? Linear Algebra and its Applications, 439(4):1040--1061, 2013. 17th Conference of the International Linear Algebra Society, Braunschweig, Germany, August 2011.
[20]
Y. Saad. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2003.
[21]
V. Simoncini and D. B. Szyld. Interpreting IDR as a Petrov-Galerkin method. SIAM Journal on Scientific Computing, pages 1898--1912, 2010.
[22]
P. Sonneveld and M. B. van Gijzen. Idr(s): A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations. SIAM Journal on Scientific Computing, 31(2):1035--1062, 2009.
[23]
S. Tabik, G. O. López, and E. M. Garzón. Performance evaluation of kernel fusion BLAS routines on the GPU: Iterative solvers as case study. The Journal of Supercomputing, 70(2):577--587, 2014.
[24]
M. B. Van Gijzen and P. Sonneveld. Algorithm 913: An elegant IDR(s) variant that efficiently exploits biorthogonality properties. ACM Trans. Math. Softw., 38(1):5:1--5:19, Dec. 2011.
[25]
G. Wang, Y. Lin, and W. Yi. Kernel fusion: An effective method for better power efficiency on multithreaded GPU. In Proceedings of the 2010 IEEE/ACM Int'L Conference on Green Computing and Communications & Int'L Conference on Cyber, Physical and Social Computing, GREENCOM-CPSCOM '10, pages 344--350, Washington, DC, USA, 2010. IEEE Computer Society.

Cited By

View all
  • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
  • (2018)Optimization and performance evaluation of the IDR iterative Krylov solver on GPUsInternational Journal of High Performance Computing Applications10.1177/109434201664684432:2(220-230)Online publication date: 1-Mar-2018
  • (2016)Efficiency of General Krylov Methods on GPUs -- An Experimental Study2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.45(683-691)Online publication date: May-2016

Index Terms

  1. GPU-accelerated co-design of induced dimension reduction: algorithmic fusion and kernel overlap

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    Co-HPC '15: Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing
    November 2015
    61 pages
    ISBN:9781450339926
    DOI:10.1145/2834899
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 November 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GPU
    2. co-design
    3. induced dimension reduction (IDR)
    4. kernel fusion
    5. kernel overlap

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SC15
    Sponsor:

    Acceptance Rates

    Co-HPC '15 Paper Acceptance Rate 7 of 13 submissions, 54%;
    Overall Acceptance Rate 7 of 13 submissions, 54%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)45
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 06 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Optimization Techniques for GPU ProgrammingACM Computing Surveys10.1145/357063855:11(1-81)Online publication date: 16-Mar-2023
    • (2018)Optimization and performance evaluation of the IDR iterative Krylov solver on GPUsInternational Journal of High Performance Computing Applications10.1177/109434201664684432:2(220-230)Online publication date: 1-Mar-2018
    • (2016)Efficiency of General Krylov Methods on GPUs -- An Experimental Study2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.45(683-691)Online publication date: May-2016

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media