Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/CGO.2013.6494992acmconferencesArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Runtime dependence computation and execution of loops on heterogeneous systems

Published: 23 February 2013 Publication History
  • Get Citation Alerts
  • Abstract

    GPUs have been used for parallel execution of DOALL loops. However, loops with indirect array references can potentially cause cross iteration dependences which are hard to detect using existing compilation techniques. Applications with such loops cannot easily use the GPU and hence do not benefit from the tremendous compute capabilities of GPUs. In this paper, we present an algorithm to compute at runtime the cross iteration dependences in such loops. The algorithm uses both the CPU and the GPU to compute the dependences. Specifically, it effectively uses the compute capabilities of the GPU to quickly collect the memory accesses performed by the iterations by executing the slice functions generated for the indirect array accesses. Using the dependence information, the loop iterations are levelized such that each level contains independent iterations which can be executed in parallel. Another interesting aspect of the proposed solution is that it pipelines the dependence computation of the future level with the actual computation of the current level to effectively utilize the resources available in the GPU. We use NVIDIA Tesla C2070 to evaluate our implementation using benchmarks from Polybench suite and some synthetic benchmarks. Our experiments show that the proposed technique can achieve an average speedup of 6.4x on loops with a reasonable number of cross iteration dependences.

    References

    [1]
    M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, P. Sadayappan. A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs. In ICS, 2008.
    [2]
    M. Baskaran, J. Ramanujam, P. Sadayappan. Automatic C-to-CUDA code generation for affine programs. In CC, 2010.
    [3]
    B. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoeflinger, D. Padua, P. Petersen, B. Pottenger, L. Rauchwerger, P. Tu, S. Weatherford. Polaris: The Next Generation in Parallelizing Compilers. In LCPC, 1994.
    [4]
    D. K. Chen, P. C. Yew, J. Torellas. An efficient algorithm for the run time parallelization of doacross loops. In Supercomputing, 1994.
    [5]
    J. Saltz, R. Mirchandaney, K. Crowley. Run-time parallelization and scheduling of loops. IEEE Trans. Computers, 1991.
    [6]
    M. Feng, R. Gupta, L. N. Bhuyan. Speculative Parallelization on GPGPUs. In PPoPP, 2012.
    [7]
    W W. L. Fung, I. Singh, A. Brownsword, T M. Aamodt Hardware Transactional Memory for GPU Architectures. In MICRO, 2011.
    [8]
    H. Kim, N. P. Johnson, J. W. Lee, S. A. Mahlke, D. I. August. Automatic Speculative DOALL for Clusters. In CGO, 2012.
    [9]
    S. Lee, S. J. Min. R. Eigenmann. OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In PPoPP, 2009.
    [10]
    E. Lindholm, J. Nickolls, S. Oberman, J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, 2008.
    [11]
    M. Kim, H. Kim, C Luk. SD3: A Scalable Approach to Dynamic Data-Dependence Profiling. In MICRO, 2010.
    [12]
    L. N. Pouchet. The Polyhendral Benchmark suite. http://www.cse.ohio-state.edu/~pouchet/software/polybench.
    [13]
    NVIDIA Corp, NVIDIA CUDA: Compute Unified Device Architecture: Programming Guide, Version 4.2, 2012.
    [14]
    NVIDIA Corp, Fermi Compute Architecture White Paper.
    [15]
    L. Rauchwerger, N. Amato, D. Padua. A scalable method to runtime loop parallelism. In IJPP, July 1995.
    [16]
    M. Samadi, A. Horamati, J. Lee, S. Mahlke. Paragon: Collaborative Speculative Loop Execution on GPU and CPU. GPGPU-5 2012.
    [17]
    Stanford Compiler Group. SUIF: A parallelizing and optimizing research compiler. Technical Report CSL-TR-94-620, Stanford University, Computer Systems Laboratory, 1994.
    [18]
    H. Yu, Z. Li, Multi-slicing: A Compiler-Supported Parallel Approach to Data Dependence Profiling, In ISSTA, 2012.
    [19]
    E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, X. Shen On-the-Fly Elimination of Dynamic Irregularities for GPU Computing, In ASPLOS, 2011.
    [20]
    C. Zhu, P. C. Yew. A scheme to enfore data dependence on large multiprocessor systems. IEEE Trans. Software Engineering, June 1987.
    [21]
    X. Zhuang, A. E. Eichenberger, Y. Luo, Kevin O'Brien, Kathryn O'Brien. Exploiting Parallelism with Dependence-Aware Scheduling, In PACT, 2009.

    Cited By

    View all
    • (2023)Runtime Composition of Iterations for Fusing Loop-carried Sparse DependenceProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607097(1-15)Online publication date: 12-Nov-2023
    • (2022)Optimizing sparse computations jointlyProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508439(459-460)Online publication date: 2-Apr-2022
    • (2020)MatRoxProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374548(389-402)Online publication date: 19-Feb-2020
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CGO '13: Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
    February 2013
    366 pages
    ISBN:9781467355247

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 23 February 2013

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Runtime Composition of Iterations for Fusing Loop-carried Sparse DependenceProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607097(1-15)Online publication date: 12-Nov-2023
    • (2022)Optimizing sparse computations jointlyProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508439(459-460)Online publication date: 2-Apr-2022
    • (2020)MatRoxProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374548(389-402)Online publication date: 19-Feb-2020
    • (2018)ParSyProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291739(1-15)Online publication date: 11-Nov-2018
    • (2018)JugglerACM SIGPLAN Notices10.1145/3200691.317849253:1(54-67)Online publication date: 10-Feb-2018
    • (2018)JugglerProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178492(54-67)Online publication date: 10-Feb-2018
    • (2018)ParSyProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00065(1-15)Online publication date: 11-Nov-2018
    • (2016)Automating wavefront parallelization for sparse matrix computationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014959(1-12)Online publication date: 13-Nov-2016
    • (2015)Distributed memory code generation for mixed Irregular/Regular computationsACM SIGPLAN Notices10.1145/2858788.268851550:8(65-75)Online publication date: 24-Jan-2015
    • (2015)Distributed memory code generation for mixed Irregular/Regular computationsProceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2688500.2688515(65-75)Online publication date: 24-Jan-2015
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media