Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2523721.2523757acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Starchart: hardware and software optimization using recursive partitioning regression trees

Published: 07 October 2013 Publication History

Abstract

Graphics processing units (GPUs) are in increasingly wide use, but significant hurdles lie in selecting the appropriate algorithms, runtime parameter settings, and hardware configurations to achieve power and performance goals with them. Exploring hardware and software choices requires time-consuming simulations or extensive real-system measurements. While some auto-tuning support has been proposed, it is often narrow in scope and heuristic in operation. This paper proposes and evaluates a statistical analysis technique, Starchart, that partitions the GPU hardware/software tuning space by automatically discerning important inflection points in design parameter values. Unlike prior methods, Starchart can identify the best parameter choices within different regions of the space. Our tool is efficient--evaluating at most 0.3% of the tuning space, and often much less--and is robust enough to analyze highly variable real-system measurements, not just simulation. In one case study, we use it to automatically find platform-specific parameter settings that are 6.3X faster (for AMD) and 1.3X faster (for NVIDIA) than a single general setting. We also show how power-optimized parameter settings can save 47 W (26% of total GPU power) with little performance loss. Overall, Starchart can serve as a foundation for a range of GPU compiler optimizations, auto-tuners, and programmer tools. Furthermore, because Starchart does not rely on specific GPU features, we expect it to be useful for broader CPU/GPU studies as well.

References

[1]
AMD APP Profiler User Guide, AMD Inc.
[2]
J. Bergstra et al., "Machine learning for predictive auto-tuning with boosted regression trees," in Innovative Parallel Computing, 2012.
[3]
S. Che et al., "Rodinia: A benchmark suite for heterogeneous computing," in Proc. IEEE Intl. Symp. Workload Characterization, 2009.
[4]
J. Chen et al., "Tree structured analysis on GPU power study," in IEEE 29th Intl. Conf. Computer Design, 2011.
[5]
J. W. Choi et al., "Model-driven autotuning of sparse matrix-vector multiply on GPUs," in Proc. 13th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, 2010.
[6]
K. Datta et al., "Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures," in Proc. 2008 ACM/IEEE Conf. Supercomputing, 2008.
[7]
Y. Dotsenko et al., "Auto-tuning of Fast Fourier Transform on Graphics Processors," in Proc. 14th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, 2011.
[8]
A. Ganapathi et al., "A case for machine learning to optimize multicore performance," in Proc. 1st USENIX Conf. Hot Topics in Parallelism, 2009.
[9]
S. Hong and H. Kim, "An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness," in Proc. 36th Ann. Intl. Symp. Computer Architecture, 2009.
[10]
S. Hong and H. Kim, "An integrated GPU power and performance model," in Intl. Symp. on Computer Architecture, 2010.
[11]
W. Jia et al., "Stargazer: Automated regression-based GPU design space exploration," in Proc. IEEE Intl. Symp. Performance Analysis of Systems and Software, 2012.
[12]
P. J. Joseph et al., "Construction and use of linear regression models for processor performance analysis," in Proc. Intl. Symp. High-Performance Computer Architecture, 2006.
[13]
M. H. Kutner et al., Applied Linear Statistical Models, 5th ed. McGraw-Hill/Irwin, 2005.
[14]
B. C. Lee and D. M. Brooks, "Accurate and efficient regression modeling for microarchitectural performance and power prediction," in Proc. 12th Intl. Conf. Architectural Support for Programming Languages and Operating Systems, 2006.
[15]
Y. Li et al., "A note on auto-tuning GEMM for GPUs," in Proc. 9th Intl. Conf. Computational Science, 2009.
[16]
W.-Y. Loh, "Classification and regression tree methods," in Encyclopedia of Statistics in Quality and Reliability, F. Ruggeri et al.,Eds. Wiley, 2008, pp. 315--323.
[17]
H. Nagasaka et al., "Statistical power modeling of GPU kernels using performance counters," in 2010 Intl. Green Computing Conf., 2010.
[18]
Compute Command Line Profiler User Guide, NVIDIA.
[19]
GPU Computing SDK, NVIDIA, {Online} http://developer.nvidia.com/cuda/gpu-computing-sdk.
[20]
S. Ryoo et al., "Program optimization space pruning for a multithreaded GPU," in Proc. 6th Ann. IEEE/ACM Intl. Symp. Code Generation and Optimization, 2008.
[21]
J. Sim et al., "A performance analysis framework for identifying potential benefits in GPGPU applications," in Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, 2012.
[22]
S. Triantafyllis et al., "Compiler optimization-space exploration," in Proc. Intl. Symp. Code Generation and Optimization, 2003.

Cited By

View all
  • (2019)A case study on machine learning for synthesizing benchmarksProceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages10.1145/3315508.3329976(38-46)Online publication date: 22-Jun-2019
  • (2019)Iterative machine learning (IterML) for effective parameter pruning and tuning in acceleratorsProceedings of the 16th ACM International Conference on Computing Frontiers10.1145/3310273.3321563(16-23)Online publication date: 30-Apr-2019
  • (2019)Hardware-Assisted Cross-Generation Prediction of GPUs Under DesignIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.283439838:6(1133-1146)Online publication date: 1-Jun-2019
  • Show More Cited By

Index Terms

  1. Starchart: hardware and software optimization using recursive partitioning regression trees

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
    October 2013
    422 pages
    ISBN:9781479910212

    Sponsors

    Publisher

    IEEE Press

    Publication History

    Published: 07 October 2013

    Check for updates

    Author Tags

    1. GPU
    2. auto-tuning
    3. decision tree
    4. design space exploration
    5. regression tree

    Qualifiers

    • Research-article

    Acceptance Rates

    PACT '13 Paper Acceptance Rate 36 of 208 submissions, 17%;
    Overall Acceptance Rate 121 of 471 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)A case study on machine learning for synthesizing benchmarksProceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages10.1145/3315508.3329976(38-46)Online publication date: 22-Jun-2019
    • (2019)Iterative machine learning (IterML) for effective parameter pruning and tuning in acceleratorsProceedings of the 16th ACM International Conference on Computing Frontiers10.1145/3310273.3321563(16-23)Online publication date: 30-Apr-2019
    • (2019)Hardware-Assisted Cross-Generation Prediction of GPUs Under DesignIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.283439838:6(1133-1146)Online publication date: 1-Jun-2019
    • (2018)GPU power prediction via ensemble machine learning for DVFS space explorationProceedings of the 15th ACM International Conference on Computing Frontiers10.1145/3203217.3203273(240-243)Online publication date: 8-May-2018
    • (2017)HALWPEProceedings of the 54th Annual Design Automation Conference 201710.1145/3061639.3062257(1-6)Online publication date: 18-Jun-2017
    • (2017)Optimization space pruning without regretsProceedings of the 26th International Conference on Compiler Construction10.1145/3033019.3033023(34-44)Online publication date: 5-Feb-2017
    • (2017)Programming GPGPU Graph Applications with Linear Algebra Building BlocksInternational Journal of Parallel Programming10.1007/s10766-016-0448-z45:3(657-679)Online publication date: 1-Jun-2017
    • (2015)Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performanceProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830780(725-737)Online publication date: 5-Dec-2015
    • (2015)GPU Performance and Power Tuning Using Regression TreesACM Transactions on Architecture and Code Optimization10.1145/273628712:2(1-26)Online publication date: 11-May-2015
    • (2014)Automatic optimization of thread-coarsening for graphics processorsProceedings of the 23rd international conference on Parallel architectures and compilation10.1145/2628071.2628087(455-466)Online publication date: 24-Aug-2014

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media