research-article

Starchart: hardware and software optimization using recursive partitioning regression trees

Authors:

Margaret MartonosiAuthors Info & Claims

PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Pages 257 - 268

Published: 07 October 2013 Publication History

Abstract

Graphics processing units (GPUs) are in increasingly wide use, but significant hurdles lie in selecting the appropriate algorithms, runtime parameter settings, and hardware configurations to achieve power and performance goals with them. Exploring hardware and software choices requires time-consuming simulations or extensive real-system measurements. While some auto-tuning support has been proposed, it is often narrow in scope and heuristic in operation. This paper proposes and evaluates a statistical analysis technique, Starchart, that partitions the GPU hardware/software tuning space by automatically discerning important inflection points in design parameter values. Unlike prior methods, Starchart can identify the best parameter choices within different regions of the space. Our tool is efficient--evaluating at most 0.3% of the tuning space, and often much less--and is robust enough to analyze highly variable real-system measurements, not just simulation. In one case study, we use it to automatically find platform-specific parameter settings that are 6.3X faster (for AMD) and 1.3X faster (for NVIDIA) than a single general setting. We also show how power-optimized parameter settings can save 47 W (26% of total GPU power) with little performance loss. Overall, Starchart can serve as a foundation for a range of GPU compiler optimizations, auto-tuners, and programmer tools. Furthermore, because Starchart does not rely on specific GPU features, we expect it to be useful for broader CPU/GPU studies as well.

References

[1]

AMD APP Profiler User Guide, AMD Inc.

[2]

J. Bergstra et al., "Machine learning for predictive auto-tuning with boosted regression trees," in Innovative Parallel Computing, 2012.

[3]

S. Che et al., "Rodinia: A benchmark suite for heterogeneous computing," in Proc. IEEE Intl. Symp. Workload Characterization, 2009.

Digital Library

[4]

J. Chen et al., "Tree structured analysis on GPU power study," in IEEE 29th Intl. Conf. Computer Design, 2011.

Digital Library

[5]

J. W. Choi et al., "Model-driven autotuning of sparse matrix-vector multiply on GPUs," in Proc. 13th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, 2010.

Digital Library

[6]

K. Datta et al., "Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures," in Proc. 2008 ACM/IEEE Conf. Supercomputing, 2008.

Digital Library

[7]

Y. Dotsenko et al., "Auto-tuning of Fast Fourier Transform on Graphics Processors," in Proc. 14th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, 2011.

Digital Library

[8]

A. Ganapathi et al., "A case for machine learning to optimize multicore performance," in Proc. 1st USENIX Conf. Hot Topics in Parallelism, 2009.

Digital Library

[9]

S. Hong and H. Kim, "An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness," in Proc. 36th Ann. Intl. Symp. Computer Architecture, 2009.

Digital Library

[10]

S. Hong and H. Kim, "An integrated GPU power and performance model," in Intl. Symp. on Computer Architecture, 2010.

Digital Library

[11]

W. Jia et al., "Stargazer: Automated regression-based GPU design space exploration," in Proc. IEEE Intl. Symp. Performance Analysis of Systems and Software, 2012.

Digital Library

[12]

P. J. Joseph et al., "Construction and use of linear regression models for processor performance analysis," in Proc. Intl. Symp. High-Performance Computer Architecture, 2006.

[13]

M. H. Kutner et al., Applied Linear Statistical Models, 5th ed. McGraw-Hill/Irwin, 2005.

[14]

B. C. Lee and D. M. Brooks, "Accurate and efficient regression modeling for microarchitectural performance and power prediction," in Proc. 12th Intl. Conf. Architectural Support for Programming Languages and Operating Systems, 2006.

Digital Library

[15]

Y. Li et al., "A note on auto-tuning GEMM for GPUs," in Proc. 9th Intl. Conf. Computational Science, 2009.

Digital Library

[16]

W.-Y. Loh, "Classification and regression tree methods," in Encyclopedia of Statistics in Quality and Reliability, F. Ruggeri et al.,Eds. Wiley, 2008, pp. 315--323.

[17]

H. Nagasaka et al., "Statistical power modeling of GPU kernels using performance counters," in 2010 Intl. Green Computing Conf., 2010.

Digital Library

[18]

Compute Command Line Profiler User Guide, NVIDIA.

[19]

GPU Computing SDK, NVIDIA, {Online} http://developer.nvidia.com/cuda/gpu-computing-sdk.

[20]

S. Ryoo et al., "Program optimization space pruning for a multithreaded GPU," in Proc. 6th Ann. IEEE/ACM Intl. Symp. Code Generation and Optimization, 2008.

Digital Library

[21]

J. Sim et al., "A performance analysis framework for identifying potential benefits in GPGPU applications," in Proc. 15th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, 2012.

Digital Library

[22]

S. Triantafyllis et al., "Compiler optimization-space exploration," in Proc. Intl. Symp. Code Generation and Optimization, 2003.

Digital Library

Cited By

Goens ABrauckmann AErtel SCummins CLeather HCastrillon JMattson TMuzahid ASolar-Lezama A(2019)A case study on machine learning for synthesizing benchmarksProceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages10.1145/3315508.3329976(38-46)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3315508.3329976
Cui XFeng WPalumbo FBecchi MSchulz MSato K(2019)Iterative machine learning (IterML) for effective parameter pruning and tuning in acceleratorsProceedings of the 16th ACM International Conference on Computing Frontiers10.1145/3310273.3321563(16-23)Online publication date: 30-Apr-2019
https://dl.acm.org/doi/10.1145/3310273.3321563
O'Neal KBrisk PShriver EKishinevsky M(2019)Hardware-Assisted Cross-Generation Prediction of GPUs Under DesignIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.283439838:6(1133-1146)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1109/TCAD.2018.2834398
Show More Cited By

Index Terms

Starchart: hardware and software optimization using recursive partitioning regression trees
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes ...
CLBlast: A Tuned OpenCL BLAS Library
IWOCL '18: Proceedings of the International Workshop on OpenCL

This work introduces CLBlast, an open-source BLAS library providing optimized OpenCL routines to accelerate dense linear algebra for a wide variety of devices. It is targeted at machine learning and HPC applications and thus provides a fast matrix-...
Developing High-Performance, Portable OpenCL Code via Multi-Dimensional Homomorphisms
IWOCL '19: Proceedings of the International Workshop on OpenCL

A key challenge in programming high-performance applications is achieving portable performance, such that the same program code can reach a consistent level of performance over the variety of modern parallel processors, including multi-core CPU and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

October 2013

422 pages

ISBN:9781479910212

Conference Chair:
Christian Fensch
University of Edinburgh, UK
,
General Chair:
Michael O'Boyle
University of Edinburgh, UK
,
Program Chairs:
André Seznec
INRIA Rennes, France
,
François Bodin
IRISA/CAPS Entreprise, France

Sponsors

IFIP WG 10.3: IFIP WG 10.3
IEEE TCCA: IEEE Computer Society Technical Committee on Computer Architecture
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE CS TCPP: IEEE Computer Society Technical Committee on Parallel Processing

Publisher

IEEE Press

Publication History

Published: 07 October 2013

Check for updates

Author Tags

Qualifiers

Research-article

Acceptance Rates

PACT '13 Paper Acceptance Rate 36 of 208 submissions, 17%;

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
225
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Goens ABrauckmann AErtel SCummins CLeather HCastrillon JMattson TMuzahid ASolar-Lezama A(2019)A case study on machine learning for synthesizing benchmarksProceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages10.1145/3315508.3329976(38-46)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3315508.3329976
Cui XFeng WPalumbo FBecchi MSchulz MSato K(2019)Iterative machine learning (IterML) for effective parameter pruning and tuning in acceleratorsProceedings of the 16th ACM International Conference on Computing Frontiers10.1145/3310273.3321563(16-23)Online publication date: 30-Apr-2019
https://dl.acm.org/doi/10.1145/3310273.3321563
O'Neal KBrisk PShriver EKishinevsky M(2019)Hardware-Assisted Cross-Generation Prediction of GPUs Under DesignIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.283439838:6(1133-1146)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1109/TCAD.2018.2834398
Dutta BAdhinarayanan VFeng WKaeli DPericàs M(2018)GPU power prediction via ensemble machine learning for DVFS space explorationProceedings of the 15th ACM International Conference on Computing Frontiers10.1145/3203217.3203273(240-243)Online publication date: 8-May-2018
https://dl.acm.org/doi/10.1145/3203217.3203273
O'Neal KBrisk PShriver EKishinevsky M(2017)HALWPEProceedings of the 54th Annual Design Automation Conference 201710.1145/3061639.3062257(1-6)Online publication date: 18-Jun-2017
https://dl.acm.org/doi/10.1145/3061639.3062257
Beaugnon UPouille APouzet MPienaar JCohen AWu PHack S(2017)Optimization space pruning without regretsProceedings of the 26th International Conference on Compiler Construction10.1145/3033019.3033023(34-44)Online publication date: 5-Feb-2017
https://dl.acm.org/doi/10.1145/3033019.3033023
Che SBeckmann BReinhardt S(2017)Programming GPGPU Graph Applications with Linear Algebra Building BlocksInternational Journal of Parallel Programming10.1007/s10766-016-0448-z45:3(657-679)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1007/s10766-016-0448-z
Ardalani NLestourgeon CSankaralingam KZhu XPrvulovic M(2015)Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performanceProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830780(725-737)Online publication date: 5-Dec-2015
https://dl.acm.org/doi/10.1145/2830772.2830780
Jia WGarza EShaw KMartonosi M(2015)GPU Performance and Power Tuning Using Regression TreesACM Transactions on Architecture and Code Optimization10.1145/273628712:2(1-26)Online publication date: 11-May-2015
https://dl.acm.org/doi/10.1145/2736287
Magni ADubach CO'Boyle MAmaral JTorrellas J(2014)Automatic optimization of thread-coarsening for graphics processorsProceedings of the 23rd international conference on Parallel architectures and compilation10.1145/2628071.2628087(455-466)Online publication date: 24-Aug-2014
https://dl.acm.org/doi/10.1145/2628071.2628087

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents