research-article

Free access

OpenTuner: an extensible framework for program autotuning

Authors:

Kalyan Veeramachaneni,

Jonathan Ragan-Kelley,

Jeffrey Bosboom,

Una-May O'Reilly,

Saman AmarasingheAuthors Info & Claims

PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

Pages 303 - 316

https://doi.org/10.1145/2628071.2628092

Published: 24 August 2014 Publication History

Abstract

Program autotuning has been shown to achieve better or more portable performance in a number of domains. However, autotuners themselves are rarely portable between projects, for a number of reasons: using a domain-informed search space representation is critical to achieving good results; search spaces can be intractably large and require advanced machine learning techniques; and the landscape of search spaces can vary greatly between different problems, sometimes requiring domain specific search techniques to explore efficiently.

This paper introduces OpenTuner, a new open source framework for building domain-specific multi-objective program autotuners. OpenTuner supports fully-customizable configuration representations, an extensible technique representation to allow for domain-specific techniques, and an easy to use interface for communicating with the program to be autotuned. A key capability inside OpenTuner is the use of ensembles of disparate search techniques simultaneously; techniques that perform well will dynamically be allocated a larger proportion of tests. We demonstrate the efficacy and generality of OpenTuner by building autotuners for 7 distinct projects and 16 total benchmarks, showing speedups over prior techniques of these projects of up to 2.8x with little programmer effort.

References

[1]

F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'boyle, J. Thomson, M. Toussaint, and C. K. I. Williams, "Using machine learning to focus iterative optimization," in CGO'06, 2006, pp. 295--305.

Digital Library

[2]

L. Almagor, K. D. Cooper, A. Grosul, T. J. Harvey, S. W. Reeves, D. Subramanian, L. Torczon, and T. Waterman, "Finding effective compilation sequences." in LCTES'04, 2004, pp. 231--239.

Digital Library

[3]

J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe, "PetaBricks: A language and compiler for algorithmic choice," in PLDI, Dublin, Ireland, Jun 2009.

Digital Library

[4]

J. Ansel, M. Pacula, S. Amarasinghe, and U.-M. O'Reilly, "An efficient evolutionary algorithm for solving bottom up problems," in Annual Conference on Genetic and Evolutionary Computation, Dublin, Ireland, July 2011.

Digital Library

[5]

W. Baek and T. Chilimbi, "Green: A framework for supporting energy-conscious programming using controlled approximation," in PLDI, June 2010.

Digital Library

[6]

V. Bhat, M. Parashar, Hua Liu, M. Khandekar, N. Kandasamy, and S. Abdelwahed, "Enabling self-managing applications using model-based online control strategies," in International Conference on Autonomic Computing, Washington, DC, 2006.

Digital Library

[7]

C. Chan, J. Ansel, Y. L. Wong, S. Amarasinghe, and A. Edelman, "Autotuning multigrid with PetaBricks," in Supercomputing, Portland, OR, Nov 2009.

Digital Library

[8]

F. Chang and V. Karamcheti, "A framework for automatic adaptation of tunable distributed applications," Cluster Computing, vol. 4, March 2001.

Digital Library

[9]

M. Christen, O. Schenk, and H. Burkhart, "Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures." in IPDPS. IEEE, 2011.

Digital Library

[10]

J. J. Dongarra, P. Luszczek, and A. Petitet, "The LINPACK Benchmark: past, present and future," Concurrency and Computation: Practice and Experience, vol. 15, no. 9, pp. 803--820, 2003.

[11]

X. Fan, "Optimize your code: Matrix multiplication," https://tinyurl.com/kuvzbp9, 2009.

[12]

A. Fialho, L. Da Costa, M. Schoenauer, and M. Sebag, "Analyzing bandit-based adaptive operator selection mechanisms," Annals of Mathematics and Artificial Intelligence - Special Issue on Learning and Intelligent Optimization, 2010.

Digital Library

[13]

A. Fialho, R. Ros, M. Schoenauer, and M. Sebag, "Comparison-based adaptive strategy selection with bandits in differential evolution," in PPSN'10, ser. LNCS, R. S. et al., Ed., vol. 6238. Springer, September 2010.

Digital Library

[14]

M. Frigo and S. G. Johnson, "The design and implementation of FFTW3," IEEE, vol. 93, no. 2, February 2005.

[15]

G. Fursin, C. Miranda, O. Temam, M. Namolaru, E. Yom-Tov, A. Zaks, B. Mendelson, E. Bonilla, J. Thomson, H. Leather, C. Williams, M. O'Boyle, P. Barnard, E. Ashton, E. Courtois, and F. Bodin, "MILEPOST GCC: machine learning based research compiler," in Proceedings of the GCC Developers' Summit, Jul 2008.

[16]

H. Hoffmann, J. Eastep, M. D. Santambrogio, J. E. Miller, and A. Agarwal, "Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments," in ICAC, New York, NY, 2010.

Digital Library

[17]

H. Hoffmann, S. Misailovic, S. Sidiroglou, A. Agarwal, and M. Rinard, "Using code perforation to improve performance, reduce energy consumption, and respond to failures," Massachusetts Institute of Technology, Tech. Rep. MIT-CSAIL-TR-2209-042, Sep 2009.

[18]

H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard, "Power-aware computing with dynamic knobs," in ASPLOS, 2011.

[19]

H. Jordan, P. Thoman, J. J. Durillo, S. Pellegrini, P. Gschwandtner, T. Fahringer, and H. Moritsch, "A multi-objective auto-tuning framework for parallel codes," in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC '12, 2012.

Digital Library

[20]

S. Kamil, C. Chan, L. Oliker, J. Shalf, and S. Williams, "An auto-tuning framework for parallel multicore stencil computations," in IPDPS'10, 2010, pp. 1--12.

[21]

S. A. Kamil, "Productive high performance parallel programming with auto-tuned domain-specific embedded languages," Ph.D. dissertation, EECS Department, University of California, Berkeley, Jan 2013.

Digital Library

[22]

G. Karsai, A. Ledeczi, J. Sztipanovits, G. Peceli, G. Simon, and T. Kovacshazy, "An approach to self-adaptive software based on supervisory control," in International Workshop in Self-adaptive software, 2001.

Digital Library

[23]

T. Murphy VII, "The first level of Super Mario Bros. is easy with lexicographic orderings and time travel," April 2013.

[24]

K. Nordkvist, "Solving TSP with a genetic algorithm in C++," https://tinyurl.com/lq3uqlh, 2012.

[25]

M. Pacula, J. Ansel, S. Amarasinghe, and U.-M. O'Reilly, "Hyperparameter tuning in bandit-based adaptive operator selection," in European Conference on the Applications of Evolutionary Computation, Malaga, Spain, Apr 2012.

Digital Library

[26]

E. Park, L.-N. Pouche, J. Cavazos, A. Cohen, and P. Sadayappan, "Predictive modeling in a polyhedral optimization space," in CGO'11, April 2011, pp. 119--129.

Digital Library

[27]

S. Pixel, "3D Basic Lessons: Writing a simple raytracer," https://tinyurl.com/lp8ncnw, 2012.

[28]

M. Püschel, J. M. F. Moura, B. Singer, J. Xiong, J. R. Johnson, D. A. Padua, M. M. Veloso, and R. W. Johnson, "Spiral: A generator for platform-adapted libraries of signal processing alogorithms," IJHPCA, vol. 18, no. 1, 2004.

Digital Library

[29]

J. Ragan-Kelley, A. Adams, S. Paris, M. Levoy, S. Amarasinghe, and F. Durand, "Decoupling algorithms from schedules for easy optimization of image processing pipelines," ACM Trans. Graph., vol. 31, no. 4, pp. 32:1--32:12, Jul. 2012.

Digital Library

[30]

J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe, "Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines," in Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation, ser. PLDI '13. New York, NY, USA: ACM, 2013, pp. 519--530.

Digital Library

[31]

C. Tapus, I.-H. Chung, and J. K. Hollingsworth, "Active harmony: Towards automated performance tuning," in In Proceedings from the Conference on High Performance Networking and Computing, 2003.

[32]

Top500, "Top 500 supercomputer sites," http://www.top500.org/, 2010.

[33]

R. Vuduc, J. W. Demmel, and K. A. Yelick, "OSKI: A library of automatically tuned sparse matrix kernels," in Scientific Discovery through Advanced Computing Conference, San Francisco, CA, June 2005.

[34]

R. C. Whaley and J. J. Dongarra, "Automatically tuned linear algebra software," in Supercomputing, Washington, DC, 1998.

Digital Library

[35]

S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta, "The SPLASH-2 programs: characterization and methodological considerations," in Symposium on Computer Architecture News, June 1995.

Digital Library

Cited By

Verma GRaskar SEmani MChapman B(2024)Cross-Feature Transfer Learning for Efficient Tensor Program GenerationApplied Sciences10.3390/app1402051314:2(513)Online publication date: 6-Jan-2024
https://doi.org/10.3390/app14020513
Koliogeorgi KAnagnostopoulos GZampino GSanchis MVinuesa RXydis S(2024)Auto-tuning Multi-GPU High-Fidelity Numerical Simulations for Urban Air Mobility2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546549(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546549
Lao JWang YLi YWang JZhang YCheng ZChen WTang MWang J(2024)GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian OptimizationProceedings of the VLDB Endowment10.14778/3659437.365944917:8(1939-1952)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.14778/3659437.3659449
Show More Cited By

Index Terms

OpenTuner: an extensible framework for program autotuning
1. Software and its engineering
  1. Software notations and tools

Recommendations

clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs
ICS '12: Proceedings of the 26th ACM international conference on Supercomputing

Sparse matrix vector multiplication (SpMV) kernel is a key computation in linear algebra. Most iterative methods are composed of SpMV operations with BLAS1 updates. Therefore, researchers make extensive efforts to optimize the SpMV kernel in sparse ...
An Autotuning Protocol to Rapidly Build Autotuners

Automatic performance tuning (Autotuning) is an increasingly critical tuning technique for the high portable performance of Exascale applications. However, constructing an autotuner from scratch remains a challenge, even for domain experts. In this work,...
Optimizing stencil application on multi-thread GPU architecture using stream programming model
ARCS'10: Proceedings of the 23rd international conference on Architecture of Computing Systems

With fast development of GPU hardware and software, using GPUs to accelerate non-graphics CPU applications is becoming inevitable trend. GPUs are good at performing ALU-intensive computation and feature high peak performance; however, how to harness ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

August 2014

514 pages

ISBN:9781450328098

DOI:10.1145/2628071

General Chair:
J. Nelson Amaral
University of Alberta, Canada
,
Program Chair:
Josep Torrellas
University of Illinois, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFIP WG 10.3: IFIP WG 10.3
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE CS TCPP: IEEE Computer Society Technical Committee on Parallel Processing
IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

PACT '14

Sponsor:

IFIP WG 10.3
SIGARCH
IEEE CS TCPP
IEEE CS TCAA

PACT '14: International Conference on Parallel Architectures and Compilation

August 24 - 27, 2014

AB, Edmonton, Canada

Acceptance Rates

PACT '14 Paper Acceptance Rate 54 of 144 submissions, 38%;

Overall Acceptance Rate 121 of 471 submissions, 26%

Upcoming Conference

PACT '24

Sponsor:
sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 13 - 16, 2024

Long Beach , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

386
Total Citations
View Citations
3,106
Total Downloads

Downloads (Last 12 months)639
Downloads (Last 6 weeks)78

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Verma GRaskar SEmani MChapman B(2024)Cross-Feature Transfer Learning for Efficient Tensor Program GenerationApplied Sciences10.3390/app1402051314:2(513)Online publication date: 6-Jan-2024
https://doi.org/10.3390/app14020513
Koliogeorgi KAnagnostopoulos GZampino GSanchis MVinuesa RXydis S(2024)Auto-tuning Multi-GPU High-Fidelity Numerical Simulations for Urban Air Mobility2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546549(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546549
Lao JWang YLi YWang JZhang YCheng ZChen WTang MWang J(2024)GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian OptimizationProceedings of the VLDB Endowment10.14778/3659437.365944917:8(1939-1952)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.14778/3659437.3659449
Cantini RMarozzo FOrsino ATalia DTrunfio PBadia REjarque JVázquez-Novoa F(2024)Block size estimation for data partitioning in HPC applications using machine learning techniquesJournal of Big Data10.1186/s40537-023-00862-w11:1Online publication date: 16-Jan-2024
https://doi.org/10.1186/s40537-023-00862-w
Rasch A(2024)(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional HomomorphismsACM Transactions on Programming Languages and Systems10.1145/3665643Online publication date: 22-May-2024
https://doi.org/10.1145/3665643
Babalad SShevade SThazhuthaveetil MGovindarajan R(2024)Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core ArchitecturesProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656630(388-399)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656630
Li CXu YSaravani SSadayappan P(2024)Accelerated Auto-Tuning of GPU Kernels for Tensor ComputationsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656626(549-561)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3650200.3656626
Ranawaka PAzhar MStenstrom P(2024)DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN AcceleratorsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649196(126-137)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3649153.3649196
Burgstaller TGarber DLe VFelfernig A(2024)Optimization Space Learning: A Lightweight, Noniterative Technique for Compiler AutotuningProceedings of the 28th ACM International Systems and Software Product Line Conference10.1145/3646548.3672588(36-46)Online publication date: 2-Sep-2024
https://dl.acm.org/doi/10.1145/3646548.3672588
Zhao JXu JDi PNie WHu JYi YYang SGeng ZZhang RLi BGan ZJin X(2024)Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine RelationsACM Transactions on Computer Systems10.1145/363530541:1-4(1-45)Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1145/3635305
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents