Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2628071.2628092acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article
Free access

OpenTuner: an extensible framework for program autotuning

Published: 24 August 2014 Publication History

Abstract

Program autotuning has been shown to achieve better or more portable performance in a number of domains. However, autotuners themselves are rarely portable between projects, for a number of reasons: using a domain-informed search space representation is critical to achieving good results; search spaces can be intractably large and require advanced machine learning techniques; and the landscape of search spaces can vary greatly between different problems, sometimes requiring domain specific search techniques to explore efficiently.
This paper introduces OpenTuner, a new open source framework for building domain-specific multi-objective program autotuners. OpenTuner supports fully-customizable configuration representations, an extensible technique representation to allow for domain-specific techniques, and an easy to use interface for communicating with the program to be autotuned. A key capability inside OpenTuner is the use of ensembles of disparate search techniques simultaneously; techniques that perform well will dynamically be allocated a larger proportion of tests. We demonstrate the efficacy and generality of OpenTuner by building autotuners for 7 distinct projects and 16 total benchmarks, showing speedups over prior techniques of these projects of up to 2.8x with little programmer effort.

References

[1]
F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'boyle, J. Thomson, M. Toussaint, and C. K. I. Williams, "Using machine learning to focus iterative optimization," in CGO'06, 2006, pp. 295--305.
[2]
L. Almagor, K. D. Cooper, A. Grosul, T. J. Harvey, S. W. Reeves, D. Subramanian, L. Torczon, and T. Waterman, "Finding effective compilation sequences." in LCTES'04, 2004, pp. 231--239.
[3]
J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe, "PetaBricks: A language and compiler for algorithmic choice," in PLDI, Dublin, Ireland, Jun 2009.
[4]
J. Ansel, M. Pacula, S. Amarasinghe, and U.-M. O'Reilly, "An efficient evolutionary algorithm for solving bottom up problems," in Annual Conference on Genetic and Evolutionary Computation, Dublin, Ireland, July 2011.
[5]
W. Baek and T. Chilimbi, "Green: A framework for supporting energy-conscious programming using controlled approximation," in PLDI, June 2010.
[6]
V. Bhat, M. Parashar, Hua Liu, M. Khandekar, N. Kandasamy, and S. Abdelwahed, "Enabling self-managing applications using model-based online control strategies," in International Conference on Autonomic Computing, Washington, DC, 2006.
[7]
C. Chan, J. Ansel, Y. L. Wong, S. Amarasinghe, and A. Edelman, "Autotuning multigrid with PetaBricks," in Supercomputing, Portland, OR, Nov 2009.
[8]
F. Chang and V. Karamcheti, "A framework for automatic adaptation of tunable distributed applications," Cluster Computing, vol. 4, March 2001.
[9]
M. Christen, O. Schenk, and H. Burkhart, "Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures." in IPDPS. IEEE, 2011.
[10]
J. J. Dongarra, P. Luszczek, and A. Petitet, "The LINPACK Benchmark: past, present and future," Concurrency and Computation: Practice and Experience, vol. 15, no. 9, pp. 803--820, 2003.
[11]
X. Fan, "Optimize your code: Matrix multiplication," https://tinyurl.com/kuvzbp9, 2009.
[12]
A. Fialho, L. Da Costa, M. Schoenauer, and M. Sebag, "Analyzing bandit-based adaptive operator selection mechanisms," Annals of Mathematics and Artificial Intelligence - Special Issue on Learning and Intelligent Optimization, 2010.
[13]
A. Fialho, R. Ros, M. Schoenauer, and M. Sebag, "Comparison-based adaptive strategy selection with bandits in differential evolution," in PPSN'10, ser. LNCS, R. S. et al., Ed., vol. 6238. Springer, September 2010.
[14]
M. Frigo and S. G. Johnson, "The design and implementation of FFTW3," IEEE, vol. 93, no. 2, February 2005.
[15]
G. Fursin, C. Miranda, O. Temam, M. Namolaru, E. Yom-Tov, A. Zaks, B. Mendelson, E. Bonilla, J. Thomson, H. Leather, C. Williams, M. O'Boyle, P. Barnard, E. Ashton, E. Courtois, and F. Bodin, "MILEPOST GCC: machine learning based research compiler," in Proceedings of the GCC Developers' Summit, Jul 2008.
[16]
H. Hoffmann, J. Eastep, M. D. Santambrogio, J. E. Miller, and A. Agarwal, "Application heartbeats: a generic interface for specifying program performance and goals in autonomous computing environments," in ICAC, New York, NY, 2010.
[17]
H. Hoffmann, S. Misailovic, S. Sidiroglou, A. Agarwal, and M. Rinard, "Using code perforation to improve performance, reduce energy consumption, and respond to failures," Massachusetts Institute of Technology, Tech. Rep. MIT-CSAIL-TR-2209-042, Sep 2009.
[18]
H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard, "Power-aware computing with dynamic knobs," in ASPLOS, 2011.
[19]
H. Jordan, P. Thoman, J. J. Durillo, S. Pellegrini, P. Gschwandtner, T. Fahringer, and H. Moritsch, "A multi-objective auto-tuning framework for parallel codes," in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC '12, 2012.
[20]
S. Kamil, C. Chan, L. Oliker, J. Shalf, and S. Williams, "An auto-tuning framework for parallel multicore stencil computations," in IPDPS'10, 2010, pp. 1--12.
[21]
S. A. Kamil, "Productive high performance parallel programming with auto-tuned domain-specific embedded languages," Ph.D. dissertation, EECS Department, University of California, Berkeley, Jan 2013.
[22]
G. Karsai, A. Ledeczi, J. Sztipanovits, G. Peceli, G. Simon, and T. Kovacshazy, "An approach to self-adaptive software based on supervisory control," in International Workshop in Self-adaptive software, 2001.
[23]
T. Murphy VII, "The first level of Super Mario Bros. is easy with lexicographic orderings and time travel," April 2013.
[24]
K. Nordkvist, "Solving TSP with a genetic algorithm in C++," https://tinyurl.com/lq3uqlh, 2012.
[25]
M. Pacula, J. Ansel, S. Amarasinghe, and U.-M. O'Reilly, "Hyperparameter tuning in bandit-based adaptive operator selection," in European Conference on the Applications of Evolutionary Computation, Malaga, Spain, Apr 2012.
[26]
E. Park, L.-N. Pouche, J. Cavazos, A. Cohen, and P. Sadayappan, "Predictive modeling in a polyhedral optimization space," in CGO'11, April 2011, pp. 119--129.
[27]
S. Pixel, "3D Basic Lessons: Writing a simple raytracer," https://tinyurl.com/lp8ncnw, 2012.
[28]
M. Püschel, J. M. F. Moura, B. Singer, J. Xiong, J. R. Johnson, D. A. Padua, M. M. Veloso, and R. W. Johnson, "Spiral: A generator for platform-adapted libraries of signal processing alogorithms," IJHPCA, vol. 18, no. 1, 2004.
[29]
J. Ragan-Kelley, A. Adams, S. Paris, M. Levoy, S. Amarasinghe, and F. Durand, "Decoupling algorithms from schedules for easy optimization of image processing pipelines," ACM Trans. Graph., vol. 31, no. 4, pp. 32:1--32:12, Jul. 2012.
[30]
J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe, "Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines," in Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation, ser. PLDI '13. New York, NY, USA: ACM, 2013, pp. 519--530.
[31]
C. Tapus, I.-H. Chung, and J. K. Hollingsworth, "Active harmony: Towards automated performance tuning," in In Proceedings from the Conference on High Performance Networking and Computing, 2003.
[32]
Top500, "Top 500 supercomputer sites," http://www.top500.org/, 2010.
[33]
R. Vuduc, J. W. Demmel, and K. A. Yelick, "OSKI: A library of automatically tuned sparse matrix kernels," in Scientific Discovery through Advanced Computing Conference, San Francisco, CA, June 2005.
[34]
R. C. Whaley and J. J. Dongarra, "Automatically tuned linear algebra software," in Supercomputing, Washington, DC, 1998.
[35]
S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta, "The SPLASH-2 programs: characterization and methodological considerations," in Symposium on Computer Architecture News, June 1995.

Cited By

View all
  • (2024)Cross-Feature Transfer Learning for Efficient Tensor Program GenerationApplied Sciences10.3390/app1402051314:2(513)Online publication date: 6-Jan-2024
  • (2024)Auto-tuning Multi-GPU High-Fidelity Numerical Simulations for Urban Air Mobility2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546549(1-6)Online publication date: 25-Mar-2024
  • (2024)GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian OptimizationProceedings of the VLDB Endowment10.14778/3659437.365944917:8(1939-1952)Online publication date: 1-Apr-2024
  • Show More Cited By

Index Terms

  1. OpenTuner: an extensible framework for program autotuning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation
    August 2014
    514 pages
    ISBN:9781450328098
    DOI:10.1145/2628071
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. autotuner
    2. optimization

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    PACT '14
    Sponsor:
    • IFIP WG 10.3
    • SIGARCH
    • IEEE CS TCPP
    • IEEE CS TCAA

    Acceptance Rates

    PACT '14 Paper Acceptance Rate 54 of 144 submissions, 38%;
    Overall Acceptance Rate 121 of 471 submissions, 26%

    Upcoming Conference

    PACT '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)639
    • Downloads (Last 6 weeks)78
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Cross-Feature Transfer Learning for Efficient Tensor Program GenerationApplied Sciences10.3390/app1402051314:2(513)Online publication date: 6-Jan-2024
    • (2024)Auto-tuning Multi-GPU High-Fidelity Numerical Simulations for Urban Air Mobility2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546549(1-6)Online publication date: 25-Mar-2024
    • (2024)GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian OptimizationProceedings of the VLDB Endowment10.14778/3659437.365944917:8(1939-1952)Online publication date: 1-Apr-2024
    • (2024)Block size estimation for data partitioning in HPC applications using machine learning techniquesJournal of Big Data10.1186/s40537-023-00862-w11:1Online publication date: 16-Jan-2024
    • (2024)(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional HomomorphismsACM Transactions on Programming Languages and Systems10.1145/3665643Online publication date: 22-May-2024
    • (2024)Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core ArchitecturesProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656630(388-399)Online publication date: 30-May-2024
    • (2024)Accelerated Auto-Tuning of GPU Kernels for Tensor ComputationsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656626(549-561)Online publication date: 30-May-2024
    • (2024)DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN AcceleratorsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649196(126-137)Online publication date: 7-May-2024
    • (2024)Optimization Space Learning: A Lightweight, Noniterative Technique for Compiler AutotuningProceedings of the 28th ACM International Systems and Software Product Line Conference10.1145/3646548.3672588(36-46)Online publication date: 2-Sep-2024
    • (2024)Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine RelationsACM Transactions on Computer Systems10.1145/363530541:1-4(1-45)Online publication date: 15-Jan-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media