Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3623278.3624770acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Open access

BaCO: A Fast and Portable Bayesian Compiler Optimization Framework

Published: 07 February 2024 Publication History
  • Get Citation Alerts
  • Abstract

    We introduce the Bayesian Compiler Optimization framework (BaCO), a general purpose autotuner for modern compilers targeting CPUs, GPUs, and FPGAs. BaCO provides the flexibility needed to handle the requirements of modern autotuning tasks. Particularly, it deals with permutation, ordered, and continuous parameter types along with both known and unknown parameter constraints. To reason about these parameter types and efficiently deliver high-quality code, BaCO uses Bayesian optimization algorithms specialized towards the autotuning domain. We demonstrate BaCO's effectiveness on three modern compiler systems: TACO, RISE & ELEVATE, and HPVM2FPGA for CPUs, GPUs, and FPGAs respectively. For these domains, BaCO outperforms current state-of-the-art auto-tuners by delivering on average 1.36X--1.56X faster code with a tiny search budget, and BaCO is able to reach expert-level performance 2.9X--3.9X faster.

    References

    [1]
    Proteas-tune. https://www.ornl.gov/project/proteas-tune. Accessed: 2022-10-18.
    [2]
    Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. OpenTuner: An extensible framework for program autotuning. In International Conference on Parallel Architectures and Compilation Techniques (PACT), 2014.
    [3]
    The GPyOpt authors. GPyOpt: A bayesian optimization framework in python. http://github.com/SheffieldML/GPyOpt, 2016.
    [4]
    Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K Hollingsworth, Boyana Norris, and Richard Vuduc. Autotuning in high-performance computing applications. Proceedings of the IEEE, 106(11):2068--2083, 2018.
    [5]
    Pedro Bruel, Alfredo Goldman, Sai Rahul Chalamalasetti, and Dejan Milojicic. Autotuning high-level synthesis for fpgas using opentuner and legup. In International Conference on ReConFigurable Computing and FPGAs (ReConFig). IEEE, 2017.
    [6]
    Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. Rodinia: A benchmark suite for heterogeneous computing. In International Symposium on Workload Characterization (IISWC). IEEE, 2009.
    [7]
    Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. TVM: An automated End-to-End optimizing compiler for deep learning. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018.
    [8]
    Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. Format abstraction for sparse tensor algebra compilers. Proc. ACM Program. Lang., 2(OOPSLA):123:1--123:30, October 2018.
    [9]
    Jhouben Cuesta Ramirez, Rodolphe Le Riche, Olivier Roustant, Guillaume Perrin, Cedric Durantin, and Alain Gliere. A comparison of mixed-variables bayesian optimization approaches. Advanced Modeling and Simulation in Engineering Sciences, 9(1):1--29, 2022.
    [10]
    Timothy A Davis and Yifan Hu. The university of florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS), 38(1):1--25, 2011.
    [11]
    Matthieu Dorier, Romain Egele, Prasanna Balaprakash, Jaehoon Koo, Sandeep Madireddy, Srinivasan Ramesh, Allen D Malony, and Rob Ross. Hpc storage service autotuning using variational-autoencoder-guided asynchronous bayesian optimization. arXiv preprint arXiv:2210.00798, 2022.
    [12]
    Adel Ejjeh, Aaron Councilman, Akash Kothari, Maria Kotsifakou, Leon Medvinsky, Abdul Rafae Noor, Hashim Sharif, Yifan Zhao, Sarita Adve, Sasa Misailovic, et al. HPVM: Hardware-agnostic programming for heterogeneous parallel systems. IEEE Micro, 42(5):108--117, 2022.
    [13]
    Adel Ejjeh, Leon Medvinsky, Aaron Councilman, Hemang Nehra, Suraj Sharma, Vikram Adve, Luigi Nardi, Eriko Nurvitadhi, and Rob A Rutenbar. HPVM2FPGA: Enabling true hardware-agnostic fpga programming. In IEEE International Conference on Application-specific Systems, Architectures, and Processors (ASAP), 2022.
    [14]
    Lorenzo Ferretti, Andrea Cini, Georgios Zacharopoulos, Cesare Alippi, and Laura Pozzi. A graph deep learning framework for high-level synthesis design space exploration, 2021.
    [15]
    Peter I Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
    [16]
    Trevor Gale, Matei Zaharia, Cliff Young, and Erich Elsen. Sparse gpu kernels for deep learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '20. IEEE Press, 2020.
    [17]
    Jacob R Gardner, Matt J Kusner, Zhixiang Eddie Xu, Kilian Q Weinberger, and John P Cunningham. Bayesian optimization with inequality constraints. In International Conference on Machine Learning (ICML), 2014.
    [18]
    Eduardo C Garrido-Merchan and Daniel Hernández-Lobato. Dealing with categorical and integer-valued variables in bayesian optimization with gaussian processes. Neurocomputing, 380:20--35, 2020.
    [19]
    Bastian Hagedorn, Johannes Lenfers, Thomas Koehler, Xueying Qin, Sergei Gorlatch, and Michel Steuwer. Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies. Proceedings of the ACM on Programming Languages, 4(ICFP):1--29, 2020.
    [20]
    Bastian Hagedorn, Larisa Stoltzfus, Michel Steuwer, Sergei Gorlatch, and Christophe Dubach. High performance stencil code generation with Lift. In International Symposium on Code Generation and Optimization (CGO), 2018.
    [21]
    Ameer Haj-Ali, Hasan Genc, Qijing Huang, William Moses, John Wawrzynek, Krste Asanović, and Ion Stoica. Protuner: tuning programs with monte carlo tree search. arXiv preprint arXiv:2005.13685, 2020.
    [22]
    Ameer Haj-Ali, Qijing Jenny Huang, John Xiang, William Moses, Krste Asanovic, John Wawrzynek, and Ion Stoica. Autophase: Juggling hls phase orderings in random forests with deep reinforcement learning. Proceedings of Machine Learning and Systems, 2:70--81, 2020.
    [23]
    Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In International conference on learning and intelligent optimization (LION), 2011.
    [24]
    Muhammad Huzaifa, Rishi Desai, Samuel Grayson, Xutao Jiang, Ying Jing, Jae Lee, Fang Lu, Yihan Pang, Joseph Ravichandran, Finn Sinclair, et al. ILLIXR: Enabling end-to-end extended reality research. In 2021 IEEE International Symposium on Workload Characterization (IISWC), pages 24--38. IEEE, 2021.
    [25]
    Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4):455--492, 1998.
    [26]
    Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. The tensor algebra compiler. Proceedings of the ACM on Programming Languages, 1(OOPSLA), 2017.
    [27]
    Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. Fast bayesian optimization of machine learning hyperparameters on large datasets. In Artificial intelligence and statistics, pages 528--536. PMLR, 2017.
    [28]
    Nicolas Knudde, Joachim van der Herten, Tom Dhaene, and Ivo Couckuyt. GPflowOpt: A Bayesian Optimization Library using TensorFlow. arXiv preprint - arXiv:1711.03845, 2017.
    [29]
    Thomas Koehler and Michel Steuwer. Towards a domain-extensible compiler: optimizing an image processing pipeline on mobile cpus. In International Symposium on Code Generation and Optimization (CGO), 2021.
    [30]
    David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, et al. Spatial: A language and compiler for application accelerators. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 296--311, 2018.
    [31]
    Scott P Kolodziej, Mohsen Aznaveh, Matthew Bullock, Jarrett David, Timothy A Davis, Matthew Henderson, Yifan Hu, and Read Sandstrom. The suitesparse matrix collection website interface. Journal of Open Source Software, 4(35):1244, 2019.
    [32]
    Jaehoon Koo, Prasanna Balaprakash, Michael Kruse, Xingfu Wu, Paul Hovland, and Mary Hall. Customized monte carlo tree search for llvm/polly's composable loop optimization transformations. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), 2021.
    [33]
    Maria Kotsifakou, Prakalp Srivastava, Matthew D Sinclair, Rakesh Komuravelli, Vikram Adve, and Sarita Adve. Hpvm: Heterogeneous parallel virtual machine. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 68--80, 2018.
    [34]
    Michael Kruse, Hal Finkel, and Xingfu Wu. Autotuning search space for loop transformations. In IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar), 2020.
    [35]
    M. Lindauer, K. Eggensperger, M. Feurer, A. Biedenkapp, J. Marben, P. Müller, and F. Hutter. Boah: A tool suite for multi-fidelity bayesian optimization & analysis of hyperparameters. arXiv:1908.06756 [cs.LG].
    [36]
    Marius Lindauer, Katharina Eggensperger, Matthias Feurer, André Biedenkapp, Difan Deng, Carolin Benjamins, Tim Ruhkopf, René Sass, and Frank Hutter. SMAC3: A versatile bayesian optimization package for hyperparameter optimization. Journal of Machine Learning Research, 23(54):1--9, 2022.
    [37]
    Dong C Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1):503--528, 1989.
    [38]
    Yang Liu, Wissam M Sid-Lakhdar, Osni Marques, Xinran Zhu, Chang Meng, James W Demmel, and Xiaoye S Li. GPTune: multitask learning for autotuning exascale applications. In Principles and Practice of Parallel Programming (PPoPP), 2021.
    [39]
    Maria Lomeli, Mark Rowland, Arthur Gretton, and Zoubin Ghahramani. Antithetic and monte carlo kernel estimators for partial rankings. Statistics and Computing, 29(5):1127--1147, 2019.
    [40]
    Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.
    [41]
    Luigi Nardi, David Koeplinger, and Kunle Olukotun. Practical design space exploration. In Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2019.
    [42]
    Thomas Nelson, Axel Rivera, Prasanna Balaprakash, Mary Hall, Paul D Hovland, Elizabeth Jessup, and Boyana Norris. Generating efficient tensor contractions for gpus. In International Conference on Parallel Processing (ICPP), 2015.
    [43]
    Filip Petrovič, David Střelák, Jana Hozzová, Jaroslav Ol'ha, Richard Trembeckỳ, Siegfried Benkner, and Jiří Filipovič. A benchmark set of highly-efficient cuda and opencl kernels and its dynamic autotuning with kernel tuning toolkit. Future Generation Computer Systems, 108:161--177, 2020.
    [44]
    Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM Sigplan Notices, 48(6):519--530, 2013.
    [45]
    Ari Rasch, Michael Haidl, and Sergei Gorlatch. Atf: A generic auto-tuning framework. In 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pages 64--71. IEEE, 2017.
    [46]
    Ari Rasch, Richard Schulze, Michel Steuwer, and Sergei Gorlatch. Efficient auto-tuning of parallel programs with interdependent tuning parameters via auto-tuning framework (ATF). ACM Transactions on Architecture and Code Optimization, 18(1):1:1--1:26, 2021.
    [47]
    Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian processes for machine learning. MIT press, 2006.
    [48]
    Rohan Basu Roy, Tirthak Patel, Vijay Gadepally, and Devesh Tiwari. Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models. In Programming Language Design and Implementation (PLDI), 2021.
    [49]
    Ryan Senanayake, Changwan Hong, Ziheng Wang, Amalee Wilson, Stephen Chou, Shoaib Kamil, Saman Amarasinghe, and Fredrik Kjolstad. A sparse iteration space transformation framework for sparse tensor algebra. Proceedings of the ACM on Programming Languages, 4(OOPSLA), 2020.
    [50]
    Wissam M Sid-Lakhdar, Mohsen Mahmoudi Aznaveh, Xiaoye S Li, and James W Demmel. Multitask and transfer learning for autotuning exascale applications. arXiv preprint arXiv:1908.05792, 2019.
    [51]
    Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. FROSTT: The formidable repository of open sparse tensors and tools, 2017.
    [52]
    Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. 2012.
    [53]
    Artur Souza, Luigi Nardi, Leonardo B Oliveira, Kunle Olukotun, Marius Lindauer, and Frank Hutter. Bayesian optimization with a prior for the optimum. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 2021.
    [54]
    Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance opencl code. ACM SIGPLAN Notices, 50(9):205--217, 2015.
    [55]
    Michel Steuwer, Thomas Koehler, Bastian Köpcke, and Federico Pizzuti. RISE & shine: Language-oriented compiler design. CoRR, abs/2201.03611, 2022.
    [56]
    Michel Steuwer, Toomas Remmelg, and Christophe Dubach. Lift: a functional data-parallel ir for high-performance gpu code generation. In Code Generation and Optimization (CGO), 2017.
    [57]
    Larisa Stoltzfus, Bastian Hagedorn, Michel Steuwer, Sergei Gorlatch, and Christophe Dubach. Tiling optimizations for stencil computations using rewrite rules in lift. ACM Trans. Archit. Code Optim., 16(4), dec 2019.
    [58]
    Hakki Mert Torun, Madhavan Swaminathan, Anto Kavungal Davis, and Mohamed Lamine Faycal Bellaredj. A global bayesian optimization algorithm and its application to integrated system design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 26(4):792--802, 2018.
    [59]
    Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P Gummadi. On the evolution of user interaction in facebook. In Proceedings of the 2nd ACM workshop on Online social networks, pages 37--42, 2009.
    [60]
    Jie Wang, Licheng Guo, and Jason Cong. AutoSA: a polyhedral compiler for high-performance systolic arrays on FPGA. In SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2021.
    [61]
    Floris-Jan Willemsen, Rob van Nieuwpoort, and Ben van Werkhoven. Bayesian optimization for auto-tuning gpu kernels. In International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), 2021.
    [62]
    Nan Wu, Yuan Xie, and Cong Hao. Ironman: Gnn-assisted design space exploration in high-level synthesis via reinforcement learning. In Great Lakes Symposium on VLSI, 2021.
    [63]
    Xingfu Wu, Michael Kruse, Prasanna Balaprakash, Hal Finkel, Paul Hovland, Valerie Taylor, and Mary Hall. Autotuning PolyBench benchmarks with LLVM Clang/Polly loop optimization pragmas using bayesian optimization (extended version). arXiv preprint arXiv:2104.13242, 2021.
    [64]
    Guanwen Zhong, Alok Prakash, Yun Liang, Tulika Mitra, and Smail Niar. Lin-Analyzer: A high-level performance analysis tool for fpga-based accelerators. In ACM/EDAC/IEEE Design Automation Conference (DAC), 2016.
    [65]
    Xinran Zhu, Yang Liu, Pieter Ghysels, David Bindel, and Xiaoye S Li. GPTuneBand: Multi-task and multi-fidelity autotuning for large-scale high performance computing applications. In Parallel Processing for Scientific Computing, 2022.

    Cited By

    View all
    • (2024)Compilation of Modular and General Sparse WorkspacesProceedings of the ACM on Programming Languages10.1145/36564268:PLDI(1213-1238)Online publication date: 20-Jun-2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS '23: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4
    March 2023
    430 pages
    ISBN:9798400703942
    DOI:10.1145/3623278
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 February 2024

    Check for updates

    Badges

    Author Tags

    1. compiler optimizations
    2. high-performance computing
    3. bayesian optimization
    4. autotuning
    5. autoscheduling

    Qualifiers

    • Research-article

    Conference

    ASPLOS '23

    Acceptance Rates

    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)476
    • Downloads (Last 6 weeks)101
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Compilation of Modular and General Sparse WorkspacesProceedings of the ACM on Programming Languages10.1145/36564268:PLDI(1213-1238)Online publication date: 20-Jun-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media