research-article

Public Access

Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models

Authors:

Rohan Basu Roy,

Vijay Gadepally,

Devesh TiwariAuthors Info & Claims

PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Pages 1280 - 1295

https://doi.org/10.1145/3453483.3454109

Published: 18 June 2021 Publication History

Abstract

As parallel applications become more complex, auto-tuning becomes more desirable, challenging, and time-consuming. We propose, Bliss, a novel solution for auto-tuning parallel applications without requiring apriori information about applications, domain-specific knowledge, or instrumentation. Bliss demonstrates how to leverage a pool of Bayesian Optimization models to find the near-optimal parameter setting 1.64× faster than the state-of-the-art approaches.

References

[1]

Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra. 2016. Performance, Design, and Autotuning of Batched GEMM for GPUs. In International Conference on High Performance Computing. 21–38. https://doi.org/10.1007/978-3-319-41321-1_2

[2]

Andrew Anderson and David Gregg. 2018. Optimal DNN Primitive Selection with Partitioned Boolean Quadratic Programming. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. 340–351. https://doi.org/10.1145/3168805

Digital Library

[3]

Jason Ansel. 2014. Opentuner: An extensible framework for program autotuning. In International conference on Parallel architectures and compilation. https://doi.org/10.1145/2628071.2628092

Digital Library

[4]

Jason Ansel, Maciej Pacula, Saman Amarasinghe, and Una-May O’Reilly. 2011. An efficient evolutionary algorithm for solving incrementally structured problems. In Proceedings of the 13th annual conference on Genetic and evolutionary computation. 1699–1706. https://doi.org/10.1145/2001576.2001805

Digital Library

[5]

Inpyo Bae, Barend Harris, Hyemi Min, and Bernhard Egger. 2018. Auto-Tuning CNNs for Coarse-Grained Reconfigurable Array-Based Accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/TCAD.2018.2857278

[6]

Sara S Baghsorkhi, Matthieu Delahaye, William D Gropp, and W Hwu Wen-mei. 2009. Analytical performance prediction for evaluation and tuning of GPGPU applications. In Workshop on EPHAM2009.

[7]

Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K Hollingsworth, Boyana Norris, and Richard Vuduc. 2018. Autotuning in high-performance computing applications. Proc. IEEE, 106, 11 (2018), 2068–2083. https://doi.org/10.1109/JPROC.2018.2841200

[8]

Prasanna Balaprakash, Robert B Gramacy, and Stefan M Wild. 2013. Active-Learning-Based Surrogate Models for Empirical Performance Tuning. In 2013 IEEE International Conference on Cluster Computing (CLUSTER). 1–8. https://doi.org/10.1109/CLUSTER.2013.6702683

[9]

David Beckingsale, Olga Pearce, Ignacio Laguna, and Todd Gamblin. 2017. Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code. In 2017 IEEE International Parallel and Distributed Processing Symposium. https://doi.org/10.1109/IPDPS.2017.38

[10]

Zhendong Bei, Zhibin Yu, Huiling Zhang, Wen Xiong, Chengzhong Xu, Lieven Eeckhout, and Shengzhong Feng. 2015. RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop’s Configuration. IEEE Transactions on Parallel and Distributed Systems, 27, 5 (2015), 1470–1483. https://doi.org/10.1109/TPDS.2015.2449299

Digital Library

[11]

Greg Bronevetsky, John Gyllenhaal, and Bronis R De Supinski. 2008. CLOMP: Accurately Characterizing OpenMP Application Overheads. In International Workshop on OpenMP. 13–25. https://doi.org/10.1007/978-3-540-79561-2_2

[12]

Sanjay Chatterjee, Nick Vrvilo, Zoran Budimlic, Kathleen Knobe, and Vivek Sarkar. 2016. Declarative tuning for locality in parallel programs. In 45th International Conference on Parallel Processing (ICPP). https://doi.org/10.1109/ICPP.2016.58

[13]

Chaofan Chen, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, and Jonathan K Su. 2019. This looks like that: deep learning for interpretable image recognition. In Advances in neural information processing systems. 8930–8941.

[14]

Guangyu Chen. 2002. Tuning garbage collection in an embedded Java environment. In HPCA. https://doi.org/10.1109/HPCA.2002.995701

[15]

Ray S Chen and Jeffrey K Hollingsworth. 2015. Angel: A Hierarchical Approach to Multi-Objective Online Auto-Tuning. In Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers. 1–8. https://doi.org/10.1145/2768405.2768409

Digital Library

[16]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, and Luis Ceze. 2018. $TVM$: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th $USENIX$ Symposium on Operating Systems Design and Implementation ($OSDI$ 18). 578–594.

[17]

Matthias Christen, Olaf Schenk, and Helmar Burkhart. 2011. Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In IEEE International Parallel & Distributed Processing Symposium. 676–687. https://doi.org/10.1109/IPDPS.2011.70

Digital Library

[18]

I-H Chung and Jeffrey K Hollingsworth. 2006. A Case Study using Automatic Performance Tuning for Large-Scale Scientific Programs. In 2006 15th IEEE International Conference on High Performance Distributed Computing. 45–56. https://doi.org/10.1109/HPDC.2006.1652135

[19]

Valentin Dalibard, Michael Schaarschmidt, and Eiko Yoneki. 2017. BOAT: Building auto-tuners with structured Bayesian optimization. In WWW. 479–488. https://doi.org/10.1145/3038912.3052662

Digital Library

[20]

Miguel de Prado, Nuria Pazos, and Luca Benini. 2019. Learning to Infer: RL-Based Search for DNN Primitive Selection on Heterogeneous Embedded Systems. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). 1409–1414. https://doi.org/10.23919/DATE.2019.8714959

[21]

James Demmel. 2018. DEGAS: Dynamic Exascale Global Address Space Programming Environments. Univ. of California, Berkeley, CA (United States).

[22]

Yufei Ding. 2015. Autotuning algorithmic choice for input sensitivity. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). https://doi.org/10.1145/2737924.2737969

Digital Library

[23]

Yuri Dotsenko, Sara S Baghsorkhi, Brandon Lloyd, and Naga K Govindaraju. 2011. Auto-tuning of fast fourier transform on graphics processors. ACM SIGPLAN Notices, 46, 8 (2011), 257–266. https://doi.org/10.1145/1941553.1941589

Digital Library

[24]

Dmitry Duplyakin, Jed Brown, and Robert Ricci. 2016. Active learning in performance analysis. In 2016 IEEE International Conference on Cluster Computing. https://doi.org/10.1109/CLUSTER.2016.63

[25]

Murali Krishna Emani and Michael O’Boyle. 2015. Celebrating diversity: A mixture of experts approach for runtime mapping in dynamic environments. ACM SIGPLAN Notices, 50, 6 (2015), 499–508. https://doi.org/10.1145/2737924.2737999

Digital Library

[26]

Thomas L Falch and Anne C Elster. 2015. Machine Learning based Auto-Tuning for Enhanced OpenCL Performance Portability. In 2015 IEEE International Parallel and Distributed Processing Symposium Workshop. 1231–1240. https://doi.org/10.1109/IPDPSW.2015.85

Digital Library

[27]

Robert D Falgout and Ulrike Meier Yang. 2002. Hypre: A Library of High Performance Preconditioners. In International Conference on Computational Science. https://doi.org/10.1007/3-540-47789-6_66

[28]

Matteo Frigo and Steven G Johnson. 2005. The design and implementation of FFTW3. Proc. IEEE, 93, 2 (2005), 216–231. https://doi.org/10.1109/JPROC.2004.840301

[29]

Michael Gerndt and Michael Ott. 2010. Automatic Performance Analysis with Periscope. Concurrency and Computation: Practice and Experience, 22, 6 (2010), 736–748. https://doi.org/10.1002/cpe.1551

[30]

Alexander Grebhahn, Norbert Siegmund, Harald Köstler, and Sven Apel. 2016. Performance Prediction of Multigrid-Solver Configurations. In Software for Exascale Computing-SPPEXA 2013-2015. Springer. https://doi.org/10.1007/978-3-319-40528-5_4

[31]

KS Griffin, AP Marathe, and B Hamann. 2018. SCoRE4HPC: Self-Configuring Runtime Environment for HPC Applications. Lawrence Livermore National Lab.(LLNL), Livermore, CA.

[32]

Philipp Gschwandtner. 2014. Multi-Objective Auto-Tuning with Insieme: Optimization and Trade-Off Analysis for Time, Energy and Resource Usage. In European Conference on Parallel Processing. https://doi.org/10.1007/978-3-319-09873-9_8

[33]

Hui Guo, Ignacio Laguna, and Cindy Rubio-González. 2020. pLiner: isolating lines of floating-point code for compiler-induced variability. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 680–693. https://doi.org/10.1109/SC41405.2020.00053

[34]

José Miguel Hernández-Lobato, Michael A Gelbart, Brandon Reagen, Robert Adolf, Daniel Hernández-Lobato, Paul N Whatmough, David Brooks, Gu-Yeon Wei, and Ryan P Adams. 2016. Designing neural network hardware accelerators with decoupled objective evaluations. In NIPS workshop on Bayesian Optimization. l0.

[35]

Jeffrey Hollingsworth and Ananta Tiwari. 2010. End-to-End Auto-Tuning with Active Harmony. Performance Tuning of Scientific Applications, 217–238. https://doi.org/10.1201/b10509-11

[36]

Jeffrey K Hollingsworth and Peter J Keleher. 1999. Prediction and Adaptation in Active Harmony. Cluster Computing, 2, 3 (1999), 195. https://doi.org/10.1023/A:1019034926845

Digital Library

[37]

Pooyan Jamshidi, Miguel Velez, Christian Kästner, Norbert Siegmund, and Prasad Kawthekar. 2017. Transfer Learning for Improving Model Predictions in Highly Configurable Software. In 2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. https://doi.org/10.1109/SEAMS.2017.11

Digital Library

[38]

Lizy Kurian John. 2020. Connectivity! Connectivity! Connectivity! May You Be More Connected Than Ever!!. IEEE Micro, 40, 1 (2020), https://doi.org/10.1109/MM.2019.2961722

[39]

Herbert Jordan. 2012. A multi-objective auto-tuning framework for parallel codes. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. https://doi.org/10.1109/SC.2012.7

Digital Library

[40]

Shoaib A Kamil. 2013. Productive high performance parallel programming with auto-tuned domain-specific embedded languages. CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE.

[41]

Ian Karlin, Jeff Keasler, and JR Neely. 2013. Lulesh 2.0 Updates and Changes. Lawrence Livermore National Lab (LLNL)).

[42]

Milind Kulkarni. 2020. Compiler and Runtime Approaches to Enable Large-Scale Irregular Programs. Purdue University).

[43]

Adam J Kunen, Teresa S Bailey, and Peter N Brown. 2015. KRIPKE: A Massively Parallel Transport Mini-App. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).

[44]

Jiajia Li, Guangming Tan, Mingyu Chen, and Ninghui Sun. 2013. SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication. In ACM SIGPLAN conference on Programming language design and implementation. 117–126. https://doi.org/10.1145/2491956.2462181

Digital Library

[45]

Yang Liu, Wissam M Sid-Lakhdar, Osni Marques, Xinran Zhu, Chang Meng, James W Demmel, and Xiaoye S Li. 2021. GPTune: multitask learning for autotuning exascale applications. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 234–246. https://doi.org/10.1145/3437801.3441621

Digital Library

[46]

Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in neural information processing systems. 4765–4774.

[47]

Ashraf Mahgoub. 2017. Rafiki: A Middleware for Parameter Tuning of NoSQL Datastores for Dynamic Metagenomics Workloads. In 18th ACM/USENIX Middleware Conference. https://doi.org/10.1145/3135974.3135991

Digital Library

[48]

Aniruddha Marathe. 2017. Performance Modeling Under Resource Constraints Using Deep Transfer Learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–12. https://doi.org/10.1145/3126908.3126969

Digital Library

[49]

Harshitha Menon, Abhinav Bhatele, and Todd Gamblin. 2020. Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization. In IPDPS. 831–840. https://doi.org/10.1109/IPDPS47924.2020.00090

[50]

Dang Nguyen, Sunil Gupta, Santu Rana, Alistair Shilton, and Svetha Venkatesh. 2020. Bayesian Optimization for Categorical and Category-Specific Continuous Inputs. In AAAI. 5256–5263. https://doi.org/10.1609/aaai.v34i04.5971

[51]

William F Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Minimizing the Cost of Iterative Compilation with Active Learning. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). https://doi.org/10.1109/CGO.2017.7863744

[52]

Michael A Osborne, Roman Garnett, and Stephen J Roberts. 2009. Gaussian Processes for Global Optimization. In 3rd international conference on learning and intelligent optimization (LION3). 2009.

[53]

Yanghua Peng. 2019. A Generic Communication Scheduler for Distributed DNN Training Acceleration. In Proceedings of the ACM Symposium on Operating Systems Principles. https://doi.org/10.1145/3341301.3359642

Digital Library

[54]

Markus Püschel, José MF Moura, Bryan Singer, Jianxin Xiong, Jeremy Johnson, David Padua, Manuela Veloso, and Robert W Johnson. 2004. Spiral: A generator for platform-adapted libraries of signal processing alogorithms. The International Journal of High Performance Computing Applications, https://doi.org/10.1177/1094342004041291

Digital Library

[55]

Brandon Reagen. 2017. A case for efficient accelerator design space exploration via Bayesian optimization. In 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 1–6. https://doi.org/10.1109/ISLPED.2017.8009208

[56]

David F Richards, Omar Aziz, Jeanine Cook, Hal Finkel, Brian Homerding, Tanner Judeman, Peter McCorquodale, Tiffany Mintz, and Shirley Moore. 2018. Quantitative Performance Assessment of Proxy Apps and Parents. Lawrence Livermore National Lab.

[57]

Alvin E Roth. 1988. The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press. https://doi.org/10.1017/CBO9780511528446

[58]

Amit Roy, Prasanna Balaprakash, Paul D Hovland, and Stefan M Wild. 2016. Exploiting Performance Portability in Search Algorithms for Autotuning. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1535–1544. https://doi.org/10.1109/IPDPSW.2016.85

[59]

Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H Bailey, Costin Iancu, and David Hough. 2013. Precimonious: Tuning assistant for floating-point precision. In SC. https://doi.org/10.1145/2503210.2503296

Digital Library

[60]

Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 5 (2019), 206–215. https://doi.org/10.1038/s42256-019-0048-x

[61]

Vivek Sarkar. 2020. DDARING: Dynamic Data Aware Reconfiguration, Integration and Generation. GEORGIA TECH RESEARCH CORPORATION Atlanta United States.

[62]

Vivek Sarkar, William Harrod, and Allan E Snavely. 2009. Software challenges in extreme scale systems. In Journal of Physics: Conference Series. https://doi.org/10.1088/1742-6596/180/1/012045

[63]

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE, 104, 1 (2015), https://doi.org/10.1109/JPROC.2015.2494218

[64]

Wissam M Sid-Lakhdar, Mohsen Mahmoudi Aznaveh, Xiaoye S Li, and James W Demmel. 2019. Multitask and transfer learning for autotuning exascale applications. arXiv preprint arXiv:1908.05792.

[65]

Cristina Silvano. 2016. AutoTuning and Adaptivity AppRoach for Energy Efficient EXascale HPC Systems: The ANTAREX Approach. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). 708–713. https://doi.org/10.3850/9783981537079_1012

[66]

Cristina Silvano, Giovanni Agosta, Stefano Cherubin, Davide Gadioli, Gianluca Palermo, Andrea Bartolini, Luca Benini, Jan Martinovič, Martin Palkovič, and Kateřina Slaninová. 2016. The ANTAREX approach to autotuning and adaptivity for energy efficient HPC systems. In Proceedings of the ACM International Conference on Computing Frontiers. 288–293. https://doi.org/10.1145/2903150.2903470

Digital Library

[67]

Artur Souza, Luigi Nardi, Leonardo B Oliveira, Kunle Olukotun, Marius Lindauer, and Frank Hutter. 2020. Prior-guided Bayesian Optimization. arXiv preprint arXiv:2006.14608.

[68]

Vinu Sreenivasan, Rajath Javali, Mary Hall, Prasanna Balaprakash, Thomas RW Scogland, and Bronis R de Supinski. 2019. A framework for enabling OpenMP autotuning. In International Workshop on OpenMP. 50–60. https://doi.org/10.1007/978-3-030-28596-8_4

Digital Library

[69]

Cristian Tapus, I-Hsin Chung, and Jeffrey K Hollingsworth. 2002. Active Harmony: Towards Automated Performance Tuning. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (SC). https://doi.org/10.1109/SC.2002.10062

[70]

Jayaraman J Thiagarajan, Nikhil Jain, Rushil Anirudh, Alfredo Gimenez, Rahul Sridhar, Aniruddha Marathe, Tao Wang, Murali Emani, Abhinav Bhatele, and Todd Gamblin. 2018. Bootstrapping parameter space exploration for fast tuning. In Proceedings of the 2018 International Conference on Supercomputing. 385–395. https://doi.org/10.1145/3205289.3205321

Digital Library

[71]

Ananta Tiwari. 2011. Auto-tuning full applications: A case study. The International Journal of High Performance Computing Applications, 25, 3 (2011), 286–294. https://doi.org/10.1177/1094342011414744

Digital Library

[72]

Ananta Tiwari. 2011. Online Adaptive Code Generation and Tuning. In 2011 IEEE International Parallel & Distributed Processing Symposium. 879–892. https://doi.org/10.1109/IPDPS.2011.86

Digital Library

[73]

Ananta Tiwari, Chun Chen, Jacqueline Chame, Mary Hall, and Jeffrey K Hollingsworth. 2009. A Scalable Auto-Tuning Framework for Compiler Optimization. In IEEE International Symposium on Parallel & Distributed Processing. https://doi.org/10.1109/IPDPS.2009.5161054

Digital Library

[74]

Richard Vuduc, James W Demmel, and Katherine A Yelick. 2005. OSKI: A library of automatically tuned sparse matrix kernels. In Journal of Physics: Conference Series. https://doi.org/10.1088/1742-6596/16/1/071

[75]

Tao Wang, Nikhil Jain, David Beckingsale, David Boehme, Frank Mueller, and Todd Gamblin. 2019. FuncyTuner: Auto-tuning Scientific Applications With Per-loop Compilation. In Proceedings of the 48th International Conference on Parallel Processing. 1–10. https://doi.org/10.1145/3337821.3337842

Digital Library

[76]

R Clinton Whaley and Jack J Dongarra. 1998. Automatically tuned linear algebra software. In SC’98: ACM/IEEE conference on Supercomputing. 38–38. https://doi.org/10.1109/SC.1998.10004

[77]

Chaofeng Wu. 2019. Paraopt: Automated application parameterization and optimization for the cloud. In 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). 255–262. https://doi.org/10.1109/CloudCom.2019.00045

[78]

Xingfu Wu. 2020. Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization. In PMBS. https://doi.org/10.1109/PMBS51919.2020.00012

[79]

Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. ACM SIGPLAN Notices, 51, 4 (2016), 545–559. https://doi.org/10.1145/2954679.2872375

Digital Library

[80]

Peng Zhang, Jianbin Fang, Tao Tang, Canqun Yang, and Zheng Wang. 2018. Auto-Tuning Streamed Applications on Intel Xeon Phi. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). https://doi.org/10.1109/IPDPS.2018.00061

Cited By

Sun QLiu YYang HJiang ZLuan ZQian D(2024)Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332563035:1(20-33)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TPDS.2023.3325630
Bolet GGeorgakoudis GParasyris KCameron KBeckingsale DGamblin T(2024)An Exploration of Global Optimization Strategies for Autotuning OpenMP-based Codes2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00138(741-750)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00138
Hellsten ESouza ALenfers JLacouture RHsu OEjjeh AKjolstad FSteuwer MOlukotun KNardi LAamodt TSwift MJerger N(2023)BaCO: A Fast and Portable Bayesian Compiler Optimization FrameworkProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624770(19-42)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624770
Show More Cited By

Index Terms

Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. Software and its engineering
  1. Software creation and management
    1. Search-based software engineering

Recommendations

Matrix multiplication beyond auto-tuning: rewrite-based GPU code generation
CASES '16: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Graphics Processing Units (GPUs) are used as general purpose parallel accelerators in a wide range of applications. They are found in most computing systems, and mobile devices are no exception. The recent availability of programming APIs such as OpenCL ...
Auto-tuning stencil codes for cache-based multicore platforms
Auto-tuning for GPGPU applications using performance and energy model

Performance and Energy models are proposed for GPGPU software applications with high fidelity, which can be used for tuning the software parameters.Auto-tuning framework based on simulated annealing and genetic algorithm is proposed for GPGPU software ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

June 2021

1341 pages

ISBN:9781450383912

DOI:10.1145/3453483

General Chair:
Stephen N. Freund
Williams College, USA
,
Program Chair:
Eran Yahav
Technion, Israel

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

PLDI '21

Sponsor:

SIGPLAN

PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

June 20 - 25, 2021

Virtual, Canada

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
1,590
Total Downloads

Downloads (Last 12 months)607
Downloads (Last 6 weeks)76

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun QLiu YYang HJiang ZLuan ZQian D(2024)Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332563035:1(20-33)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TPDS.2023.3325630
Bolet GGeorgakoudis GParasyris KCameron KBeckingsale DGamblin T(2024)An Exploration of Global Optimization Strategies for Autotuning OpenMP-based Codes2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00138(741-750)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00138
Hellsten ESouza ALenfers JLacouture RHsu OEjjeh AKjolstad FSteuwer MOlukotun KNardi LAamodt TSwift MJerger N(2023)BaCO: A Fast and Portable Bayesian Compiler Optimization FrameworkProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624770(19-42)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624770
Yang TChen RLi YLiu XWang G(2023)CoTuner: A Hierarchical Learning Framework for Coordinately Optimizing Resource Partitioning and Parameter TuningProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605578(317-326)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605573.3605578
Dutta AAlcaraz JTehraniJamsaz ACesar ESikora AJannesari AButt AMi NChard K(2023)Performance Optimization using Multimodal Modeling and Heterogeneous GNNProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592984(45-57)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3588195.3592984
Parasyris KGeorgakoudis GRangel ELaguna IDoerfert JMohror KArnold DBadia R(2023)Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and ReplayProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607098(1-14)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607098
Randall TKoo JVideau BKruse MWu XHovland PHall MGe RBalaprakash PGallivan KNikolopoulos DBeivide RGallopoulos E(2023)Transfer-learning-based Autotuning using Gaussian CopulaProceedings of the 37th ACM International Conference on Supercomputing10.1145/3577193.3593712(37-49)Online publication date: 21-Jun-2023
https://dl.acm.org/doi/10.1145/3577193.3593712
Dutta AChoi JJannesari A(2023)Power Constrained Autotuning using Graph Neural Networks2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00060(535-545)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00060
Edelman DSamsi SMcDonald JMichaleas AGadepally V(2023)An Analysis of Energy Requirement for Computer Vision Algorithms2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363596(1-7)Online publication date: 25-Sep-2023
https://doi.org/10.1109/HPEC58863.2023.10363596
Doroshenko AIvanenko PYatsenko O(2023)Formal Techniques for Development and Auto-tuning of Parallel ProgramsSN Computer Science10.1007/s42979-022-01559-24:2Online publication date: 13-Jan-2023
https://dl.acm.org/doi/10.1007/s42979-022-01559-2
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents