Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3453483.3454109acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Public Access

Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models

Published: 18 June 2021 Publication History

Abstract

As parallel applications become more complex, auto-tuning becomes more desirable, challenging, and time-consuming. We propose, Bliss, a novel solution for auto-tuning parallel applications without requiring apriori information about applications, domain-specific knowledge, or instrumentation. Bliss demonstrates how to leverage a pool of Bayesian Optimization models to find the near-optimal parameter setting 1.64× faster than the state-of-the-art approaches.

References

[1]
Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra. 2016. Performance, Design, and Autotuning of Batched GEMM for GPUs. In International Conference on High Performance Computing. 21–38. https://doi.org/10.1007/978-3-319-41321-1_2
[2]
Andrew Anderson and David Gregg. 2018. Optimal DNN Primitive Selection with Partitioned Boolean Quadratic Programming. In Proceedings of the 2018 International Symposium on Code Generation and Optimization. 340–351. https://doi.org/10.1145/3168805
[3]
Jason Ansel. 2014. Opentuner: An extensible framework for program autotuning. In International conference on Parallel architectures and compilation. https://doi.org/10.1145/2628071.2628092
[4]
Jason Ansel, Maciej Pacula, Saman Amarasinghe, and Una-May O’Reilly. 2011. An efficient evolutionary algorithm for solving incrementally structured problems. In Proceedings of the 13th annual conference on Genetic and evolutionary computation. 1699–1706. https://doi.org/10.1145/2001576.2001805
[5]
Inpyo Bae, Barend Harris, Hyemi Min, and Bernhard Egger. 2018. Auto-Tuning CNNs for Coarse-Grained Reconfigurable Array-Based Accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/TCAD.2018.2857278
[6]
Sara S Baghsorkhi, Matthieu Delahaye, William D Gropp, and W Hwu Wen-mei. 2009. Analytical performance prediction for evaluation and tuning of GPGPU applications. In Workshop on EPHAM2009.
[7]
Prasanna Balaprakash, Jack Dongarra, Todd Gamblin, Mary Hall, Jeffrey K Hollingsworth, Boyana Norris, and Richard Vuduc. 2018. Autotuning in high-performance computing applications. Proc. IEEE, 106, 11 (2018), 2068–2083. https://doi.org/10.1109/JPROC.2018.2841200
[8]
Prasanna Balaprakash, Robert B Gramacy, and Stefan M Wild. 2013. Active-Learning-Based Surrogate Models for Empirical Performance Tuning. In 2013 IEEE International Conference on Cluster Computing (CLUSTER). 1–8. https://doi.org/10.1109/CLUSTER.2013.6702683
[9]
David Beckingsale, Olga Pearce, Ignacio Laguna, and Todd Gamblin. 2017. Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code. In 2017 IEEE International Parallel and Distributed Processing Symposium. https://doi.org/10.1109/IPDPS.2017.38
[10]
Zhendong Bei, Zhibin Yu, Huiling Zhang, Wen Xiong, Chengzhong Xu, Lieven Eeckhout, and Shengzhong Feng. 2015. RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop’s Configuration. IEEE Transactions on Parallel and Distributed Systems, 27, 5 (2015), 1470–1483. https://doi.org/10.1109/TPDS.2015.2449299
[11]
Greg Bronevetsky, John Gyllenhaal, and Bronis R De Supinski. 2008. CLOMP: Accurately Characterizing OpenMP Application Overheads. In International Workshop on OpenMP. 13–25. https://doi.org/10.1007/978-3-540-79561-2_2
[12]
Sanjay Chatterjee, Nick Vrvilo, Zoran Budimlic, Kathleen Knobe, and Vivek Sarkar. 2016. Declarative tuning for locality in parallel programs. In 45th International Conference on Parallel Processing (ICPP). https://doi.org/10.1109/ICPP.2016.58
[13]
Chaofan Chen, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, and Jonathan K Su. 2019. This looks like that: deep learning for interpretable image recognition. In Advances in neural information processing systems. 8930–8941.
[14]
Guangyu Chen. 2002. Tuning garbage collection in an embedded Java environment. In HPCA. https://doi.org/10.1109/HPCA.2002.995701
[15]
Ray S Chen and Jeffrey K Hollingsworth. 2015. Angel: A Hierarchical Approach to Multi-Objective Online Auto-Tuning. In Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers. 1–8. https://doi.org/10.1145/2768405.2768409
[16]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, and Luis Ceze. 2018. $TVM$: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th $USENIX$ Symposium on Operating Systems Design and Implementation ($OSDI$ 18). 578–594.
[17]
Matthias Christen, Olaf Schenk, and Helmar Burkhart. 2011. Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In IEEE International Parallel & Distributed Processing Symposium. 676–687. https://doi.org/10.1109/IPDPS.2011.70
[18]
I-H Chung and Jeffrey K Hollingsworth. 2006. A Case Study using Automatic Performance Tuning for Large-Scale Scientific Programs. In 2006 15th IEEE International Conference on High Performance Distributed Computing. 45–56. https://doi.org/10.1109/HPDC.2006.1652135
[19]
Valentin Dalibard, Michael Schaarschmidt, and Eiko Yoneki. 2017. BOAT: Building auto-tuners with structured Bayesian optimization. In WWW. 479–488. https://doi.org/10.1145/3038912.3052662
[20]
Miguel de Prado, Nuria Pazos, and Luca Benini. 2019. Learning to Infer: RL-Based Search for DNN Primitive Selection on Heterogeneous Embedded Systems. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). 1409–1414. https://doi.org/10.23919/DATE.2019.8714959
[21]
James Demmel. 2018. DEGAS: Dynamic Exascale Global Address Space Programming Environments. Univ. of California, Berkeley, CA (United States).
[22]
Yufei Ding. 2015. Autotuning algorithmic choice for input sensitivity. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). https://doi.org/10.1145/2737924.2737969
[23]
Yuri Dotsenko, Sara S Baghsorkhi, Brandon Lloyd, and Naga K Govindaraju. 2011. Auto-tuning of fast fourier transform on graphics processors. ACM SIGPLAN Notices, 46, 8 (2011), 257–266. https://doi.org/10.1145/1941553.1941589
[24]
Dmitry Duplyakin, Jed Brown, and Robert Ricci. 2016. Active learning in performance analysis. In 2016 IEEE International Conference on Cluster Computing. https://doi.org/10.1109/CLUSTER.2016.63
[25]
Murali Krishna Emani and Michael O’Boyle. 2015. Celebrating diversity: A mixture of experts approach for runtime mapping in dynamic environments. ACM SIGPLAN Notices, 50, 6 (2015), 499–508. https://doi.org/10.1145/2737924.2737999
[26]
Thomas L Falch and Anne C Elster. 2015. Machine Learning based Auto-Tuning for Enhanced OpenCL Performance Portability. In 2015 IEEE International Parallel and Distributed Processing Symposium Workshop. 1231–1240. https://doi.org/10.1109/IPDPSW.2015.85
[27]
Robert D Falgout and Ulrike Meier Yang. 2002. Hypre: A Library of High Performance Preconditioners. In International Conference on Computational Science. https://doi.org/10.1007/3-540-47789-6_66
[28]
Matteo Frigo and Steven G Johnson. 2005. The design and implementation of FFTW3. Proc. IEEE, 93, 2 (2005), 216–231. https://doi.org/10.1109/JPROC.2004.840301
[29]
Michael Gerndt and Michael Ott. 2010. Automatic Performance Analysis with Periscope. Concurrency and Computation: Practice and Experience, 22, 6 (2010), 736–748. https://doi.org/10.1002/cpe.1551
[30]
Alexander Grebhahn, Norbert Siegmund, Harald Köstler, and Sven Apel. 2016. Performance Prediction of Multigrid-Solver Configurations. In Software for Exascale Computing-SPPEXA 2013-2015. Springer. https://doi.org/10.1007/978-3-319-40528-5_4
[31]
KS Griffin, AP Marathe, and B Hamann. 2018. SCoRE4HPC: Self-Configuring Runtime Environment for HPC Applications. Lawrence Livermore National Lab.(LLNL), Livermore, CA.
[32]
Philipp Gschwandtner. 2014. Multi-Objective Auto-Tuning with Insieme: Optimization and Trade-Off Analysis for Time, Energy and Resource Usage. In European Conference on Parallel Processing. https://doi.org/10.1007/978-3-319-09873-9_8
[33]
Hui Guo, Ignacio Laguna, and Cindy Rubio-González. 2020. pLiner: isolating lines of floating-point code for compiler-induced variability. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 680–693. https://doi.org/10.1109/SC41405.2020.00053
[34]
José Miguel Hernández-Lobato, Michael A Gelbart, Brandon Reagen, Robert Adolf, Daniel Hernández-Lobato, Paul N Whatmough, David Brooks, Gu-Yeon Wei, and Ryan P Adams. 2016. Designing neural network hardware accelerators with decoupled objective evaluations. In NIPS workshop on Bayesian Optimization. l0.
[35]
Jeffrey Hollingsworth and Ananta Tiwari. 2010. End-to-End Auto-Tuning with Active Harmony. Performance Tuning of Scientific Applications, 217–238. https://doi.org/10.1201/b10509-11
[36]
Jeffrey K Hollingsworth and Peter J Keleher. 1999. Prediction and Adaptation in Active Harmony. Cluster Computing, 2, 3 (1999), 195. https://doi.org/10.1023/A:1019034926845
[37]
Pooyan Jamshidi, Miguel Velez, Christian Kästner, Norbert Siegmund, and Prasad Kawthekar. 2017. Transfer Learning for Improving Model Predictions in Highly Configurable Software. In 2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. https://doi.org/10.1109/SEAMS.2017.11
[38]
Lizy Kurian John. 2020. Connectivity! Connectivity! Connectivity! May You Be More Connected Than Ever!!. IEEE Micro, 40, 1 (2020), https://doi.org/10.1109/MM.2019.2961722
[39]
Herbert Jordan. 2012. A multi-objective auto-tuning framework for parallel codes. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. https://doi.org/10.1109/SC.2012.7
[40]
Shoaib A Kamil. 2013. Productive high performance parallel programming with auto-tuned domain-specific embedded languages. CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE.
[41]
Ian Karlin, Jeff Keasler, and JR Neely. 2013. Lulesh 2.0 Updates and Changes. Lawrence Livermore National Lab (LLNL)).
[42]
Milind Kulkarni. 2020. Compiler and Runtime Approaches to Enable Large-Scale Irregular Programs. Purdue University).
[43]
Adam J Kunen, Teresa S Bailey, and Peter N Brown. 2015. KRIPKE: A Massively Parallel Transport Mini-App. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).
[44]
Jiajia Li, Guangming Tan, Mingyu Chen, and Ninghui Sun. 2013. SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication. In ACM SIGPLAN conference on Programming language design and implementation. 117–126. https://doi.org/10.1145/2491956.2462181
[45]
Yang Liu, Wissam M Sid-Lakhdar, Osni Marques, Xinran Zhu, Chang Meng, James W Demmel, and Xiaoye S Li. 2021. GPTune: multitask learning for autotuning exascale applications. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 234–246. https://doi.org/10.1145/3437801.3441621
[46]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in neural information processing systems. 4765–4774.
[47]
Ashraf Mahgoub. 2017. Rafiki: A Middleware for Parameter Tuning of NoSQL Datastores for Dynamic Metagenomics Workloads. In 18th ACM/USENIX Middleware Conference. https://doi.org/10.1145/3135974.3135991
[48]
Aniruddha Marathe. 2017. Performance Modeling Under Resource Constraints Using Deep Transfer Learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–12. https://doi.org/10.1145/3126908.3126969
[49]
Harshitha Menon, Abhinav Bhatele, and Todd Gamblin. 2020. Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization. In IPDPS. 831–840. https://doi.org/10.1109/IPDPS47924.2020.00090
[50]
Dang Nguyen, Sunil Gupta, Santu Rana, Alistair Shilton, and Svetha Venkatesh. 2020. Bayesian Optimization for Categorical and Category-Specific Continuous Inputs. In AAAI. 5256–5263. https://doi.org/10.1609/aaai.v34i04.5971
[51]
William F Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Minimizing the Cost of Iterative Compilation with Active Learning. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). https://doi.org/10.1109/CGO.2017.7863744
[52]
Michael A Osborne, Roman Garnett, and Stephen J Roberts. 2009. Gaussian Processes for Global Optimization. In 3rd international conference on learning and intelligent optimization (LION3). 2009.
[53]
Yanghua Peng. 2019. A Generic Communication Scheduler for Distributed DNN Training Acceleration. In Proceedings of the ACM Symposium on Operating Systems Principles. https://doi.org/10.1145/3341301.3359642
[54]
Markus Püschel, José MF Moura, Bryan Singer, Jianxin Xiong, Jeremy Johnson, David Padua, Manuela Veloso, and Robert W Johnson. 2004. Spiral: A generator for platform-adapted libraries of signal processing alogorithms. The International Journal of High Performance Computing Applications, https://doi.org/10.1177/1094342004041291
[55]
Brandon Reagen. 2017. A case for efficient accelerator design space exploration via Bayesian optimization. In 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 1–6. https://doi.org/10.1109/ISLPED.2017.8009208
[56]
David F Richards, Omar Aziz, Jeanine Cook, Hal Finkel, Brian Homerding, Tanner Judeman, Peter McCorquodale, Tiffany Mintz, and Shirley Moore. 2018. Quantitative Performance Assessment of Proxy Apps and Parents. Lawrence Livermore National Lab.
[57]
Alvin E Roth. 1988. The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press. https://doi.org/10.1017/CBO9780511528446
[58]
Amit Roy, Prasanna Balaprakash, Paul D Hovland, and Stefan M Wild. 2016. Exploiting Performance Portability in Search Algorithms for Autotuning. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1535–1544. https://doi.org/10.1109/IPDPSW.2016.85
[59]
Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H Bailey, Costin Iancu, and David Hough. 2013. Precimonious: Tuning assistant for floating-point precision. In SC. https://doi.org/10.1145/2503210.2503296
[60]
Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 5 (2019), 206–215. https://doi.org/10.1038/s42256-019-0048-x
[61]
Vivek Sarkar. 2020. DDARING: Dynamic Data Aware Reconfiguration, Integration and Generation. GEORGIA TECH RESEARCH CORPORATION Atlanta United States.
[62]
Vivek Sarkar, William Harrod, and Allan E Snavely. 2009. Software challenges in extreme scale systems. In Journal of Physics: Conference Series. https://doi.org/10.1088/1742-6596/180/1/012045
[63]
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE, 104, 1 (2015), https://doi.org/10.1109/JPROC.2015.2494218
[64]
Wissam M Sid-Lakhdar, Mohsen Mahmoudi Aznaveh, Xiaoye S Li, and James W Demmel. 2019. Multitask and transfer learning for autotuning exascale applications. arXiv preprint arXiv:1908.05792.
[65]
Cristina Silvano. 2016. AutoTuning and Adaptivity AppRoach for Energy Efficient EXascale HPC Systems: The ANTAREX Approach. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). 708–713. https://doi.org/10.3850/9783981537079_1012
[66]
Cristina Silvano, Giovanni Agosta, Stefano Cherubin, Davide Gadioli, Gianluca Palermo, Andrea Bartolini, Luca Benini, Jan Martinovič, Martin Palkovič, and Kateřina Slaninová. 2016. The ANTAREX approach to autotuning and adaptivity for energy efficient HPC systems. In Proceedings of the ACM International Conference on Computing Frontiers. 288–293. https://doi.org/10.1145/2903150.2903470
[67]
Artur Souza, Luigi Nardi, Leonardo B Oliveira, Kunle Olukotun, Marius Lindauer, and Frank Hutter. 2020. Prior-guided Bayesian Optimization. arXiv preprint arXiv:2006.14608.
[68]
Vinu Sreenivasan, Rajath Javali, Mary Hall, Prasanna Balaprakash, Thomas RW Scogland, and Bronis R de Supinski. 2019. A framework for enabling OpenMP autotuning. In International Workshop on OpenMP. 50–60. https://doi.org/10.1007/978-3-030-28596-8_4
[69]
Cristian Tapus, I-Hsin Chung, and Jeffrey K Hollingsworth. 2002. Active Harmony: Towards Automated Performance Tuning. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (SC). https://doi.org/10.1109/SC.2002.10062
[70]
Jayaraman J Thiagarajan, Nikhil Jain, Rushil Anirudh, Alfredo Gimenez, Rahul Sridhar, Aniruddha Marathe, Tao Wang, Murali Emani, Abhinav Bhatele, and Todd Gamblin. 2018. Bootstrapping parameter space exploration for fast tuning. In Proceedings of the 2018 International Conference on Supercomputing. 385–395. https://doi.org/10.1145/3205289.3205321
[71]
Ananta Tiwari. 2011. Auto-tuning full applications: A case study. The International Journal of High Performance Computing Applications, 25, 3 (2011), 286–294. https://doi.org/10.1177/1094342011414744
[72]
Ananta Tiwari. 2011. Online Adaptive Code Generation and Tuning. In 2011 IEEE International Parallel & Distributed Processing Symposium. 879–892. https://doi.org/10.1109/IPDPS.2011.86
[73]
Ananta Tiwari, Chun Chen, Jacqueline Chame, Mary Hall, and Jeffrey K Hollingsworth. 2009. A Scalable Auto-Tuning Framework for Compiler Optimization. In IEEE International Symposium on Parallel & Distributed Processing. https://doi.org/10.1109/IPDPS.2009.5161054
[74]
Richard Vuduc, James W Demmel, and Katherine A Yelick. 2005. OSKI: A library of automatically tuned sparse matrix kernels. In Journal of Physics: Conference Series. https://doi.org/10.1088/1742-6596/16/1/071
[75]
Tao Wang, Nikhil Jain, David Beckingsale, David Boehme, Frank Mueller, and Todd Gamblin. 2019. FuncyTuner: Auto-tuning Scientific Applications With Per-loop Compilation. In Proceedings of the 48th International Conference on Parallel Processing. 1–10. https://doi.org/10.1145/3337821.3337842
[76]
R Clinton Whaley and Jack J Dongarra. 1998. Automatically tuned linear algebra software. In SC’98: ACM/IEEE conference on Supercomputing. 38–38. https://doi.org/10.1109/SC.1998.10004
[77]
Chaofeng Wu. 2019. Paraopt: Automated application parameterization and optimization for the cloud. In 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). 255–262. https://doi.org/10.1109/CloudCom.2019.00045
[78]
Xingfu Wu. 2020. Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization. In PMBS. https://doi.org/10.1109/PMBS51919.2020.00012
[79]
Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. ACM SIGPLAN Notices, 51, 4 (2016), 545–559. https://doi.org/10.1145/2954679.2872375
[80]
Peng Zhang, Jianbin Fang, Tao Tang, Canqun Yang, and Zheng Wang. 2018. Auto-Tuning Streamed Applications on Intel Xeon Phi. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). https://doi.org/10.1109/IPDPS.2018.00061

Cited By

View all
  • (2024)Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332563035:1(20-33)Online publication date: 1-Jan-2024
  • (2024)An Exploration of Global Optimization Strategies for Autotuning OpenMP-based Codes2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00138(741-750)Online publication date: 27-May-2024
  • (2023)BaCO: A Fast and Portable Bayesian Compiler Optimization FrameworkProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624770(19-42)Online publication date: 25-Mar-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation
June 2021
1341 pages
ISBN:9781450383912
DOI:10.1145/3453483
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Auto-tuning HPC applications
  2. Parameter tuning

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)607
  • Downloads (Last 6 weeks)76
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332563035:1(20-33)Online publication date: 1-Jan-2024
  • (2024)An Exploration of Global Optimization Strategies for Autotuning OpenMP-based Codes2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00138(741-750)Online publication date: 27-May-2024
  • (2023)BaCO: A Fast and Portable Bayesian Compiler Optimization FrameworkProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624770(19-42)Online publication date: 25-Mar-2023
  • (2023)CoTuner: A Hierarchical Learning Framework for Coordinately Optimizing Resource Partitioning and Parameter TuningProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605578(317-326)Online publication date: 7-Aug-2023
  • (2023)Performance Optimization using Multimodal Modeling and Heterogeneous GNNProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3592984(45-57)Online publication date: 7-Aug-2023
  • (2023)Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and ReplayProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607098(1-14)Online publication date: 12-Nov-2023
  • (2023)Transfer-learning-based Autotuning using Gaussian CopulaProceedings of the 37th ACM International Conference on Supercomputing10.1145/3577193.3593712(37-49)Online publication date: 21-Jun-2023
  • (2023)Power Constrained Autotuning using Graph Neural Networks2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00060(535-545)Online publication date: May-2023
  • (2023)An Analysis of Energy Requirement for Computer Vision Algorithms2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363596(1-7)Online publication date: 25-Sep-2023
  • (2023)Formal Techniques for Development and Auto-tuning of Parallel ProgramsSN Computer Science10.1007/s42979-022-01559-24:2Online publication date: 13-Jan-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media