Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3337821.3337842acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

FuncyTuner: Auto-tuning Scientific Applications With Per-loop Compilation

Published: 05 August 2019 Publication History

Abstract

The de facto compilation model for production software compiles all modules of a target program with a single set of compilation flags, typically 02 or 03. Such a per-program compilation strategy may yield sub-optimal executables since programs often have multiple hot loops with diverse code structures and may be better optimized with a per-region compilation model that assembles an optimized executable by combining the best per-region code variants.
In this paper, we demonstrate that a naïve greedy approach to per-region compilation often degrades performance in comparison to the 03 baseline. To overcome this problem, we contribute a novel per-loop compilation framework, FuncyTuner, which employs lightweight profiling to collect per-loop timing information, and then utilizes a space-focusing technique to construct a performant executable. Experimental results show that FuncyTuner can reliably improve performance of modern scientific applications on several multi-core architectures by 9.2% to 12.3% and 4.5% to 10.7%(geometric mean, up to 22% on certain program) in comparison to the 03 baseline and prior work, respectively.

References

[1]
F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'Boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. 2006. Using machine learning to focus iterative optimization. In International Symposium on Code Generation and Optimization (CGO'06). 11 pp.-.
[2]
Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT '14). ACM, New York, NY, USA, 303--316.
[3]
Amir H. Ashouri, Andrea Bignoli, Gianluca Palermo, Cristina Silvano, Sameer Kulkarni, and John Cavazos. 2017. MiCOMP: Mitigating the Compiler Phase-Ordering Problem Using Optimization Sub-Sequences and Machine Learning. ACM Trans. Archit. Code Optim. 14, 3, Article 29 (Sept. 2017), 28 pages.
[4]
Amir Hossein Ashouri, Giovanni Mariani, Gianluca Palermo, Eunjung Park, John Cavazos, and Cristina Silvano. 2016. COBAYN: Compiler Autotuning Framework Using Bayesian Networks. ACM Trans. Archit. Code Optim. 13, 2, Article 21 (June 2016), 25 pages.
[5]
D. Boehme, T. Gamblin, D. Beckingsale, P. T. Bremer, A. Gimenez, M. LeGendre, O. Pearce, and M. Schulz. 2016. Caliper: Performance Introspection for HPC Software Stacks. In SC16: International Conference for High Performance Computing, Networking, Storage and Analysis. 550--560.
[6]
John Cavazos, Grigori Fursin, Felix Agakov, Edwin Bonilla, Michael F. P. O'Boyle, and Olivier Temam. 2007. Rapidly Selecting Good Compiler Optimizations Using Performance Counters. In Proceedings of the International Symposium on Code Generation and Optimization (CGO '07). IEEE Computer Society, Washington, DC, USA, 185--197.
[7]
Yang Chen, Yuanjie Huang, Lieven Eeckhout, Grigori Fursin, Liang Peng, Olivier Temam, and Chengyong Wu. 2010. Evaluating Iterative Optimization Across 1000 Datasets. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '10). ACM, New York, NY, USA, 448--459.
[8]
Grigori Fursin. 2018. Shared programs, benchmarks and kernels for autotuning/crowd-tuning. https://github.com/ctuning/ctuning-programs. (May 2018).
[9]
Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon, Zbigniew Chamski, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric Courtois, Francois Bodin, Phil Barnard, Elton Ashton, Edwin Bonilla, John Thomson, Christopher K. I. Williams, and Michael O'Boyle. 2011. Milepost GCC: Machine Learning Enabled Self-tuning Compiler. International Journal of Parallel Programming 39, 3 (01 Jun 2011), 296--327.
[10]
Grigori Fursin, Abdul Wahid Memon, Christophe Guillon, and Anton Lokhmotov. 2015. Collective Mind, Part II: Towards Performance- and Cost-Aware Software Engineering as a Natural Science. CoRR abs/1506.06256 (2015). arXiv:1506.06256 http://arxiv.org/abs/1506.06256
[11]
Mary Hall, Jacqueline Chame, Chun Chen, Jaewook Shin, Gabe Rudy, and Malik Murtaza Khan. 2010. Loop Transformation Recipes for Code Generation and Auto-Tuning. In Languages and Compilers for Parallel Computing, Guang R. Gao, Lori L. Pollock, John Cavazos, and Xiaoming Li (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 50--64.
[12]
Kenneth Hoste and Lieven Eeckhout. 2007. Microarchitecture-Independent Workload Characterization. IEEE Micro 27, 3 (May 2007), 63--72.
[13]
M. R. Jantz and P. A. Kulkarni. 2013. Exploiting phase inter-dependencies for faster iterative compiler optimization phase order searches. In 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). 1--10.
[14]
T. Kisuki, P.M.W. Knijnenburg, M.F.P. O'Boyle, and H. A. G. Wijshoff. 2000. Iterative Compilation in Program optimization. (2000).
[15]
Prasad Kulkarni, Stephen Hines, Jason Hiser, David Whalley, Jack Davidson, and Douglas Jones. 2004. Fast Searches for Effective Optimization Phase Sequences. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI '04). ACM, New York, NY, USA, 171--182.
[16]
Sameer Kulkarni and John Cavazos. 2012. Mitigating the Compiler Optimization Phase-ordering Problem Using Machine Learning. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '12). ACM, New York, NY, USA, 147--162.
[17]
Lawrence Livermore National Lab. 2018. LLNL Codesign. https://codesign.llnl.gov. (2018).
[18]
Jiajia Li, Guangming Tan, Mingyu Chen, and Ninghui Sun. 2013. SMAT: An Input Adaptive Auto-tuner for Sparse Matrix-vector Multiplication. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13). ACM, New York, NY, USA, 117--126.
[19]
San-Chih Lin, Chi-Kuang Chang, and Nai-Wei Lin. 2008. Automatic selection of GCC optimization options using a gene weighted genetic algorithm. In 2008 13th Asia-Pacific Computer Systems Architecture Conference. 1--8.
[20]
Ricardo Nobre, Luiz G. A. Martins, and João M. P. Cardoso. 2016. A Graph-based Iterative Compiler Pass Selection and Phase Ordering Approach. In Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems (LCTES 2016). ACM, New York, NY, USA, 21--30.
[21]
Zhelong Pan and Rudolf Eigenmann. 2008. PEAK: a Fast and Effective Performance Tuning System via Compiler Optimization Orchestration. ACM Trans. Program. Lang. Syst. 30, 3, Article 17 (May 2008), 43 pages.
[22]
Mihail Popov, Chadi Akel, William Jalby, and Pablo de Oliveira Castro. 2016. Piecewise Holistic Autotuning of Compiler and Runtime Parameters. In Euro-Par 2016 Parallel Processing- 22nd International Conference (Lecture Notes in Computer Science), Christos Kaklamanis, Theodore S. Papatheodorou, and Paul G. Spirakis (Eds.), Vol. 9833. 238--250.
[23]
Mohammed Sourouri. 2018. Optewe. https://github.com/mohamso/optewe. (April 2018).
[24]
UK-MAC. 2018. Cloverleaf. http://uk-mac.github.io/CloverLeaf/. (April 2018).
[25]
Qing Yi. 2012. POET: A Scripting Language for Applying Parameterized Source-to-source Program Transformations. Softw. Pract. Exper. 42, 6 (June 2012), 675--706.

Cited By

View all
  • (2023)CoTuner: A Hierarchical Learning Framework for Coordinately Optimizing Resource Partitioning and Parameter TuningProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605578(317-326)Online publication date: 7-Aug-2023
  • (2020)A Collaborative Filtering Approach for the Automatic Tuning of Compiler OptimisationsThe 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3372799.3394361(15-25)Online publication date: 16-Jun-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '19: Proceedings of the 48th International Conference on Parallel Processing
August 2019
1107 pages
ISBN:9781450362955
DOI:10.1145/3337821
© 2019 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

  • University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HPC
  2. ICC
  3. OpenMP
  4. auto-tuning
  5. compiler
  6. fine-grained
  7. optimization
  8. per-loop
  9. profile
  10. scientific simulation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2019

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)5
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)CoTuner: A Hierarchical Learning Framework for Coordinately Optimizing Resource Partitioning and Parameter TuningProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605578(317-326)Online publication date: 7-Aug-2023
  • (2020)A Collaborative Filtering Approach for the Automatic Tuning of Compiler OptimisationsThe 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3372799.3394361(15-25)Online publication date: 16-Jun-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media