Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3640537.3641562acmconferencesArticle/Chapter ViewAbstractPublication PagesccConference Proceedingsconference-collections
research-article

If-Convert as Early as You Must

Published: 20 February 2024 Publication History

Abstract

Optimizing compilers employ a rich set of transformations that generate highly efficient code for a variety of source languages and target architectures. These transformations typically operate on general control flow constructs which trigger a range of optimization opportunities, such as moving code to less frequently executed paths, and more. Regular loop nests are specifically relevant for accelerating certain domains, leveraging architectural features including vector instructions, hardware-controlled loops and data flows, provided their internal control-flow is eliminated. Compilers typically apply predicating if-conversion late, in their backend, to remove control-flow undesired by the target. Until then, transformations triggered by control-flow constructs that are destined to be removed may end up doing more harm than good. We present an approach that leverages the existing powerful and general optimization flow of LLVM when compiling for targets without control-flow in loops. Rather than trying to teach various transformations how to avoid misoptimizing for such targets, we propose to introduce an aggressive if-conversion pass as early as possible, along with carefully addressing pass-ordering implications. This solution outperforms the traditional compilation flow with only a modest tuning effort, thereby offering a robust and promising compilation approach for branch-restricted targets.

References

[1]
John R. Allen, Ken Kennedy, Carrie Porterfield, and Joe Warren. 1983. Conversion of Control Dependence to Data Dependence. In Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL ’83). Association for Computing Machinery, New York, NY, USA. 177–189. isbn:0897910907 https://doi.org/10.1145/567067.567085
[2]
David I. August, Wen Mei W. Hwu, and Scott A. Mahlke. 1999. Partial reverse if-conversion framework for balancing control flow and predication. International Journal of Parallel Programming, 27, 5 (1999), 381–423. issn:0885-7458 https://doi.org/10.1023/A:1018787007582
[3]
David I. August, Wen-mei W. Hwu, and Scott A. Mahlke. 1997. A Framework for Balancing Control Flow and Predication. In Proceedings of 30th Annual International Symposium on Microarchitecture (Micro ’97). IEEE Computer Society, USA. 92–103. https://doi.org/10.1109/MICRO.1997.645801
[4]
Christopher Barton, Arie Tal, Bob Blainey, and José Nelson Amaral. 2005. Generalized Index-Set Splitting. In Compiler Construction, Rastislav Bodik (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg. 106–120. isbn:978-3-540-31985-6
[5]
Muthu Manikandan Baskaran, Uday Bondhugula, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, and P. Sadayappan. 2008. A Compiler Framework for Optimization of Affine Loop Nests for Gpgpus. In Proceedings of the 22nd Annual International Conference on Supercomputing (ICS ’08). Association for Computing Machinery, New York, NY, USA. 225–234. isbn:9781605581583 https://doi.org/10.1145/1375527.1375562
[6]
Yishen Chen, Charith Mendis, and Saman Amarasinghe. 2022. All You Need is Superword-Level Parallelism: Systematic Control-Flow Vectorization with SLP. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA. 301–315. isbn:9781450392655 https://doi.org/10.1145/3519939.3523701
[7]
Shuhan Ding and Soner Önder. 2010. Unrestricted Code Motion: A Program Representation and Transformation Algorithms Based on Future Values. In Compiler Construction, Rajiv Gupta (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg. 26–45. isbn:978-3-642-11970-5
[8]
Kemal Ebcioğlu. 1987. A Compilation Technique for Software Pipelining of Loops with Conditional Jumps. In Proceedings of the 20th Annual Workshop on Microprogramming (Micro 20). Association for Computing Machinery, New York, NY, USA. 69–79. isbn:0897912500 https://doi.org/10.1145/255305.255317
[9]
Alexandre E. Eichenberger, Kathryn O’Brien, Kevin O’Brien, Peng Wu, Tong Chen, Peter H. Oden, Daniel A. Prener, Janice C. Shepherd, Byoungro So, Zehra Sura, Amy Wang, Tao Zhang, Peng Zhao, and Michael Gschwind. 2005. Optimizing Compiler for the CELL Processor. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT ’05). IEEE Computer Society, USA. 161–172. isbn:076952429X https://doi.org/10.1109/PACT.2005.33
[10]
Alexander Jordan, Nikolai Kim, and Andreas Krall. 2013. IR-Level versus Machine-Level If-Conversion for Predicated Architectures. In Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems (ODES ’13). Association for Computing Machinery, New York, NY, USA. 3–10. isbn:9781450319058 https://doi.org/10.1145/2443608.2443611
[11]
Hyesoon Kim, Onur Mutlu, Jared Stark, and Yale Patt. 2006. Wish Branches: Enabling Adaptive and Aggressive Predicated Execution. IEEE Micro, 26 (2006), 48–58. https://api.semanticscholar.org/CorpusID:6838785
[12]
JinYing Kong, Lin Han, JinLong Xu, and Kai Nie. 2022. Research on control flow conversion technique based on Domestic Sunway compiler. In 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP). IEEE Computer Society, Xi’an, China. 1340–1344. https://doi.org/10.1109/ICSP54964.2022.9778356
[13]
Samuel Larsen and Saman Amarasinghe. 2000. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation (PLDI ’00). Association for Computing Machinery, New York, NY, USA. 145–156. isbn:1581131992 https://doi.org/10.1145/349299.349320
[14]
Tanya M. Lattner. 2005. An Implementation of Swing Modulo Scheduling with Extensions for Superblocks. Master’s thesis. Computer Science Dept., University of Illinois at Urbana-Champaign. Urbana, IL. See http://llvm.cs.uiuc.edu.
[15]
LLVM. 2023. Auto-Vectorization in LLVM. https://llvm.org/docs/Vectorizers.html
[16]
LLVM. 2023. Vectorization Plan. https://llvm.org/docs/VectorizationPlan.html
[17]
Dragan Milicev and Zoran Jovanovic. 2002. Control Flow Regeneration for Software Pipelined Loops with Conditions. International Journal of Parallel Programming, 30 (2002), 06, 149–179. https://doi.org/10.1023/A:1015453520790
[18]
Simon Moll. 2020. Vector Predication Roadmap. https://llvm.org/docs/Proposals/VectorPredication.html
[19]
Simon Moll and Sebastian Hack. 2018. Partial Control-Flow Linearization. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA. 543–556. isbn:9781450356985 https://doi.org/10.1145/3192366.3192413
[20]
Simon Moll, Shrey Sharma, Matthias Kurtenacker, and Sebastian Hack. 2019. Multi-Dimensional Vectorization in LLVM. In Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing (WPMVP’19). Association for Computing Machinery, New York, NY, USA. Article 3, 8 pages. isbn:9781450362917 https://doi.org/10.1145/3303117.3306172
[21]
Jaime H. Moreno, Victor V. Zyuban, Uzi Shvadron, Fredy D. Neeser, Jeff H. Derby, Malcolm S. Ware, Krishnan Kailas, Ayal Zaks, Amir B. Geva, Shay Ben-David, Sameh W. Asaad, Thomas W. Fox, Daniel Littrell, Marina Biberstein, Dorit Naishlos, and Hillery C. Hunter. 2003. An innovative low-power high-performance programmable signal processor for digital communications. IBM J. Res. Dev., 47, 2-3 (2003), 299–326. https://doi.org/10.1147/RD.472.0299
[22]
Todd C. Mowry, Monica S. Lam, and Anoop Gupta. 1992. Design and Evaluation of a Compiler Algorithm for Prefetching. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V). Association for Computing Machinery, New York, NY, USA. 62–73. isbn:0897915348 https://doi.org/10.1145/143365.143488
[23]
Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006. Auto-Vectorization of Interleaved Data for SIMD. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’06). Association for Computing Machinery, New York, NY, USA. 132–143. isbn:1595933204 https://doi.org/10.1145/1133981.1133997
[24]
Vasileios Porpodas and Pushkar Ratnalikar. 2021. PostSLP: Cross-Region Vectorization of Fully or Partially Vectorized Code. In Languages and Compilers for Parallel Computing, Santosh Pande and Vivek Sarkar (Eds.). Springer International Publishing, Cham. 15–31. isbn:978-3-030-72789-5
[25]
Rodrigo C. O. Rocha, Vasileios Porpodas, Pavlos Petoumenos, Luís F. W. Góes, Zheng Wang, Murray Cole, and Hugh Leather. 2020. Vectorization-Aware Loop Unrolling with Seed Forwarding. In Proceedings of the 29th International Conference on Compiler Construction (CC 2020). Association for Computing Machinery, New York, NY, USA. 1–13. isbn:9781450371209 https://doi.org/10.1145/3377555.3377890
[26]
Charitha Saumya, Kirshanthan Sundararajah, and Milind Kulkarni. 2022. DARM: Control-Flow Melding for SIMT Thread Divergence Reduction. In 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 1–13. https://doi.org/10.1109/CGO53902.2022.9741285
[27]
Fabian Schuiki, Florian Zaruba, Torsten Hoefler, and Luca Benini. 2021. Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores. IEEE Trans. Comput., 70, 2 (2021), feb, 212–227. issn:0018-9340 https://doi.org/10.1109/TC.2020.2987314
[28]
Jaewook Shin, Mary Hall, and Jacqueline Chame. 2005. Superword-Level Parallelism in the Presence of Control Flow. In Proceedings of the International Symposium on Code Generation and Optimization (CGO ’05). IEEE Computer Society, USA. 165–175. isbn:076952298X https://doi.org/10.1109/CGO.2005.33
[29]
James E. Smith. 1982. Decoupled Access/Execute Computer Architectures. In Proceedings of the 9th Annual Symposium on Computer Architecture (ISCA ’82). IEEE Computer Society Press, Washington, DC, USA. 112–119.
[30]
TI. 2023. C7000 C/C++ Optimization Guide. www.ti.com
[31]
Gang-Ryung Uh, Yuhong Wang, Sanjay Jinturkar, Chris Burns, and Vincent Cao. 2000. Techniques for Effectively Exploiting a Zero Overhead Loop Buffer. In Proceedings of the 9th International Conference on Compiler Construction. 157–172. isbn:978-3-540-67263-0 https://doi.org/10.1007/3-540-46423-9_11
[32]
Janek van Oirschot. 2022. Hardware Loops in the IPU Backend. https://llvm.org/devmtg/2022-05/slides/
[33]
Nicolas Vasilache, Cédric Bastoul, and Albert Cohen. 2006. Polyhedral Code Generation in the Real World. In Proceedings of the 15th International Conference on Compiler Construction (CC’06). Springer-Verlag, Berlin, Heidelberg. 185–201. isbn:354033050X https://doi.org/10.1007/11688839_16
[34]
Miao Wang, Rongcai Zhao, Jianmin Pang, and Guoming Cai. 2008. Reconstructing Control Flow in Modulo Scheduled Loops. In Seventh IEEE/ACIS International Conference on Computer and Information Science (ICIS 2008). IEEE, Portland, OR. 539–544. isbn:978-0-7695-3131-1 https://doi.org/10.1109/ICIS.2008.16
[35]
Zhengrong Wang and Tony Nowatzki. 2019. Stream-Based Memory Access Specialization for General Purpose Processors. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA ’19). Association for Computing Machinery, New York, NY, USA. 736–749. isbn:9781450366694 https://doi.org/10.1145/3307650.3322229
[36]
Nancy J. Warter, Scott A. Mahlke, Wen-Mei W. Hwu, and B. Ramakrishna Rau. 1993. Reverse If-Conversion. In Proceedings of the ACM SIGPLAN 1993 Conference on Programming Language Design and Implementation (PLDI ’93). Association for Computing Machinery, New York, NY, USA. 290–299. isbn:0897915984 https://doi.org/10.1145/155090.155118
[37]
Baofen Yuan, Jianfeng Zhu, Xingchen Man, Zijiao Ma, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2022. Dynamic-II Pipeline: Compiling Loops With Irregular Branches on Static-Scheduling CGRA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41, 9 (2022), 2929–2942. https://doi.org/10.1109/TCAD.2021.3121346
[38]
Han-saem Yun, Jihong Kim, and Soo-mook Moon. 2001. A First Step Towards Time Optimal Software Pipelining of Loops with Control Flows. In Proceedings of the 10th International Conference on Compiler Construction. Springer-Verlag, Berlin, Heidelberg, Genove, Italy. isbn:978-3-540-41861-0 https://doi.org/10.1007/3-540-45306-7_13
[39]
Han-Saem Yun, Jihong Kim, and Soo-Mook Moon. 2002. Optimal Software Pipelining of Loops with Control Flows. In Proceedings of the 16th International Conference on Supercomputing (ICS ’02). Association for Computing Machinery, New York, NY, USA. 117–128. isbn:1581134835 https://doi.org/10.1145/514191.514210
[40]
Eric Zimmerman. 2005. Profile-directed If-Conversion in Superscalar Microprocessors. Master’s thesis. Computer Science Dept., University of Illinois at Urbana-Champaign. https://llvm.org/pubs/2005-07-ZimmermanMSThesis.html

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CC 2024: Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction
February 2024
261 pages
ISBN:9798400705076
DOI:10.1145/3640537
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DAE
  2. DSP
  3. Decoupled Access Execute
  4. If-Conversion
  5. Phase-Ordering
  6. Predication
  7. Zero Overhead Loop

Qualifiers

  • Research-article

Conference

CC '24
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 247
    Total Downloads
  • Downloads (Last 12 months)247
  • Downloads (Last 6 weeks)19
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media