Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3168811acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Dominance-based duplication simulation (DBDS): code duplication to enable compiler optimizations

Published: 24 February 2018 Publication History

Abstract

Compilers perform a variety of advanced optimizations to improve the quality of the generated machine code. However, optimizations that depend on the data flow of a program are often limited by control-flow merges. Code duplication can solve this problem by hoisting, i.e. duplicating, instructions from merge blocks to their predecessors. However, finding optimization opportunities enabled by duplication is a non-trivial task that requires compile-time intensive analysis. This imposes a challenge on modern (just-in-time) compilers: Duplicating instructions tentatively at every control flow merge is not feasible because excessive duplication leads to uncontrolled code growth and compile time increases. Therefore, compilers need to find out whether a duplication is beneficial enough to be performed.
This paper proposes a novel approach to determine which duplication operations should be performed to increase performance. The approach is based on a duplication simulation that enables a compiler to evaluate different success metrics per potential duplication. Using this information, the compiler can then select the most promising candidates for optimization. We show how to map duplication candidates into an optimization cost model that allows us to trade-off between different success metrics including peak performance, code size and compile time.
We implemented the approach on top of the GraalVM and evaluated it with the benchmarks Java DaCapo, Scala DaCapo, JavaScript Octane and a micro-benchmark suite, in terms of performance, compilation time and code size increase.
We show that our optimization can reach peak performance improvements of up to 40% with a mean peak performance increase of 5.89%, while it generates a mean code size increase of 9.93% and mean compile time increase of 18.44%.

References

[1]
David F. Bacon, Susan L. Graham, and Oliver J. Sharp. 1994. Compiler Transformations for High-performance Computing. ACM Comput. Surv. 26, 4 (Dec. 1994), 345–420.
[2]
J. Eugene Ball. 1979. Predicting the Effects of Optimization on a Procedure Body. SIGPLAN Not. 14, 8 (Aug. 1979), 214–220.
[3]
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications. ACM Press, 169– 190.
[4]
Rastislav Bodík, Rajiv Gupta, and Mary Lou Soffa. 1998. Complete Removal of Redundant Expressions. In Proceedings of the ACM SIG-PLAN Conference on Programming Language Design and Implementation. ACM, 1–14.
[5]
Preston Briggs, Keith D. Cooper, and L. Taylor Simpson. 1997. Value numbering. Software-Practice and Experience 27, 6 (1997), 701–724.
[6]
Stefano Cazzulani. 2012. Octane: The JavaScript benchmark suite for the modern web. Retrieved December 21 (2012), 2015.
[7]
Craig David Chambers. 1992. The Design and Implementation of the Self Compiler, an Optimizing Compiler for Object-oriented Programming Languages. Ph.D. Dissertation. Stanford, CA, USA. UMI Order No. GAX92-21602.
[8]
Pohua P. Chang, Scott A. Mahlke, and Wen-mei W. Hwu. 1991. Using Profile Information to Assist Classic Code Optimizations. Softw. Pract. Exper. 21, 12 (Dec. 1991), 1301–1321.
[9]
Cliff Click and Keith D. Cooper. 1995. Combining Analyses, Combining Optimizations. ACM Trans. Program. Lang. Syst. 17, 2 (March 1995), 16.
[10]
Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. 1991. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Trans. Program. Lang. Syst. 13, 4 (Oct. 1991), 40.
[11]
Gilles Duboscq, Lukas Stadler, Thomas Würthinger, Doug Simon, Christian Wimmer, and Hanspeter Mössenböck. 2013. Graal IR: An Extensible Declarative Intermediate Representation. In Proceedings of the Asia-Pacific Programming Languages and Compilers Workshop.
[12]
Gilles Duboscq, Thomas Würthinger, Lukas Stadler, Christian Wimmer, Doug Simon, and Hanspeter Mössenböck. 2013. An Intermediate Representation for Speculative Optimizations in a Dynamic Compiler. In Proceedings of the ACM Workshop on Virtual Machines and Intermediate Languages.
[13]
Joseph A. Fisher. 1995. Instruction-level Parallel Processors. IEEE Computer Society Press, Chapter Trace Scheduling: A Technique for Global Microcode Compaction, 186–198. http://dl.acm.org/citation. cfm?id=201749.201766
[14]
Michael Frigge, David C. Hoaglin, and Boris Iglewicz. 1989. Some Implementations of the Boxplot. The American Statistician 43, 1 (1989), 50–54. http://www.jstor.org/stable/2685173
[15]
Yoshihiko Futamura. 1999. Partial Evaluation of Computation Process– An Approach to a Compiler-Compiler. Higher-Order and Symbolic Computation 12, 4 (01 Dec 1999), 381–391.
[16]
Google. 2012. V8 JavaScript Engine. (2012). http://code.google.com/p/ v8/
[17]
Wen-Mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Quellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lavery. 1995. Instruction-level Parallel Processors. IEEE Computer Society Press, Chapter The Superblock: An Effective Technique for VLIW and Superscalar Compilation, 234–253. http://dl.acm.org/citation.cfm?id=201749.201774
[18]
Kazuaki Ishizaki, Motohiro Kawahito, Toshiaki Yasue, Hideaki Komatsu, and Toshio Nakatani. 2000. A Study of Devirtualization Techniques for a Java Just-In-Time Compiler. In Proceedings of the 15th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA ’00). ACM, 294–310.
[19]
David Leopoldseder, Lukas Stadler, Christian Wimmer, and Hanspeter Mössenböck. 2015. Java-to-JavaScript Translation via Structured Control Flow Reconstruction of Compiler IR. In Proceedings of the 11th Symposium on Dynamic Languages (DLS 2015). ACM, New York, NY, USA, 91–103.
[20]
Yau-Tsun Steven Li, Sharad Malik, and Andrew Wolfe. 1999. Performance Estimation of Embedded Software with Instruction Cache Modeling. ACM Trans. Des. Autom. Electron. Syst. 4, 3 (1999), 257–279.
[21]
Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. 2015. The Java Virtual Machine Specification, Java SE 8 Edition. http://docs. oracle.com/javase/specs/jvms/se8/jvms8.pdf
[22]
Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. 1995. Instruction-level Parallel Processors. IEEE Computer Society Press, Chapter Effective Compiler Support for Predicated Execution Using the Hyperblock, 161–170. http://dl.acm. org/citation.cfm?id=201749.201763
[23]
Frank Mueller and David B. Whalley. 1992. Avoiding Unconditional Jumps by Code Replication. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 322–330.
[24]
Frank Mueller and David B. Whalley. 1995. Avoiding Conditional Branches by Code Replication. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 56–66.
[25]
OpenJDK 2013. Graal Project. (2013). http://openjdk.java.net/projects/ graal
[26]
OpenJDK 2017. HotSpot Virtual Machine. (2017). http://openjdk.java. net/groups/hotspot/
[27]
Michael Paleczny, Christopher Vick, and Cliff Click. 2001. The Java HotSpot™ Server Compiler. In Proceedings of the Java Virtual Machine Research and Technology Symposium. USENIX, 1–12.
[28]
Aleksandar Prokopec, David Leopoldseder, Gilles Duboscq, and Thomas Würthinger. 2017. Making Collection Operations Optimal with Aggressive JIT Compilation. In Proceedings of the 8th ACM SIG-PLAN International Symposium on Scala (SCALA 2017). ACM, New York, NY, USA, 29–40.
[29]
Andreas Sewe, Mira Mezini, Aibek Sarimbekov, and Walter Binder. 2011. Da Capo con Scala: design and analysis of a scala benchmark suite for the java virtual machine. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications. ACM Press, 657–676.
[30]
Lukas Stadler, Gilles Duboscq, Hanspeter Mössenböck, Thomas Würthinger, and Doug Simon. 2013. An Experimental Study of the Influence of Dynamic Compiler Optimizations on Scala Performance. In Proceedings of the 4th Workshop on Scala (SCALA ’13). ACM, New York, NY, USA, Article 9, 8 pages.
[31]
Lukas Stadler, Thomas Würthinger, and Hanspeter Mössenböck. 2014. Partial Escape Analysis and Scalar Replacement for Java. In Proceedings of the International Symposium on Code Generation and Optimization. ACM Press, 165–174.
[32]
April W. Wade, Prasad A. Kulkarni, and Michael R. Jantz. 2017. AOT vs. JIT: Impact of Profile Data on Code Quality. In Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2017). ACM, New York, NY, USA, 1–10.
[33]
David W. Wall. 1991. Limits of Instruction-level Parallelism. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV). ACM, New York, NY, USA, 176–188.
[34]
Thomas Würthinger, Christian Wimmer, Christian Humer, Andreas Wöß, Lukas Stadler, Chris Seaton, Gilles Duboscq, Doug Simon, and Matthias Grimmer. 2017. Practical Partial Evaluation for Highperformance Dynamic Language Runtimes. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 662–676.
[35]
Thomas Würthinger, Andreas Wöß, Lukas Stadler, Gilles Duboscq, Doug Simon, and Christian Wimmer. 2012. Self-optimizing AST interpreters. In Proceedings of the 8th symposium on Dynamic languages (DLS ’12). ACM Press, 73–82.

Cited By

View all
  • (2024)Enhancing Performance Through Control-Flow Unmerging and Loop Unrolling on GPUs2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444819(106-118)Online publication date: 2-Mar-2024
  • (2024)GraalSP: Polyglot, efficient, and robust machine learning-based static profilerJournal of Systems and Software10.1016/j.jss.2024.112058213(112058)Online publication date: Jul-2024
  • (2023)TASTyTruffle: Just-in-Time Specialization of Parametric PolymorphismProceedings of the ACM on Programming Languages10.1145/36228537:OOPSLA2(1561-1588)Online publication date: 16-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO 2018: Proceedings of the 2018 International Symposium on Code Generation and Optimization
February 2018
377 pages
ISBN:9781450356176
DOI:10.1145/3179541
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Code Duplication
  2. Compiler Optimizations
  3. Just-In-Time Compilation
  4. Tail Duplication
  5. Virtual Machines

Qualifiers

  • Research-article

Conference

CGO '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)5
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing Performance Through Control-Flow Unmerging and Loop Unrolling on GPUs2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444819(106-118)Online publication date: 2-Mar-2024
  • (2024)GraalSP: Polyglot, efficient, and robust machine learning-based static profilerJournal of Systems and Software10.1016/j.jss.2024.112058213(112058)Online publication date: Jul-2024
  • (2023)TASTyTruffle: Just-in-Time Specialization of Parametric PolymorphismProceedings of the ACM on Programming Languages10.1145/36228537:OOPSLA2(1561-1588)Online publication date: 16-Oct-2023
  • (2023)Diagnosing Compiler Performance by Comparing Optimization DecisionsProceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes10.1145/3617651.3622994(47-61)Online publication date: 19-Oct-2023
  • (2023)Exploiting Partially Context-sensitive Profiles to Improve Performance of Hot CodeACM Transactions on Programming Languages and Systems10.1145/361293745:4(1-64)Online publication date: 13-Sep-2023
  • (2023)Optimization-Aware Compiler-Level Event ProfilingACM Transactions on Programming Languages and Systems10.1145/3591473Online publication date: 10-Apr-2023
  • (2023)A Hotspot-Driven Semi-automated Competitive Analysis Framework for Identifying Compiler Key OptimizationsProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580255(216-227)Online publication date: 17-Feb-2023
  • (2022)Automatically Transforming Arrays to Columnar Storage at Run Time✱Proceedings of the 19th International Conference on Managed Programming Languages and Runtimes10.1145/3546918.3560805(141-143)Online publication date: 14-Sep-2022
  • (2022)Machine-Learning-Based Self-Optimizing Compiler Heuristics✱Proceedings of the 19th International Conference on Managed Programming Languages and Runtimes10.1145/3546918.3546921(98-111)Online publication date: 14-Sep-2022
  • (2022)Automatic Array Transformation to Columnar Storage at Run TimeProceedings of the 19th International Conference on Managed Programming Languages and Runtimes10.1145/3546918.3546919(16-28)Online publication date: 14-Sep-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media