Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3302516.3307358acmotherconferencesArticle/Chapter ViewAbstractPublication PagesccConference Proceedingsconference-collections
research-article
Public Access

Codestitcher: inter-procedural basic block layout optimization

Published: 16 February 2019 Publication History

Abstract

Modern software executes a large amount of code. Previous techniques of code layout optimization were developed one or two decades ago and have become inadequate to cope with the scale and complexity of new types of applications such as compilers, browsers, interpreters, language VMs and shared libraries.
This paper presents Codestitcher, an inter-procedural basic block code layout optimizer which reorders basic blocks in an executable to benefit from better cache and TLB performance. Codestitcher provides a hierarchical framework which can be used to improve locality in various layers of the memory hierarchy. Our evaluation shows that Codestitcher improves the performance of the original program (already optimized with O3 and link time optimizations) by 3% to 25% (on average, by 10%) on 5 widely used applications with large code sizes: MySQL, Clang, Firefox, PHP server, and Python. It gives an additional improvement of 4% over LLVM's PGO and 3% over PGO combined with the best function reordering technique. For profiling, Codestitcher does not need instrumentation. Instead it uses branch history samples which are collected during the execution of the original program. Codestitcher's profiling and trace processing together incur an average overhead of 22.5%, compared to an average overhead of 90% from LLVM's PGO.

References

[1]
Hagit Attiya and Gili Yavneh. 2017. Remote Memory References at Block Granularity. In 21st International Conference on Principles of Distributed Systems, OPODIS 2017, Lisbon, Portugal, December 18-20, 2017. 18:1ś18:17.
[2]
Thomas Ball and James R. Larus. 1996. Efficient Path Profiling. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 29, Paris, France, December 2-4, 1996. 46ś57.
[3]
Krishnendu Chatterjee, Amir Kafshdar Goharshady, Nastaran Okati, and Andreas Pavlogiannis. 2019. Efficient Parameterized Algorithms for Data Packing. Proceedings of the ACM on Programming Languages 3, POPL, Article 53 (Jan. 2019), 53:1ś53:28 pages.
[4]
Robert S. Cohn, David W. Goodwin, and P. Geoffrey Lowney. 1998. Optimizing Alpha Executables on Windows NT with Spike. Digital Technical Journal 9, 4 (1998), 3ś20.
[5]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, 3rd Edition. MIT Press.
[6]
Drupal. 2000. Drupal. https://www.drupal.org/. Online; accessed 10 April 2018.
[7]
Nicholas C. Gloy and Michael D. Smith. 1999. Procedure placement using temporal-ordering information. ACM Transactions on Programming Languages and Systems 21, 5 (1999), 977ś1027.
[8]
Saurabh Gupta, Ping Xiang, Yi Yang, and Huiyang Zhou. 2013. Locality principle revisited: A probability-based quantitative approach. J. Parallel Distrib. Comput. 73, 7 (2013), 1011ś1027.
[9]
HHVM 2011. HHVM: A virtual machine designed for executing programs written in Hack and PHP. https://github.com/facebook/hhvm. Online; accessed 19 November 2016.
[10]
Intel Corporation 2016. Intel ® 64 and IA-32 Architectures Optimization Reference Manual. Intel Corporation. Available at http://www.intel.com/content/dam/www/public/us/en/documents/manuals/ 64-ia-32-architectures-optimization-manual.pdf.
[11]
Harold W. Kuhn. 2010. The Hungarian Method for the Assignment Problem. In 50 Years of Integer Programming 1958-2008 - From the Early Years to the State-ofthe-Art. 29ś47.
[12]
James R. Larus. 1999. Whole Program Paths. In Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Atlanta, Georgia, USA, May 1-4, 1999. 259ś269.
[13]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In Proceedings of the International Symposium on Code Generation and Optimization, CGO. San Jose, CA, USA, 75ś88.
[14]
Rahman Lavaee. 2016. The hardness of data packing. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. St. Petersburg, FL, USA, 232ś242.
[15]
Moshe Lewenstein and Maxim Sviridenko. 2003. A 5/8 Approximation Algorithm for the Maximum Asymmetric TSP. SIAM J. Discrete Math. 17, 2 (2003), 237ś248.
[16]
Yun Liang and Tulika Mitra. 2010. Improved procedure placement for set associative caches. In Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems. Scottsdale, AZ, USA, 147ś156.
[17]
Michael Novotny. 2013. WP Test. https://github.com/poststatus/wptest. Online; accessed 19 November 2017; acquired by Post Status Labs Project in 2016;.
[18]
Mozilla Foundation. 2007. Talos. https://wiki.mozilla.org/Buildbot/Talos. Online; accessed 19 November 2017.
[19]
Frank Mueller and David B. Whalley. 1995. Avoiding Conditional Branches by Code Replication. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 56ś66.
[20]
Guilherme Ottoni and Bertrand Maher. 2017. Optimizing function placement for large-scale data-center applications. In Proceedings of the 2017 International Symposium on Code Generation and Optimization, CGO 2017, Austin, TX, USA, February 4-8, 2017. 233ś244.
[21]
Erez Petrank and Dror Rawitz. 2002. The hardness of cache conscious data placement. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. Portland, OR, USA, 101ś112.
[22]
Karl Pettis and Robert C. Hansen. 1990. Profile Guided Code Positioning. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. White Plains, NY, USA, 16ś27.
[23]
Alex Ramírez, Luiz André Barroso, Kourosh Gharachorloo, Robert S. Cohn, Josep-Lluís Larriba-Pey, P. Geoffrey Lowney, and Mateo Valero. 2001. Code layout optimizations for transaction processing workloads. In Proceedings of the International Symposium on Computer Architecture. Göteborg, Sweden, 155ś164.
[24]
Unladen 2009. Unladen Swallow Benchmarks. https://code.google.com/p/ unladen-swallow/wiki/Benchmarks. Online; accessed 13 September 2016.
[25]
WordPress Foundation. 2003. Wordpress. https://wordpress.org/. Online; accessed 10 April 2018.

Cited By

View all
  • (2024)Reordering Functions in Mobiles Apps for Reduced Size and Faster Start-UpACM Transactions on Embedded Computing Systems10.1145/366063523:4(1-54)Online publication date: 20-Apr-2024
  • (2024)Stale Profile MatchingProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641573(162-173)Online publication date: 17-Feb-2024
  • (2023)Optimizing Function Layout for Mobile ApplicationsProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596277(52-63)Online publication date: 13-Jun-2023
  • Show More Cited By

Index Terms

  1. Codestitcher: inter-procedural basic block layout optimization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CC 2019: Proceedings of the 28th International Conference on Compiler Construction
    February 2019
    204 pages
    ISBN:9781450362771
    DOI:10.1145/3302516
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 February 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Code layout
    2. Compilers
    3. Inter-procedural layout
    4. Locality
    5. Optimization
    6. Profiling

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CC '19

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)215
    • Downloads (Last 6 weeks)40
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Reordering Functions in Mobiles Apps for Reduced Size and Faster Start-UpACM Transactions on Embedded Computing Systems10.1145/366063523:4(1-54)Online publication date: 20-Apr-2024
    • (2024)Stale Profile MatchingProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641573(162-173)Online publication date: 17-Feb-2024
    • (2023)Optimizing Function Layout for Mobile ApplicationsProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596277(52-63)Online publication date: 13-Jun-2023
    • (2023)GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model InferenceProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582029(282-301)Online publication date: 25-Mar-2023
    • (2023)HyBF: A Hybrid Branch Fusion Strategy for Code Size ReductionProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580267(156-167)Online publication date: 17-Feb-2023
    • (2023)Propeller: A Profile Guided, Relinking Optimizer for Warehouse-Scale ApplicationsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575727(617-631)Online publication date: 27-Jan-2023
    • (2023)An Offline Profile-Guided Optimization Strategy for Function Reordering on Relational Databases2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC53992.2023.10394026(967-972)Online publication date: 1-Oct-2023
    • (2023)Maximizing the Security Level of Real-Time Software While Preserving Temporal ConstraintsIEEE Access10.1109/ACCESS.2023.326467111(35591-35607)Online publication date: 2023
    • (2022)One Profile Fits AllACM SIGOPS Operating Systems Review10.1145/3544497.354450256:1(26-33)Online publication date: 14-Jun-2022
    • (2022)CRISP: critical slice prefetchingProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507745(300-313)Online publication date: 28-Feb-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media