Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3313808.3313812acmconferencesArticle/Chapter ViewAbstractPublication PagesveeConference Proceedingsconference-collections
research-article

The janus triad: exploiting parallelism through dynamic binary modification

Published: 14 April 2019 Publication History
  • Get Citation Alerts
  • Abstract

    We present a unified approach for exploiting thread-level, data-level, and memory-level parallelism through a same-ISA dynamic binary modifier guided by static binary analysis. A static binary analyser first examines an executable and determines the operations required to extract parallelism at runtime, encoding them as a series of rewrite rules that a dynamic binary modifier uses to perform binary transformation. We demonstrate this framework by exploiting three different kinds of parallelism to perform automatic vectorisation, software prefetching, and automatic parallelisation together on legacy application binaries. Software prefetch insertion alone achieves an average speedup of 1.2x, comparing favourably with an automatic compiler pass. Automatic vectorisation brings speedups of 2.7x on the TSVC benchmarks, significantly beating a compiler approach for some workloads. Finally, combining prefetching, vectorisation, and parallelisation realises a speedup of 3.8x on a representative application loop.

    References

    [1]
    Sam Ainsworth and Timothy M. Jones. 2017. Software Prefetching for Indirect Memory Accesses. In CGO.
    [2]
    Kapil Anand, Matthew Smithson, Aparna Kotha, Khaled Elwazeer, and Rajeev Barua. 2010. Decompilation to compiler high IR in a binary rewriter. Technical Report. University of Maryland.
    [3]
    David H Bailey, Eric Barszcz, John T Barton, David S Browning, Robert L Carter, Leonardo Dagum, Rod A Fatoohi, Paul O Frederickson, Thomas A Lasinski, Rob S Schreiber, et al. 1991. The NAS parallel benchmarks summary and preliminary results. In SC.
    [4]
    Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. 2000. Dynamo: a transparent dynamic optimization system. In PLDI.
    [5]
    Jean Christophe Beyler and Philippe Clauss. 2007. Performance driven data cache prefetching in a dynamic software optimization system. In SC.
    [6]
    Derek Bruening, Timothy Garnett, and Saman Amarasinghe. 2003. An infrastructure for adaptive dynamic optimization. In CGO.
    [7]
    David Callahan, Jack Dongarra, and David Levine. 1988. Vectorizing compilers: A test suite and results. In SC.
    [8]
    Tobias J. K. Edler von Koch and Björn Franke. 2013. Limits of Regionbased Dynamic Binary Parallelization. In VEE.
    [9]
    Nabil Hallou, Erven Rohou, Philippe Clauss, and Alain Ketterlin. 2015. Dynamic re-vectorization of binary code. In SAMOS.
    [10]
    Jeffrey K Hollingsworth, Barton Paul Miller, and Jon Cargille. 1994. Dynamic program instrumentation for scalable performance tools. In Scalable High-Performance Computing Conference.
    [11]
    Ding-Yong Hong, Sheng-Yu Fu, Yu-Ping Liu, Jan-Jan Wu, and WeiChung Hsu. 2016. Exploiting longer SIMD lanes in dynamic binary translation. In ICPADS.
    [12]
    Jianhui Li, Qi Zhang, Shu Xu, and Bo Huang. 2006. Optimizing dynamic binary translation for SIMD instructions. In CGO.
    [13]
    Yu-Ping Liu, Ding-Yong Hong, Jan-Jan Wu, Sheng-Yu Fu, and WeiChung Hsu. 2017. Exploiting Asymmetric SIMD Register Configurations in ARM-to-x86 Dynamic Binary Translation. In PACT.
    [14]
    Sheldon Lobo. 1999. The Sun Studio Binary Code Optimizer. http://www.oracle.com/technetwork/server-storage/solaris/ binopt-136601.html .
    [15]
    Jiwei Lu, Howard Chen, Rao Fu, Wei-Chung Hsu, Bobbie Othmer, PenChung Yew, and Dong-Yuan Chen. 2003. The performance of runtime data cache prefetching in a dynamic optimization system. In MICRO.
    [16]
    Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI.
    [17]
    Chi-Keung Luk, Robert Muth, Harish Patil, Robert Cohn, and Geoff Lowney. 2004. Ispike: A post-link optimizer for the Intel® Itanium® architecture. In CGO.
    [18]
    Saeed Maleki, Yaoqing Gao, Maria J Garzar, Tommy Wong, David A Padua, et al. 2011. An evaluation of vectorizing compilers. In PACT.
    [19]
    Dorit Nuzman, Sergei Dyshel, Erven Rohou, Ira Rosen, Kevin Williams, David Yuste, Albert Cohen, and Ayal Zaks. 2011. Vapor SIMD: Autovectorize once, run everywhere. In CGO.
    [20]
    Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. BOLT: A Practical Binary Optimizer for Data Centers and Beyond. (2019).
    [21]
    Emmanuel Riou, Erven Rohou, Philippe Clauss, Nabil Hallou, and Alain Ketterlin. 2014. Padrone: a platform for online profiling, analysis, and optimization. In International Workshop on Dynamic Compilation Everywhere.
    [22]
    Amitabh Srivastava, Andrew Edwards, and Hoi Vo. 2001. Vulcan: Binary Transformation in a Distributed Environment. Technical Report MSR-TR-2001-50. Microsoft Research.
    [23]
    Cheng Wang, Shiliang Hu, Ho-seop Kim, Sreekumar R. Nair, Mauricio Breternitz, Zhiwei Ying, and Youfeng Wu. 2007. StarDBT: An Efficient Multi-platform Dynamic Binary Translation System. In Asia-Pacific Conference on Advances in Computer Systems Architecture.
    [24]
    Efe Yardımcı and Michael Franz. 2006. Dynamic Parallelization and Mapping of Binary Executables on Hierarchical Platforms. In CF.
    [25]
    Ruoyu Zhou and Timothy M. Jones. 2019. Janus: Statically-Driven and Profile-Guided Automatic Dynamic Binary Parallelization. In CGO.

    Cited By

    View all
    • (2023)VClinic: A Portable and Efficient Framework for Fine-Grained Value ProfilersProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3576934(892-904)Online publication date: 27-Jan-2023
    • (2022)An energy efficient multi-target binary translator for instruction and data level parallelism exploitationDesign Automation for Embedded Systems10.1007/s10617-021-09258-626:1(55-82)Online publication date: 14-Jan-2022
    • (2021)CinnamonProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370313(103-114)Online publication date: 27-Feb-2021
    • Show More Cited By

    Index Terms

    1. The janus triad: exploiting parallelism through dynamic binary modification

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          VEE 2019: Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
          April 2019
          206 pages
          ISBN:9781450360203
          DOI:10.1145/3313808
          • General Chair:
          • Jennifer Sartor,
          • Program Chairs:
          • Mayur Naik,
          • Chris Rossbach
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 14 April 2019

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. binary optimisation
          2. binary translation
          3. software prefetch
          4. vectorization

          Qualifiers

          • Research-article

          Funding Sources

          • Engineering and Physical Sciences Research Council

          Conference

          VEE '19

          Acceptance Rates

          Overall Acceptance Rate 80 of 235 submissions, 34%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)10
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 09 Aug 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)VClinic: A Portable and Efficient Framework for Fine-Grained Value ProfilersProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3576934(892-904)Online publication date: 27-Jan-2023
          • (2022)An energy efficient multi-target binary translator for instruction and data level parallelism exploitationDesign Automation for Embedded Systems10.1007/s10617-021-09258-626:1(55-82)Online publication date: 14-Jan-2022
          • (2021)CinnamonProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370313(103-114)Online publication date: 27-Feb-2021
          • (2019)Exploiting Vector Processing in Dynamic Binary TranslationProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337844(1-10)Online publication date: 5-Aug-2019

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media