Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multi-objective Exploration for Practical Optimization Decisions in Binary Translation

Published: 07 October 2019 Publication History
  • Get Citation Alerts
  • Abstract

    In the design of mobile systems, hardware/software (HW/SW) co-design has important advantages by creating specialized hardware for the performance or power optimizations. Dynamic binary translation (DBT) is a key component in co-design. During the translation, a dynamic optimizer in the DBT system applies various software optimizations to improve the quality of the translated code. With dynamic optimization, optimization time is an exposed run-time overhead and useful analyses are often restricted due to their high costs. Thus, a dynamic optimizer needs to make smart decisions with limited analysis information, which complicates the design of optimization decision models and often causes failures in human-made heuristics. In mobile systems, this problem is even more challenging because of strict constraints on computing capabilities and memory size.
    To overcome the challenge, we investigate an opportunity to build practical optimization decision models for DBT by using machine learning techniques. As the first step, loop unrolling is chosen as the representative optimization. We base our approach on the industrial strength DBT infrastructure and conduct evaluation with 17,116 unrollable loops collected from 200 benchmarks and real-life programs across various domains. By utilizing all available features that are potentially important for loop unrolling decision, we identify the best classification algorithm for our infrastructure with consideration for both prediction accuracy and cost. The greedy feature selection algorithm is then applied to the classification algorithm to distinguish its significant features and cut down the feature space. By maintaining significant features only, the best affordable classifier, which satisfies the budgets allocated to the decision process, shows 74.5% of prediction accuracy for the optimal unroll factor and realizes an average 20.9% reduction in dynamic instruction count during the steady-state translated code execution. For comparison, the best baseline heuristic achieves 46.0% prediction accuracy with an average 13.6% instruction count reduction. Given that the infrastructure is already highly optimized and the ideal upper bound for instruction reduction is observed at 23.8%, we believe this result is noteworthy.

    References

    [1]
    2019-02-08. Intel Core i7 Embedded Processor. https://ark.intel.com/products/series/122593/8th-Generation-Intel-Core-i7-Processors#@embedded.
    [2]
    2019-06-02. 3DMark. https://www.3dmark.com/.
    [3]
    2019-06-02. FPMark. https://www.eembc.org/fpmark/.
    [4]
    2019-06-02. Geekbench. https://www.geekbench.com/.
    [5]
    2019-06-02. SYSmark. https://bapco.com/products/sysmark-2018/.
    [6]
    2019-06-02. TabletMark. https://bapco.com/products/end-of-life-products/tabletmark/.
    [7]
    Felice Balarin, Paolo Giusto, Attila Jurecska, Michael Chiodo, Harry Hsieh, Claudio Passerone, Ellen Sentovich, Luciano Lavagno, Bassam Tabbara, Alberto Sangiovanni-Vincentelli, et al. 1997. Hardware-software Co-design of Embedded Systems: The POLIS Approach. Springer Science 8 Business Media.
    [8]
    Stefan Behnel, Robert Bradshaw, Craig Citro, Lisandro Dalcin, Dag Sverre Seljebotn, and Kurt Smith. 2011. Cython: The best of both worlds. Computing in Science 8 Engineering 13, 2 (2011), 31--39.
    [9]
    Edson Borin, Youfeng Wu, Cheng Wang, Wei Liu, Mauricio Breternitz Jr, Shiliang Hu, Esfir Natanzon, Shai Rotem, and Roni Rosner. 2010. TAO: Two-level atomicity for dynamic binary optimizations. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 12--21.
    [10]
    James Bucek, Klaus-Dieter Lange, et al. 2018. SPEC CPU2017: Next-generation compute benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering. ACM, 41--42.
    [11]
    John Cavazos and J. Eliot B. Moss. 2004. Inducing heuristics to decide whether to schedule. In ACM SIGPLAN Notices, Vol. 39. ACM, 183--194.
    [12]
    John Cavazos and Michael F. P. O’Boyle. 2005. Automatic tuning of inlining heuristics. In Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference. IEEE, 14--14.
    [13]
    John Cavazos and Michael F. P. O’boyle. 2006. Method-specific dynamic compilation using logistic regression. ACM SIGPLAN Notices 41, 10 (2006), 229--240.
    [14]
    Jack W. Davidson and Sanjay Jinturkar. 1996. Aggressive loop unrolling in a retargetable, optimizing compiler. In International Conference on Compiler Construction. Springer, 59--73.
    [15]
    James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, and Jim Mattson. 2003. The transmeta code morphing/spl trade/Software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In Code Generation and Optimization, 2003. CGO 2003. International Symposium on. IEEE, 15--24.
    [16]
    Kemal Ebcioglu, Erik Altman, Michael Gschwind, and Sumedh Sathaye. 2001. Dynamic binary translation and optimization. IEEE Transactions on Computers 50, 6 (2001), 529--548.
    [17]
    Ramaswamy Govindarajan, Erik R. Altman, and Guang R. Gao. 1994. Minimizing register requirements under resource-constrained rate-optimal software pipelining. In Proceedings of the 27th Annual International Symposium on Microarchitecture. ACM, 85--94.
    [18]
    John L. Hennessy and David A. Patterson. 2011. Computer Architecture: A Quantitative Approach. Elsevier.
    [19]
    John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News 34, 4 (2006), 1--17.
    [20]
    Kenneth Hoste, Andy Georges, and Lieven Eeckhout. 2010. Automated just-in-time compiler tuning. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 62--72.
    [21]
    Chandra Krintz and Brad Calder. 2001. Using annotations to reduce dynamic optimization time. ACM Sigplan Notices 36, 5 (2001), 156--167.
    [22]
    Hugh Leather, Edwin Bonilla, and Michael O’Boyle. 2009. Automatic feature generation for machine learning based optimizing compilation. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Computer Society, 81--91.
    [23]
    Andy Liaw, Matthew Wiener, et al. 2002. Classification and regression by randomForest. R News 2, 3 (2002), 18--22.
    [24]
    Ankur Limaye and Tosiron Adegbija. 2018. A workload characterization of the SPEC CPU2017 benchmark suite. In Performance Analysis of Systems and Software (ISPASS), 2018 IEEE International Symposium on. IEEE, 149--158.
    [25]
    Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, and Ce Zhang. 2018. MLbench: Benchmarking machine learning services against human experts. Proceedings of the VLDB Endowment 11, 10 (2018), 1220--1232.
    [26]
    Josep Llosa, Mateo Valero, E. Agyuade, and Antonio González. 1998. Modulo scheduling with reduced register pressure. IEEE Transactions on Computers6 (1998), 625--638.
    [27]
    Uma Mahadevan and Lacky Shah. 1998. Intelligent loop unrolling. US Patent 5,797,013.
    [28]
    Antoine Monsifrot, François Bodin, and Rene Quiniou. 2002. A machine learning approach to automatic production of compiler heuristics. In International Conference on Artificial Intelligence: Methodology, Systems, and Applications. Springer, 41--50.
    [29]
    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, Oct (2011), 2825--2830.
    [30]
    Erez Perelman, Greg Hamerly, Michael Van Biesbrouck, Timothy Sherwood, and Brad Calder. 2003. Using SimPoint for accurate and efficient simulation. In ACM SIGMETRICS Performance Evaluation Review, Vol. 31. ACM, 318--319.
    [31]
    Archana Ravindar and Y. N. Srikant. 2011. Relative roles of instruction count and cycles per instruction in WCET estimation. In ACM SIGSOFT Software Engineering Notes, Vol. 36. ACM, 55--60.
    [32]
    Stuart J Russell and Peter Norvig. 2016. Artificial Intelligence: A Modern Approach. Malaysia; Pearson Education Limited.
    [33]
    Vivek Sarkar. 2000. Optimized unrolling of nested loops. In Proceedings of the 14th International Conference on Supercomputing. ACM, 153--166.
    [34]
    Mark Stephenson and Saman Amarasinghe. 2005. Predicting unroll factors using supervised classification. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 123--134.
    [35]
    Mark Stephenson, Saman Amarasinghe, Martin Martin, and Una-May O’Reilly. 2003. Meta optimization: Improving compiler heuristics with machine learning. In ACM SIGPLAN Notices, Vol. 38. ACM, 77--90.
    [36]
    Cheng Wang and Youfeng Wu. 2013. TSO_ATOMICITY: Efficient hardware primitive for TSO-preserving region optimizations. In ACM SIGARCH Computer Architecture News, Vol. 41. ACM, 509--520.
    [37]
    Zheng Wang and Michael O’Boyle. 2018. Machine learning in compiler optimization. Proc. IEEE (2018).
    [38]
    Markus Willems, Volker Bursgens, Thorsten Grotker, and Heinrich Meyr. 1997. FRIDGE: An interactive code generation environment for HW/SW codesign. In 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1. IEEE, 287--290.
    [39]
    Wayne H. Wolf. 1994. Hardware-software co-design of embedded systems. Proc. IEEE 82, 7 (1994), 967--989.
    [40]
    Kamen Yotov, Xiaoming Li, Gang Ren, Michael Cibulskis, Gerald DeJong, Maria Garzaran, David Padua, Keshav Pingali, Paul Stodghill, and Peng Wu. 2003. A comparison of empirical and model-driven optimization. ACM SIGPLAN Notices 38, 5 (2003), 63--76.
    [41]
    Xinchuan Zeng and Tony R. Martinez. 2000. Distribution-balanced stratified cross-validation for accuracy estimation. Journal of Experimental 8 Theoretical Artificial Intelligence 12, 1 (2000), 1--12.
    [42]
    Guoqiang Peter Zhang. 2000. Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 30, 4 (2000), 451--462.

    Cited By

    View all
    • (2022)SRTunerProceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO53902.2022.9741263(118-130)Online publication date: 2-Apr-2022
    • (2022)An energy efficient multi-target binary translator for instruction and data level parallelism exploitationDesign Automation for Embedded Systems10.1007/s10617-021-09258-626:1(55-82)Online publication date: 14-Jan-2022

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 18, Issue 5s
    Special Issue ESWEEK 2019, CASES 2019, CODES+ISSS 2019 and EMSOFT 2019
    October 2019
    1423 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3365919
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 07 October 2019
    Accepted: 01 July 2019
    Revised: 01 June 2019
    Received: 01 April 2019
    Published in TECS Volume 18, Issue 5s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. Loop unrolling<?clr?>

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)SRTunerProceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO53902.2022.9741263(118-130)Online publication date: 2-Apr-2022
    • (2022)An energy efficient multi-target binary translator for instruction and data level parallelism exploitationDesign Automation for Embedded Systems10.1007/s10617-021-09258-626:1(55-82)Online publication date: 14-Jan-2022

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media