Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Hardware-Accelerated Cross-Architecture Full-System Virtualization

Published: 25 October 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Hardware virtualization solutions provide users with benefits ranging from application isolation through server consolidation to improved disaster recovery and faster server provisioning. While hardware assistance for virtualization is supported by all major processor architectures, including Intel, ARM, PowerPC, and MIPS, these extensions are targeted at virtualization of the same architecture, for example, an x86 guest on an x86 host system. Existing techniques for cross-architecture virtualization, for example, an ARM guest on an x86 host, still incur a substantial overhead for CPU, memory, and I/O virtualization due to the necessity for software emulation of these mismatched system components. In this article, we present a new hardware-accelerated hypervisor called Captive, employing a range of novel techniques that exploit existing hardware virtualization extensions for improving the performance of full-system cross-platform virtualization. We illustrate how (1) guest memory management unit (MMU) events and operations can be mapped onto host memory virtualization extensions, eliminating the need for costly software MMU emulation, (2) a block-based dynamic binary translation engine inside the virtual machine can improve CPU virtualization performance, (3) memory-mapped guest I/O can be efficiently translated to fast I/O specific calls to emulated devices, and (4) the cost for asynchronous guest interrupts can be reduced. For an ARM-based Linux guest system running on an x86 host with Intel VT support, we demonstrate application performance levels, based on SPEC CPU2006 benchmarks, of up to 5.88× over state-of-the-art Qemu and 2.5× on average, achieving a guest dynamic instruction throughput of up to 1280 MIPS (million instructions per second) and 915.52 MIPS, on average.

    References

    [1]
    AMD Developer Central. 2010. AMD SimNow simulator. Retrieved from http://developer.amd.com/tools-and-sdks/cpu-development/simnow-simulator/.
    [2]
    ARM. 2011a. About the PB-A8. (2011). Retrieved from http://infocenter.arm.com/help/topic/com.arm.doc. dui0417d/BABCHBFC.html#CHDFGCFB Retrieved 02-June-2016.
    [3]
    ARM. 2011b. RealView Platform Baseboard for Cortex-A8 User Guide. Retrieved from http://infocenter.arm. com/help/index.jsp?topic=/com.arm.doc.dui0417d/index.html.
    [4]
    Rodolfo Azevedo, Sandro Rigo, Marcus Bartholomeu, Guido Araujo, Cristiano Araujo, and Edna Barros. 2005. The ArchC architecture description language and tools. Int. J. Parallel Program. 33, 5 (Oct. 2005), 453--484.
    [5]
    Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference(ATEC’05). USENIX Association, Berkeley, CA, 41--41.
    [6]
    Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7.
    [7]
    Igor Böhm, Tobias J. K. Edler von Koch, Stephen C. Kyle, Björn Franke, and Nigel Topham. 2011. Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). ACM, New York, NY, 74--85.
    [8]
    Igor Böhm, Björn Franke, and Nigel P. Topham. 2010. Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator. In Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS 2010), Fadi J. Kurdahi and Jarmo Takala (Eds.). IEEE, 1--10.
    [9]
    Florian Brandner, Andreas Fellnhofer, Andreas Krall, and David Riegler. 2009. Fast and accurate simulation using the LLVM compiler framework. In Proceedings of the 1st Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools (RAPIDO).
    [10]
    Jeffrey Buell, Daniel Hecht, Jin Heo, Kalyan Saladi, and H. Reza Taheri. 2013. Methodology for Performance Analysis of VMware vSphere under Tier-1 Applications. VMware technical journal. Retrieved from https://labs.vmware.com/vmtj/methodology-for-performance-analysis-of-vmware-vsphere- under-tier-1-applications.
    [11]
    Jianjiang Ceng, Weihua Sheng, Jeronimo Castrillon, Anastasia Stulova, Rainer Leupers, Gerd Ascheid, and Heinrich Meyr. 2009. A high-level virtual platform for early MPSoC software development. In Proceedings of the 7th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’09). ACM, New York, NY, 11--20.
    [12]
    Chao-Jui Chang, Jan-Jan Wu, Wei-Chung Hsu, Pangfeng Liu, and Pen-Chung Yew. 2014. Efficient memory virtualization for cross-ISA system mode emulation. In Proceedings of the 10th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’14). ACM, New York, NY, 117--128.
    [13]
    J. H. Ding, P. C. Chang, W. C. Hsu, and Y. C. Chung. 2011. PQEMU: A parallel system emulator based on QEMU. In Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS). 276--283.
    [14]
    Jiun-Hung Ding, Chang-Jung Lin, Ping-Hao Chang, Chieh-Hao Tsang, Wei-Chung Hsu, and Yeh-Ching Chung. 2012. ARMvisor: System virtualization for ARM. In Proceedings of the Ottawa Linux Symposium.
    [15]
    K. Ebcioglu, E. Altman, M. Gschwind, and S. Sathaye. 2001. Dynamic binary translation and optimization. IEEE Trans. Comput. 50, 6 (Jun. 2001), 529--548.
    [16]
    Adam Gerber and Clifton Craig. 2015. Learn Android Studio: Build Android Apps Quickly and Effectively (1st ed.). Apress, Berkely, CA.
    [17]
    Apala Guha, Kim hazelwood, and Mary Lou Soffa. 2010. DBT path selection for holistic memory efficiency and performance. In Proceedings of the 6th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’10). ACM, New York, NY, 145--156. 10.1145/1735997.1736018
    [18]
    Ding-Yong Hong, Chun-Chen Hsu, Cheng-Yi Chou, Wei-Chung Hsu, Pangfeng Liu, and Jan-Jan Wu. 2015. Optimizing control transfer and memory virtualization in full system emulators. ACM Trans. Archit. Code Optim. 12, 4, Article 47 (Dec. 2015), 24 pages.
    [19]
    Intel. 2016. Intel Virtualization Technology (Intel VT). Retrieved from http://www.intel.com/content/www/ us/en/virtualization/virtualization-technology/intel-virtualization-technology.html Retrieved 26-April-2016.
    [20]
    Daniel Jones and Nigel Topham. 2009. High speed CPU simulation using LTU dynamic binary translation. In Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’09). Springer-Verlag, Berlin, 50--64.
    [21]
    Naveen Kumar, Bruce R. Childers, Daniel Williams, Jack W. Davidson, and Mary Lou Soffa. 2005. Compile-time planning for overhead reduction in software dynamic translators. Int. J. Parallel Program. 33, 2 (June 2005), 103--114.
    [22]
    KVM. 2016. KVM. Retrieved from http://www.linux-kvm.org/page/Main_Page.
    [23]
    Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hållberg, Johan Högberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. Computer 32, 2 (Feb. 2002), 50--58. Retrieved from http://dlib.computer.org/co/books/co2002/ pdf/r2050.pdf.
    [24]
    Peter S. Magnusson and Bengt Werner. 1994. Some Efficient Techniques for Simulating Memory. Technical Report R94. Swedish Institute of Computer Science technical report.
    [25]
    Dirk Merkel. 2014. Docker: Lightweight linux containers for consistent development and deployment. Linux J. 2014, 239, Article 2 (March 2014).
    [26]
    Timothy Merrifield and H. Reza Taheri. 2016. Performance implications of extended page tables on virtualized x86 processors. In Proceedings of the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’16). ACM, New York, NY, 25--35. 10.1145/2892242.2892258
    [27]
    David Ott. 2009. Virtualization and Performance: Understanding VM Exits. Retrieved from https://software.intel.com/en-us/blogs/2009/06/25/virtualization-and-performance-understanding-vm-exits.
    [28]
    A. Patel, F. Afram, S. Chen, and K. Ghose. 2011. MARSS: A full system simulator for multicore x86 CPUs. In Proceedings of the 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC’11). 1050--1055.
    [29]
    Wei Qin and S. Malik. 2003. Flexible and formal modeling of microprocessors with application to retargetable simulation. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. 556--561.
    [30]
    A. Sandberg, N. Nikoleris, T. E. Carlson, E. Hagersten, S. Kaxiras, and D. Black-Schaffer. 2015. Full speed ahead: Detailed architectural simulation at near-native speed. In Proceedings of the 2015 IEEE International Symposium on Workload Characterization (IISWC). 183--192.
    [31]
    Tom Spink, Harry Wagstaff, Björn Franke, and Nigel Topham. 2014. Efficient code generation in a region-based dynamic binary translator. In Proceedings of the 2014 SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems. ACM, 3--12.
    [32]
    David Ung and Cristina Cifuentes. 2000. Machine-adaptable dynamic binary translation. In Proceedings of the ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization (DYNAMO’00). ACM, New York, NY, 41--51.
    [33]
    VMware. 2009. Performance Evaluation of Intel EPT Hardware Assist. Technical Report. VMware. Retrieved from https://www.vmware.com/pdf/Perf_ESX_Intel-EPT-eval.pdf.
    [34]
    Harry Wagstaff, Miles Gould, Björn Franke, and Nigel Topham. 2013. Early partial evaluation in a JIT-compiled, retargetable instruction set simulator generated from a high-level architecture description. In Proceedings of the Annual Design Automation Conference (DAC’13). ACM, New York, NY, Article 21, 6 pages.
    [35]
    Zhe Wang, Jianjun Li, Chenggang Wu, Dongyan Yang, Zhenjiang Wang, Wei-Chung Hsu, Bin Li, and Yong Guan. 2015. HSPT: Practical implementation and efficient management of embedded shadow page tables for cross-ISA system virtual machines. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. ACM, 53--64.
    [36]
    Emmett Witchel and Mendel Rosenblum. 1996. Embra: Fast and flexible machine simulation. In Proceedings of the 1996 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’96). ACM, New York, NY, 68--79.
    [37]
    M. T. Yourst. 2007. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems Software (ISPASS’07). 23--34.

    Cited By

    View all
    • (2024)An Instruction Inflation Analyzing Framework for Dynamic Binary TranslatorsACM Transactions on Architecture and Code Optimization10.1145/364081321:2(1-25)Online publication date: 15-Jan-2024
    • (2024)A System-Level Dynamic Binary Translator Using Automatically-Learned Translation Rules2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444850(423-434)Online publication date: 2-Mar-2024
    • (2023)On-Demand Triggered Memory Management Unit in Dynamic Binary TranslatorAdvanced Parallel Processing Technologies10.1007/978-981-99-7872-4_17(297-309)Online publication date: 4-Aug-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 4
    December 2016
    648 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/3012405
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 October 2016
    Accepted: 01 September 2016
    Revised: 01 August 2016
    Received: 01 May 2016
    Published in TACO Volume 13, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. Virtualization

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)429
    • Downloads (Last 6 weeks)33
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)An Instruction Inflation Analyzing Framework for Dynamic Binary TranslatorsACM Transactions on Architecture and Code Optimization10.1145/364081321:2(1-25)Online publication date: 15-Jan-2024
    • (2024)A System-Level Dynamic Binary Translator Using Automatically-Learned Translation Rules2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO57630.2024.10444850(423-434)Online publication date: 2-Mar-2024
    • (2023)On-Demand Triggered Memory Management Unit in Dynamic Binary TranslatorAdvanced Parallel Processing Technologies10.1007/978-981-99-7872-4_17(297-309)Online publication date: 4-Aug-2023
    • (2022)Eliminate the overhead of interrupt checking in full-system dynamic binary translatorProceedings of the 15th ACM International Conference on Systems and Storage10.1145/3534056.3534939(1-12)Online publication date: 6-Jun-2022
    • (2022)CrossDBT: An LLVM-Based User-Level Dynamic Binary Translation EmulatorEuro-Par 2022: Parallel Processing10.1007/978-3-031-12597-3_1(3-18)Online publication date: 22-Aug-2022
    • (2021)BTMMU: an efficient and versatile cross-ISA memory virtualizationProceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3453933.3454015(71-83)Online publication date: 7-Apr-2021
    • (2020)A Retargetable System-level DBT HypervisorACM Transactions on Computer Systems10.1145/338616136:4(1-24)Online publication date: 30-May-2020
    • (2019)A retargetable system-level DBT hypervisorProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358850(505-520)Online publication date: 10-Jul-2019
    • (2019)Cross-ISA machine instrumentation using fast and scalable dynamic binary translationProceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3313808.3313811(74-87)Online publication date: 14-Apr-2019
    • (2017)Optimizing Memory Access Performance Using Hardware Assisted Virtualization in Retargetable Dynamic Binary Translation2017 Euromicro Conference on Digital System Design (DSD)10.1109/DSD.2017.41(40-46)Online publication date: Aug-2017

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media