Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2755753.2757098acmconferencesArticle/Chapter ViewAbstractPublication PagesdateConference Proceedingsconference-collections
research-article

Improving SIMD code generation in QEMU

Published: 09 March 2015 Publication History

Abstract

Modern processors are often enhanced using SIMD instructions, such as the MMX, SSE, and AVX instructions set in the x86 architecture, or the NEON instruction set in the ARM architecture. Using these SIMD instructions could significantly increase application performance, hence in application binaries a significant proportion of instructions are likely to be SIMD instructions. However, Dynamic Binary Translation (DBT) has largely overlooked SIMD instruction translation. For example, in the popular QEMU system emulator, guest SIMD instructions are often emulated with a sequence of scalar instructions even when the host machines have SIMD instructions to support such parallel computation, leaving significant potential for performance enhancement. In this paper, we propose two approaches, one leveraging the existing helper function implementation in QEMU, and the other using a newly introduced vector IR (Intermediate Representation) to enhance the performance of SIMD instruction translation in DBT of QEMU. Both approaches were implemented in the QEMU to support ARM and IA32 frontend and x86-64 backend. Preliminary experiments show that adding vector IR can significantly enhance the performance of guest applications containing SIMD instructions for both ARM and IA32 architectures when running with QEMU on the x86-64 platform.

References

[1]
Bellard, Fabrice. "QEMU, a Fast and Portable Dynamic Translator." USENIX Annual Technical Conference, FREENIX Track. 2005
[2]
R. L. Sites, A. Chernoff, M. B. Kirk, M. P. Marks and S. G. Robinson, "Binary translation", Communications of the ACM, Volume 36 Issue 2, Feb. 1993
[3]
Smith, Jim, and Ravi Nair. Virtual machines: versatile platforms for systems and processes. Elsevier, 2005.
[4]
Nicholas Nethercote and Julian Seward, "Valgrind: a framework for heavyweight dynamic binary instrumentation", ACM SIGPLAN Notices - Proceedings of the 2007 PLDI conference, 2007
[5]
Vasanth Bala, Evelyn Duesterwald and Sanjeev Banerjia, "Dynamo: a transparent dynamic optimization system", PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference, 2000
[6]
Bob Cmelik, David Keppel, "Shade: A Fast Instruction-Set Simulator for Execution Profiling", 94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, Pages 128--137, 1994
[7]
Hong, Ding-Yong, et al. "HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores." Proceedings of the Tenth International Symposium on Code Generation and Optimization, 2012.
[8]
Steven S. Muchnick. Advanced compiler design implementation. Morgan Kaufmann, 1997.
[9]
Lattner, Chris, and Vikram Adve. "LLVM: A compilation framework for lifelong program analysis & transformation." Code Generation and Optimization, 2004. CGO 2004. International Symposium on. IEEE, 2004.
[10]
Wang, Zhaoguo, et al. "COREMU: a scalable and portable parallel full-system emulator." ACM SIGPLAN Notices 46.8 (2011): 213--222.
[11]
Ding, Jiun-Hung, et al. "PQEMU: A parallel system emulator based on QEMU." Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on. IEEE, 2011.
[12]
Michel, Luc, Nicolas Fournel, and Frédéric Pétrot. "Speeding-up SIMD instructions dynamic binary translation in embedded processor simulation." Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011. IEEE, 2011.
[13]
Li, Jianhui, et al. "Optimizing dynamic binary translation for SIMD instructions." Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 2006.

Cited By

View all
  • (2019)Cross-ISA execution of SIMD regions for improved performanceProceedings of the 12th ACM International Conference on Systems and Storage10.1145/3319647.3325832(55-67)Online publication date: 22-May-2019
  • (2018)Fast and Portable Vector DSP Simulation Through Automatic VectorizationProceedings of the 21st International Workshop on Software and Compilers for Embedded Systems10.1145/3207719.3207720(47-53)Online publication date: 28-May-2018
  • (2017)Case of ARM emulation optimization for offloading mechanisms in Mobile Cloud ComputingFuture Generation Computer Systems10.1016/j.future.2016.05.03776:C(407-417)Online publication date: 1-Nov-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition
March 2015
1827 pages
ISBN:9783981537048

Sponsors

Publisher

EDA Consortium

San Jose, CA, United States

Publication History

Published: 09 March 2015

Check for updates

Qualifiers

  • Research-article

Conference

DATE '15
Sponsor:
  • EDAA
  • EDAC
  • SIGDA
  • Russian Acadamy of Sciences
DATE '15: Design, Automation and Test in Europe
March 9 - 13, 2015
Grenoble, France

Acceptance Rates

DATE '15 Paper Acceptance Rate 206 of 915 submissions, 23%;
Overall Acceptance Rate 518 of 1,794 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Cross-ISA execution of SIMD regions for improved performanceProceedings of the 12th ACM International Conference on Systems and Storage10.1145/3319647.3325832(55-67)Online publication date: 22-May-2019
  • (2018)Fast and Portable Vector DSP Simulation Through Automatic VectorizationProceedings of the 21st International Workshop on Software and Compilers for Embedded Systems10.1145/3207719.3207720(47-53)Online publication date: 28-May-2018
  • (2017)Case of ARM emulation optimization for offloading mechanisms in Mobile Cloud ComputingFuture Generation Computer Systems10.1016/j.future.2016.05.03776:C(407-417)Online publication date: 1-Nov-2017
  • (2016)Towards native code offloading based MCC frameworks for multimedia applicationsJournal of Network and Computer Applications10.1016/j.jnca.2016.08.02175:C(335-354)Online publication date: 1-Nov-2016

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media