research-article

Improving SIMD code generation in QEMU

Authors:

Sheng-Yu Fu,

Jan-Jan Wu,

Wei-Chung HsuAuthors Info & Claims

DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition

Pages 1233 - 1236

Published: 09 March 2015 Publication History

Get Access

Abstract

Modern processors are often enhanced using SIMD instructions, such as the MMX, SSE, and AVX instructions set in the x86 architecture, or the NEON instruction set in the ARM architecture. Using these SIMD instructions could significantly increase application performance, hence in application binaries a significant proportion of instructions are likely to be SIMD instructions. However, Dynamic Binary Translation (DBT) has largely overlooked SIMD instruction translation. For example, in the popular QEMU system emulator, guest SIMD instructions are often emulated with a sequence of scalar instructions even when the host machines have SIMD instructions to support such parallel computation, leaving significant potential for performance enhancement. In this paper, we propose two approaches, one leveraging the existing helper function implementation in QEMU, and the other using a newly introduced vector IR (Intermediate Representation) to enhance the performance of SIMD instruction translation in DBT of QEMU. Both approaches were implemented in the QEMU to support ARM and IA32 frontend and x86-64 backend. Preliminary experiments show that adding vector IR can significantly enhance the performance of guest applications containing SIMD instructions for both ARM and IA32 architectures when running with QEMU on the x86-64 platform.

References

[1]

Bellard, Fabrice. "QEMU, a Fast and Portable Dynamic Translator." USENIX Annual Technical Conference, FREENIX Track. 2005

Digital Library

Google Scholar

[2]

R. L. Sites, A. Chernoff, M. B. Kirk, M. P. Marks and S. G. Robinson, "Binary translation", Communications of the ACM, Volume 36 Issue 2, Feb. 1993

Digital Library

Google Scholar

[3]

Smith, Jim, and Ravi Nair. Virtual machines: versatile platforms for systems and processes. Elsevier, 2005.

Digital Library

Google Scholar

[4]

Nicholas Nethercote and Julian Seward, "Valgrind: a framework for heavyweight dynamic binary instrumentation", ACM SIGPLAN Notices - Proceedings of the 2007 PLDI conference, 2007

Digital Library

Google Scholar

[5]

Vasanth Bala, Evelyn Duesterwald and Sanjeev Banerjia, "Dynamo: a transparent dynamic optimization system", PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference, 2000

Digital Library

Google Scholar

[6]

Bob Cmelik, David Keppel, "Shade: A Fast Instruction-Set Simulator for Execution Profiling", 94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, Pages 128--137, 1994

Digital Library

Google Scholar

[7]

Hong, Ding-Yong, et al. "HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores." Proceedings of the Tenth International Symposium on Code Generation and Optimization, 2012.

Digital Library

Google Scholar

[8]

Steven S. Muchnick. Advanced compiler design implementation. Morgan Kaufmann, 1997.

Digital Library

Google Scholar

[9]

Lattner, Chris, and Vikram Adve. "LLVM: A compilation framework for lifelong program analysis & transformation." Code Generation and Optimization, 2004. CGO 2004. International Symposium on. IEEE, 2004.

Digital Library

Google Scholar

[10]

Wang, Zhaoguo, et al. "COREMU: a scalable and portable parallel full-system emulator." ACM SIGPLAN Notices 46.8 (2011): 213--222.

Digital Library

Google Scholar

[11]

Ding, Jiun-Hung, et al. "PQEMU: A parallel system emulator based on QEMU." Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on. IEEE, 2011.

Digital Library

Google Scholar

[12]

Michel, Luc, Nicolas Fournel, and Frédéric Pétrot. "Speeding-up SIMD instructions dynamic binary translation in embedded processor simulation." Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011. IEEE, 2011.

Crossref

Google Scholar

[13]

Li, Jianhui, et al. "Optimizing dynamic binary translation for SIMD instructions." Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 2006.

Digital Library

Google Scholar

Cited By

View all

Pang YLyerly RRavindran BHershcovitch MGoel AMorrison A(2019)Cross-ISA execution of SIMD regions for improved performanceProceedings of the 12th ACM International Conference on Systems and Storage10.1145/3319647.3325832(55-67)Online publication date: 22-May-2019
https://dl.acm.org/doi/10.1145/3319647.3325832
Mundichipparakkal JBamakhrama MJordans R(2018)Fast and Portable Vector DSP Simulation Through Automatic VectorizationProceedings of the 21st International Workshop on Software and Compilers for Embedded Systems10.1145/3207719.3207720(47-53)Online publication date: 28-May-2018
https://dl.acm.org/doi/10.1145/3207719.3207720
Shuja JGani ANaveed AAhmed EHsu C(2017)Case of ARM emulation optimization for offloading mechanisms in Mobile Cloud ComputingFuture Generation Computer Systems10.1016/j.future.2016.05.03776:C(407-417)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1016/j.future.2016.05.037
Show More Cited By

Index Terms

Improving SIMD code generation in QEMU
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Hardware
  1. Hardware validation
    1. Functional verification
      1. Simulation and emulation

Recommendations

Retargetable code optimization with SIMD instructions
CODES+ISSS '06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesis

Retargetable C compilers are nowadays widely used to quickly obtain compiler support for new embedded processors and to perform early processor architecture exploration. One frequent concern about retargetable compilers, though, is their lack of machine-...
Generation of permutations for SIMD processors
LCTES '05: Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

Short vector (SIMD) instructions are useful in signal processing, multimedia, and scientific applications. They offer higher performance, lower energy consumption, and better resource utilization. However, compilers still do not have good support for ...
Automatic generation of custom SIMD instructions for superword level parallelism
DATE '14: Proceedings of the conference on Design, Automation & Test in Europe

Application specific instruction-set processors (ASIPs) have drawn significant attention from System-on-a-Chip (SoC) community due to the capability of fine grain flexibility and customizability. In order to maximize the benefit of ASIP, automatic ...

Comments

Information & Contributors

Information

Published In

DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition

March 2015

1827 pages

ISBN:9783981537048

General Chair:
Wolfgang Nebel
OFFIS & University of Oldenburg, DE
,
Program Chair:
David Atienza
EPFL, CH

Publisher

EDA Consortium

San Jose, CA, United States

Publication History

Published: 09 March 2015

Check for updates

Qualifiers

Research-article

Conference

DATE '15

Sponsor:

EDAA
EDAC
SIGDA
Russian Acadamy of Sciences

DATE '15: Design, Automation and Test in Europe

March 9 - 13, 2015

Grenoble, France

Acceptance Rates

DATE '15 Paper Acceptance Rate 206 of 915 submissions, 23%;

Overall Acceptance Rate 518 of 1,794 submissions, 29%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
327
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Pang YLyerly RRavindran BHershcovitch MGoel AMorrison A(2019)Cross-ISA execution of SIMD regions for improved performanceProceedings of the 12th ACM International Conference on Systems and Storage10.1145/3319647.3325832(55-67)Online publication date: 22-May-2019
https://dl.acm.org/doi/10.1145/3319647.3325832
Mundichipparakkal JBamakhrama MJordans R(2018)Fast and Portable Vector DSP Simulation Through Automatic VectorizationProceedings of the 21st International Workshop on Software and Compilers for Embedded Systems10.1145/3207719.3207720(47-53)Online publication date: 28-May-2018
https://dl.acm.org/doi/10.1145/3207719.3207720
Shuja JGani ANaveed AAhmed EHsu C(2017)Case of ARM emulation optimization for offloading mechanisms in Mobile Cloud ComputingFuture Generation Computer Systems10.1016/j.future.2016.05.03776:C(407-417)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1016/j.future.2016.05.037
Shuja JGani ARehman MAhmed EMadani SKhan MKo K(2016)Towards native code offloading based MCC frameworks for multimedia applicationsJournal of Network and Computer Applications10.1016/j.jnca.2016.08.02175:C(335-354)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1016/j.jnca.2016.08.021

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Retargetable code optimization with SIMD instructions

Generation of permutations for SIMD processors

Automatic generation of custom SIMD instructions for superword level parallelism

Comments

Published In

Sponsors

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Retargetable code optimization with SIMD instructions

Generation of permutations for SIMD processors

Automatic generation of custom SIMD instructions for superword level parallelism

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations