Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3567955.3567962acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Risotto: A Dynamic Binary Translator for Weak Memory Model Architectures

Published: 21 December 2022 Publication History

Abstract

Dynamic Binary Translation (DBT) is a powerful approach to support cross-architecture emulation of unmodified binaries. However, DBT systems face correctness and performance challenges, when emulating concurrent binaries from strong to weak memory consistency architectures. As a matter of fact, we report several translation errors in QEMU, when emulating x86 binaries on Arm hosts.
To address these challenges, we propose an end-to-end approach that provides correct and efficient emulation for weak memory model architectures. Our contributions are twofold: we formalize QEMU’s intermediate representation’s memory model, and use it to propose formally verified mapping schemes to bridge the strong-on-weak memory consistency mismatch. Secondly, we implement these verified mappings in Risotto, a QEMU-based DBT system that optimizes memory fence placement while ensuring correctness. Risotto further enhances the emulation performance via cross-architecture dynamic linking of native shared libraries, and fast and correct translation of compare-and-swap operations.
We evaluate Risotto using multi-threaded benchmark suites and real-world applications, and show that Risotto improves the emulation performance by 6.7% on average over ”erroneous” QEMU, while ensuring correctness.

References

[1]
Parosh Aziz Abdulla, Mohamed Faouzi Atig, Magnus Lång, and Tuan Phong Ngo. 2015. Precise and Sound Automatic Fence Insertion Procedure under PSO. In NETYS (Lecture Notes in Computer Science, Vol. 9466). 32–47. https://doi.org/10.1007/978-3-319-26850-7_3
[2]
S. V. Adve and J. K. Aggarwal. 1993. A Unified Formalization of Four Shared-Memory Models. IEEE Trans. Parallel Distrib. Syst., 4, 6 (1993), June, 613–624. https://doi.org/10.1109/71.242161
[3]
Sarita V. Adve and Kourosh Gharachorloo. 1996. Shared Memory Consistency Models: A Tutorial. IEEE Computer, 29, 12 (1996), 66–76. https://doi.org/10.1109/2.546611
[4]
Agda Development Team. 2021. Agda 2.6.2 documentation. https://agda.readthedocs.io/en/v2.6.2/
[5]
Jade Alglave. 2012. A Formal Hierarchy of Weak Memory Models. Form. Methods Syst. Des., 41, 2 (2012), 178–210. https://doi.org/10.1007/s10703-012-0161-5
[6]
Jade Alglave, Will Deacon, Richard Grisenthwaite, Antoine Hacquard, and Luc Maranget. 2021. Armed Cats: Formal Concurrency Modelling at Arm. ACM Trans. Program. Lang. Syst., 43, 2 (2021), Article 8, 54 pages. https://doi.org/10.1145/3458926
[7]
Jade Alglave, Daniel Kroening, Vincent Nimal, and Daniel Poetzl. 2017. Don’t Sit on the Fence: A Static Analysis Approach to Automatic Fence Insertion. ACM Trans. Program. Lang. Syst., 39, 2 (2017), 6:1–6:38.
[8]
Jade Alglave and Luc Maranget. [n.d.]. herd7 consistency model simulator. http://diy.inria.fr/www/
[9]
Jade Alglave, Luc Maranget, Susmit Sarkar, and Peter Sewell. 2010. Fences in Weak Memory Models. In CAV’10. 258–272. https://doi.org/10.1007/978-3-642-14295-6_25
[10]
Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. Herding cats: modelling, simulation, testing, and data-mining for weak memory. ACM Trans. Program. Lang. Syst., 36, 2 (2014), 7:1–7:74. https://doi.org/10.1145/2627752
[11]
Apple. 2020. WWDC2020 Keynote (at 1:39:25). https://developer.apple.com/videos/play/wwdc2020/101/
[12]
ARM. [n.d.]. ARM Cortex-A72 MPCore Processor Technical Reference Manual – Memory access sequence. https://developer.arm.com/documentation/100095/0003/Memory-Management-Unit/Memory-access-sequence
[13]
ARM. 2015. ARM Cortex-A Series Programmer’s Guide for ARMv8-A. https://developer.arm.com/documentation/den0024/a/
[14]
Mohamed Faouzi Atig, Ahmed Bouajjani, Sebastian Burckhardt, and Madanlal Musuvathi. 2012. What’s Decidable about Weak Memory Models? In ESOP’12. 26–46.
[15]
avast. [n.d.]. A retargetable machine-code decompiler based on LLVM. https://github.com/avast/retdec
[16]
Amazon AWS. [n.d.]. AWS Graviton Processor. https://aws.amazon.com/ec2/graviton
[17]
Mark Batty, Kayvan Memarian, Scott Owens, Susmit Sarkar, and Peter Sewell. 2012. Clarifying and compiling C/C++ concurrency: From C++11 to POWER. In POPL’12. ACM, 509–520. https://doi.org/10.1145/2103656.2103717
[18]
Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. 2011. Mathematizing C++ concurrency. In POPL’11. ACM, 55–66. https://doi.org/10.1145/1926385.1926394
[19]
Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.
[20]
Lifting Bits. [n.d.]. Framework for lifting x86, amd64, and aarch64 program binaries to LLVM bitcode. https://github.com/lifting-bits/mcsema
[21]
Hans-J. Boehm and Sarita V. Adve. 2008. Foundations of the C++ Concurrency Memory Model. In PLDI’08. https://doi.org/10.1145/1375581.1375591
[22]
Jupyter book community. [n.d.]. Jupyter homepage. https://jupyter.org
[23]
Ahmed Bouajjani, Egor Derevenetc, and Roland Meyer. 2013. Checking and Enforcing Robustness against TSO. In ESOP 2013. 533–553. https://doi.org/10.1007/978-3-642-37036-6_29
[24]
Ahmed Bougacha. [n.d.]. Binary Translator to LLVM IR. https://github.com/repzret/dagger
[25]
Soham Chakraborty and Viktor Vafeiadis. 2016. Validating optimizations of concurrent C/C++ programs. In CGO’16. ACM, 216–226. https://doi.org/10.1145/2854038.2854051
[26]
Soham Chakraborty and Viktor Vafeiadis. 2017. Formalizing the Concurrency Semantics of an LLVM Fragment. In CGO ’17. IEEE, 100–110.
[27]
Soham Chakraborty and Viktor Vafeiadis. 2019. Grounding Thin-Air Reads with Event Structures. Proc. ACM Program. Lang., 3, POPL (2019), https://doi.org/10.1145/3290383
[28]
Emilio G. Cota, Paolo Bonzini, Alex Bennée, and Luca P. Carloni. 2017. Cross-ISA Machine Emulation for Multicores. In CGO’2017. IEEE Press, 210–220.
[29]
Jiun-Hung Ding, Po-Chun Chang, Wei-Chung Hsu, and Yeh-Ching Chung. 2011. PQEMU: A Parallel System Emulator Based on QEMU. In ICPADS’11. 276–283. https://doi.org/10.1109/ICPADS.2011.102
[30]
Docker. [n.d.]. Docker homepage. https://www.docker.com
[31]
Reinoud Elhorst. 2014. Lowering C11 Atomics for ARM in LLVM. In European LLVM Conference.
[32]
Andrei Frumusanu. 2020. Amazon’s Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute – Anandtech. https://www.anandtech.com/show/15578/cloud-clash-amazon-graviton2-arm-against-intel-and-amd
[33]
Redha Gouicem, Dennis Sprokholt, Jasper Ruehl, Rodrigo C. O. Rocha, Tom Spink, Soham Chakraborty, and Pramod Bhatotia. [n.d.]. Risotto: A Dynamic Binary Translator for Weak Memory Architectures — Artifact. https://github.com/binary-translation/risotto-artifact-asplos23
[34]
Redha Gouicem, Dennis Sprokholt, Jasper Ruehl, Rodrigo C. O. Rocha, Tom Spink, Soham Chakraborty, and Pramod Bhatotia. 2022. Risotto: A Dynamic Binary Translator for Weak Memory Model Architectures. In Zenodo. Zenodo. https://doi.org/10.5281/zenodo.7198195
[35]
Lisa Higham, Lillanne Jackson, and Jalal Kawash. 2007. Specifying Memory Consistency of Write Buffer Multiprocessors. ACM Trans. Comput. Syst., https://doi.org/10.1145/1189736.1189737
[36]
L. Higham, J. Kawash, and Nathaly Verwaal. 1997. Defining and Comparing Memory Consistency Models. In PDCS’97.
[37]
Ding-Yong Hong, Chun-Chen Hsu, Pen-Chung Yew, Jan-Jan Wu, Wei-Chung Hsu, Pangfeng Liu, Chien-Min Wang, and Yeh-Ching Chung. 2012. HQEMU: A Multi-Threaded and Retargetable Dynamic Binary Translator on Multicores. In CGO’12. 104–113. https://doi.org/10.1145/2259016.2259030
[38]
RISC-V International. [n.d.]. RISC-V. https://riscv.org/
[39]
jalglave. [n.d.]. [AArch64 cat] Atomics strengthening #322. https://github.com/herd/herdtools7/pull/322
[40]
Saagar Jha. [n.d.]. TSOEnabler – Kernel extension that enables TSO for Apple silicon processes. https://github.com/saagarjha/TSOEnabler
[41]
Jeehoon Kang, Hur, Chung-Kil, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. 2017. A promising semantics for relaxed-memory concurrency. In POPL’17. ACM.
[42]
Ori Lahav and Viktor Vafeiadis. 2016. Explaining Relaxed Memory Models with Program Transformations. In FM’16. 479–495. https://doi.org/10.1007/978-3-319-48989-6_29
[43]
Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek Dreyer. 2017. Repairing Sequential Consistency in C/C++11. In PLDI 2017. 618–632. https://doi.org/10.1145/3062341.3062352 Technical Appendix Available at
[44]
Leslie Lamport. 1979. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. Computers, 28, 9 (1979), 690–691. https://doi.org/10.1109/TC.1979.1675439
[45]
J. Lee and D. Padua. 2001. Hiding Relaxed Memory Consistency with a Compiler. IEEE Trans. Computers, 50 (2001), 824–833.
[46]
J. Lee and D. A. Padua. 2001. Hiding relaxed memory consistency with a compiler. IEEE Trans. Comput., 50, 8 (2001), 824–833.
[47]
Sung-Hwan Lee, Minki Cho, Anton Podkopaev, Soham Chakraborty, Chung-Kil Hur, Ori Lahav, and Viktor Vafeiadis. 2020. Promising 2.0: Global Optimizations in Relaxed Memory Concurrency. In PLDI 2020. 362–376. https://doi.org/10.1145/3385412.3386010
[48]
Alexander Linden and Pierre Wolper. 2011. A Verification-Based Approach to Memory Fence Insertion in Relaxed Memory Systems. In SPIN’11. 144–160.
[49]
Alexander Linden and Pierre Wolper. 2013. A Verification-Based Approach to Memory Fence Insertion in PSO Memory Systems. In TACAS.
[50]
Feng Liu, Nayden Nedev, Nedyalko Prisadnikov, Martin Vechev, and Eran Yahav. 2012. Dynamic Synthesis for Relaxed Memory Models. In PLDI ’12. 429–440. https://doi.org/10.1145/2254064.2254115
[51]
Nian Liu, Binyu Zang, and Haibo Chen. 2020. No Barrier in the Road: A Comprehensive Study and Optimization of ARM Barriers. In PPOPP’20. 348–361. https://doi.org/10.1145/3332466.3374535
[52]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI 2005. 190–200. https://doi.org/10.1145/1065010.1065034
[53]
Daniel Lustig, Caroline Trippel, Michael Pellauer, and Margaret Martonosi. 2015. ArMOR: Defending against Memory Consistency Model Mismatches in Heterogeneous Architectures. In ISCA’15. 388–400. https://doi.org/10.1145/2749469.2750378
[54]
Sela Mador-Haim, Rajeev Alur, and Milo M K. Martin. 2010. Generating Litmus Tests for Contrasting Memory Consistency Models. In CAV’10. 273–287. https://doi.org/10.1007/978-3-642-14295-6_26
[55]
Luc Maranget, Susmit Sarkar, and Peter Sewell. 2012. A Tutorial Introduction to the ARM and POWER Relaxed Memory Models. Draft.
[56]
Microsoft. [n.d.]. How x86 emulation works on ARM. https://docs.microsoft.com/en-us/windows/uwp/porting/apps-on-arm-x86-emulation
[57]
Microsoft. [n.d.]. Using ARM64EC to build apps for Windows 11 on ARM devices. https://docs.microsoft.com/en-us/windows/uwp/porting/arm64ec
[58]
Robin Morisset and Francesco Zappa Nardelli. 2017. Partially redundant fence elimination for x86, ARM, and power processors. In CC’17. 1–10.
[59]
Robin Morisset, Pankaj Pawan, and Francesco Zappa Nardelli. 2013. Compiler testing via a theory of sound optimisations in the C11/C++11 memory model. In PLDI’13. ACM, 187–196. https://doi.org/10.1145/2491956.2491967
[60]
Koh M. Nakagawa. 2021. Reverse-engineering Rosetta 2 part1: Analyzing AOT files and the Rosetta 2 runtime. https://ffri.github.io/ProjectChampollion/part1/
[61]
Jonas Oberhauser, R. Chehab, Diogo Behrens, Ming Fu, A. Paolillo, Lilith Oberhauser, Koustubha Bhat, Yuzhong Wen, Haibo Chen, Jaeho Kim, and Viktor Vafeiadis. 2021. VSync: push-button verification and optimization for synchronization primitives on weak memory models. ASPLOS’21.
[62]
Maintainers of nix. [n.d.]. NixOS homepage. https://nixos.org/download.html
[63]
OpenSSL. [n.d.]. OpenSSL – Cryptography and SSL/TLS Toolkit. https://www.openssl.org/
[64]
Scott Owens. 2010. Reasoning about the Implementation of Concurrency Abstractions on x86-TSO. In ECOOP. 478–503.
[65]
Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A Better x86 Memory Model: x86-TSO. In TPHOLs. 391–407. https://doi.org/10.1007/978-3-642-03359-9_27
[66]
Gustavo Petri, Jan Vitek, and Suresh Jagannathan. 2015. Cooking the Books: Formalizing JMM Implementation Recipes. In ECOOP 2015 (LIPIcs, Vol. 37). 445–469. https://doi.org/10.4230/LIPIcs.ECOOP.2015.445
[67]
Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. 2019. Bridging the Gap between Programming Languages and Hardware Weak Memory Models. Proc. ACM Program. Lang., 3, POPL (2019), https://doi.org/10.1145/3290382
[68]
ptitSeb. 2021. box64. https://github.com/ptitSeb/box64
[69]
ptitSeb. 2021. box86. https://github.com/ptitSeb/box86
[70]
Christopher Pulte, Shaked Flur, Will Deacon, Jon French, Susmit Sarkar, and Peter Sewell. 2018. Simplifying ARM concurrency: multicopy-atomic axiomatic and operational models for ARMv8. PACMPL, 2, POPL (2018), 19:1–19:29. https://doi.org/10.1145/3158107
[71]
QEMU. [n.d.]. the FAST! processor emulator. https://www.qemu.org/
[72]
Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary R. Bradski, and Christos Kozyrakis. 2007. Evaluating MapReduce for Multi-core and Multiprocessor Systems. In HPCA. IEEE Computer Society, 13–24.
[73]
Susmit Sarkar, Kayvan Memarian, Scott Owens, Mark Batty, Peter Sewell, Luc Maranget, Jade Alglave, and Derek Williams. 2012. Synchronising C/C++ and POWER. In PLDI’12. ACM, 311–322. https://doi.org/10.1145/2254064.2254102
[74]
Susmit Sarkar, Peter Sewell, Jade Alglave, Luc Maranget, and Derek Williams. 2011. Understanding POWER Multiprocessors. In PLDI ’11. 175–186.
[75]
Jaroslav Sevcík. 2011. Safe optimisations for shared-memory concurrent programs. In PLDI 2011. 306–316. https://doi.org/10.1145/1993498.1993534
[76]
Jaroslav Sevcík and David Aspinall. 2008. On Validity of Program Transformations in the Java Memory Model. In ECOOP 2008. 27–51. https://doi.org/10.1007/978-3-540-70592-5_3
[77]
Agam Shah. 2021. We’re closing the gap with Arm and x86, claims SiFive: New RISC-V CPU core for PCs, servers, mobile incoming – The Register. https://www.theregister.com/2021/10/21/sifive_riscv_cpu/
[78]
Dennis E. Shasha and Marc Snir. 1988. Efficient and Correct Execution of Parallel Programs that Share Memory. ACM Trans. Program. Lang. Syst., 10, 2 (1988), 282–312. https://doi.org/10.1145/42190.42277
[79]
Bor-Yeh Shen, Jiunn-Yeu Chen, Wei-Chung Hsu, and Wuu Yang. 2012. LLBT: An LLVM-Based Static Binary Translator. In CASES 2012. 51–60. https://doi.org/10.1145/2380403.2380419
[80]
Tom Spink, Harry Wagstaff, and Björn Franke. 2019. A Retargetable System-Level DBT Hypervisor. In USENIX Annual Technical Conference. USENIX Association, 505–520.
[81]
SQLite. [n.d.]. Database Speed Comparison. https://www.sqlite.org/speed.html
[82]
Robert C. Steinke and Gary J. Nutt. 2004. A unified theory of shared memory consistency. J. ACM, 51, 5 (2004), 800–849. https://doi.org/10.1145/1017460.1017464
[83]
Zehra Sura, Xing Fang, Chi-Leung Wong, Samuel P. Midkiff, Jaejin Lee, and David Padua. 2005. Compiler Techniques for High Performance Sequentially Consistent Java Programs. In PPOPP’05. 2–13. https://doi.org/10.1145/1065944.1065947
[84]
Jie Tan, Jian-min Pang, and Shuai-bing Lu. 2018. Using Local Library Function in Binary Translation. In Current Trends in Computer Science and Mechanical Automation Vol. 1. De Gruyter Open Poland, 123–132.
[85]
Runzhou Tao, Jianan Yao, Xupeng Li, Shih-Wei Li, Jason Nieh, and Ronghui Gu. 2021. Formal Verification of a Multiprocessor Hypervisor on Arm Relaxed Memory Hardware. In SOSP. ACM, 866–881.
[86]
Viktor Vafeiadis, Thibaut Balabonski, Soham Chakraborty, Robin Morisset, and Francesco Zappa Nardelli. 2015. Common Compiler Optimisations are Invalid in the C11 Memory Model and what we can do about it. In POPL’15. ACM, 209–220. https://doi.org/10.1145/2676726.2676995
[87]
Viktor Vafeiadis and Francesco Zappa Nardelli. 2011. Verifying Fence Elimination Optimisations. In SAS’11 (LNCS, Vol. 6887). Springer, 146–162. https://doi.org/10.1007/978-3-642-23702-7_14
[88]
Zhaoguo Wang, Ran Liu, Yufei Chen, Xi Wu, Haibo Chen, Weihua Zhang, and Binyu Zang. 2011. COREMU: a scalable and portable parallel full-system emulator. In PPOPP’11, Calin Cascaval and Pen-Chung Yew (Eds.). 213–222. https://doi.org/10.1145/1941553.1941583
[89]
John Wickerson, Mark Batty, Tyler Sorensen, and George A. Constantinides. 2017. Automatically Comparing Memory Consistency Models. In POPL’17. ACM, 190–204. https://doi.org/10.1145/3009837.3009838
[90]
QEMU wiki. [n.d.]. Features/tcg-multithread. https://wiki.qemu.org/Features/tcg-multithread
[91]
S. Bharadwaj Yadavalli and Aaron Smith. 2019. Raising Binaries to LLVM IR with MCTOLL (WIP Paper). In LCTES 2019. 213–218. https://doi.org/10.1145/3316482.3326354

Cited By

View all
  • (2024)An Instruction Inflation Analyzing Framework for Dynamic Binary TranslatorsACM Transactions on Architecture and Code Optimization10.1145/364081321:2(1-25)Online publication date: 23-Mar-2024
  • (2024)BTBench: A Benchmark for Comprehensive Binary Translation Performance Evaluation2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00014(36-47)Online publication date: 5-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1
March 2023
137 pages
ISBN:9781450399159
DOI:10.1145/3567955
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 December 2022

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Binary translation
  2. formal verification
  3. memory models

Qualifiers

  • Research-article

Conference

ASPLOS '23

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)327
  • Downloads (Last 6 weeks)26
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Instruction Inflation Analyzing Framework for Dynamic Binary TranslatorsACM Transactions on Architecture and Code Optimization10.1145/364081321:2(1-25)Online publication date: 23-Mar-2024
  • (2024)BTBench: A Benchmark for Comprehensive Binary Translation Performance Evaluation2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00014(36-47)Online publication date: 5-May-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media