Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Compound Memory Models

Published: 06 June 2023 Publication History

Abstract

Today's mobile, desktop, and server processors are heterogeneous, consisting not only of CPUs but also GPUs and other accelerators. Such heterogeneous processors are starting to expose a shared memory interface across these devices.Given that each of these individual devices typically supports a distinct instruction set architecture and a distinct memory consistency model, it is not clear what the memory consistency model of the heterogeneous machine should be. In this paper, we answer this question by formalizing "compound" memory models: we present a compositional operational model describing the resulting model when devices with distinct consistency models are fused together. We instantiate our model with the compound x86TSO/PTX model -- a CPU enforcing x86TSO and a GPU enforcing the PTX model. A key result is that the x86TSO/PTX compound model retains compiler mappings from the language-based (scoped) C memory model. This means that threads mapped to the x86TSO device can continue to use the already proven C-to-x86TSO compiler mapping, and the same for PTX.

Supplementary Material

Auxiliary Archive (pldi23main-p284-p-archive.zip)
This is the technical supplement (appendix) to our submission.

References

[1]
Jade Alglave, Will Deacon, Richard Grisenthwaite, Antoine Hacquard, and Luc Maranget. 2021. Armed Cats: Formal Concurrency Modelling at Arm. ACM Trans. Program. Lang. Syst., 43, 2 (2021), Article 8, 54 pages. https://doi.org/10.1145/3458926
[2]
Jade Alglave and Luc Maranget. 2022. herd7 consistency model simulator. http://diy.inria.fr/www/
[3]
Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. Herding cats: modelling, simulation, testing, and data-mining for weak memory. ACM Trans. Program. Lang. Syst., 36, 2 (2014), 7:1–7:74. https://doi.org/10.1145/2627752
[4]
AMD. 2022. AMD Instinct™ MI200 Series Accelerator and Node Architectures. https://chipsandcheese.com/2022/09/18/hot-chips-34-amds-instinct-mi200-architecture/ Accessed: 9th September 2022
[5]
AMD. 2022. AMD ROCm Memory model. https://rocmdocs.amd.com/en/latest/ROCm_Compiler_SDK/ROCm-Codeobj-format.html##memory-model Accessed: 8th August 2022
[6]
ARM. 2011. Atomicity in the ARM Architecture. https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architecture/Application-Level-Memory-Model/Memory-types-and-attributes-and-the-memory-order-model/Atomicity-in-the-ARM-architecture Accessed: 20th March 2023
[7]
2018. ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile. Initial v8.4 EAC release
[8]
ARM. 2021. The AMBA CHI Specification. https://developer.arm.com/architectures/system-architectures/amba/amba-5 Accessed: 5th July 2022
[9]
Mark Batty. 2017. Compositional relaxed concurrency. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 375, 2104 (2017), September, https://kar.kent.ac.uk/64300/
[10]
Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. 2011. Mathematizing C++ concurrency. In POPL’11. ACM, 55–66. https://doi.org/10.1145/1926385.1926394
[11]
Soham Chakraborty and Viktor Vafeiadis. 2017. Formalizing the Concurrency Semantics of an LLVM Fragment. In CGO ’17. IEEE, 100–110.
[12]
Soham Chakraborty and Viktor Vafeiadis. 2019. Grounding Thin-Air Reads with Event Structures. 3, POPL (2019), https://doi.org/10.1145/3290383
[13]
CXL. 2022. Compute Express Link. https://www.computeexpresslink.org/ Accessed: 5th July 2022
[14]
Leonardo de Moura and Sebastian Ullrich. 2021. The Lean 4 Theorem Prover and Programming Language. In International Conference on Automated Deduction. 625–635.
[15]
Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit Sarkar, Ali Sezgin, Luc Maranget, Will Deacon, and Peter Sewell. 2016. Modelling the ARMv8 architecture, operationally: concurrency and ISA. In POPL 2016. 608–621.
[16]
Shaked Flur, Susmit Sarkar, Christopher Pulte, Kyndylan Nienhuis, Luc Maranget, Kathryn E. Gray, Ali Sezgin, Mark Batty, and Peter Sewell. 2017. Mixed-size concurrency: ARM, POWER, C/C++11, and SC. In POPL’17.
[17]
Andrés Goens, Soham Chakraborty, Susmit Sarkar, Sukarn Agarwal, Nicolai Oswald, and Vijay Nagarajan. 2023. Compound Memory Models: Artifact. Dataset on Zenodo. https://doi.org/10.5281/zenodo.7798646
[18]
Derek R. Hower, Blake A. Hechtman, Bradford M. Beckmann, Benedict R. Gaster, Mark D. Hill, Steven K. Reinhardt, and David A. Wood. 2014. Heterogeneous-race-free Memory Models. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems.
[19]
HSA Foundation. 2012. Heterogeneous System Architecture: A Technical Review.
[20]
Dan Iorga, Alastair F. Donaldson, Tyler Sorensen, and John Wickerson. 2021. The semantics of shared memory in Intel CPU/FPGA systems. Proc. ACM Program. Lang., 5, OOPSLA (2021), 1–28. https://doi.org/10.1145/3485497
[21]
ISO/IEC 14882. 2011. Programming Language C++.
[22]
ISO/IEC 9899. 2011. Programming Language C.
[23]
Daniel Jackson. 2012. Software Abstractions: logic, language, and analysis. MIT press.
[24]
Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. 2017. A Promising Semantics for Relaxed-Memory Concurrency. In POPL’17 (POPL 2017). Association for Computing Machinery, New York, NY, USA. 175–189. isbn:9781450346603 https://doi.org/10.1145/3009837.3009850
[25]
Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek Dreyer. 2017. Repairing Sequential Consistency in C/C++11. In PLDI 2017. 618–632. https://doi.org/10.1145/3062341.3062352 Technical Appendix Available at
[26]
Leslie Lamport. 1979. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. Comput., 28, 9 (1979), Sept., 690–691.
[27]
Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jerónimo Castrillón, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Marjan Fariborz, Amin Farmahini Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi, Dibakar Gope, Thomas Grass, Bagus Hanindhito, Andreas Hansson, Swapnil Haria, Austin Harris, Timothy Hayes, Adrian Herrera, Matthew Horsnell, Syed Ali Raza Jafri, Radhika Jagtap, Hanhwi Jang, Reiley Jeyapaul, Timothy M. Jones, Matthias Jung, Subash Kannoth, Hamidreza Khaleghzadeh, Yuetsu Kodama, Tushar Krishna, Tommaso Marinelli, Christian Menard, Andrea Mondelli, Tiago Mück, Omar Naji, Krishnendra Nathella, Hoa Nguyen, Nikos Nikoleris, Lena E. Olson, Marc S. Orr, Binh Pham, Pablo Prieto, Trivikram Reddy, Alec Roelke, Mahyar Samani, Andreas Sandberg, Javier Setoain, Boris Shingarov, Matthew D. Sinclair, Tuan Ta, Rahul Thakur, Giacomo Travaglini, Michael Upton, Nilay Vaish, Ilias Vougioukas, Zhengrong Wang, Norbert Wehn, Christian Weis, David A. Wood, Hongil Yoon, and Éder F. Zulian. 2020. The gem5 Simulator: Version 20.0+. CoRR, abs/2007.03152 (2020), arXiv:2007.03152. arxiv:2007.03152
[28]
Daniel Lustig, Sameer Sahasrabuddhe, and Olivier Giroux. 2019. A Formal Analysis of the NVIDIA PTX Memory Consistency Model. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, Providence, RI, USA, April 13-17, 2019, Iris Bahar, Maurice Herlihy, Emmett Witchel, and Alvin R. Lebeck (Eds.). ACM, 257–270. https://doi.org/10.1145/3297858.3304043
[29]
Paul E. McKenney. 2017. Is Parallel Programming Hard, And, If So, What Can You Do About It? (v2017.01.02a). CoRR, abs/1701.00854 (2017), arXiv:1701.00854. arxiv:1701.00854
[30]
Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, and David A. Wood. 2020. A Primer on Memory Consistency and Cache Coherence, Second Edition. Morgan & Claypool Publishers. https://doi.org/10.2200/S00962ED2V01Y201910CAC049
[31]
NVIDIA. 2019. CUDA Toolkit Documentation - PTX ISA.
[32]
NVIDIA. 2022. NVIDIA debuts Grace CPU Superschip. https://nvidianews.nvidia.com/news/nvidia-introduces-grace-cpu-superchip Accessed: 22nd August 2022
[33]
Nicolai Oswald, Vijay Nagarajan, Daniel J. Sorin, Vasilis Gavrielatos, Theo Olausson, and Reece Carr. 2022. HeteroGen: Automatic Synthesis of Heterogeneous Cache Coherence Protocols. In IEEE International Symposium on High-Performance Computer Architecture, HPCA 2022, Seoul, South Korea, April 2-6, 2022. IEEE, 756–771. https://doi.org/10.1109/HPCA53966.2022.00061
[34]
Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A Better x86 Memory Model: x86-TSO. In TPHOLs. 391–407. https://doi.org/10.1007/978-3-642-03359-9_27
[35]
Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. 2019. Bridging the Gap between Programming Languages and Hardware Weak Memory Models. Proc. ACM Program. Lang., 3, POPL (2019), https://doi.org/10.1145/3290382
[36]
Susmit Sarkar, Kayvan Memarian, Scott Owens, Mark Batty, Peter Sewell, Luc Maranget, Jade Alglave, and Derek Williams. 2012. Synchronising C/C++ and POWER. In PLDI’12. ACM, 311–322. https://doi.org/10.1145/2254064.2254102
[37]
Peter Sewell. 2022. C/C++11 mappings to processors. https://www.cl.cam.ac.uk/ pes20/cpp/cpp0xmappings.html Accessed April 5, 2023
[38]
Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O. Myreen. 2010. X86-TSO: A Rigorous and Usable Programmer’s Model for X86 Multiprocessors. Commun. ACM, 53, 7 (2010), July, 89–97. issn:0001-0782 https://doi.org/10.1145/1785414.1785443
[39]
The OpenCAPI Consortium. 2021. The OpenCAPI Consortium. https://opencapi.org/ Accessed: 5th July 2022
[40]
Andrew Waterman and Krste Asanovic. 2019. The RISC-V Instruction Set Manual, Volume I: Unprivileged ISA. https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf
[41]
Sizhuo Zhang, Muralidaran Vijayaraghavan, and Arvind. 2017. Weak Memory Models: Balancing Definitional Simplicity and Implementation Flexibility. In 26th International Conference on Parallel Architectures and Compilation Techniques, PACT 2017, Portland, OR, USA, September 9-13, 2017. IEEE Computer Society, 288–302. https://doi.org/10.1109/PACT.2017.29

Cited By

View all
  • (2024)Extending the C/C++ Memory Model with Inline AssemblyProceedings of the ACM on Programming Languages10.1145/36897498:OOPSLA2(1081-1107)Online publication date: 8-Oct-2024
  • (2024)UNIFICO: Thread Migration in Heterogeneous-ISA CPUs without State TransformationProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641565(86-99)Online publication date: 17-Feb-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 7, Issue PLDI
June 2023
2020 pages
EISSN:2475-1421
DOI:10.1145/3554310
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2023
Published in PACMPL Volume 7, Issue PLDI

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. coherence protocols
  2. compound memory models
  3. consistency models

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)550
  • Downloads (Last 6 weeks)66
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Extending the C/C++ Memory Model with Inline AssemblyProceedings of the ACM on Programming Languages10.1145/36897498:OOPSLA2(1081-1107)Online publication date: 8-Oct-2024
  • (2024)UNIFICO: Thread Migration in Heterogeneous-ISA CPUs without State TransformationProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641565(86-99)Online publication date: 17-Feb-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media