research-article

Open access

The semantics of shared memory in Intel CPU/FPGA systems

Authors:

Dan Iorga,

Alastair F. Donaldson,

Tyler Sorensen,

John WickersonAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 5, Issue OOPSLA

Article No.: 120, Pages 1 - 28

https://doi.org/10.1145/3485497

Published: 15 October 2021 Publication History

PDF eReader

Abstract

Heterogeneous CPU/FPGA devices, in which a CPU and an FPGA can execute together while sharing memory, are becoming popular in several computing sectors. In this paper, we study the shared-memory semantics of these devices, with a view to providing a firm foundation for reasoning about the programs that run on them. Our focus is on Intel platforms that combine an Intel FPGA with a multicore Xeon CPU. We describe the weak-memory behaviours that are allowed (and observable) on these devices when CPU threads and an FPGA thread access common memory locations in a fine-grained manner through multiple channels. Some of these behaviours are familiar from well-studied CPU and GPU concurrency; others are weaker still. We encode these behaviours in two formal memory models: one operational, one axiomatic. We develop executable implementations of both models, using the CBMC bounded model-checking tool for our operational model and the Alloy modelling language for our axiomatic model. Using these, we cross-check our models against each other via a translator that converts Alloy-generated executions into queries for the CBMC model. We also validate our models against actual hardware by translating 583 Alloy-generated executions into litmus tests that we run on CPU/FPGA devices; when doing this, we avoid the prohibitive cost of synthesising a hardware design per litmus test by creating our own 'litmus-test processor' in hardware. We expect that our models will be useful for low-level programmers, compiler writers, and designers of analysis tools. Indeed, as a demonstration of the utility of our work, we use our operational model to reason about a producer/consumer buffer implemented across the CPU and the FPGA. When the buffer uses insufficient synchronisation -- a situation that our model is able to detect -- we observe that its performance improves at the cost of occasional data corruption.

Supplementary Material

Auxiliary Presentation Video (oopsla21main-p99-p-video.mp4)

This is a presentation video for our OOPSLA 2021 paper "The Semantics of Shared Memory in Intel CPU/FPGA Systems". Dan Iorga is the presenter in the video.

Download
46.94 MB

References

[1]

Maleen Abeydeera, Manupa Karunaratne, Geethan Karunaratne, Kalana De Silva, and Ajith Pasqual. 2016. 4K Real-Time HEVC Decoder on an FPGA. IEEE Transactions on Circuits and Systems for Video Technology, 26, 1 (2016), Jan, 236–249. https://doi.org/10.1109/TCSVT.2015.2469113

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Intel nehalem processor core made FPGA synthesizable

Stream Processing on Hybrid CPU/Intel® Xeon Phi™ Systems

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Badges

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations

Stream Processing on Hybrid CPU/Intel^® Xeon Phi^™ Systems