Mixed-size concurrency: ARM, Power, C/C++ 11, and SC

S Flur, S Sarkar, C Pulte, K Nienhuis, L Maranget… - ACM SIGPLAN …, 2017 - dl.acm.org
ACM SIGPLAN Notices, 2017dl.acm.org
Previous work on the semantics of relaxed shared-memory concurrency has only
considered the case in which each load reads the data of exactly one store. In practice,
however, multiprocessors support mixed-size accesses, and these are used by systems
software and (to some degree) exposed at the C/C++ language level. A semantic foundation
for software, therefore, has to address them. We investigate the mixed-size behaviour of
ARMv8 and IBM POWER architectures and implementations: by experiment, by developing …
Previous work on the semantics of relaxed shared-memory concurrency has only considered the case in which each load reads the data of exactly one store. In practice, however, multiprocessors support mixed-size accesses, and these are used by systems software and (to some degree) exposed at the C/C++ language level. A semantic foundation for software, therefore, has to address them.
We investigate the mixed-size behaviour of ARMv8 and IBM POWER architectures and implementations: by experiment, by developing semantic models, by testing the correspondence between these, and by discussion with ARM and IBM staff. This turns out to be surprisingly subtle, and on the way we have to revisit the fundamental concepts of coherence and sequential consistency, which change in this setting. In particular, we show that adding a memory barrier between each instruction does not restore sequential consistency. We go on to extend the C/C++11 model to support non-atomic mixed-size memory accesses.
This is a necessary step towards semantics for real-world shared-memory concurrent code, beyond litmus tests.
ACM Digital Library