research-article

Open access

Compositional Reasoning for Non-multicopy Atomic Architectures

Authors:

Nicholas Coughlin,

Kirsten Winter,

Graeme SmithAuthors Info & Claims

Formal Aspects of Computing, Volume 35, Issue 2

Article No.: 8, Pages 1 - 30

https://doi.org/10.1145/3574137

Published: 23 June 2023 Publication History

PDF eReader

Abstract

Rely/guarantee reasoning provides a compositional approach to reasoning about concurrent programs. However, such reasoning traditionally assumes a sequentially consistent memory model and hence is unsound on modern hardware in the presence of data races. In this article, we present a rely/guarantee-based approach for non-multicopy atomic weak memory models, i.e., where a thread’s stores are not simultaneously propagated to all other threads and hence are not observable by other threads at the same time. Such memory models include those of the earlier versions of the ARM processor as well as the POWER processor.

This article builds on our approach to compositional reasoning for multicopy atomic architectures, i.e., where a thread’s stores are simultaneously propagated to all other threads. In that context, an operational semantics can be based on thread-local instruction reordering. We exploit this to provide an efficient compositional proof technique in which weak memory behaviour can be shown to preserve rely/guarantee reasoning on a sequentially consistent memory model. To achieve this, we introduce a side-condition, reordering interference freedom on each thread, reducing the complexity of weak memory to checks over pairs of reorderable instructions.

In this article, we extend our approach to non-multicopy atomic weak memory models. We utilise the idea of reordering interference freedom between parallel components. This by itself would break compositionality but serves as a vehicle to derive a refined compatibility check between rely and guarantee conditions, which takes into account the effects of propagations of stores that are only partial, i.e., not covering all threads. All aspects of our approach have been encoded and proved sound in Isabelle/HOL.

1 Introduction

Reasoning about concurrent programs with interference over shared resources is a complex task. The interleaving of all thread behaviours leads to an exponential explosion in observable behaviour. Rely/guarantee reasoning [19] is one approach to reduce the complexity of the verification task. It enables reasoning about one thread at a time by considering an abstraction of the thread’s environment given as a rely condition on shared resources. This abstraction is justified by proving that all other threads in the environment guarantee the assumed rely condition. The approach limits the interference between threads to the effects of the rely condition (specified as a relation over states).

Xu et al. [48] show how rely/guarantee reasoning can be used to allow reasoning over individual threads in a concurrent program using Hoare logic [18]. We introduce a similar approach in Reference [46] to allow thread-local reasoning, in the context of information flow security, using weakest precondition calculation [13]. These approaches work equally well for concurrent programs executed on weak memory models under the implicit assumption that the code is data-race-free. This is a reasonable assumption given that most programmers avoid data races due to them leading to unexpected behaviour when the code’s execution is optimised under the weak memory model of the compiler [6, 21] or underlying hardware [3, 10, 38]. However, data races may be introduced inadvertently by programmers, or programmers may introduce data races for efficiency reasons, as seen in non-blocking algorithms [29]. These algorithms appear regularly in the low-level code of operating systems, e.g., seqlock [7] is used routinely in the Linux kernel, and software libraries, e.g., the Michael-Scott queue [28] is used as the basis for Java’s ConcurrentLinkedQueue in java.util.concurrent.

This article defines a proof system for rely/guarantee reasoning that is parameterised by the weak memory model under consideration. In a previous publication [11], we restricted our focus to those memory models that are multicopy atomic, i.e., where a thread’s stores become observable to all other threads at the same time. This includes the memory models of x86-TSO [37], ARMv8 [33], and RISC-V [45] processor architectures, but not POWER [36], older ARM processors [16], nor C11 [6]. As shown by Colvin and Smith [10], multicopy atomic memory models can be captured in terms of instruction reordering. That is, they can be characterised by a reordering relation over pairs of instructions in a thread’s code, indicating when two instructions may execute out of order. This has been validated against the same sets of litmus test used to validate the widely accepted weak memory semantics of Alglave et al. [3].

Consequently, the implications of weak memory can be captured thread-locally, enabling compositional reasoning. However, thread-local reasoning under such a semantics is non-trivial. Instruction reordering introduces interference within a single thread, similar to the effects of interference between concurrent threads and equally hard to reason about. For instance, a thread with n reorderable instructions may have $n!$ behaviours due to possible reordering. To tackle such complexity, we exploit the fact that many of these instructions will not influence the behaviour of others. We reduce the verification burden to a standard rely/guarantee judgement [48], over a sequentially consistent memory model, and a consideration of the pair-wise interference between reorderable instructions in a thread, totalling $n (n - 1) / 2$ pairs given n reorderable instructions. The resulting proof technique has been automated and shown to be sound on both a simple while language and an abstraction of ARMv8 assembly code using Isabelle/HOL [31] (see https://bitbucket.org/wmmif/wmm-rg).

This article extends the work of Reference [11] in that it additionally provides an approach to compositional reasoning for non-multicopy atomic architectures. For non-multicopy atomic processors such as POWER and older versions of ARM, the semantics of Colvin and Smith refers to a storage subsystem to capture each component’s view of the global memory. This view depends on the propagations of writes performed by the hardware. It is this view of a component that provides the point of reference for the rely/guarantee reasoning.

To capture the semantics of propagations that deliver a particular view, one can utilise the notion of instruction reorderings between components. However, reasoning about such reorderings can not be performed thread-locally, and the compositionality of the approach would be lost. Instead, we reason over reorderings of an instruction with behaviours of the rely condition, which abstractly represents the behaviours of the instructions of other components. By lifting the argument of reordering interference freedom between components to the abstract level, compositionality is maintained. We show how this global reordering interference freedom manifests itself in our theory as a specialisation of the compatibility between guarantee and rely conditions that is standard in rely/guarantee reasoning [19].

We begin the article in Section 2 with a formalisation of a basic proof system for rely/guarantee reasoning introduced in Reference [48]. In Section 3, we abstractly introduce reordering semantics for weak memory models and our notion of reordering interference freedom, which suffices to account for the effects of the weak memory model under multicopy atomicity. We also discuss the practical implications of the approach. To take the effects of non-multicopy atomicity into account, Section 4 introduces the additional notion of global reordering interference freedom, which is encoded into the proof system via a refined compatibility check. In Section 5, we present the instantiation of the approach with a simple language and demonstrate reasoning in Section 6 by means of an example. We elaborate on related work in Section 7 and conclude in Section 8.

2 Preliminaries

The language for our framework is purposefully kept abstract so it can be instantiated for different programming languages. It consists of individual instructions $\alpha$, whose executions are atomic, and commands (or programs) c, which are composed of instructions using sequential composition, nondeterministic choice, iteration, and parallel composition. Commands also include the empty program $\epsilon$ denoting termination.

\begin{equation*} c ::= \epsilon \mid \alpha \mid c_1 \, ; \,c_2 \mid c_1 \sqcap c_2 \mid c^* \mid c_1 \! \parallel \!c_2 \end{equation*}

Note that conditional instructions (such as if-then-else and loops) and their evaluation are modelled via silent steps making a nondeterministic choice during the execution of a program (see Section 5).

A configuration of a program is a pair $(c, \sigma)$, consisting of a command c to be executed and state $\sigma$ (a mapping from variables to values) in which it executes. The behaviour of a component, or thread, in a concurrent program can be described via steps the program, including its environment, can perform during execution, each modelled as a relation between the configurations before and after the step. A program step, denoted as $(c,\sigma) \overset{\!\!\scriptscriptstyle ps}{\rightarrow }(c^{\prime },\sigma ^{\prime })$, describes a single step of the component itself and changes the command (i.e., the remainder of the program). A program step may be an action step $(c,\sigma) \overset{\!\!\scriptscriptstyle as}{\rightarrow }(c^{\prime },\sigma ^{\prime }),$ which performs an instruction that also changes the state, or a silent step, $(c,\sigma) \leadsto (c^{\prime },\sigma),$ which does not execute an instruction but makes a choice and thus changes the command only. Hence, $\overset{\!\!\scriptscriptstyle ps}{\rightarrow }\,=\, (\overset{\!\!\scriptscriptstyle as}{\rightarrow }\cup \leadsto)$. An environment step, $(c,\sigma) \overset{\!\!\scriptscriptstyle es}{\rightarrow }(c,\sigma ^{\prime })$, describes a step of the environment (performed by any of the other concurrent components); it may alter the state but not the remainder of the program (of the component).

Program execution is defined via a small-step semantics over the command. Choice or iteration over commands result in multiple executions.

\begin{equation} \begin{array}{l} \alpha \mapsto _{\alpha }\epsilon \\ {[}1ex] c_1\, ; \,c_2 \mapsto _{\alpha }c_1^{\prime }\, ; \,c_2 \text{ if }c_1 \mapsto _{\alpha }c_1^{\prime }\\ {[}1ex] c_1\! \parallel \!c_2 \mapsto _{\alpha }c_1^{\prime }\! \parallel \!c_2 \text{ if } c_1 \mapsto _{\alpha }c_1^{\prime } \text{ or } c_1\! \parallel \!c_2 \mapsto _{\alpha }c_1\! \parallel \!c^{\prime }_2 \text{ if }c_2 \mapsto _{\alpha }c_2^{\prime } \end{array} \end{equation}

(1)

The semantics of program steps is based on the evaluation of instructions. Each atomic instruction $\alpha$ has a relation over (pre- and post-) states $beh(\alpha)$, formalising its execution behaviour. A program step $(c,\sigma) \overset{\!\!\scriptscriptstyle as}{\rightarrow }(c^{\prime },\sigma ^{\prime })$ requires an execution $c \mapsto _{\alpha }c^{\prime }$ to occur such that the state is updated according to the executed instruction $\alpha$, i.e.,

\begin{equation} (c,\sigma) \overset{\!\!\scriptscriptstyle as}{\rightarrow }(c^{\prime }, \sigma ^{\prime }) \iff \exists \alpha .\ c \mapsto _{\alpha }c^{\prime } \wedge (\sigma , \sigma ^{\prime }) \in beh(\alpha). \end{equation}

(2)

2.1 Rely/guarantee Reasoning

A proof system for rely/guarantee reasoning in a Hoare logic style has been defined in Reference [48]. Our approach largely follows its definitions, but includes a customisable verification condition, vc, with each instruction. This verification condition serves to capture the state an instruction must execute under to enforce properties such as the component’s guarantee and potentially more specialised analyses. For example, in an information flow security analysis (cf. Reference [46]), it can be used to check that the value assigned to a publicly accessible variable is not classified. We define a Hoare triple as follows: For simplicity of presentation, we treat predicates as sets of states.

\begin{equation} P \lbrace \alpha \rbrace Q ~\widehat{=}~P \subseteq vc(\alpha) \cap \lbrace \sigma \mid \forall \sigma ^{\prime } . \, (\sigma , \sigma ^{\prime }) \in beh(\alpha) \Rightarrow \sigma ^{\prime } \in Q \rbrace \end{equation}

(3)

Equivalently, the Hoare triple can be expressed as $P \subseteq vc(\alpha) \cap wp(beh(\alpha), Q)$, using the definition of weakest preconditions [13].

The rely and guarantee conditions of a thread, denoted ${\mathcal {R}}$ and ${\mathcal {G}}$, respectively, are relations over (pre- and post-) states. The rely condition captures allowable environments steps and the guarantee constrains all program steps. A rely/guarantee pair $({\mathcal {R}}$, ${\mathcal {G}})$ is well-formed when the rely condition is reflexive and transitive, and the guarantee condition is reflexive.

Given that ${\mathcal {R}}$ is transitive, stability of a predicate P under rely condition ${\mathcal {R}}$ is defined such that ${\mathcal {R}}$ maintains P.

\begin{equation} stable_{\mathcal {R}}(P) ~\widehat{=}~ P \subseteq \lbrace \sigma \mid \forall \sigma ^{\prime } . \, (\sigma , \sigma ^{\prime }) \in {\mathcal {R}}\Rightarrow \sigma ^{\prime } \in P \rbrace \end{equation}

(4)

The conditions under which an instruction satisfies ${\mathcal {G}}$ is defined as

\begin{equation} sat(\alpha , {\mathcal {G}}) ~\widehat{=}~ \lbrace \sigma \mid \, \forall \sigma ^{\prime } . \, (\sigma , \sigma ^{\prime }) \in beh(\alpha) \Rightarrow (\sigma ,\sigma ^{\prime }) \in {\mathcal {G}}\rbrace . \end{equation}

(5)

These ingredients allow us to introduce a rely/guarantee judgement. We do this on three levels: the instruction level $\vdash _{\sf a}$, the component level $\vdash _{\sf c}$, and the global level $\vdash$. On the instruction level the judgement requires that the pre- and postcondition are stable under ${\mathcal {R}}$. This ensures that these conditions, and hence the Hoare triple, hold despite any environmental interference. Additionally, the judgement requires that the instruction satisfies the guarantee ${\mathcal {G}}$.

\begin{equation} \begin{array}{rl} {\mathcal {R}}, {\mathcal {G}}\vdash _{\sf a}P \lbrace \alpha \rbrace Q ~\widehat{=}~ stable_{\mathcal {R}}(P) \wedge stable_{\mathcal {R}}(Q) \wedge vc(\alpha) \subseteq sat(\alpha , {\mathcal {G}}) \wedge P \lbrace \alpha \rbrace Q \end{array} \end{equation}

(6)

A rely/guarantee proof system on the component and global levels follows straightforwardly and is given in Figure 1. At the level of single instructions, the interplay between environment interference (captured by ${\mathcal {R}}$) and the precondition P required to achieve the postcondition Q manifests itself through the stability condition on both predicates in Definition (6). For example, if the second instruction in the code snippet $(sync:=1 \, ; \,x := secret)$ critically depends on the fact that the synchronising variable sync is set to 1, then this precondition is satisfied within this component itself (as the first instruction updates sync accordingly). However, if a parallel thread is able to modify sync at any time (modelled by ${\mathcal {R}}$ not constraining the value of sync in the next state), then this precondition can be invalidated by the environment before the second instruction executes. Whether this is possible or not is checked via the predicate $stable_{\mathcal {R}}(P)$. Correspondingly, we check that the postcondition is not invalidated by environment behaviour. At the component level, note the necessity for the invariant of the [Iteration] rule to be stable (such that it continues to hold amid environmental interference). At the global level, the rule for parallel composition [Par] includes a compatibility check ensuring that the guarantee for each component implies the rely conditions of the other component. A standard [Conseq] rule over global satisfiability is supported by the proof system, but omitted in Figure 1.

Fig. 1.

Such rules are standard to rely/guarantee reasoning [48]. Our modification can be seen in [Comp], in which global satisfiability is deduced from component satisfiability $\vdash _{\sf c}$ plus an additional check on reordering interference freedom, $\mathit {rif}({\mathcal {R}},{\mathcal {G}},c)$, which we introduce in Section 3.2. As a consequence, component-based reasoning in this proof system is based on standard rely/guarantee reasoning, which can be conducted independently from the interference check.

Moreover, the proof system supports a notion of auxiliary variables, common to rely/guarantee reasoning [39, 48]. These variables increase the expressiveness of the specification (${\mathcal {R}}$, ${\mathcal {G}}$, P, and Q) by representing properties of intermediate execution states. Auxiliary variables cannot influence program execution, as they are abstract, and their modification must be coupled with an instruction such that the instruction and the update of the auxiliary variable are executed in one step, i.e., atomically.

3 Multicopy Atomic Memory Models

Weak memory models are commonly defined to maintain sequentially consistent behaviour given the absence of data races, thereby greatly simplifying reasoning for the majority of programs. However, as we are interested in the analysis of racy concurrent code, it is necessary to reason on a semantics that fully captures the behaviours these models may introduce.

Colvin and Smith [10] show that weak memory behaviour for multicopy atomic processors such as x86-TSO, ARMv8, and RISC-V can be captured in terms of instruction reordering. A memory model, in these cases, is characterised by a reordering relation over pairs of instructions indicating whether the two instructions can execute out-of-order when they appear in a component’s code. This complicates reasoning significantly. For example, one needs to determine whether an instruction $\alpha$ that is reordered to execute earlier in a program can invalidate verification conditions that are satisfiable under normal executions (following the program order without reordering). In that sense, we are facing not only interference between concurrent components (which can be visualised as horizontal interference) but also interference between the instructions within one component (which can be pictured as vertical interference).

3.1 Reordering Semantics

The reordering relation, $\hookleftarrow$, of a component is syntactically derivable based on the rules of the specific memory model (see Section 3.3). In ARMv8, for example, two instructions that do not access (write or read) a common variable are deemed semantically independent and can change their execution order. Moreover, weak memory models support various memory barriers that prevent particular forms of reordering. For example, a full fence prevents all reordering, while a control fence prevents speculative execution (for a complete definition, refer to Reference [10]).

Matters are complicated by the concept of forwarding, where an instruction that reads from a variable written in an earlier instruction might replace the reading access with the written value, thereby eliminating the dependence to the variable in common. This allows it to execute earlier, anticipating the write before it happens. For example, in the snippet x := 3; y := x the second instruction can replace the read from x by the value 3 written by the first instruction, thereby losing the dependency between the write to x and the read of x and being able to reorder the second instruction before the first. Thus, forwarding would result in an execution of y := 3; x := 3. We denote the instruction $\alpha$ with the value written in an earlier instruction $\beta$ forwarded to it as $\alpha _{\langle \beta \rangle }$. Note that $\alpha _{\langle \beta \rangle } = \alpha$ whenever $\beta$ does not write to a variable that is read by $\alpha$.

Forwarding can span a series of instructions and can continue arbitrarily, with later instructions allowed to replace variables introduced by earlier forwarding modifications. The ternary relation $\gamma \prec c \prec \alpha$ denotes reordering of the instruction $\alpha$ prior to the command c, with the cumulative forwarding effects producing $\gamma$ [9]. $\alpha _{\langle \ \langle c\rangle \ \rangle }$ denotes the cumulative forwarding effects of the instructions in command c on $\alpha$. We define both terms recursively over c.

\begin{equation} \begin{array}{rcl} \alpha _{\langle \beta \rangle } \prec \beta \prec \alpha & \widehat{=}& \beta \hookleftarrow \alpha _{\langle \beta \rangle }\\ \alpha _{\langle \ \langle c_1\, ; \,c_2\rangle \ \rangle } \prec c_1\, ; \,c_2 \prec \alpha & \widehat{=}& \alpha _{\langle \ \langle c_1\, ; \,c_2\rangle \ \rangle } \prec c_1 \prec \alpha _{\langle \ \langle c_2\rangle \ \rangle } ~\wedge ~ \alpha _{\langle \ \langle c_2\rangle \ \rangle } \prec c_2 \prec \alpha \\ \end{array} \end{equation}

(7)

For example, let $\alpha =$ (y := x) and $\beta =$ (x := 3), then we have $(y:=3) \prec (x:=3) \prec (y:=x)$ such that $\alpha _{\langle \beta \rangle } =$ (y := 3). Prepending $\beta$ with another instruction, $\gamma =$ (z := 5), would result in $(y:=3) \prec (z:=5 ; x:=3) \prec (y:=x)$ such that also $\alpha _{\langle \ \langle \gamma \, ; \,\beta \rangle \ \rangle } =$ (y := 3).

To capture the effects of reordering, we extend the definition of executions (1) with an extra rule that captures out-of-order executions: A step can execute an instruction whose original form occurs later in the program if reordering and forwarding can bring it (in its new form $\gamma$) to the beginning of the program.

\begin{equation} c_1\, ; \,c_2 \mapsto _{\gamma } c_1\, ; \,c^{\prime }_2 \text{ if } \gamma \prec c_1 \prec \alpha \wedge c_2 \mapsto _{\alpha }c^{\prime }_2 \end{equation}

(8)

3.2 Reordering Interference Freedom

Our aim is to eliminate the implications of this reordering behaviour and, therefore, enable standard rely/guarantee reasoning despite a weak memory context. To achieve this, we note that a valid reordering transformation will preserve the thread-local semantics and, hence, will only invalidate reasoning when observed by the environment. Such interactions are captured either as invalidation of the component’s guarantee ${\mathcal {G}}$ or new environment behaviours, as allowed by its rely condition ${\mathcal {R}}$. Consequently, reorderings may be considered benign if the modified variables are not related by ${\mathcal {G}}$ or ${\mathcal {R}}$.

We capture such benign reorderings via reordering interference freedom. Two instructions are said to be reordering interference free($\mathit {rif}$) if we can show that reasoning over the instructions in their original (program) order is sufficiently strong to also include reasoning over their reordered behaviour. Consider the program text $\beta \, ; \,\alpha$, where $\alpha$ can be forwarded and executed before $\beta$, resulting in an execution equivalent to $\alpha _{\langle \beta \rangle } \, ; \,\beta$. Reordering interference freedom between $\alpha$ and $\beta$ under given rely/guarantee conditions is then formalised as follows:

\begin{equation} \begin{array}{l} \mathit {rif}_{\sf a}({\mathcal {R}}, {\mathcal {G}}, \beta , \alpha) ~\widehat{=}~ \forall P, Q, M.\ {\mathcal {R}}, {\mathcal {G}}\vdash _{\sf a}P \lbrace \beta \rbrace M \wedge {\mathcal {R}}, {\mathcal {G}}\vdash _{\sf a}M \lbrace \alpha \rbrace Q \Rightarrow \\ \\ \ \exists M^{\prime }.\ {\mathcal {R}}, {\mathcal {G}}\vdash _{\sf a}P \lbrace \alpha _{\langle \beta \rangle }\rbrace M^{\prime } \wedge {\mathcal {R}}, {\mathcal {G}}\vdash _{\sf a}M^{\prime }\lbrace \beta \rbrace Q . \end{array} \end{equation}

(9)

Importantly, $\mathit {rif}_{\sf a}$ is defined independently of the pre- and post-states of the given instructions, as can be seen by the universal quantification over P, M and Q in (9). This independence allows for the establishment of $\mathit {rif}_{\sf a}$ across a program via consideration of only pairs of reorderable instructions, rather than that of all execution traces under which they may be reordered. Such an approach dramatically reduces the complexity of reasoning in the presence of reordering, from one of $n!$ transformed programs for n reorderable instructions to $n (n - 1) / 2$ pairs.

The definition of $\mathit {rif}_{\sf a}$ extends inductively over commands c with which $\alpha$ can reorder. Command c is reordering interference free from $\alpha$ under ${\mathcal {R}}$ and ${\mathcal {G}}$, if the reordering of $\alpha$ over each instructions of c is interference free, including those variants of $\alpha$ produced by forwarding.

\begin{equation} \begin{array}{rcl} \mathit {rif}_{\sf c}({\mathcal {R}}, {\mathcal {G}}, \beta , \alpha) & = & \mathit {rif}_{\sf a}({\mathcal {R}}, {\mathcal {G}}, \beta , \alpha)\\ \\ \mathit {rif}_{\sf c}({\mathcal {R}}, {\mathcal {G}}, c_1\, ; \,c_2, \alpha) & = & \mathit {rif}_{\sf c}({\mathcal {R}}, {\mathcal {G}}, c_1, \alpha _{\langle \ \langle c_2\rangle \ \rangle }) \wedge \mathit {rif}_{\sf c}({\mathcal {R}}, {\mathcal {G}}, c_2, \alpha) \\ \end{array} \end{equation}

(10)

From the definition of executions including reordering behaviour given in (8) we have $c \mapsto _{\alpha _{\langle \ \langle r\rangle \ \rangle }} \! c^{\prime } \Rightarrow r\, ; \,\alpha \in \mathit {prefix}(c) \wedge \alpha _{\langle \ \langle r\rangle \ \rangle }\! \prec \! r\! \prec \!\alpha$, where $\mathit {prefix}(c)$ refers to the set of prefixes of c. Program c is reordering interference free if and only if all possible reorderings of its instructions over the respective prefixes are reordering interference free.

\begin{equation} \begin{array}{l} \mathit {rif}({\mathcal {R}}, {\mathcal {G}}, c) ~\widehat{=}~ \forall \alpha , r, c^{\prime }.\, c \mapsto _{\alpha _{\langle \ \langle r\rangle \ \rangle }} c^{\prime } \Rightarrow \mathit {rif}_{\sf c}({\mathcal {R}},{\mathcal {G}}, r, \alpha) \wedge \mathit {rif}({\mathcal {R}}, {\mathcal {G}}, c^{\prime }) \end{array} \end{equation}

(11)

As can be seen from the definitions, checking $\mathit {rif}({\mathcal {R}}, {\mathcal {G}}, c)$ amounts to checking $\mathit {rif}_{\sf a}({\mathcal {R}}, {\mathcal {G}}, \beta , \alpha)$ for all pairs of instructions $\beta$ and $\alpha$ that can reorder in c, including those pairs for which $\alpha$ is a new instruction generated through forwarding. Therefore, one can reason about a component’s code as follows:

(1)

Compute all pairs of reorderable instructions, i.e., each pair of instructions $(\beta , \alpha)$ such that there exists an execution trace where $\alpha$ reorders before $\beta$ according to the memory model under consideration.

(2)

Demonstrate reordering interference freedom for as many of these pairs as possible (using $\mathit {rif}_{\sf a}({\mathcal {R}},{\mathcal {G}},\beta ,\alpha)$).

(3)

If $\mathit {rif}_{\sf a}$ cannot be shown for some pairs, then introduce memory barriers to prevent their reordering or modify the verification problem such that their reordering can be considered benign.

(4)

Verify the component in isolation, using standard rely/guarantee reasoning with an assumed sequentially consistent memory model.

We detail steps 1–3 in the following sections and assume the use of any standard rely/guarantee reasoning approach for step 4.

3.3 Computing All Reorderable Instructions

Pairs of potentially reorderable instructions can be identified via a dataflow analysis [22], similar to dependence analysis commonly used in compiler optimisation. However, rather than attempting to establish an absence of dependence, we are interested in demonstrating its presence, such that instruction reordering is not possible during execution. This notion of dependence is derived from the language’s reordering relation, such that $\alpha$ is dependent on $\beta$ iff $\beta \not\!\hookleftarrow \alpha$. All pairs of instructions for which a dependence cannot be established are assumed reorderable.

The approach is constructed as a backwards analysis over a component’s program text, incrementally determining the instructions a particular instruction is dependent on and, inversely, those it can reorder before. Therefore, the analysis can be viewed as a series of separate analyses, one from the perspective of each instruction in the program text.

We describe one instance of this analysis for some instruction $\alpha$. The analysis records a notion of $\alpha$’s cumulative dependencies, which simply begins as all instructions $\gamma$ for which $\gamma \not\!\hookleftarrow \alpha$. The analysis commences at the instruction immediately prior to $\alpha$ in the program text and progresses backwards. For each instruction $\beta$, we first determine if $\alpha$ depends on $\beta$ by consulting $\alpha$’s cumulative dependencies. Given a dependence exists, $\alpha$’s cumulative dependencies are extended to include $\beta$’s dependencies via a process we refer to as strengthening, such that the analysis may subsequently identify those instructions $\alpha$ is dependent on due to its dependence on $\beta$. If a dependence on $\beta$ cannot be shown, then the instructions are considered reorderable, subsequently requiring $\mathit {rif}_{\sf a}({\mathcal {R}},{\mathcal {G}},\beta ,\alpha)$ to be shown. Moreover, a process of weakening is necessary to remove $\alpha$’s cumulative dependencies that $\beta$ may resolve due to forwarding.

To illustrate the evolving nature of cumulative dependencies, consider the sequence $\beta \, ; \,\gamma \, ; \,\alpha$ where $\gamma \not\!\hookleftarrow \alpha$ and $\beta \not\!\hookleftarrow \gamma$ but $\beta \hookleftarrow \alpha$. The analysis from the perspective of $\alpha$ starts at $\gamma$ and identifies a dependence, due to $\gamma \not\!\hookleftarrow \alpha$. Therefore, $\alpha$ gains $\gamma$’s dependencies via strengthening. The analysis progresses to the next instruction, $\beta$, for which a dependence can be established due to $\alpha$’s cumulative dependencies including $\beta \not\!\hookleftarrow \gamma$. Consequently, despite no direct dependency between $\alpha$ and $\beta$, the sequence does not produce reordering pairs for $\alpha$. Repeating this process for $\gamma$ and $\beta$ ultimately finds no reordering pairs over the entire sequence, resulting in no $\mathit {rif}_{\sf a}$ checks.

A realistic implementation of this analysis is highly dependent on the language’s reordering relation. In most examples, this relation only considers the variables accessed by the instructions and special case behaviours for memory barriers, as illustrated by the instantiation in Section 5. Consequently, cumulative dependencies can be efficiently represented as sets of such information, for example capturing the variables read by $\alpha$ and those instructions it depends on. This representation lends itself to efficient set-based manipulations for strengthening and weakening.

The analysis has been implemented for both a simple while language and an abstraction of ARMv8 assembly, with optimisations to improve precision in each context. In particular, precision can be improved through special handling of the forwarding case, as the effects of forwarding typically result in trivial $\mathit {rif}_{\sf a}$ checks. The implementations have been encoded and verified in Isabelle/HOL, along with proofs of termination (following the approach suggested in Reference [30]).

3.3.1 Address Calculations.

Dependence analysis is considerably more complex in the presence of address calculations. Under such conditions, it is not possible to syntactically identify whether two instructions access equivalent addresses, complicating an essential check to establishing dependence. Without sufficient aliasing information, the analysis must over-approximate and consider the two addresses distinct, potentially introducing excess reordering pairs.

The precision of the analysis can be improved using an alias analysis to first identify equivalent address calculations, feeding such information into the dependency checks. Precision may also be improved by augmenting the interference check, ${\mathit {rif}_{\sf a}}$, with any calculations that have been assumed to be distinct. For example, consider ${ [x] := e; [y] := f}$, where ${ [v] := e}$ represents a write to the memory address computed by the expression ${ v}$. If an alias analysis cannot establish ${ x = y}$, then it is necessary to consider their interference. As they are assumed to reorder, a proof demonstrating $\mathit {rif}_{\sf a}({\mathcal {R}},{\mathcal {G}},{ [x] := e},{ [y] := f})$ can assume ${ x \ne y}$. Such a property extends to any other comparisons with cumulative dependencies.

We have implemented such improvements in our analysis for ARMv8, relying on manual annotations to determine aliasing address calculations. These aliasing annotations are subsequently added to each instruction’s verification condition to ensure they are sound.

3.4 Interference Checking

Given the set of reordering pairs, it is necessary to demonstrate ${\mathit {rif}_{\sf a}}$ on each to demonstrate freedom of reordering interference. Many ${\mathit {rif}_{\sf a}}$ properties can be shown trivially. For example, if one instruction does not access shared memory, then ${\mathit {rif}_{\sf a}}$ can be immediately shown to hold, as no interference via ${\mathcal {R}}$ could take place. Additionally, if the two instructions access distinct variables and these variables are not related by ${\mathcal {R}}$, then no interference would be observed.

If these shortcuts do not hold, then it is necessary to consider ${\mathit {rif}_{\sf a}}$ directly. The property can be rephrased in terms of weakest precondition calculation [13], assisting automated verification.

3.5 Elimination of Reordering Interference

Step 3 of the process is intended to handle situations where $\mathit {rif}_{\sf a}$ cannot be shown for a particular pair of instructions. A variety of techniques can be applied in such conditions, depending on the overall verification goals. In some circumstances, a failure to establish $\mathit {rif}_{\sf a}$ indicates a problematic reordering such that the out-of-order execution of the instruction pair will violate any variation of the desired rely/guarantee reasoning. In such circumstances, it is necessary to prevent reordering through the introduction of a memory barrier.

As these barriers incur a performance penalty, this is not a suitable technique to correct all problematic pairs. Some reordering pairs can instead be resolved by demonstrating stronger properties during the standard rely/guarantee reasoning in step 4. We describe a series of techniques that can be employed to extract these stronger properties by modifying a program’s verification conditions and/or abstracting over its behaviour. These techniques, while incomplete, are easily automated and cover the majority of cases.

3.5.1 Strengthening.

Establishing $\mathit {rif}_{\sf a}$ may fail in cases where an instruction in a reordering pair modifies the other’s verification condition. In such circumstances, it is possible to strengthen verification conditions such that the interference becomes benign by capturing both the in-order and out-of-order execution behaviours. Given a reordering pair ($\beta$, $\alpha$), this is achieved by first determining the weakest P that solves $P\lbrace \alpha _{\langle \beta \rangle };\beta \rbrace (true)$, representing the implications of each instruction’s verification conditions when executed out-of-order. This P is then used to strengthen $\beta$’s verification condition, such that the stronger constraints are established during the standard rely/guarantee reasoning stage.

For example, consider the component $({ y=0})\lbrace { z := z + 1; x := y }\rbrace (true),$ where, due to a specialised analysis, the assignment to ${ x}$ has the verification condition ${ z=1 \vee y=0}$ (and that for the assignment to ${ z}$ is true). Assume that ${\mathcal {R}}$ is the identity relation, i.e., no variables are changed by environment steps, and ${\mathcal {G}}$ is true. The rely/guarantee reasoning to establish this judgement is trivial, as Q is true and $x := y$ will execute in a state where $y = 0$.

However, assuming the two assignments may be reordered, it is necessary to establish $\mathit {rif}_{\sf a}({\mathcal {R}}, {\mathcal {G}}$, ${ z:=z+1},{ x:=y})$. Unfortunately, such a property does not hold. For example, setting the pre-state of the program, P, to be ${ z=0}$ and the post-state, Q, to be true, we have $({ z=0})$ $\lbrace { z:=z+1}\rbrace$ $({ z=1}) \wedge ({ z=1}) \lbrace { x:=y}\rbrace (true)$ but not $\exists M^{\prime }.\ ({ z=0}) \lbrace { x:=y}\rbrace M^{\prime } \wedge M^{\prime } \lbrace { z:=z+1}\rbrace (true)$, since the verification condition of ${ x:=y}$ does not hold in the pre-state ${ z=0}$.

Applying the strengthening approach, we compute P for the out-of-order execution as ${ z = 1 \vee y = 0}$. This predicate is then used as the verification condition for ${ z:=z+1}$, which was originally true. With this strengthened verification condition, we have $\mathit {rif}_{\sf a}({\mathcal {R}}, {\mathcal {G}}, { z:=z+1}$, ${ x:=y})$, since $({ z=0}) \lbrace { z:=z+1}\rbrace ({ z=1}) \wedge ({ z=1}) \lbrace { x:=y}\rbrace (true)$ no longer holds.

With $\mathit {rif}$ established, the standard rely/guarantee reasoning in step 4 must demonstrate $({ y=0})\lbrace { z := z + 1; x := y}\rbrace (true)$, with the strengthened verification condition for $z := z + 1$. This obviously holds given $y = 0$ initially.

3.5.2 Ignored Reads.

An additional issue when correcting for $\mathit {rif}_{\sf a}$ derives from the quantification of the pre- and post-states. This quantification reduces the proof burden, such that only pairs of reorderable instruction must be considered, but can introduce additional proof effort where the precise pre- and post-states are well known and limited reordering takes place. For instance, consider the simple component $(true)\lbrace { x := 1; z := y} \rbrace ({ x = 1})$ with a rely specification that will preserve the values of ${ x}$ and ${ z}$ always and the value of ${ y}$ given ${ x = 1}$. The rely/guarantee reasoning to establish this judgement is trivial. However, the component will fail to demonstrate $\mathit {rif}_{\sf a}$ when considering the reordering of ${ x := 1}$ and ${ z := y}$, as their program order execution may establish the stronger $(true)\lbrace { x := 1; z := y} \rbrace ({ x = 1 \wedge z = y})$, whereas the reordered cannot.

We employ two techniques to amend such situations. The most trivial is a weakening of the component’s ${\mathcal {R}}$ specification to remove the relationship between ${ y}$ and ${ x}$, as it is unnecessary for the component’s verification. Otherwise, if this is not possible, then the component can be abstracted to $(true)\lbrace { x := 1; {\sf chaos}~z} \rbrace ({ x = 1})$, where ${ {\sf chaos}~v}$ encodes a write of any value to the variable ${ v}$. Consequently, the read of ${ y}$ is ignored. Both standard rely/guarantee reasoning and $\mathit {rif}$ can be established for this modified component, subsequently enabling verification of the original via a refinement argument.

We propose the automatic detection of those reads that do not impact reasoning and, therefore, can be ignored when establishing $\mathit {rif}$. In general, such situations are rare, as the analysis targets assembly code produced via compilation. Consequently, such unnecessary reads are eliminated via optimisation. Moreover, the ${\mathcal {R}}$ specification infrequently over-specifies constraints on the environment.

3.6 Soundness

Soundness of the proof system has been proven in Isabelle/HOL and is available in the accompanying theories at https://bitbucket.org/wmmif/wmm-rg. A proof sketch can be found in Appendix A.

3.7 Precision

The proof system is incomplete due to the over-approximations required to reduce reasoning to pairs of reorderable instructions. This is by design, as the approach benefits significantly from such simplifications and the problematic cases seem rare, particularly when the techniques suggested in Section 3.5 are applied. As an illustration of these problematic cases, consider $(P) \lbrace { x := v_1 ; y := v_2} \rbrace (true)$, where P is some precondition, the rely condition preserves the values of ${ x}$ and ${ y}$, and the guarantee is true. Moreover, assume the verification condition for ${ y := v_2}$ requires ${ x \ne y}$ and the instructions can reorder.

When considering both possible execution orderings, a sufficient precondition P would be $x \ne y \wedge v_1 \ne y$, as this captures the constraints imposed by the single verification condition. However, the $\mathit {rif}$ approach will introduce an additional, unnecessary condition to establish $\mathit {rif}_{\sf a}({\mathcal {R}},{\mathcal {G}},{ x:=v_1},{ y:=v_2})$. First, observe that ${ x := v_1}$ modifies the verification condition for ${ y := v_2}$. Therefore, the verification condition for ${ x := v_1}$ must be strengthened to ${ x \ne y}$, following the same approach as the example in Section 3.5. However, the resulting instructions are still not interference free, as ${ y := v_2}$ can now modify the new verification condition for ${ x := v_1}$. This can be resolved through an additional application of strengthening, extending the verification condition for ${ x := v_1}$ to ${ x \ne y \wedge x \ne v_2}$. Consequently, the approach requires a precondition P stronger than ${ x \ne y \wedge v_1 \ne y \wedge x \ne v_2}$, over-approximating the true requirements.

This failure can be attributed to the lack of delineation between the original components of a verification condition and those added due to strengthening, as interference checks on the latter are not necessary. We leave an appropriate encoding of such differences to future work.

4 Non-multicopy Atomic Weak Memory Models

Some modern hardware architectures, such as POWER and older versions of ARM, implement weaker memory models, referred to as non-multicopy atomic (NMCA), that cannot be fully characterised by a reordering relation. Under these architectures, a component’s writes may become observable to other components at different points in time. Consequently, there is no shared state that all components agree on throughout execution, invalidating a core assumption of standard rely/guarantee reasoning. Moreover, such systems provide weak guarantees in terms of the cumulativity of writes [3]. For instance, a component may observe the effect of another component’s instruction before writes that actually enabled the instruction’s execution. This substantially complicates reasoning, as it results in behaviour that appears to execute out of order, invalidating a traditional notion of causality.

Building on the work of Colvin and Smith [10], we observe that the state of any pair of components can at most differ by writes from other components that the pair has inconsistently observed. Therefore, we propose a simple modification to the rules introduced in Section 2 to support reasoning under such memory models and comment on potential improvements to the approach’s precision.

4.1 Write History Semantics

Non-multicopy atomic behaviour can be modelled as an extension to the reordering semantics introduced in Section 3.1, as demonstrated by Colvin and Smith [10]. Under this extension, each component is associated with a unique identifier, and the shared memory state is represented as a list of variable writes, i.e., $\langle w_1, w_2, w_3, \ldots \rangle$, with metadata to indicate which components have performed and observed particular writes. The order of events in this write history provides an overall order to the system’s events, with those later in the list being the most recent. Each $w_i$ is a write of the form ${(x \mapsto v)^{wr}_{rds}}$ assigning value v to variable x, with wr being the writer component’s identifier and rds the set of component identifiers that have observed the write. We introduce the definitions ${\sf writer} ({(x \mapsto v)^{wr}_{rds}}) = wr$, ${\sf readers} ({(x \mapsto v)^{wr}_{rds}}) = rds$ and ${\sf var} ({(x \mapsto v)^{wr}_{rds}}) = x$ to access metadata associated with a write.

To model the effects of instructions on the write history, it is necessary to associate each with the identifier for the component that executed them. Moreover, it is necessary to extract the instruction’s effects in a form suitable for manipulation of the write history. To resolve these issues, we restrict the possible instructions that the language may execute over the global state to either store instructions of the form $(x := v)_i$, denoting a write to variable x of the constant value v from component i; load instructions of the form $[x = v]_i$, asserting the variable x must hold constant value v from the perspective of component i; memory barriers such as ${\sf fence}_i$, corresponding to the execution of a ${\sf fence}$ by component i; or silent skip instructions, in which a component performs some internal step.

We refine the relation beh to model transitions over the write history for each of these instruction types. Modifications to the write history are constrained such that they may not invalidate variable coherence from the perspective of a component. For example, when component i executes the write instruction $x := v$, it must introduce a new write event for x with the written value of v and place it after all writes to x that i has observed and any writes that i has performed.

\begin{equation} \begin{array}{rcl} beh((x := v)_i) &~=~ &\lbrace (h^{\, \frown \,}h^{\prime }, h^{\, \frown \,} {(x \mapsto v)^{i}_{ \lbrace i\rbrace }} ^{\, \frown \,}h^{\prime }) \mid \\ & & \hspace{21.25pt} \forall w\in {\sf ran}(h^{\prime }) \cdot {\sf writer} (w) \ne i \wedge ({\sf var} (w) = x \Rightarrow i \notin {\sf readers} (w)) \rbrace \\ \end{array} \end{equation}

(12)

Before such a write may be read by another component, the NMCA system must first propagate it from the writing component to the reading component. These transitions result in a component’s view of a variable x progressing to the next write to x that it has not yet observed. They are modelled as environment effects and can take place at any point during the execution. Moreover, they are only constrained to respect the order in which individual variables are modified, allowing components to observe writes to different variables in any arbitrary order. We define the set of possible propagations as follows:

\begin{equation} \begin{array}{l} prp ~ \widehat{=}~ \lbrace (h^{\, \frown \,} {(x \mapsto v)^{j}_{r}} ^{\, \frown \,}h^{\prime }, h^{\, \frown \,} {(x \mapsto v)^{j}_{r \cup \lbrace i\rbrace }} ^{\, \frown \,}h^{\prime }) \mid \\ \hspace{85.35826pt} i \notin r \wedge \forall w\in {\sf ran}(h) \cdot {\sf var} (w) = x \Rightarrow i \in {\sf readers} (w) \rbrace . \end{array} \end{equation}

(13)

A component can access the value of a variable via the execution of a load instruction $[x = v]_i$. This read is constrained to the most recent write to x visible to component i, which must have written the value v. Additionally, memory barriers may constrain the write history, depending on the architecture. For instance, the ${\sf fence}_i$ instruction on ARM ensures that all components have observed the writes seen by component i. Finally, silent skip instructions are trivially defined as ${\sf id}$ over the write history.

\begin{equation} \begin{array}{rcl} beh([x = v]_i) & ~=~& \lbrace (h^{\, \frown \,} {(x \mapsto v)^{j}_{r}} ^{\, \frown \,} h^{\prime }, h^{\, \frown \,} {(x \mapsto v)^{j}_{r}} ^{\, \frown \,} h^{\prime }) \mid \\ & &\hspace{21.25pt} \forall w\in {\sf ran}(h^{\prime }) \cdot {\sf var} (w) = x \Rightarrow i \notin {\sf readers} (w) \rbrace \\ beh({\sf fence}_i) & ~=~& \lbrace (h, h) \mid \forall w\in {\sf ran}(h) \cdot i \in {\sf readers} (w) \Rightarrow \forall y \cdot y \in {\sf readers} (w) \rbrace \end{array} \end{equation}

(14)

Note that a component’s writes can be perceived out-of-order even if they were considered ordered within the component’s command, as the environment may decide to propagate writes to different variables arbitrarily. This can be perceived as a weakening of the reordering relation semantics, such that only instructions over the same variable are known to be ordered. Additionally, the propagation of a write from one component to another provides no constraint to relate the writes the destination and source components have both perceived, beyond the history of the written variable. Consequently, it is possible to propagate a write $w_i$ from a source component to a destination component before the destination observes effects that enabled the execution of $w_i$ to begin with.

To simplify specification and reasoning, we extend the language with a new constructor $comp(i, m, c)$, indicating a component with identifier i, local state $m,$ and command c. Moreover, we assume the specification of a local behaviour relation, lbeh, such that $(m,\alpha ^{\prime },m^{\prime }) \in lbeh(\alpha)$ denotes that the execution of $\alpha$ modifies the local state from m to $m^{\prime }$ and will result in the execution of $\alpha ^{\prime }$ in the shared state, where $\alpha ^{\prime }$ must be one of the shared memory instructions introduced above. Given these definitions, we can extract instructions over the shared state from transitions internal to a component and ensure appropriate annotation with the component identifier.

\begin{equation} \begin{array}{l} comp(i, m, c) \mapsto _{\alpha ^{\prime }_i} comp(i, m^{\prime }, c^{\prime }) \iff c \mapsto _{\alpha }c^{\prime } \wedge (m,\alpha ^{\prime },m^{\prime }) \in lbeh(\alpha) \end{array} \end{equation}

(15)

This structure is intended to capture the transition from local to global reasoning (as can be seen in Figure 3 in Section 4.3), with the constraint that systems are constructed as the parallel composition of a series of comp commands. Moreover, this structure enables trivial support for local state, such as hardware registers, and the partial evaluation of instructions via lbeh, such that they can be appropriately reduced to the shared memory instructions over which NMCA has been defined. For instance, an instruction $x := r_1 + r_2$, where $r_1$ and $r_2$ correspond to local state, could be partially evaluated to $x := v$ based on the values of $r_1$ and $r_2$ in m.

4.2 Reasoning under NMCA

We aim to quantify the implications of non-multicopy atomicity such that standard rely/guarantee reasoning may be preserved on these architectures. We first redefine the implications of a rely/guarantee judgement in the context of NMCA. To do so, we introduce the concept of a view of the write history h for a set of components I. This view, denoted ${\sf view}_I(h)$, corresponds to the standard interpretation of shared memory, mapping variables to their current values. As there is no guarantee that all components in I will agree on the value a variable holds, we select the most recent write all components in I have observed. We let ${\sf view}_I(h,x)$ provide a value v for x such that

For brevity, we overload the case of a singleton set, such that ${\sf view}_i = {\sf view}_{ \lbrace i \rbrace }$. Therefore, a judgement over a component i with command c of the form ${\mathcal {R}},{\mathcal {G}}\vdash P \lbrace c \rbrace Q$ can be interpreted as constraints over the modifications to ${\sf view}_i$ throughout execution. Specifically, such a judgement encodes that for all executions of c, given the execution operates on the write history h such that ${\sf view}_i(h) \in P$ and all propagations to i modify ${\sf view}_i$ in accordance with ${\mathcal {R}}$, then i will modify ${\sf view}_i$ in accordance with ${\mathcal {G}}$ and, given termination, will end with a write history $h^{\prime }$ such that ${\sf view}_i(h^{\prime }) \in Q$.

This state mapping allows for rely/guarantee judgements over individual components to be trivially lifted from a standard memory model to their respective views of a write history. However, arguments for parallel composition are significantly more complex, as it is necessary to relate differing component views. Specifically, it is necessary to demonstrate that, given the execution of an instruction $\alpha$ by some component i satisfies its guarantee specification ${\mathcal {G}}_i$ in state h, formally ${\sf view}_i(h) \in sat(\alpha ,{\mathcal {G}}_i)$, then the effects of propagating $\alpha$’s writes to some other component j will satisfy its rely specification ${\mathcal {R}}_j$ in its view, i.e., ${\sf view}_j(h) \in sat(\alpha ,{\mathcal {R}}_j)$. Evidently, establishing such a notion of compatibility requires reasoning over the differences between the views of any arbitrary pair of components.

At a high level, we observe that it is possible to relate the views of two components by only considering the difference in their observed writes, i.e., the writes one component has observed but the other has not. When considering two components i and j, this difference manifests as two distinct sets of writes, those that i has observed but j has not and those that j has observed but i has not. Therefore, to successfully map $sat(\alpha ,{\mathcal {G}}_i)$ from the view of component i to that of j, it is only necessary to consider the effects of these two sets of writes on $sat(\alpha ,{\mathcal {G}}_i)$. Building on the ideas presented in Section 3.2, we frame the problem in terms of reordering by considering $\alpha$’s out-of-order execution with respect to these differing writes and establish a new notion of reordering interference freedom $\mathit {rif}_{nmca}$, such that $sat(\alpha ,{\mathcal {G}}_i)$ must hold independent of any differing writes between i and j.

4.2.1 Relating a Pair of Views.

We formally define the difference in observed writes between components given a write history h. To facilitate reasoning over these writes as a form of instruction reordering, the evaluated writes are converted back into instructions and composed via sequential composition. We define $\Delta _{i,j}(h)$ to perform such a conversion, returning a command consisting of all writes in h that i has observed but j has not. These writes are sequenced in the same order they appear in the write history h, therefore respecting any constraints such as variable coherence.

Note that $\Delta _{i,j}(h)$ consists only of instructions of the form $x:=v$, where x is a shared variable and v is a constant value, as this reflects their representation in h. Moreover, $\Delta _{i,j}(h)$ cannot contain writes performed by component j, as it only contains writes j has not observed and j must have observed its own instructions.

We observe that the execution of the command $\Delta _{i,j}(h)$ with an initial state ${\sf view}_{ \lbrace i,j\rbrace }(h)$, i.e., the shared view of memory for components i and j, will terminate in the state ${\sf view}_i(h)$. The final state must be ${\sf view}_i(h)$, as this memory will only differ with ${\sf view}_{ \lbrace i,j\rbrace }(h)$ for some variable x if there is a write in h to x that i has observed but j has not. Therefore, this write must exist in $\Delta _{i,j}(h)$. A similar property holds from the perspective of j, such that the execution of the command $\Delta _{j,i}(h)$ with an initial state ${\sf view}_{ \lbrace i,j\rbrace }(h)$ will terminate in the state ${\sf view}_j(h)$. Consequently, it is possible to relate the views of two components i and j via their respective $\Delta$s and their shared view of the write history, ${\sf view}_{ \lbrace i,j\rbrace }(h)$.

4.2.2 Reordering Before Δ _i,j.

Based on this relation between two component views, we aim to demonstrate rely/guarantee compatibility when propagating a write instruction $\alpha$ from component i to component j. Given component i must evaluate instruction $\alpha$ such that ${\sf view}_i(h) \in sat(\alpha ,{\mathcal {G}}_i)$, we first establish that $\alpha$ can be executed in the share view with j and it will still satisfy ${\mathcal {G}}_i$ in such a context, i.e., ${\sf view}_{ \lbrace i,j\rbrace }(h) \in sat(\alpha ,{\mathcal {G}}_i)$. As these two views are related by the execution of the write sequence $\Delta _{i,j}(h)$, this property can be established by considering the reordering of $\alpha$ before $\Delta _{i,j}(h)$ (see step $\bigcirc$ in Figure 2).

Fig. 2.

The instruction $\alpha$ will be reorderable with all writes in the write sequence without changing the sequential semantics of their execution due to constraints imposed on propagation transitions (cf. Definition (13)). Specifically, when propagating the effects of $\alpha$ from component i to j, component j must have already observed all prior writes to the variable $\alpha$ modifies. As j has observed these writes, they will not be present in $\Delta _{i,j}(h)$, resulting in $\alpha$ writing to a distinct variable with respect to the writes it must reorder with. Moreover, $\alpha$ must be of the form $(x := v)_i$ when propagation occurs, where v is a constant, and, therefore, its behaviour must be independent of the writes in $\Delta _{i,j}(h)$.

We demonstrate ${\sf view}_{ \lbrace i,j\rbrace }(h) \in sat(\alpha ,{\mathcal {G}}_i)$ via an induction over the write sequence $\Delta _{i,j}(h)$ in reverse, where ${\sf view}_i(h) \in sat(\alpha ,{\mathcal {G}}_i)$ represents the base case. Recall that the sequence cannot contain writes from j. Consequently it must consist of writes from i itself or components other than i and j. When considering a write from component i, the effects of propagating $\alpha$ earlier than this write are equivalent to the reordering behaviour introduced in Section 3.1 with a sufficiently relaxed reordering relation. We assume component i has been verified with a $\mathit {rif}$ condition capturing such possible reorderings and exploit this condition to preserve $sat(\alpha ,{\mathcal {G}}_i)$ across all instructions in the write sequence derived from i.

Next, we consider the effects of writes derived from components other than i and j. This case captures the main complication introduced by an NMCA system, such that i may have demonstrated $sat(\alpha ,{\mathcal {G}}_i)$ based on writes that j has not yet observed. Therefore, the compatibility between i and j only holds if $sat(\alpha ,{\mathcal {G}}_i)$ can be shown independently of these writes. We phrase this notion of independence as $\mathit {rif}_{nmca}$ and define it in terms of the weakest precondition of some relation ${\mathcal {E}}$ intended to capture the possible writes i may have observed ahead of j.

\begin{equation*} \mathit {rif}_{nmca}({\mathcal {E}},\alpha ,{\mathcal {G}}_i) ~~\widehat{=}~~ wp({\mathcal {E}},sat(\alpha , {\mathcal {G}}_i)) \subseteq sat(\alpha , {\mathcal {G}}_i) \end{equation*}

This property captures that $sat(\alpha , {\mathcal {G}}_i)$ must hold prior to the execution of some transition ${\mathcal {E}}$ if it held after, preserving $sat(\alpha , {\mathcal {G}}_i)$ across those writes in $\Delta _{i,j}(h)$ from components k other than i and j, given they satisfy ${\mathcal {E}}$. To derive a suitable ${\mathcal {E}}$, we observe that these writes must satisfy the specification ${\mathcal {R}}_i \cap {\mathcal {R}}_j$, given a similar overall compatibility argument between k and both i and j. Moreover, according to the constraints imposed by the propagation transition (as outlined above), these writes must not modify the variable written by $\alpha$. We introduce the relation ${\sf id}_\alpha$ denoting all state transitions in which the variable written by $\alpha$ does not change, capturing this constraint. Therefore, the property $\mathit {rif}_{nmca}({\mathcal {R}}_i \cap {\mathcal {R}}_j \cap {\sf id}_\alpha ,\alpha ,{\mathcal {G}}_i)$ is sufficient to establish the induction proof and ultimately demonstrate ${\sf view}_{ \lbrace i,j\rbrace }(h) \in sat(\alpha ,{\mathcal {G}}_i)$.

4.2.3 Reordering After Δ_j,i.

With the execution of $\alpha$ established in the shared view such that it must satisfy ${\mathcal {G}}_i$, we consider its execution in ${\sf view}_j$. Following the prior argument, these views are related by the command $\Delta _{j,i}(h)$, however, we now consider the preservation of a property after the execution of this command, modelled by reordering $\alpha$ after $\Delta _{j,i}(h)$.

When propagating $\alpha$ to component j (see step $\bigcirc$ in Figure 2), it is possible that j may have observed a more recent write to the variable $\alpha$ modifies, where recent implies a later placement in the write history h. This can occur if component i placed $\alpha$ earlier in h than writes that j had already observed. As ${\sf view}_j$ maps each variable to its most recent write, j’s view will not be modified by the propagation of $\alpha$ under such conditions, resulting in a trivial compatibility proof. Alternatively, if j has not observed a more recent write to the variable $\alpha$ modifies, then it must be trivially reorderable with the write sequence $\Delta _{j,i}(h)$, following the same argument as the prior section.

To preserve $sat(\alpha ,{\mathcal {G}}_i)$ across $\Delta _{j,i}(h)$, we note that the write sequence must not contain writes derived from component i, as i must have observed its own writes. Therefore, all writes in $\Delta _{j,i}$ must satisfy ${\mathcal {R}}_i$, i.e., the constraint i imposes on writes derived from all other components. Moreover, the existing argument establishing $sat(\alpha ,{\mathcal {G}}_i)$ must be stable under ${\mathcal {R}}_i$, as this is a requirement of standard rely/guarantee reasoning. Given the properties of stability, $sat(\alpha ,{\mathcal {G}}_i)$ must therefore be preserved by the execution of $\Delta _{j,i}$, establishing ${\sf view}_j(h) \in sat(\alpha ,{\mathcal {G}}_i)$. Finally, given compatibility between i and j such that ${\mathcal {G}}_i \subseteq {\mathcal {R}}_j$, the desired property ${\sf view}_j(h) \in sat(\alpha ,{\mathcal {R}}_j)$ must hold via the monotonicity of sat.

Note that reordering $\alpha$ after $\Delta _{j,i}(h)$ reduces to existing proof obligations imposed by standard rely/guarantee reasoning. This can be attributed to its similarity with scheduling effects, as a scheduler may place an arbitrary number of instructions from other components between i’s execution of $\alpha$ and j’s subsequent observation of $\alpha$’s effects when considering a standard memory model. Consequently, $\mathit {rif}_{nmca}$ is the only novel constraint imposed when considering an NMCA system.

4.3 NMCA Rules

We modify the rules [Comp] and [Par] introduced in Figure 1 to enforce NMCA compatibility conditions. To simplify modifications, we encode $\mathit {rif}_{nmca}$ between components within the check of compatibility. First, we note that the standard rely/guarantee compatibility condition for two components i and j takes the form of ${\mathcal {G}}_i \subseteq {\mathcal {R}}_j$. This condition can be reinterpreted to consider the variable being modified as $\forall x,v \cdot sat(x := v, {\mathcal {G}}_i) \subseteq sat(x := v, {\mathcal {R}}_j)$, denoting that the conditions i guarantees will hold when executing $(x := v)_i$ imply the conditions j assumes to hold when it observes $(x := v)_i$.

Evidently, this reinterpretation of compatibility and $\mathit {rif}_{nmca}$ can be combined based on the transitivity of $\subseteq$ to define our new notion of compatibility under NMCA, such that

\begin{equation*} {\sf compat}({\mathcal {G}}_i, {\mathcal {R}}_i, {\mathcal {R}}_j) \ \widehat{=}\ \forall x, v \cdot wp({\mathcal {R}}_i \cap {\mathcal {R}}_j \cap {\sf id}_x, sat(x := v, {\mathcal {G}}_i)) \subseteq sat(x := v, {\mathcal {R}}_j). \end{equation*}

This notion of compatibility roughly denotes that i may have observed some additional writes from components other than j and its argument for compatibility with j must be independent of these writes. Note that ${\mathcal {R}}_i\ \cap \ {\mathcal {R}}_j\ \cap \ {\sf id}_x$ is reflexive and transitive, due to constraints on ${\mathcal {R}}$ specifications. Therefore, this property captures the execution in which i observes no additional writes, implying the original compatibility condition, as well as those with an arbitrary number of additional writes seen by i.

A modified rule for parallel composition limited to only two components would be updated to this new notion of compatibility as follows:

\begin{equation*} \begin{array}{c} {\mathcal {R}}_1, {\mathcal {G}}_1 \vdash P_1 \lbrace c_1 \rbrace Q_1 \quad {\mathcal {R}}_2, {\mathcal {G}}_2 \vdash P_2 \lbrace c_2 \rbrace Q_2 \quad {\sf compat}({\mathcal {G}}_1, {\mathcal {R}}_1, {\mathcal {R}}_2) \quad {\sf compat}({\mathcal {G}}_2, {\mathcal {R}}_2, {\mathcal {R}}_1) \\ {\mathcal {R}}_1 \cap {\mathcal {R}}_2, {\mathcal {G}}_1 \cup {\mathcal {G}}_2 \vdash P_1 \cap P_2 \lbrace c_1 \! \parallel \!c_2 \rbrace Q_1 \cap Q_2 \end{array}\!. \end{equation*}

However, this approach is limited to two components, due to constraints in establishing ${\sf compat}$. Observe that ${\sf compat}$ must be demonstrated over each pair-wise combination of components in a system, due to its dependence on their net environment specification $({\mathcal {R}}_i \cap {\mathcal {R}}_j)$. Consequently, it is necessary to know the rely/guarantee specification for each individual component within a judgement to successfully demonstrate compatibility with a new component. Unfortunately, the standard rule for parallel composition merges the individual component specifications in $({\mathcal {R}}_i \cap {\mathcal {R}}_j)$, allowing for more abstract reasoning but resulting in the loss of information necessary to establish the pair-wise ${\sf compat}$ (i.e., ${\mathcal {R}}_i$ and ${\mathcal {R}}_j$ are not accessible anymore).

We resolve this issue by retaining the necessary rely specification throughout reasoning. We modify ${\mathcal {R}}$ and ${\mathcal {G}}$ to partial maps, mapping identifiers of the sub-components to their original rely/guarantee specification (see Rule [Comp’] in Figure 3). The domain of the partial map corresponds to the sub-components the judgement operates over. We use the syntax $M(k)$ to represent accessing map M with key k and $[k \rightarrow v]$ to represent a new partial map, which returns v for key k. Moreover, we introduce operators over the partial map, such that ${\sf dom}(M)$, return the domain of the map M, corresponding to the identifiers it holds specifications for, ${\sf disjoint}(M, N)$ returns whether the maps M and N have disjoint domains (i.e., do not share any sub-components), and $M \uplus N$ combines two disjoint maps. The generalised rule [Par’] is shown in Figure 3. Note that we assert that the domains of the rely/guarantee specification for two parallel components must be disjoint to enforce the uniqueness of identifiers.

Fig. 3.

The judgement ${\mathcal {R}}, {\mathcal {G}}\vdash P \lbrace c \rbrace Q$ can be interpreted such that for all executions of c commencing in a write history h, given for all i in ${\sf dom}({\mathcal {R}})$, ${\sf view}_i(h) \in P$ and all propagations to i modify ${\sf view}_i$ in accordance with ${\mathcal {R}}(i)$, then i will modify ${\sf view}_i$ in accordance with ${\mathcal {G}}(i)$ and, given termination, c will end with a write history $h^{\prime }$ such that ${\sf view}_i(h^{\prime }) \in Q$.

4.4 Soundness

These rules have been encoded in Isabelle/HOL as an abstract theory, with a minimal instantiation for NMCA versions of the ARM architecture, and are available at https://bitbucket.org/wmmif/wmm-rg. Based on the work of Colvin and Smith [10], it should be possible to implement a similar instantiation for the POWER architecture. A proof sketch for the soundness argument can be found in Appendix B.

4.5 Example for Reasoning on NMCA Behaviour

We demonstrate the reasoning steps under the NMCA rule with the following simple example shown in Figure 4. The code consists of three components, $C_1$, $C_2$, and $C_3$. $C_{1}$ writes 1 to x, while $C_{2}$ reads x and writes the resulting value to y. Finally, $C_{3}$ will read y followed by x. It is assumed that there is no reordering possible within these components, as $C_{3}$’s instructions are ordered by a ${\sf fence}$ and $C_{2}$’s instructions cannot be reordered without changing their behaviour. We assume that all variables hold 0 to begin with. Under a sequentially consistent memory model, it should be possible to establish that $C_{3}$ terminates in a state such that $r_2 = 1 \Rightarrow r_3 = 1$, as y will only hold the value 1 if that value has been written to x earlier.

Fig. 4.

This reasoning is trivially preserved on an MCA system, as there are no reorderable instructions to consider, however, it fails to carry over to an NMCA system. On such a system, it is possible for the write $x := 1$ from $C_{1}$ to be propagated to $C_{2}$ before $C_{3}$. $C_{2}$ is then able to read x and perform the write $y := 1$. If this write is then propagated to $C_{3}$ before the earlier write $x := 1$, then $C_{3}$ can read a value of 1 for y and 0 for x, violating the desired postcondition.

Observe that, for rely/guarantee reasoning to establish the desired postcondition, $C_3$ must know that the write $y := 1$ will only occur in a state where $x = 1$. Therefore, ${\mathcal {R}}_3$ must be specified such that $sat(y:=1,{\mathcal {R}}_3) = (x = 1)$. As $C_2$ performs a write to y, it must guarantee a similar condition, such that $sat(y:=1,{\mathcal {G}}_2) = (x = 1)$. Moreover, the behaviour of $C_1$ cannot be more precise than $beh(x := 1)$, as this is its only instruction. Consequently, the condition ${\sf compat}({\mathcal {G}}_2, {\mathcal {R}}_2, {\mathcal {R}}_3)$ must at least show $wp(beh(x := 1), x = 1) \subseteq (x = 1)$ when considering whether the guarantee of $C_2$ is compatible with the rely of $C_3$ in a system that includes effects from $C_1$. Evidently, this condition reduces to $True \subseteq x = 1$, which cannot be shown. This proves that it can not be asserted that the given example provides the desired output under an NMCA memory model.

It is possible to resolve this issue by introducing a ${\sf fence}$ instruction in $C_2$, between its read and subsequent write. Such a ${\sf fence}$ would ensure that $C_3$ must see the same or a later value for x relative to the value $C_2$ observed when executing $r_1 := x$. Therefore, if $C_2$ reads a value of 1 for x, then $C_3$ must also read a value of 1 after $C_2$ executes its added ${\sf fence}$, limiting executions to those that satisfy the desired postcondition.

4.6 Precision

The approach introduced in Section 4.3 is incomplete due to two over-approximations. The first is inherited from the underlying multi-copy atomic approach, as detailed in Section 3.7. The second can be attributed to the approach’s ignorance of ${\sf fence}$ instructions.

To illustrate, consider the amendment to the example in Section 4.5, introducing a ${\sf fence}$ instruction in $C_2$. This modification limits possible executions by enforcing a constraint on the relative observations of variable x between $C_2$ and $C_3$, manifesting as a restriction on the possible configurations of $\Delta _{2,3}$. However, such constraints are ignored by the compatibility condition proposed in Section 4.3, resulting in a failure to show ${\sf compat}({\mathcal {G}}_2, {\mathcal {R}}_2, {\mathcal {R}}_3)$ and, consequently, a false positive outcome for the revised example.

To improve on this incompleteness, it would be necessary to identify the reads that can influence the behaviour of a write with no ${\sf fence}$ between them and then only consider the possible environment interference for effects on those reads. With such a technique, the case without a ${\sf fence}$ would still fail to show compatibility, while the case with a ${\sf fence}$ would be trivially shown, as there would be no reads that could influence $y := r_1$ without being propagated by the ${\sf fence}$ first.

Such an extension to the technique is quite feasible by providing a more precise approximation of $\Delta _{i,j}$. In the current approach, $\Delta _{i,j}$ is approximated by ${\mathcal {R}}_i\ \cap \ {\mathcal {R}}_j\ \cap \ {\sf id}_{\alpha }$, which utilises rely conditions that do not reflect on reordering constraints, and only excludes the writes to the variable updated in $\alpha$. A more precise approximation would additionally exclude those writes that are separated via a ${\sf fence}$ from the preceding read on which the write depends. Those writes are not included in (the more precise) $\Delta _{i,j}$, since they are observed by all threads at the same time (due to the semantics of the ${\sf fence}$ instruction).

The relevant writes (that occur after a ${\sf fence}$) could be derived via a static analysis that can identify the relevant dependencies via an approach similar to that suggested for reorderable instruction pairs in Section 3.3. If we denote this set of fenced writes as $\mathit {fW}$, then $\Delta _{i,j}$ could be more precisely abstracted by ${\mathcal {R}}_i \cap {\mathcal {R}}_j \cap {\sf id}_{\alpha } \cap {\sf id}_{\mathit {fW}}$, and this abstraction would cover the implications fences have on the differing views of the memory, $\Delta _{i,j}$. We leave the implementation and verification of this approach to future work.

Note that our approach makes no further over-approximations with respect to the semantics defined in Section 4.1. This can be observed via the freedom with which the semantics can propagate writes between threads, limited only by variable coherence and ${\sf fence}$ instructions. As the compatibility condition captures the effects of variable coherence, via the ${\sf id}$ constraint on the modified variable, and the implications of the ${\sf fence}$ instruction have been discussed, no further properties can be exploited to minimise the possible $\Delta _{i,j}$ configurations that must be considered. Therefore, given a sufficiently precise ${\mathcal {R}}$ condition for a third thread, compatibility between two threads under NMCA, that is not reliant on a ${\sf fence}$ instruction, can be shown under the approach.

5 Instantiating the Proof System

In this section, we illustrate instantiating the proof system with a simple while language. The Isabelle/HOL theories accompanying this work also include an instantiation for ARMv8 assembly weakened to allow NMCA behaviour.

We distinguish three different types of state variables: global variables Glb and local variables Loc, which are program variables, and global auxiliary variables Aux. Local variables are unique to each component and cannot be modified by others.

Atomic instructions in our language comprise skips, assignments, guards, two kinds of fences, and coupling of an instruction with an auxiliary variable assignment and/or with a specific verification condition (similar to an assertion)

\begin{equation*} inst::= {\sf nop}\mid v := e \mid {\sf guard}~p \mid {\sf fence}\mid {\sf cfence}\mid \langle inst, a := e_a \rangle \mid \lbrace \!\mid \!p_a\!\mid \!\rbrace inst, \end{equation*}

where v is a program variable, e an expression over program variables, p a Boolean expression over program variables, a an auxiliary variable, $e_a$ an expression over program and auxiliary variables, $p_a$ a Boolean expression over program and auxiliary variables, and $\langle inst, a := e_a \rangle$ denotes the execution of inst followed by the execution of $a := e_a$ atomically.

Commands are defined over atomic instructions and their combinations

\begin{equation*} cmd ::= inst\mid cmd \, ; \,cmd \mid {\sf if}~p {\sf ~ then~}cmd {\sf ~ else~}cmd \mid {\sf do}~cmd~ {\sf while}(p,Inv), \end{equation*}

where Inv denotes a loop invariant. Instructions instantiate individual instructions (i.e., $\alpha$) in our abstract language. Sequential composition directly instantiates its abstract counterpart. Conditionals and loops are defined via the choice and iteration operator, i.e., ${\sf if}~p {\sf ~ then~}c_1 {\sf ~ else~}c_2$ is defined as $({\sf guard}~p)\, ; \,c_1 ~\sqcap ~ ({\sf guard}~\lnot p) \, ; \,c_2$, and ${\sf do}~c~ {\sf while}(p,Inv)$ as $(c\, ; \,({\sf guard}~p))^* \, ; \,c\, ; \,({\sf guard}~\lnot p)$, where the invariant Inv holds at the start of c’s execution.

A reordering relation $\overset{\scriptscriptstyle inst}{\hookleftarrow }$ (and its inverse $\not\!\overset{\scriptscriptstyle inst}{\hookleftarrow }$) is defined over atomic instructions based on syntactic independence of reorderable instruction [10]. For all instructions $\alpha$ and $\beta$

\begin{equation*} {{\sf fence}\not\!\overset{\scriptscriptstyle inst}{\hookleftarrow }\alpha , \ \alpha \not\!\overset{\scriptscriptstyle inst}{\hookleftarrow }{\sf fence}, \ {\sf guard}~p \not\!\overset{\scriptscriptstyle inst}{\hookleftarrow }{\sf cfence},}\\ {\sf cfence}\not\!\overset{\scriptscriptstyle inst}{\hookleftarrow }\alpha \text{~if $rd(\alpha) \not\subseteq Loc$,}\\ {\sf guard}~p \not\!\overset{\scriptscriptstyle inst}{\hookleftarrow }\alpha \text{~if $wr(\alpha)\! \in \! Glb \vee wr(\alpha) \!\in \! rd({\sf guard}~p) \vee rd({\sf guard}~p) \!\cap \! rd(\alpha) \not\subseteq Loc$,}\\ { \text{and for all other cases,}}\\ \beta \overset{\scriptscriptstyle inst}{\hookleftarrow }\alpha \text{~if $wr(\beta) \ne wr(\alpha) \wedge wr(\alpha) \not\in rd(\beta) \wedge rd(\beta) \cap rd(\alpha) \subseteq Loc$,} \end{equation*}

where $wr(\alpha)$ is the program variable written by $\alpha$ and $rd(\alpha)$ the program variables read by $\alpha$. Note that a ${\sf cfence}$ is used to prevent speculative reads of global variables when placed prior to the reading instruction and after a ${\sf guard}$ [10]. Correspondingly, the above reordering rules ensure that a ${\sf cfence}$, placed between a ${\sf guard}$ and a (global) load instruction, will block the load reordering with the preceding ${\sf guard}$. In contrast, a (full) ${\sf fence}$ blocks the reordering of any instruction and cannot be reordered itself. Thus, it provides a stronger barrier than the ${\sf cfence}$, which does allow, e.g., the reordering with a later ${\sf guard}$.

Forwarding a value to an assignment instruction in our language is defined as $(v_\alpha := e_\alpha [v_\beta \ \backslash \ e_\beta ]) \prec (v_\beta := e_\beta) \prec (v_\alpha := e_\alpha)$ and to a guard as $({\sf guard}~p[v_\alpha \ \backslash \ e_\alpha ]) \prec (v_\alpha := e_\alpha) \prec ({\sf guard}~p),$ where $e[v\ \backslash \ e^{\prime }]$ replaces every occurrence of v in e by $e^{\prime }$. The instruction after forwarding carries the same verification condition as the original instruction, i.e., $vc(\alpha _{\langle \beta \rangle })=vc(\alpha)$.

Note that auxiliary variables and verification conditions do not influence the reordering relation, as they will not constrain execution behaviour. Moreover, these annotations remain linked to their respective instructions during reordering and forwarding.

6 Peterson’s Mutual Exclusion Algorithm

To demonstrate the workings of our technique for NMCA architectures requires a system with more than two components for this weaker memory model to have a possibly observable effect. We use the extension of Peterson’s mutual exclusion algorithm that implements the behaviour of n components [32], each of which aims to get exclusive access to a critical section.

The proposed solution models $n-1$ waiting rooms through which the components have to advance before the critical section can be entered from the last. Each component $p_i$ maintains its current waiting room in $level[p_i]$. Additionally, for each waiting room that component was the last to enter is monitored in variables $lastEnter[1], \ldots , lastEnter[n-1]$. This organises the components’ advancement. These program variables are globally shared between the components and are accessible outside the critical section. A component can advance from one waiting room to the next if it is not the last to enter or if it is the last to enter and no other component is in the same waiting room or a waiting room ahead (i.e., it is the first to advance to its current waiting room). The algorithm ensures that only two components can be present in the last waiting room and only the one that was not the last to enter can access the critical section, which provides mutual exclusion.

The algorithm (shown in Figure 5) depicts one component $p_1$ instantiating this algorithm. The parameter $e_1$ is an auxiliary variable that does not affect the algorithm itself but is used during reasoning (see further details below). The critical section is represented by a placeholder in the figure. The other components $p_2, \ldots p_n$ are encoded similarly.

Fig. 5.

Fences have been added to the algorithm in Figure 5 where required to guarantee mutual exclusion. The necessity of these barriers can be shown by checking the reordering interference between reorderable instructions in each thread. More detail is given below.

In the algorithm, the outer loop increments r over the $n-1$ waiting rooms the component has to pass through before it can enter the critical section. Within that loop, it first records the room number the component is about to enter in $level[p_1]$. In a second step, the appearance of the component in the room is notified by setting the component to be the last that has entered the room. Note that, in the following, we consider the component to have entered the room only after this second step. The auxiliary variable $e_1$ is updated twice to indicate when this entering phase is complete, i.e., in the first step entering is set to $true,$ and in the second step, when the component has fully entered, entering is completed and $e_1$ set to $\mathit {false}$.

The inner loop implements a busy wait in the current waiting room until the exit conditions for this room are met and the component can proceed to the next waiting room (i.e., no other component is ahead or one other component has entered after this one). As initial condition of the overall system, we require that $\forall i \in C\, \cdot \, level[p_i] = 0 \wedge \lnot e_i$, where C is the set of components.

To demonstrate our rely/guarantee reasoning, we define a rely condition for each component that is reflected by the other components’ guarantee conditions, i.e., $\forall p_i \in C\cdot {\mathcal {G}}_i = \bigwedge _{j \cdot p_j \ne p_i } {\mathcal {R}}_j$. These conditions refer to the auxiliary variables $e_i$, for $0 \lt i \le n$, which indicate for each component whether its current waiting room has been fully entered (i.e., $p_i$ has set variable lastEnter for this room at some stage). Furthermore, the following auxiliary functions are used:

•

$room(p_i)$ determines the waiting room that component $p_i$ has fully entered

$room(p_i) ~~\widehat{=}~~ \left\lbrace \begin{array}{ll} level[p_i] - 1 & \text{if } e_i \wedge level[p_i] \gt 0\\ [1ex] level[p_i] & \text{otherwise} \end{array} \right.$

•

$\mathit {aheadOf}(p_i)$ provides the number of components that component $p_i$ is ahead of. Let $\#S$ denote the cardinality of set S.

\begin{equation*} \mathit {aheadOf}(p_i) ~~ \widehat{=}~~ \begin{array}[t]{l@{\ }l}\#\lbrace \, p_j \mid room(p_i) \gt room(p_j) \\ ~ \vee \, (room(p_i) = room(p_j) \wedge p_i \ne p_j \wedge lastEnter[room(p_i)] = p_j \rbrace \end{array} \end{equation*}

•

$exitCond(p_i)$ formalises that the number of components $p_i$ is ahead of is at least the same as the level of $p_i$’s current waiting room. This constitutes (an abstraction of) the exit condition to each waiting room, e.g., in the last waiting room $n-1$ component $p_i$ needs to be ahead of at least $n-1$ other components for its progress into the critical section to be enabled. $exitCond(p_i) ~~\widehat{=}~~ \mathit {aheadOf}(p_i) \ge room(p_i).$

The rely condition ${\mathcal {R}}_{i}$ for the component $p_i$ can then be phrased as follows, where $r \in \lbrace 1, \ldots , n-1\rbrace$ ranges over the waiting rooms.¹ Rely conditions for the other components are formalised and can be explained similarly.

\begin{equation*} \ {\mathcal {R}}_{i} = \ \begin{array}[t]{l@{\ }r} level[p_i] = level^{\prime }[p_i] ~\wedge e_i=e^{\prime }_i \wedge exitCond(p_i) = exitCond^{\prime }(p_i) \wedge & \text{(i)} \\ lastEnter^{\prime }[room(p_i)] = p_i \Rightarrow lastEnter[room(p_i)] = p_i & \text{(ii)} \\ lastEnter[room(p_i)] = p_i \wedge lastEnter^{\prime }[room(p_i)] \ne p_i \Rightarrow exitCond^{\prime }(p_i) & \text{(iii)} \\ \forall p_j \in C\, \cdot \, room(p_j) \lt room(p_i) \wedge lastEnter^{\prime }[room(p_i)] = p_i \Rightarrow &\\ \hfill room^{\prime }(p_j) \lt room^{\prime }(p_i) & \text{(iv)} \end{array} \end{equation*}

That is, ${\mathcal {R}}_{i}$ specifies that

(i)

no other component modifies $level[p_i]$, $e_i$, or $p_i$’s abstract exit condition, only component $p_i$ itself can do so;

(ii)

no other component can set variable lastEnter of $p_i$’s current waiting room to $p_i$;

(iii)

if another component has entered $p_i$’s current waiting room after $p_i$, then the abstract exit condition needs to be maintained;

(iv)

if $p_i$ is ahead of component $p_j$ (i.e., $p_j$ is in a waiting room on a lower level) and $p_i$ is the component that last entered its current waiting room after the environment step(s), then $p_i$ will remain ahead of $p_j$.

Reasoning over the reordering interference freedom ($\mathit {rif}$) for $p_1$ showed where fence instructions were required to eliminate interferences caused by reorderings within the component (as indicated in Figure 5). For the body of the inner loop, it is easy to see that executing the load instructions out of order does not have an effect on the exit condition of the loop and can be considered benign. Contrary to that, an out-of-order execution of the two store instructions before the inner loop affects the coordination between $p_1$ and other components, which ensures mutual exclusion. The condition $\mathit {rif}_{\sf a}({\mathcal {R}}_1, {\mathcal {G}}_1, \beta ,\alpha)$ for these two instructions could not be established, and a fence instruction was placed after each. Similarly, a second fence is required before the inner loop and a ${\sf cfence}$ is required at the beginning of the critical section. The latter prevents instructions within the critical section being reordered with the condition of the inner loop, and hence executing before the busy wait (encoded by the inner loop) is finalised.

Additionally, we have to show global reordering interference freedom ($\mathit {rif}_{nmca}$) by using the refined compatibility check between pairs of rely and guarantee conditions. Unfolding the definition given in Section 4.3 results in compatibility conditions for $i, j \in \lbrace 1, \ldots , n\rbrace$ and $i\ne j$ that are of the form

\begin{equation*} {\sf compat}({\mathcal {G}}_i, {\mathcal {R}}_i, {\mathcal {R}}_j) = wp({\mathcal {R}}_i \cap {\mathcal {R}}_j \cap {\sf id}_x, sat(x := v, {\mathcal {G}}_i)) \subseteq sat(x := v, {\mathcal {R}}_j). \end{equation*}

With ${\mathcal {G}}_i \Rightarrow {\mathcal {R}}_j$ and monotonicity of wp we deduce that it suffices to show that

\begin{equation*} wp({\mathcal {R}}_i \cap {\mathcal {R}}_j \cap {\sf id}_x, sat(x := v, {\mathcal {R}}_j)) \subseteq sat(x := v, {\mathcal {R}}_j) \end{equation*}

to prove ${\sf compat}({\mathcal {G}}_i, {\mathcal {R}}_i, {\mathcal {R}}_j)$.

For example, let $x:=v$ be $\langle lastEnter[n-1] := p_i; e_i := {\sf ff}\rangle$. That is, component $p_i$ is entering the last waiting room $n-1$. In this case, to satisfy ${\mathcal {R}}_j$, in particular condition (iii), the exit condition of $p_j$, $exitCond(p_j)$, must be maintained if $room(p_j) = n-1$. Since no other component can modify $exitCond(p_j),$ this condition must have been satisfied before the step, i.e.,

\begin{equation*} sat(\langle lastEnter[n-1] := p_i; e_i := {\sf ff}\rangle , {\mathcal {R}}_j) = room(p_j) ~=~ n-1 \Rightarrow \mathit {aheadOf}(p_j) \ge room(p_j). \end{equation*}

${\mathcal {R}}_i \cap {\mathcal {R}}_j \cap {\sf id}_{lastEnter[n-1]}$ abstractly represents all behaviours of some component $p_k$ (for $k \ne i$ and $k \ne j$) that do not modify $lastEnter[n-1]$. This reduces the steps of $p_k$ to be considered to only local steps and those that modify $level[p_k]$ or $lastEnter[m]$ for $m\ne n-1$. Although these steps might increase $room(p_k),$ it will remain lower than $room(p_j)$ and consequently $\mathit {aheadOf}(p_j)$ remains unaffected. Hence, we have

$wp({\mathcal {R}}_i \cap {\mathcal {R}}_j \cap {\sf id}_{lastEnter[n-1]}, sat(\langle lastEnter[n-1] := p_i; e_i := {\sf ff}\rangle , {\mathcal {R}}_j)) \\ \ = sat(\langle lastEnter[n-1] := p_i; e_i := {\sf ff}\rangle , {\mathcal {R}}_j),$

which proves compatibility of ${\mathcal {G}}_i$ and ${\mathcal {R}}_j$ for this instruction. We reason over all other instructions in a similar fashion.

7 Related Work

There are a number of approaches for verifying concurrent code under weak memory models [1, 2, 17, 23, 24, 26, 42, 43], which are centred around relations between instructions in multiple threads, thereby precluding the benefits of thread-local reasoning. Notable amongst these is the work by Abdulla et al. [1, 2], which aims at automated tool support via stateless model checking and is based on the axiomatic semantic model of [3]. Instead of thread-local reasoning, the approaches deal with execution graphs, which include not only the interleaving behaviour of concurrent threads but also “parallelisation” of sequential code resulting from weak memory behaviour. Techniques to combat the resulting state-space explosion and improve scalability include elaborate solutions to dynamic partial order reduction, context bounds for a bug-finding technique [1], and (for a sound approach) coarsening the semantic model of execution graphs through reads-from equivalences [2].

Approaches that propose a purely thread-local analysis for concurrent code under weak memory models include the work by Ridge [35] and Suzanne et al. [40]. Both capture the weak memory model of x86-TSO [37] by modelling the concept of store buffers. This limits their applicability to that of this relatively simple memory model and prohibits adaption to weaker memory models.

Closer to our approach are the proof systems for concurrent programs under the C11 memory model developed by Lahav et al. [26] and Dalvandi et al. [12]. These proof systems are based on the notion of Owicki-Gries reasoning with interference assertions between each line of code to capture potential interleavings.

However, to achieve a thread-local approach, the authors of Reference [26] present their logic in a “rely/guarantee style” in which interference assertions are collected in “rely sets” whose stability needs to be guaranteed by the current thread. This leads to a fine-grained consideration of interference between threads, whereas in standard rely/guarantee reasoning the interference is abstracted into a rely condition that summarises the effects of the environment. Moreover, similarly to References [1, 2] the semantic model is based on (an abstraction of) the axiomatic model in Reference [3] so the interference between threads includes additionally weak memory effects, thereby further complicating the analysis over each instruction. A somewhat-related approach to capture assertions on thread interference is presented in Reference [24], which computes the reads-from relation between threads, which is then taken into account by the thread-local static analyser.

In contrast, the work in Reference [12] provides a more expressive view-based assertion language that allows for the use of the standard proof rules of Owicki-Gries reasoning for concurrent systems despite the effects of the weak memory model including non-multicopy atomicity. As a consequence, the complications of weak memory effects need to be taken into account when crafting the assertions to show interference freedom. In our approach, the constraints that define the guarantee and rely conditions are specified as simple relational predicates over the shared variables. The intricacies of weak memory effects and non-multicopy atomicity are hidden in the proof technique given by our logic, which is proven sound with respect to the weak memory model.

Numerous papers have been published targeting the analysis of C programs executing under the C11 semantics, e.g., References [5, 20, 34, 41]. In our approach, we deliberately bypass the C language level for two reasons. First, the C11 semantics is a definition that is aimed at encompassing every possible compiler behaviour. Hence, the definition has to be loose enough to include potential compilations that are never performed by the actual compiler in use. The resulting hypothetical behaviours are likely to provide false positives, indicating problems that the actual compiled code does not have, out-of-thin-air behaviour being one of those. Second, the analysis results that are gained on the program language level are not necessarily maintained by the compiler, in particularly when considering security properties, cf., Reference [14]. Therefore, in our approach, we consider the analysis of low-level code (instead of C11) and target our analysis to hardware weak memory models. Many of the concerns raised in the aforementioned papers are thus unrelated to our work.

Furthermore, many logics proposed for verifying concurrent code, e.g., References [20, 41], are based on concurrent separation logic (CSL) [8]. CSL and its derivatives are geared towards reasoning about the absence of interference between threads. They are thus unsuitable for fine-grained concurrency, which aims at avoiding (costly) locking mechanisms and thus deliberately incorporates interference as a means of inter-thread communication. Our logic can potentially be extended to include the frame rule from separation logic and thus use the combination of both worlds, rely-guarantee reasoning and separation logic, as proposed by References [44] and [15].

Interestingly, a number of papers propose to derive an operational semantics from a denotational/axiomatic semantics in which event graphs define the set of all possible behaviours, e.g., Reference [25]. In that paper, the authors develop an operational semantics for declarative Strong Release Acquire semantics (a memory model that includes non-multicopy atomic characteristics) and provide complexity results for the reachability problem in the resulting model.

Our work, in contrast, is based on a semantics that is defined operationally to begin with [9, 10]. It defines weak memory behaviour in terms of a thread-local reordering relation over pairs of instructions. Approaches to verification that are built on event graphs (i.e., those approaches that build on top of an axiomatic semantics defined in terms of semantic dependencies) inherently prohibit thread-local reasoning, as the semantic dependencies (which define the graph structures) cut across thread boundaries (in particular, the reads-from relation).

An approach to combat this disadvantage is presented in Reference [47]. That paper proposes an operational semantics for C11 programs that aims to support thread-local reasoning using an Owicki-Gries-style proof method. The semantics is based on the denotational semantics of Modular Relaxed Dependencies (MRD), which provides the semantic dependencies between events from which a tagged action graph is built. The tagged action graph defines all possible executions and drives the steps of the operational semantics. To enable reasoning over a single thread, the semantic dependencies of the action graph are curtailed to only consider future events (to be executed), which ignores the reads-from dependencies that reach across thread boundaries, thereby rendering the reasoning space thread-local (bar interferences).

The Owicki-Gries-style analysis in that approach is situated on the semantic level of all potential executions (i.e., the space of all next steps in any possible execution has to be analysed). Consequently, the approach directly considers predicates over an action graph and the possible views of different threads. This is in contrast to our approach, which considers reasoning over a standard memory representation with conditions to identify when this reasoning may fail in a weak memory context. Moreover, the Owicki-Gries-style analysis considers local correctness by exploring all possible reordering of a thread’s instructions. This is less efficient than the pair-wise technique we propose, however, we believe that the approach could be modified to exploit similar optimisations.

8 Conclusion

This article presents a truly thread-local approach to reasoning about concurrent code on a range of weak memory models. When considering multicopy atomic memory models, it employs standard rely/guarantee reasoning to handle interference between components and a separate check of reordering interference freedom to handle interference within a component due to weak memory behaviour.

Reordering interference freedom provides evidence that the weak memory model under consideration will not invalidate properties shown via standard rely/guarantee reasoning. It is a novel concept that hinges on a thread-local reordering semantics, which can be defined for any hardware architecture, as it is based on the notion of instruction dependence, a core concept of processor pipelining.

Importantly, our approach reduces the check of reordering interference to only pairs of instructions, thereby significantly reducing its complexity. In situations where freedom of reordering interference cannot be shown, our approach includes methods to amend the program, to prohibit reordering behaviour, or modify its verification conditions, such that stronger arguments for reordering interference freedom may be shown.

When considering non-multicopy atomic memory models, the approach is extended via a simple modification to the rely/guarantee notion of component compatibility. This novel compatibility property identifies the conditions under which rely/guarantee reasoning between two components will not be invalidated by the inconsistent observation of writes from other components. Critically, this modification only alters the approach’s rules when considering parallel composition, preserves the compositional nature of rely/guarantee reasoning, and extends the approach to support all widely implemented hardware memory models.

The article exemplifies an instantiation of the approach for a simple while language and NMCA memory model and uses it to verify the mutual exclusion property of Peterson’s algorithm extended to synchronise multiple components. These results, along with a soundness proof for our approach, have been encoded in Isabelle/HOL. In future work we intend to improve the precision of the techniques, addressing some of the concerns we raise, and improve tool support to ease verification.

Footnotes

To show that the rely conditions ${\mathcal {R}}_i$ satisfy the rely/guarantee judgement of our logic (without $\mathit {rif}$ or $\mathit {rif}_{nmca}$ checks), we have encoded the problem for three threads within Dafny [27] and performed the verification via the Boogie [4] backend. The Dafny code is available at https://bitbucket.org/wmmif/wmm-rg.

Note that $len(s)$ denotes the length of sequence s and $s_i$ provides its ith element.

A Soundness for the Proof System Under MCA

We include a proof sketch establishing soundness of the system under an MCA memory model. To prove soundness of the proof system, we reason over a program’s configuration traces. A configuration consists of a pair $(c, \sigma)$, containing a command c to be executed and state $\sigma$ (a mapping from variables to values) in which it executes. We denote the set of configurations by ${\mathcal {C}}$. We define the computations of a program as sequences of configurations that are linked via program or environment steps, starting with the initial configuration.²

\begin{equation} \begin{array}{rl} cp(c) ~~\widehat{=}~~\lbrace t \in {\sf seq}({\mathcal {C}}) \mid & \exists \sigma _0.\ t_0 = (c, \sigma _0) \\ & \wedge \, \forall i. \ 0 \lt i \lt len(t) .\ (t_{i-1}, t_{i}) \in (\overset{\!\!\scriptscriptstyle ps}{\rightarrow }\cup \overset{\!\!\scriptscriptstyle es}{\rightarrow }) \rbrace \end{array} \end{equation}

(16)

We define the properties of such computations below, where $t_i = (c_i, \sigma _i)$.

•

t satisfies precondition P if the initial state of t satisfies P,

$pre(t, P) ~~\widehat{=}~~ \sigma _0 \in P$

•

t adheres to rely condition ${\mathcal {R}}$ if all its environment steps adhere to ${\mathcal {R}}$,

$rels(t, {\mathcal {R}}) ~~\widehat{=}~~ (\forall i. 0\lt i \lt len(t). (t_{i-1}, t_i) \in \overset{\!\!\scriptscriptstyle es}{\rightarrow }\ \Rightarrow \, (\sigma _{i-1}, \sigma _i) \in {\mathcal {R}})$

•

t adheres to guarantee ${\mathcal {G}}$ if all its program steps adhere to ${\mathcal {G}}$,

$grs(t, G) ~~\widehat{=}~~ (\forall i. 0\lt i \lt len(t). (t_{i-1}, t_i) \in \overset{\!\!\scriptscriptstyle ps}{\rightarrow }\ \Rightarrow \, (\sigma _{i-1}, \sigma _i) \in G)$

•

t satisfies postcondition Q if upon termination its final state satisfies Q,

$post(t, Q)~~ \widehat{=}~~ len(t) \in {N} \wedge c_{len(t)-1}=\epsilon \Rightarrow \, \sigma _{len (t)-1} \in Q$.

Validity of a judgement over a program, $\models c ~{\sf sats}~ [P,{\mathcal {R}},{\mathcal {G}},Q]$, states that for all computations of the program if the initial state satisfies P and every environment step adheres to ${\mathcal {R}}$ then the final state, if the computation terminates, satisfies Q, and each program step adheres to ${\mathcal {G}}$.

\begin{equation} \begin{array}{rl} \models c ~{\sf sats}~ [P,{\mathcal {R}},{\mathcal {G}},Q] ~~\widehat{=}& \forall t \in cp(c).\, \\ {[}0.5ex] & pre(t, P) \wedge rels(t, {\mathcal {R}}) \Rightarrow grs(t, {\mathcal {G}}) \wedge post(t, Q) \end{array} \end{equation}

(17)

Soundness of the proof system requires that the rely/guarantee judgement implies its validity.

Theorem A.1.

Soundness

\begin{equation*} {\mathcal {R}}, {\mathcal {G}}\vdash P \lbrace c\rbrace Q ~\Rightarrow ~ \models c ~{\sf sats}~ [P,{\mathcal {R}},{\mathcal {G}},Q] \end{equation*}

Proof sketch:

At the heart of the proof, we have that if two sequentially composed instructions are deemed correct (via rely/guarantee judgements) and they can reorder and are reordering interference free, then the sequential composition of the reordered (and forwarded) instructions is correct. This follows straightforwardly from (9).

\begin{equation} \begin{array}{l} \ {\mathcal {R}},{\mathcal {G}}\vdash _{\sf a}P\lbrace \beta \rbrace M \wedge {\mathcal {R}},{\mathcal {G}}\vdash _{\sf a}M\lbrace \alpha \rbrace Q \wedge \mathit {rif}_{\sf a}({\mathcal {R}},{\mathcal {G}}, \beta , \alpha) ~ \Rightarrow ~ \\ {[}1ex] \exists M^{\prime }.\, {\mathcal {R}},{\mathcal {G}}\vdash _{\sf a}P\lbrace \alpha _{\langle \beta \rangle }\rbrace M^{\prime } \wedge {\mathcal {R}},{\mathcal {G}}\vdash _{\sf a}M^{\prime }\lbrace \beta \rbrace Q \end{array} \end{equation}

(18)

This property extends to programs as follows: If an instruction, which can reorder to the start of a program is interference free in the program, then correctness of the original program implies correctness of the executions in which the reordered instruction occurs first. This can be shown by structural induction using (18) and (10).

\begin{equation} \begin{array}{l} \ {\mathcal {R}},{\mathcal {G}}\vdash _{\sf c}P\lbrace r\rbrace M \wedge {\mathcal {R}},{\mathcal {G}}\vdash _{\sf a}M\lbrace \alpha \rbrace Q ~\wedge ~ \mathit {rif}_{\sf c}({\mathcal {R}},{\mathcal {G}}, r, \alpha) ~ \Rightarrow \\ {[}1ex] \exists P^{\prime }, M^{\prime }.\, P \subseteq P^{\prime } \wedge {\mathcal {R}},{\mathcal {G}}\vdash _{\sf a}P^{\prime }\lbrace \alpha _{\langle \ \langle r\rangle \ \rangle }\rbrace M^{\prime } \wedge {\mathcal {R}},{\mathcal {G}}\vdash _{\sf c}M^{\prime }\lbrace r\rbrace Q \end{array} \end{equation}

(19)

Note that we may weaken the precondition P of the original program to $P^{\prime }$ for the reordered program. This is to ensure all intermediate states in the reordered program (such as $M^{\prime }$) are stable in cases where conjuncts from P would lead to unstable conjuncts in intermediate states.

From (19), it follows that component level judgements over a program c are preserved across interference-free execution steps reordered to the start of c.

\begin{equation} \ \begin{array}{l} {\mathcal {R}},{\mathcal {G}}\vdash _{\sf c}P\lbrace c\rbrace Q \wedge \mathit {rif}_{\sf c}({\mathcal {R}},{\mathcal {G}},r,\alpha) \wedge c \mapsto _{\alpha _{\langle \ \langle r\rangle \ \rangle }} c^{\prime } \Rightarrow ~ \\ {[}1ex] \ \exists P^{\prime }, M^{\prime }.\, P\subseteq P^{\prime } \wedge {\mathcal {R}},{\mathcal {G}}\vdash _{\sf a}P^{\prime }\lbrace \alpha _{\langle \ \langle r\rangle \ \rangle }\rbrace M^{\prime } \wedge {\mathcal {R}},{\mathcal {G}}\vdash _{\sf c}M^{\prime }\lbrace c^{\prime }\rbrace Q \end{array} \end{equation}

(20)

Property (20) straightforwardly extends to global rely/guarantee judgements. Note that reordering interference freedom is a component-level condition and is hidden in the global rely/guarantee judgement $\vdash$. The global judgement can only be shown by lifting the component level judgement $\vdash _{\sf c}$ to the global level using proof rule [Comp] (see Figure 1), which includes $\mathit {rif}({\mathcal {R}}, {\mathcal {G}}, c)$ as a precondition.

\begin{equation} \begin{array}{l} \ {\mathcal {R}},{\mathcal {G}}\vdash P\lbrace c\rbrace Q ~\wedge ~ c \mapsto _{\alpha _{\langle \ \langle r\rangle \ \rangle }} c^{\prime }~\Rightarrow ~ \\ {[}1ex] \exists P^{\prime }, M^{\prime }.\, P\subseteq P^{\prime } \wedge {\mathcal {R}},{\mathcal {G}}\vdash _{\sf a}P^{\prime }\lbrace \alpha _{\langle \ \langle r\rangle \ \rangle }\rbrace M^{\prime } \wedge {\mathcal {R}},{\mathcal {G}}\vdash M^{\prime }\lbrace c^{\prime }\rbrace Q \end{array} \end{equation}

(21)

With (21) and the component and global proof rules (Figure 1), we can prove via induction over environment and program steps of each computation the validity of the global judgement. If the program is correct and for each computation of the program it holds that the precondition is satisfied by the initial state and each environment step adheres to the rely condition, then the final state satisfies the postcondition (in the case when the computation terminates) and all program steps adhere to the guarantee condition.

\begin{equation*} \begin{array}{c} t \in cp(c) \wedge {\mathcal {R}},{\mathcal {G}}\vdash P\lbrace c\rbrace Q \wedge pre(t,P) \wedge rels(t, {\mathcal {R}}) ~\Rightarrow ~ post(t, Q) \wedge grs(t, {\mathcal {G}}) \end{array} \end{equation*}

Using (17), this proves the theorem.

B Soundness for the Proof System Under NMCA

We include a proof sketch establishing soundness of the system under an NMCA memory model. Under NMCA, the state is modelled as a write history $h= \langle {(x \mapsto v)^{wr}_{rds}} , \ldots \rangle$ and the system command (i.e., the program) c contains all threads, each modelled as a thread identifier paired with the thread’s command such that the command of thread i can be accessed via $c(i)$. The components function $ids(c)$ provides the set of thread identifiers contributing to the program c. An NMCA configuration consists of pairs $(c, h)$, the set of configurations is denoted ${\mathcal {C}}_n$. Additionally, we project the write history h into the different views for the threads, which identifies the local state of each thread i as ${\sf view}_{i}(h)$.

We define the computations of a program as (terminating) sequences of NMCA configurations that are linked via program or propagation steps, starting with the initial configuration and ending in a terminal configuration (with an empty program). Note that each step now takes a global view of the system and is defined as a relation over write histories.

The proof relies on similar definitions to the MCA model given in Appendix A. We outline here the definitions that deviate. Terminating traces are defined as follows using the notation $\overset{\!\!\scriptscriptstyle prp}{\rightarrow }~=~ prp$ for the set of propagation steps (from Definition (13)):

\begin{equation} \begin{array}{rl} cp_{nmca}(c) ~\widehat{=}~ \lbrace s \in {\sf seq}({\mathcal {C}}_n) \mid & \exists h_0.\ s_0 = (c, h_0) \\ {[}1ex] & \wedge \, \forall i. \ 0 \lt i \lt len(s) .\ (s_{i-1}, s_{i}) \in (\overset{\!\!\scriptscriptstyle ps}{\rightarrow }\cup \overset{\!\!\scriptscriptstyle prp}{\rightarrow }) \\ {[}1ex] & \wedge \,(s_{len(s)-1} = (c_n, h_n) \Rightarrow \bigwedge _{t \in ids(c)}c_n(t) = \epsilon) \rbrace . \end{array} \end{equation}

(22)

We define the properties of such computations below, where $s \in cp_{nmca}(c)$ and $s_i = (c_i, h_i)$.

•

A history h is propagated to thread t if none of the writes in h are still pending to be propagated to t and thus t has a complete view of h.

$prop(t, h) ~~\widehat{=}~~ \lbrace wrs^i_{rds} \in h \mid t \notin rds\rbrace = \emptyset$

•

A computation s, with $s_0=(c_o,h_0)$, satisfies precondition P if the initial configuration of s is propagated to all threads and all of these views satisfy P.

$pre_{nmca}(s, P) ~~\widehat{=}~~ \forall t\in ids(c). prop(t, h_0) \wedge view_{t}(h_0) \in P$

•

A computation s adheres to guarantee ${\mathcal {G}}$ if all its program steps adhere to ${\mathcal {G}}$.

$\begin{array}{rl} grs_{nmca}(s,{\mathcal {G}}) ~~~\widehat{=}~~~ \forall t\in ids(c).\, & \forall i. 0\lt i \lt len(s). \\ & (h_{i-1}, h_i) \in \overset{\!\!\scriptscriptstyle ps}{\rightarrow }\ \Rightarrow \, (view_{t}(h_{i-1}), view_{t}(h_i)) \in {\mathcal {G}}_{t} \end{array}$

•

A computation s satisfies postcondition Q if for each thread the final view, if propagated, satisfies Q.

$post_{nmca}(s, Q)~~ \widehat{=}~~ \forall t\in ids(c). prop(t, h_{len(s_-1)}) \Rightarrow view_{t}(h_{len(s)-1}) \in Q$

Validity of a judgement over a program, $\models c ~{\sf sats}~ [P,{\mathcal {R}},{\mathcal {G}},Q]$, states that for all computations of the program if the initial state satisfies P and the history has been propagated to each component in the system, then each program step adheres to ${\mathcal {G}}$ and the final state, if the computation terminates, satisfies Q.

\begin{equation} \begin{array}{l} \models c~{\sf sats}_{nmca}~ [P,{\mathcal {R}},{\mathcal {G}},Q] ~~\widehat{=}~~ \forall s \in cp_{nmca}(c).\, s_0=(\_ ~, h_0) \\ {[}1ex] \ \wedge \,pre_{nmca}(s_0, P) \wedge (\forall t \in ids(c).\ prop(t, h_0)) \Rightarrow grs_{nmca}(s, {\mathcal {G}}) \wedge post_{nmca}(s, Q) \end{array} \end{equation}

(23)

The condition that the history is fully propagated includes the fact that each of the writes that has been passed from the set of pending writes to each component’s view is adhering to ${\mathcal {R}}$, since the effects of the environment constitute the set of pending events.

Soundness of the proof system under NMCA (as introduced in Section 4.2) requires that the rely/guarantee judgement implies its (NMCA) validity.

Theorem B.1.

Soundness

\begin{equation*} {\mathcal {R}}, {\mathcal {G}}\vdash P \lbrace c\rbrace Q ~\Rightarrow ~ \models c~{\sf sats}_{nmca}~ [P,{\mathcal {R}},{\mathcal {G}},Q] \end{equation*}

Proof sketch:

The proof of the soundness theorem for NMCA systems follows a similar argument to the proof of soundness for MCA systems for each component in the system. Additionally, we have to show the following properties for each component $j \in ids(c)$:

Let $\Delta _j$ be the set of pending writes in the history that have not been propagated to component j (i.e., $\Delta _j ~\widehat{=}~ \lbrace wr^i_{rds} \mid j \notin rds \rbrace$) and $\Delta ^i_{j}$ the set of pending writes that component i has read but j has not (i.e., $\Delta ^i_{j} ~\widehat{=}~ \lbrace wr^i_{rds} \mid j \notin rds \wedge i \in rds\rbrace$).

(1)

All writes $wr^i_{rds} \in \Delta _j$ are required when executed under $view_i(h)$ to lead to the component’s postcondition $Q_i$ (assuming $\bigcap _{i \in ids(c)} Q_i \subseteq Q$), satisfy ${\mathcal {G}}_i$, and their verification condition being stable under ${\mathcal {R}}_i$, i.e., ${\mathcal {R}}_i, {\mathcal {G}}_i \vdash view_i(h) ~\lbrace c_i\rbrace ~ Q_i$ has to hold.

(2)

Reorderable writes from the same component in $\Delta _j$ are independent such that the stability of their respective pre- and postconditions is granted under their respective ${\mathcal {R}}_k$ (where k is the writer of the instruction under consideration). This is a similar condition to $\mathit {rif}$ over the instructions of $\Delta _j$; it is, however, weaker in that the focus is on a particular sequence of instructions under the particular $view_j(h)$.

(3)

For components $i \ne j$, the instructions in $\Delta ^i_{j}$ and $c_i$ are reorder-interference-free, i.e., $\mathit {rif}({\mathcal {R}}_i, {\mathcal {G}}_i, \langle \Delta ^i_{j}; c_i\rangle)$, where $\langle \Delta ^i_{j}; c_i\rangle$ represents the sequential composition of all instructions in $\Delta ^i_{j}$ with $c_i$ (this condition relates back to the step $\bigcirc$ in Figure 2).

One can prove that each trace that satisfies the logical rely-guarantee judgement and satisfies these additional conditions on the initial configuration will in its terminal configuration, once all pending writes have been propagated to all components, satisfy the postcondition (i.e., $post(s,Q)$) as well as satisfying its guarantees along the way (i.e., $grs(s, {\mathcal {G}})$).

It remains to show that each trace generated by the logic will satisfy these additional conditions, which follows from the given definition of the rules and the compatibility conditions ${\sf compat}({\mathcal {G}}_i, {\mathcal {R}}_i, {\mathcal {R}}_j)$ given in Section 4.3.

References

[1]

Parosh Aziz Abdulla, Mohamed Faouzi Atig, Ahmed Bouajjani, and Tuan Phong Ngo. 2017. Context-bounded analysis for POWER. In Proceedings of the 23rd International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’17)(Lecture Notes in Computer Science, Vol. 10206). 56–74. DOI:

Abstract

1 Introduction

2 Preliminaries

2.1 Rely/guarantee Reasoning

3 Multicopy Atomic Memory Models

3.1 Reordering Semantics

3.2 Reordering Interference Freedom

3.3 Computing All Reorderable Instructions

3.3.1 Address Calculations.

3.4 Interference Checking

3.5 Elimination of Reordering Interference

3.5.1 Strengthening.

3.5.2 Ignored Reads.

3.6 Soundness

3.7 Precision

4 Non-multicopy Atomic Weak Memory Models

4.1 Write History Semantics

4.2 Reasoning under NMCA

4.2.1 Relating a Pair of Views.

4.2.2 Reordering Before Δ i,j.

4.2.3 Reordering After Δj,i.

4.3 NMCA Rules

4.4 Soundness

4.5 Example for Reasoning on NMCA Behaviour

4.6 Precision

5 Instantiating the Proof System

6 Peterson’s Mutual Exclusion Algorithm

7 Related Work

8 Conclusion

Footnotes

A Soundness for the Proof System Under MCA

B Soundness for the Proof System Under NMCA

References

Cited By

Index Terms

Recommendations

Unifying Operational Weak Memory Verification: An Axiomatic Approach

Alone together: compositional reasoning and inference for weak isolation

Rely/Guarantee Reasoning for Multicopy Atomic Weak Memory Models

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations

4.2.2 Reordering Before Δ _i,j.

4.2.3 Reordering After Δ_j,i.