research-article

Open access

Trace Semantics for C++11 Memory Model

Authors:

Shengchao QinAuthors Info & Claims

Formal Aspects of Computing, Volume 36, Issue 3

Article No.: 14, Pages 1 - 24

https://doi.org/10.1145/3670696

Published: 05 September 2024 Publication History

PDF eReader

Abstract

The C and C++ languages introduced the relaxed-memory concurrency into the language specification for efficiency purposes in 2011. Trace semantics can provide the mathematical foundation for the proposed C++11 memory model, and there is a lack of investigation of trace semantics for C++11.

The Promising Semantics (PS) of Kang et al. provides the standard SC-style operational semantics for the C++11 concurrency model, where “SC” refers to “Sequential Consistency”. Inspired by PS, in this article we first investigate the trace semantics for the relaxed read and write accesses under C++11, acting in the denotational semantics style. In our semantic model, a trace is in the form of a sequence of snapshots, and the snapshots record the modification in the relevant global or local variables, and the thread view. Moreover, the trace semantics for the release/acquire accesses under C++11 is also explored, based on the separated thread views and newly added message views. When considering this trace model, different accesses bring in their unique snapshots, and make distinguished effects on the production of the sequences.

For any given program, the proposed trace semantics in this article produces all the valid traces directly. Furthermore, our trace semantics, together with that for TSO and MCA ARMv8, has the possibility to be the foundation of the meta model of the trace semantics for weak memory models.

1 Introduction

Consistency models are applied to determine the shared memory behaviors of concurrent programs [13]. Sequential Consistency (SC) first introduced by Lamport [14, 15], is the strongest and most intuitive consistency model. It states that a multi-threaded execution is shown as an interleaving of the sequential executions of these threads. And the result of an execution of a single thread is the same as if the operations have been performed in the order specified by the program [25].

Modern hardware architectures like x86 [9, 24], ARM [7], and Power [23] all employ weak (relaxed) memory models. These models enable hardware and compiler optimizations, but produce the behaviors which do not conform to the principle of SC semantics. Mainstream languages such as C++ [1] and Java [19] also provide weak memory models for efficiency purposes.

The C++11 memory model introduces some memory orders, and we concentrate on the two of them in this article: (1) The relaxed memory ordering paradigm (rlx in Figure 1), imposes no additional constraints on reordering and only guarantees coherence, i.e., the property that all the memory accesses to a given location conform to the program order. (2) The release-acquire memory ordering paradigm (ar in Figure 1) includes lightweight inter-thread synchronization between threads.

Fig. 1.

Usually, the C++11 memory model is expressed in an axiomatic style [2]. For a given program, its valid executions are defined by the graphs of memory access events following a set of coherence axioms. The newly proposed Promising Semantics (PS) [10, 16] provides the standard SC-style operational semantics for the C++11 concurrency model, by utilizing the concepts of promise and time stamp.

A thread T may promise to write a value v to a memory location x at some point in the future. For other threads (excluding the thread T), a promise is the same as a common memory write, and then they are allowed to read from such a write out of order. For facilitating preventing out-of-thin-air (OOTA) [10] behaviors, the promise must be fulfilled later. Only when the fulfill operation has been completed can the thread T read from its own promised write.

For each thread T, every memory location x holds a time stamp t whose initial value is 0. The time stamps of all the memory locations appearing in T form the view of the thread T. When T wants to read from x, it has to read from such a message whose time stamp \(t^{\prime }\) recorded for x is greater than or equal to that in T’s view (e.g., the starting point of the red line in Figure 2). And then T’s view of x is modified to \(t^{\prime }\) (i.e., the red dot in Figure 2). When T writes to x, it picks a time stamp \(t^{\prime }\) which is greater than that recorded for x in T’s view to generate a new message. Afterward, we update T’s view of x to match \(t^{\prime }\) .

Fig. 2.

Now, we use the classic example below to help to understand PS. Suppose that \(T_1\) and \(T_2\) promise to write 1 to memory locations x and y at time stamps 1 and 2, respectively. Thus, the messages \(\langle x:0@0 \rangle\) , \(\langle y:0@0 \rangle\) , \(\langle x:1@1 \rangle ,\) and \(\langle y:1@2 \rangle\) are all in the memory. Here, the message is the triple in the form of \(\langle x:v@t \rangle\) , where x is a location, v is a value and t is a time stamp.

\begin{equation*} \begin{array}{l||l||l||l} \ \ \ T_1&\ \ \ T_2&\ \ \ T_3&\ \ \ T_4\\ x:=1 & y:=1 & a:=x; \ \ //1&c:=y;\ \ //1\\ & & b:=y\ \ \ \ //0&d:=x \ \ \ //0 \end{array} \end{equation*}

For \(T_3\) , at first, the time stamps for x and y are both 0. When carrying out the read operation from x, it can select the message \(\langle x:1@1 \rangle\) , and then the value of a is 1 and the time stamp for x is updated to 1. Afterward, \(\langle y:0@0 \rangle\) is chosen and the read from y finishes. In such a case, the value 0 is returned to variable b. The analysis of the reads in \(T_4\) is similar. Since the time stamp for every location in each thread is independent, now x’s time stamp and y’s time stamp in \(T_4\) are both 0. When conducting the two reads from y and x, \(T_4\) is able to choose the messages \(\langle y:1@2 \rangle\) and \(\langle x:0@0 \rangle\) , respectively. It means that the values of c and d will become 1 and 0. Therefore, it is possible for a, b, c, and d to obtain 1, 0, 1, and 0 in the same execution, which is impossible under SC. In order to refer to a specific observation of the program above, we annotate the corresponding reads with the values expected to be read.

Always, trace semantics acts in the denotational semantics style. Compared with operational semantics [21], denotational semantics indicates what a program does. The important advantages of denotational semantics [26] are: (1) “The behavior of each program can be predicted without actually executing it on a computer, and similarly the semantics of a program language can be understood as a whole without visualizing how programs run on a computer” ([27], page 49). (2) “Based on mathematical theory, we can reason about programs, for example, to prove that one program is equivalent to another” ([27], page 49). Specifically, for any given program under relaxed and release/acquire accesses, the trace semantics proposed in this article can output all the valid traces directly. Further, the trace semantics for the C++11 memory model, together with that for TSO and MCA ARMv8 [29, 31], can be regarded as an attempt to provide the meta model of the trace semantics for weak memory models.

In this article, inspired by PS, we first study the denotational semantics for the relaxed accesses under C++11, where the trace structure is applied. In our semantic model, a trace is expressed as a sequence of snapshots. The snapshots exhibit the occurrence of relaxed read and write accesses, and record the modification in each thread view. To get all the valid executions, we add the constraints that the time stamp recorded for each location is monotonically increasing in each thread, and any two threads cannot write to the same location at the same time stamp. Then, the trace semantics for the release/acquire accesses under C++11 is also explored, with the view of each thread being separated and the message views being added. Here, we focus on how the appearance of each snapshot affects the production of the traces. The framework of our work is given in the purple box of Figure 3.

Fig. 3.

Meanwhile, we also explore the algebraic laws for C++11, including a set of sequential and parallel expansion laws [30]. The correctness of the algebraic semantics can be proved, based on the achieved trace semantics.

The remainder of this article is organized as follows. Section 2 gives the semantic model for relaxed accesses under C++11. In Section 3, we investigate the trace semantics for relaxed accesses. The semantic model for the release/acquire accesses is provided in Section 4. Section 5 explores the trace semantics for the release/acquire accesses. We discuss the possibility to provide the meta model of the trace semantics for weak memory models in Section 6. Section 7 presents the related work, and we conclude the article and discuss the future work in Section 8. We leave some definitions and analyses in the appendix.

2 Semantic Model for Relaxed Accesses

The “relaxed” memory ordering paradigm guarantees coherence but is weak enough to require no hardware fences in their implementation.

Similar to the analysis of the example in the previous section, the result \(a=b=0\) can be observed in the same execution.

\begin{equation*} \begin{array}{l||l} \ \ \ T_1&\ \ \ T_2\\ x:=1; & y:=1;\\ a:=y& b:=x \end{array} \end{equation*}

For instance, \(T_1\) and \(T_2\) first promise to write the value 1 to x and y at the time stamps 1 and 2, respectively. Now, the memory is composed of four messages \(\langle x:0 @0 \rangle\) , \(\langle y:0 @0 \rangle\) , \(\langle x:1 @1 \rangle\) and \(\langle y:1 @2 \rangle\) . Considering \(T_1\) , the fulfill operation on x does not have an influence on the time stamp recorded for y in this thread, and y’s time stamp is still 0. Thus, \(\langle y:0 @0 \rangle\) can be chosen and the value 0 can be assigned to a.

Now, we study the trace model for the relaxed accesses under C++11, which is useful for the investigation on trace semantics in the next section. We illustrate the behaviors of a process with traces (sequences of snapshots). A snapshot in a trace is in the form of \((msg, view, oflag, eflag)\) , where:

—

We use msg to record the changes brought by statements, and it is expressed as \((var, val, t)\) . When the time stamp t is not null, msg indicates a promise.

—

The parameter view is applied to keep track of the newest time stamps of the locations related to the statement which takes effects just now. It is made up of a variety of pairs, and each pair is expressed in the form of \((var, t)\) .

—

oflag is to distinguish different types of operations.

—

When writing to a global variable, promising to write to a location results in that oflag is 1, and fulfilling this promise leads to that oflag is 2.

—

\(oflag=3\) denotes writing to a local variable, and \(oflag=0\) indicates the branching condition.

—

For a thread, if one operation is performed by the thread itself, the parameter eflag is 1. Otherwise, the operation is contributed by its environment, and eflag is 0 instead. Then, the environment’s behaviors can be included.

We also define the projection function \(\pi _i(i \in \lbrace 1,2,3,4\rbrace)\) to get the ith element of the snapshot, e.g., \(\pi _1(msg,view,oflag,eflag) = msg\) . For msg, we will use the function \(\pi _i(i \in \lbrace 1,2,3\rbrace)\) to achieve the corresponding variable, value and time stamp, i.e., \(\pi _1(\pi _1(msg,view,oflag,eflag)) = var\) , \(\pi _2(\pi _1(msg,view,oflag,eflag)) = val\) and \(\pi _3(\pi _1(msg,view,oflag,eflag)) = t\) .

3 Trace Semantics for Relaxed Accesses

In this section, we present the trace semantics for the relaxed accesses under C++11. We use \(traces(P)\) to indicate all the valid behaviors of a process P (including the weak but allowed behaviors).

Local Assignment. As a convention, we write a, b, and c for local variables, and we use the third parameter 3 in the snapshot to denote the local assignment.

\begin{align*} &traces(a:=e) =_{df} \lbrace s ^\wedge \langle (\underline{(a,r(e),\text{null})}, \underline{view(e)}, \underline{3}, \underline{1}) \rangle \ | \ \pi _4^*(s) \in 0^* \rbrace . \end{align*}

Here, \(\pi _4^*(s) \in 0^*\) informs that eflag in every added snapshot is 0, i.e., these newly added snapshots are contributed by the environment. As discussed in Section 2, the function \(\pi _4\) is to get the fourth element of a snapshot. Then, the notation \(\pi _4^*(s)\) denotes the repeated execution of \(\pi _4\) on each snapshot in the trace s. Furthermore, \(0^*\) stands for the sequence containing any number of the integer 0.

Example 1.

Let \(P = _{df} a:=x\) , \(Q=_{df}x:=1\) . For \(P||Q\) , Q is the parallel component of P. Now, we consider the trace semantics of P, which can help us to understand the inserted trace s in the definition of \(traces(a:=e)\) .

The sequence in Figure 4 is one trace in \(traces(P)\) , where the last snapshot is contributed by P whose fourth element is 1, while the left two snapshots are both produced by P’s environment (i.e., Q) with eflag being 0.

Fig. 4.

Read Function. Next, we present the read function r appearing in the above formalization, and its framework is given in Figure 5. Function r is split into global read function g and local read function l. For function g, based on the time stamp provided by function cv (step 1 in Figure 5), it can achieve all the possible snapshots (step 2). Then, it can get the concrete value (step 4), via the selected snapshot randomly (step 3). At the same time, the parameter view will also be generated (step 5).

Fig. 5.

Now, we explain and formalize the function r in detail. Above all, we need to check whether the variable read from is global or not. If true, we introduce the function named g to continue the following operations. Otherwise, the function l is provided.

\begin{align*} r(x, tr) = _{df} g(x, tr) \triangleleft x \in Globals \triangleright l(x, tr). \end{align*}

Here, Globals denotes the set of all global variables, and tr represents the trace prefixing the read statement. The symbol \(e \triangleleft h \triangleright f\) stands for e when the judgment h is true, otherwise f. Note that, we only use \(r(x)\) in the snapshot in a trace for simplicity. \(r(e)\) requires us to carry out the read function for each variable in the expression e. For example, \(r(x+y)\) can be expressed as \(r(x)+r(y)\) . After getting the values of x and y, respectively, the value of e can be calculated.

For explaining g better, we first define the function cv. When a thread T would like to read from a memory location x, it must read from such a message whose time stamp is not less than that recorded for x in its view. And the function cv is applied to obtain the mentioned time stamp recorded in T. When \((x,t_x)\) and \((y,t_y)\) are both stored in view, we will use \(view[x]\) and \(view[y]\) to get \(t_x\) and \(t_y\) , respectively.

Here, we use ttr instead, which is formed by concatenating tr and the sub-snapshot sp til the read operation on x, shown by the shadow area in the formalization above. We use Figure 6 to facilitate understanding ttr. For the sequential program \(x:=1;a := x + x\) , upon reading from x on the right-hand side of the operator “+”, tr means \(\langle ((x,1,1),\lbrace \rbrace ,1,1), ((x,1,\text{null}),\lbrace (x,1)\rbrace ,2,1) \rangle\) . The generation of such a trace will be explained later.

Fig. 6.

The notation sp records the information brought by the read from x on the left-hand side of “+”, and then it refers to \(((\_, r(x), \text{null}), view(x), 3, 1)\) , where “ \(\_\) ” means that the assigned variable in \(a:=x+x\) has not gotten the final value. Then, ttr is in the form of \(tr ^\wedge \langle sp \rangle\) , that is, \(\langle ((x,1,1),\lbrace \rbrace ,1,1), ((x,1,\text{null}),\lbrace (x,1)\rbrace ,2,1), ((\_, r(x),\text{null}),view(x),3,1)\rangle\) . In consequence, the case that more than one read from x appears in one statement also conforms to the coherence axiom. The notation \(lt(ttr)\) stands for the last snapshot of the trace ttr, and \(ft(ttr)\) is to denote the result of removing the last snapshot in the trace ttr. Note that, \(tr = ttr \backslash lt(ttr)\) , and we will use the two symbols in functions directly.

We know that the thread T cannot read from its own promised write directly before the promise has been fulfilled, because this would violate per-location coherence. Thus, as the read operation proceeds, there are mainly two situations: (1) One is that it may read from the promises made by other threads. (2) The other is that the snapshots, which reflect the effects of the fulfill operations and are contributed by the thread itself, can be read, as shown by the shadow area in the formalization below. When searching the trace prefixing the read operation in reverse order, maybe there exist some items that satisfy the above conditions, which are all collected by the function gs. Because the trace tr records the changes of various variables, we should add the initial snapshot of x when reading from it additionally, which is formalized as \(tr^{\prime }\) .

Here, ASCII is used to specify the binary numbers of common symbols.

Then, any of them can be returned and we introduce the function rand to select one element randomly. After the read from x completes by the function \(\pi _2(\pi _1(gv(x,tr)))\) , the time stamp recorded for x in the thread’s view should be updated through \(view(x,ttr)\) in the formalization above. Similar to \(r(x)\) , we only use \(view(x)\) in the snapshot in a trace for simplicity. The effects on the thread view brought by a variety of reads in the expression e happen in the order specified by operators.

When T reads from a local variable x, it searches the previous trace for such a snapshot whose oflag, eflag, and the assigned variable are 3, 1, and x, respectively. The detailed formalization is ignored here.

Global Assignment. The write to a memory location x can be simulated by first promising to write to x and then fulfilling the promise, which can be described by the boxed areas in the formalization below.

Similar to the explanation of local assignment, the environment can perform any number of operations before each step of global assignment. Thus, two sub-traces u and v are inserted, which are contributed by the environment.

In the above trace, when making a promise to x, the constant c and time stamp t are chosen randomly. However, only the trace, whose c is equal to the calculated \(r(e)\) and t is greater than \(view(e)[x]\) in the fulfill operation, is valid. Under the operational semantics [10], an extra rule is needed to guarantee that all promises a thread makes are fulfillable. It has been realized by the former constraint. The reason for the latter constraint is that a thread must pick a time stamp strictly greater than that recorded in its view when it writes to x, which is modeled in the shadow area. What we should pay attention to is that the call of read function \(r(e)\) happens in the snapshots, and \(r(e)\) in other places only indicates the return value.

Sequential Composition. In this section, we focus on studying the trace semantics of sequential composition. The investigation can be regarded as conditional interleaving formalized as seqcom, with some constraints on the time stamps appearing in the interleaved sequence s which is modeled as \(inc(s)\) . Then, the trace semantics of sequential composition is provided.

\begin{align*} &u;v = _{df} \lbrace s | s \in seqcom(u,v) \wedge inc(s)\rbrace \\ &traces(P;Q) = \bigcup \limits _{c} u;v, \ \text{where, }c = u \in traces(P) \wedge v \in traces(Q). \end{align*}

Now, we give the definition of the function seqcom in the following. The result of interleaving two empty traces is still empty. If one of them is empty and the other is nonempty, the result follows the nonempty one.

\begin{align*} &seqcom(u,v) = _{df} \left(\begin{array}{l} hd(u) ^\wedge seqcom(tl(u),v)\cup \\ \left(\begin{array}{l} hd(v) ^\wedge seqcom (u,tl(v))\\ \triangleleft \left(\begin{array}{l} \pi _4(hd(v))=0 \vee (\pi _4(hd(v))=1 \wedge \pi _3(\pi _1(hd(v)))!=\text{null}) \end{array} \right) \triangleright \\ \phi \end{array} \right) \end{array}\right)\\ &\text{where, }seqcom(u,\langle \rangle) = _{df} \lbrace u\rbrace ,\ seqcom(\langle \rangle ,v) = _{df} \lbrace v\rbrace ,\ seqcom(\langle \rangle ,\langle \rangle) = _{df} \lbrace \langle \rangle \rbrace . \end{align*}

The first snapshot in the former trace u, formalized by \(hd(u)\) , can always be scheduled. However, if the first in v wants to be triggered, some conditions must be satisfied: (1) This snapshot is produced by the environment, or (2) it is of a promise contributed by the thread itself. The discussed conditions lead to the difference between the traditional interleaving semantics and the one introduced here.

Then, we explain the constraints, formalized as \(inc(s)\) , on the time stamps occurring in s, interleaved from u and v.

\begin{align*} inc(s)& = _{df} \forall x \in Globals, \forall np_1,np_2 \in s \bullet \\ & \left(\begin{array}{l} \left(\begin{array}{l} \pi _4(np_1) = \pi _4(np_2) = 1 \wedge \\ \pi _2(np_1)[x]!=\text{null} \wedge \\ \pi _2(np_2)[x]!=\text{null} \wedge \\ np_1 \ltimes np_2 \end{array}\right) \rightarrow \left(\begin{array}{l} ((\pi _3(np_1) = 2 \wedge \pi _3(np_2) = 2) \\ \ \ \ \ \ \ \ \ \ \ \rightarrow \pi _2(np_1)[x]\lt \pi _2(np_2)[x]) \wedge \\ ((\pi _3(np_1) = 3 \wedge \pi _3(np_2) = 2) \\ \ \ \ \ \ \ \ \ \ \ \rightarrow \pi _2(np_1)[x]\lt \pi _2(np_2)[x]) \wedge \\ ((\pi _3(np_1) = 2 \wedge \pi _3(np_2) = 3) \\ \ \ \ \ \ \ \ \ \ \ \rightarrow \pi _2(np_1)[x] \le \pi _2(np_2)[x]) \wedge \\ ((\pi _3(np_1) = 3 \wedge \pi _3(np_2) = 3) \\ \ \ \ \ \ \ \ \ \ \ \rightarrow \pi _2(np_1)[x] \le \pi _2(np_2)[x]) \end{array}\right) {{\wedge }}\\ \left(\begin{array}{l} \pi _2(np_1)[x]!=\text{null} \wedge \pi _2(np_2)[x]!=\text{null} \wedge \\ ((np_1 \ltimes np_2) \vee (np_2 \ltimes np_1)) \wedge \\ (\pi _3(np_1) = \pi _3(np_2) = 2) \end{array}\right) \rightarrow \ \pi _2(np_1)[x] \ne \pi _2(np_2)[x] \end{array} \right) \end{align*}

When T reads from x, it must read from a message with a time stamp at least as large as the one recorded for x in T’s view. And when T writes to x, it must pick a time stamp strictly larger than the one recorded for x in its view. These two requirements can be denoted by the former in the conjunction ( \(\wedge\) ) formula shown by the shadow area. The additional constraint is that one memory location in the memory cannot be accessed by different threads at the same time, which is modeled by the latter in the conjunction formula. Here, \(np_1\) and \(np_2\) are both snapshots in the trace s. The notation \(np_1 \ltimes np_2\) indicates that the snapshot \(np_1\) appears before \(np_2\) .

Example 3.

Assume x and y are global. Then we take \(P;Q\) into account, where \(P=_{df}x:=1\) , and \(Q=_{df}y:=1\) . The analysis of one trace seq of \(P;Q\) , which is generated from the trace of P (i.e., \(seq_1\) ) and that of Q (i.e., \(seq_2\) ), is provided.

For simplicity, we do not exhibit the environment operations here.

(1)

The snapshot \(((y,1,2),\lbrace \rbrace ,1,1)\) in the latter trace \(seq_2\) is of a promise, and then it can be scheduled first.

(2)

Next, the first snapshot \(((x,1,1),\lbrace \rbrace ,1,1)\) in \(seq_1\) can be selected at any time. Here, it is put in the second position of the trace seq.

(3)

The left two are the snapshots of fulfill operations. Consequently, they must be triggered in the program order. It says that the snapshot \(((x,1,\text{null}),\lbrace (x,1)\rbrace ,2,1)\) is fetched before \(((y,1,\text{null}),\lbrace (y,2)\rbrace ,2,1)\) , shown in Figure 8.

Fig. 8.

Conditional. Consider the execution of Conditional. It will behave the same as P if h is true, otherwise Q. Different from the traditional Conditional, the snapshot of h ( \(\lnot h\) ) should be checked after concatenating the trace s of P (t of Q), using the function inc.

\begin{align*} &traces(P \triangleleft h \triangleright Q) = _{df} \left(\begin{array}{l} \left\lbrace \begin{array}{l} \langle ((\_,r(h),\text{null}),view(h),0,1) \rangle ^\wedge s \ | \\ s \in traces(P) \ \wedge inc(\langle ((\_,r(h),\text{null}),view(h),0,1) \rangle ^\wedge s) \end{array} \right\rbrace \\ \triangleleft \ h \ \triangleright \\ \left\lbrace \begin{array}{l} \langle ((\_,r(\lnot h),\text{null}),view(\lnot h),0,1) \rangle ^\wedge t \ | \\ t \in traces(Q) \wedge inc(\langle ((\_,r(\lnot h),\text{null}),view(\lnot h),0,1) \rangle ^\wedge t) \end{array} \right\rbrace \end{array} \right) \end{align*}

For simplicity, in the trace semantics of Conditional, h always refers to its evaluation (i.e., \(r(h)\) ).

Iteration. The trace semantics of Iteration can be discussed, on the basis of the concept of the least fixed point [3] and that of Conditional.

For “ \({\rm while} \ h \ {\rm do} \ P\) ”, we consider it as “ \({\rm if} \ h \ {\rm then} \ (P;{\rm while} \ h \ {\rm do} \ P) {\rm \ else} \ I\!I\) ”. Then, its trace semantics can be achieved. The notations II and \({\rm STOP}\) are only used to help to define the trace semantics of Iteration.

\begin{align*} &\ \ \ \ traces({\rm while} \ h \ {\rm do} \ P) =_{df} \bigcup \limits _{n=0} ^ {\infty } traces\lbrace F^{n}({\rm STOP})\rbrace , &\ \ \ \ \text{where,} \ F(X) = _{df}{\rm if} \ h \ {\rm then} \ (P;X) {\rm \ else} \ I\!I,\\ &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ F^0(X) =_{df} X,\\ &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ F^{n+1}(X) =_{df} F(F^n(X)) \\ &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ =\underbrace{F(...(F}_{n \ \text{times}}(F(X)))...)\\ &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ traces(I\!I) =_{df} \lbrace \varepsilon \rbrace \ \text{and }traces({\rm STOP}) =_{df} \lbrace \rbrace . \end{align*}

Parallel Composition. In this part, the trace semantics of a parallel construct is explored, formed by the merging of traces contributed by its components.

In general, two sequences \(seq_1\) and \(seq_2\) are said to be comparable if

—

They are built from the same sequence of states: \(\pi _1^*(seq_1) = \pi _1^*(seq_2)\)

—

The view sequences from the two traces are the same: \(\pi _2^*(seq_1) = \pi _2^*(seq_2)\)

—

They are constructed from the same sequence of operation types: \(\pi _3^*(seq_1) = \pi _3^*(seq_2)\)

—

None of their snapshots is made by both components: \(2 \notin \pi _4^*(seq_1)+\pi _4^*(seq_2)\)

In this case, their merge is defined by the predicate Merge as below.

\begin{align*} &Merge(seq,seq_1,seq_2) = _{df} \left(\begin{array}{l} (\pi _1^*(seq) = \pi _1^*(seq_1) = \pi _1^*(seq_2))\ \wedge \\ (\pi _2^*(seq) = \pi _2^*(seq_1) = \pi _2^*(seq_2))\ \wedge \\ (\pi _3^*(seq) = \pi _3^*(seq_1) = \pi _3^*(seq_2))\ \wedge \\ (\pi _4^*(seq) = \pi _4^*(seq_1) + \pi _4^*(seq_2))\ \wedge \\ (2 \notin \pi _4^*(seq_1) + \pi _4^*(seq_2)) \end{array}\right). \end{align*}

Then, we give the definition of the trace semantics of the parallel construct. For facilitating merging, we concatenate the sequence s by the environment of P with \(tr_1\) and it is the same for Q, where \(\pi _4^*(s) \in 0^*\) and \(\pi _4^*(t) \in 0^*\) .

\begin{align*} &traces(P||Q) \\ = _{df}&\left\lbrace \begin{array}{l} tr | tr_1 \in traces(P) \wedge tr_2 \in traces(Q) \wedge (Merge(tr,tr_1 ^\wedge s, tr_2) \vee Merge(tr, tr_1, tr_2 ^\wedge t)) \end{array} \right\rbrace . \end{align*}

The reason for adding s (or t) is that the environment is also allowed to perform any number of operations after the execution of P (or Q).

4 Semantic Model for Release/Acquire Accesses

So far, locks are the only mechanism to synchronize multiple threads. We know that the release-acquire memory ordering paradigm under the C++11 memory model includes lightweight inter-thread synchronization between threads.

One way is using memory fences. Similar to the analysis of the example in Section 1, the result \(a=1\) and \(b=0\) can be observed in the execution of \((x:=1;y:=1)||(a:=y;b:=x)\) . However, the release fence between the writes and the acquire fence between the reads prevent the occurrence of the weak behavior.

\begin{equation*} \begin{array}{l||l} \ \ \ T_1&\ \ \ T_2\\ x:=1; & a:=y;\\ \text{fence-rel}; & \text{fence-acq};\\ y:=1& b:=x \end{array} \end{equation*}

Informally, it says that when a read before an acquire fence reads from a write after a release fence, and the two fences synchronize, any write before the release fence must be visible to any read after the acquire fence. Hence, if the thread \(T_2\) reads \(y=1\) , it must read \(x=1\) .

In addition to release and acquire fences, release writes and acquire reads can also achieve synchronization. An acquire read in the form of \(y_{\text{acq}}\) can be regarded as a relaxed read followed by an acquire fence, and a release write in the form of \(y_{\text{rel}}\) is a release fence followed by a relaxed write. However, these fences only induce synchronization on the location of the access (here, it refers to the location y).

\begin{equation*} \begin{array}{l||l} \ \ \ T_1&\ \ \ T_2\\ x:=1; & a:=y_{\text{acq}};\\ y_{\text{rel}}:=1& b:=x \end{array} \end{equation*}

In consequence, when the acquire read from y gets the value 1, \(T_2\) is forced to read 1 from the location x.

Now, we extend our semantic model in Section 2 to exhibit the inter-thread synchronization under the C++11 memory model, with the thread view being separated and message views being added. Here, the snapshot is redefined as \((msg, thd\_view, otype, eflag)\) , where:

—

The parameter msg is extended by adding the message view \(msg\_view\) , and then it is expressed as \((var,val,t,msg\_view)\) . The message view is used to record the release view of the writing thread when the write happens, which is updated to include the write itself.

Further, release fence (fence-rel) or acquire fence (fence-acq) can also be illustrated by the element var, and the corresponding value val is null.

—

\(thd\_view\) is divided to three parts, namely the current view \(cur\_view\) , the acquire view \(acq\_view\) and the release view \(rel\_view\) , and the initial values of them are \(\lbrace \bot \rbrace\) , \(\lbrace \bot \rbrace ,\) and \(\lbrace \pm \rbrace\) ¹.

—

The explanation of \(cur\_view\) is the same as the parameter view introduced in the previous sections.

—

\(acq\_view\) records what the thread’s current view will become if it performs an acquire fence.

When a thread reads a message, it incorporates the message’s view into the thread’s acquire view.

—

The release view \(rel\_view\) of a thread is treated as one separate view per location, instead of a single view. It is used to record the thread’s current view reaching the latest release fence or release write to that location.

—

The thread view in Section 2 only keeps tracks of the newest time stamps of the locations appearing in the statement which takes effects just now. However, the separated thread views here can exhibit the time stamps of every location after the occurrence of the corresponding snapshot.

—

otype is in the form of \((oflag,dom_r,dom_a)\) . oflag works on distinguishing different operation types and Table 1 above gives a brief introduction to it.

—

The actions (promise, fulfill, and register write) discussed previously are all regarded as relaxed.

—

The release write makes the value of oflag be -2.

—

If the operation is a release fence, oflag is 10, while if it is an acquire fence, oflag is 20.

In addition, the notation \(dom_r\) collects the variables related to relaxed reads, while the variables connected with acquire reads appear in the set \(dom_a\) .

—

The meaning of the parameter eflag is as before.

Table 1.

oflag	Values
oflag	1	2	3	0	-2	10	20
Types	Promise	Fulfill	Register Write	Condition	Release Write	Fence-rel	Fence-acq

Table 1. Different Types of Operations Divided by Oflag

5 Trace Semantics for Release/Acquire Accesses

In the following, we present the trace semantics for the release/acquire accesses under the C++11 memory model.

Local Assignment. When performing the register write, the data state of a is modified to \(r(e)\) . The local variables are invisible to other threads, and then the corresponding message view and release view are \(\lbrace \rbrace\) and \(\lbrace \pm \rbrace\) , respectively.

\begin{align*} &traces(a:=e) \\ = _{df} &\lbrace s ^\wedge \langle (\underline{(a,r(e),\text{null},\lbrace \rbrace)},\underline{(\lbrace \pm \rbrace , cur\_view(e), acq\_view(e))} ,\underline{(3,dom_r(e),dom_a(e))}, \underline{1}) \rangle \rbrace \end{align*}

where, \(\pi _4^*(s) \in 0^*\) .

At the same time, the current view \(cur\_view\) and acquire view \(acq\_view\) will be updated with the definition of \(dom_r\) and \(dom_a\) acting on the expression e.

Read Function. With the appearance of the message view and acquire reads, the read function r and the generation of the thread view will be transferred into the ones formalized below, whose framework is given in Figure 10. The explanation of some notations is as before. For function r, above all, we need to judge whether the variable read from is global, and we use the functions g and l to obtain the value of a global or local variable, respectively.

\begin{align*} r(x,tr) = _{df} g(x,tr) \triangleleft x \in Globals \triangleright l(x,tr), \end{align*}

Fig. 10.

If the thread T wants to perform a relaxed read or an acquire read, it is necessary to get the time stamp recorded for x in its current view. In addition, after some reads on the variables appearing in the expression e, the current view and acquire view are supposed to be updated, based on the ones before executing the local assignment. The original ones can be achieved by functions cv and av, which are formalized in the following.

\begin{align*} &cv(ttr) = _{df} \left(\begin{array}{l} \pi _2(\pi _2(lt(ttr)))\\ \ \ \ \ \triangleleft \pi _4(lt(ttr))=1 \wedge \pi _2(\pi _2(lt(ttr)))!=\lbrace \bot \rbrace \triangleright \\ cv(ft(ttr))\end{array} \right) \\ &cv(\langle \rangle) =_{df} \lbrace \bot \rbrace \end{align*}

\begin{align*} &av(ttr) = _{df} \left(\begin{array}{l} \pi _3(\pi _2(lt(ttr)))\\ \ \ \ \ \triangleleft \pi _4(lt(ttr))=1 \wedge \pi _3(\pi _2(lt(ttr)))!=\lbrace \bot \rbrace \triangleright \\ av(ft(ttr))\end{array} \right) \\ &av(\langle \rangle) =_{df} \lbrace \bot \rbrace \\ &\text{where, }ttr = tr^\wedge \langle sp \rangle . \end{align*}

Here, ttr is expressed as \(tr ^\wedge \langle sp \rangle\) , where sp is the sub-snapshot recording the information til the read operation on x.

As mentioned in Section 3, some possible items can be returned when T conducts the read from a global variable x using the function gs, which is redefined in the following. For the formalization of gs, there are mainly three cases: (1) If we find such a snapshot, which is of a fulfill operation or a release write, and produced by the thread itself, it can be collected. The extra requirement is that the time stamp recorded for x in the current view of this snapshot should be larger than or equal to that of the thread T which is denoted by t. (2) If the snapshot of a promise performed by the environment, and its promised time stamp is not less than t, it will be collected by the function gs. (3) The snapshot of a release write contributed by the environment can also be saved in the function gs, with the same requirement in the first case.

Always, the thread T does not need to read the latest write to x, since there is no shared understanding among threads of what the latest write is. Therefore, we apply the function gv to choose one randomly.

\begin{align*} &g(x,tr) = _{df} \pi _2(\pi _1(gv(x,tr)))\\ &gv(x,tr) = _{df} rand(gs(x,\boxed{cv(ttr)[x]},tr^{\prime }))\\ &cur\_view(x, ttr) =_{df} \left(\begin{array}{l} cv(ttr) \sqcup ^2 \lbrace (x,\pi _2(\pi _2(gv(x,tr)))[x])\rbrace \sqcup \pi _4(\pi _1(gv(x,tr))) \\ \ \ \triangleleft x \in dom_a \triangleright \\ cv(ttr) \sqcup \lbrace (x,\pi _2(\pi _2(gv(x,tr)))[x])\rbrace \end{array} \right)\\ &acq\_view(x, ttr) =_{df} av(ttr) \sqcup \lbrace (x,\pi _2(\pi _2(gv(x,tr)))[x])\rbrace \sqcup \pi _4(\pi _1(gv(x,tr))) \\ &gs(x,t,tr^{\prime })\\ = _{df} &\left(\begin{array}{l} \lbrace lt(tr^{\prime })\rbrace \cup gs(x,t,ft(tr^{\prime }))\\ \triangleleft \left(\begin{array}{l} ((\pi _4(lt(tr^{\prime }))=1 \wedge \pi _1(\pi _3(lt(tr^{\prime }))) \in \lbrace 2,-2\rbrace \wedge \pi _2(\pi _2(lt(tr^{\prime })))[x]\ge t)\vee \\ (\pi _4(lt(tr^{\prime }))=0 \wedge \pi _1(\pi _3(lt(tr^{\prime }))) =1 \wedge \pi _3(\pi _1(lt(tr^{\prime }))) \ge t) \vee \\ (\pi _4(lt(tr^{\prime }))=0 \wedge \pi _1(\pi _3(lt(tr^{\prime }))) =-2 \wedge \pi _2(\pi _2(lt(tr^{\prime })))[x]\ge t))\\ \wedge \text{ASCII}(x) = \text{ASCII}(\pi _1(\pi _1(lt(tr^{\prime })))) \end{array}\right)\triangleright \\ gs(x,t,ft(tr^{\prime }))\end{array}\right)\\ &gs(x, t, \langle \rangle) =_{df} \lbrace \rbrace \\ &\text{where, }tr^{\prime } = \langle ((x,0,\text{null}, \lbrace \rbrace),(\lbrace \pm \rbrace ,\lbrace \bot \rbrace ,\lbrace \bot \rbrace),(2,\lbrace \rbrace ,\lbrace \rbrace),1) \rangle ^\wedge tr. \end{align*}

On the basis of the selected snapshot, the concrete value of x can be achieved, and the current view and acquire view should be updated. As shown in Figure 10, for the current view \(cur\_view\) , if x is not one element of the set \(dom_a\) , we are supposed to match the current view of x to the time stamp \(\pi _2(\pi _2(gv(x,tr)))[x]\) of the snapshot, which provides x’s value. Otherwise, the message view \(\pi _4(\pi _1(gv(x,tr)))\) in the mentioned snapshot is also included. The green lines and the red lines in Figure 10 present the differences between the two situations. For the acquire view \(acq\_view\) , we should use the message view \(\pi _4(\pi _1(gv(x,tr)))\) and \(\pi _2(\pi _2(gv(x,tr)))[x]\) to update the original one, shown as the blue lines in Figure 10.

Relaxed Write. As discussed in Section 3, the relaxed write can be simulated by promising to write a value v to a location x at the time stamp t, and then fulfilling the promise above. For a write to x, the message view originating from the release view is responsible for informing other threads about the modification of x. Then, after the write mentioned, the message view to x will be added in the snapshot of the promise operation, and the release view will be updated in the snapshot of the fulfill action. Here, we first define the function rv to get the release view before the relaxed write.

\begin{align*} &rv(ttr) = _{df} \left(\begin{array}{l} \pi _1(\pi _2(lt(ttr)))\\ \ \ \ \ \triangleleft \pi _4(lt(ttr))=1 \wedge \pi _1(\pi _2(lt(ttr)))!=\lbrace \pm \rbrace \triangleright \\ rv(ft(ttr)) \end{array} \right) \\ &rv(\langle \rangle) = _{df} \lbrace \pm \rbrace \\ &\text{where, } ttr = tr^\wedge \langle sp \rangle . \end{align*}

Then, for the single relaxed write, we set the message view to be \(\lbrace (x,t)\rbrace\) in the promise operation where t is the promised time stamp, and \(\lbrace \rbrace\) in the fulfill operation. After the relaxed write, the write to x at the time stamp t will extend the release view, current view and acquire view of the thread. To display the snapshots clearly, we add underlines for the elements in any snapshot.

Release Accesses. Release writes do not allow promises. And the traces of \(x_{\text{rel}}:=e\) are formed by allowing any number of operations contributed by the environment before the snapshot of the release write.

The changes to the current view and acquire view are the same to the ones in the relaxed write. Further, the release view of x should include the updated current view with the replacement operator “/ ”. The message view \(msg\_view\) in the snapshot of \(x_{\text{rel}}:=e\) is modified to the updated release view of the location x. Hence, an acquire read from x in another thread can incorporate it into the thread’s acquire view and current view, and then construct the inter-thread synchronization.

Release/Acquire Fences. An interesting feature under the C++11 memory model is the ability for threads to synchronize using memory fences. When a read before an acquire fence reads from a write after a release fence, and the two fences synchronize, any write before the release fence must be visible to any read after the acquire fence. Maybe we cannot see the feature from the formalization of release fence in this part. It can be reflected by Example 5 in the following.

\begin{align*} &traces(\text{fence-rel})\\ = _{df}& \left\lbrace s^\wedge \langle (\underline{(\text{fence-rel},\text{null},\text{null},\lbrace \rbrace)},\underline{(\lbrace \pm \rbrace ,\lbrace \bot \rbrace ,\lbrace \bot \rbrace)},\underline{(10,\lbrace \rbrace ,\lbrace \rbrace)},\underline{1})\rangle \ |\ \pi _4^*(s) \in 0^* \right\rbrace \end{align*}

The definition of fence-acq’s traces is similar, which is not provided here.

Sequential Composition. The trace semantics of sequential composition can be regarded as a variant of the interleaving model, which is formalized by the function \(seqcom(u,v,thd\_view)\) . The parameter \(thd\_view\) records the release view, current view and acquire view that the function (thread) holds, and is in the form of \((rel\_view,cur\_view,acq\_view)\) with the initial values of them being \(\lbrace \pm \rbrace\) , \(\lbrace \bot \rbrace\) and \(\lbrace \bot \rbrace\) , respectively. They will be changed because of each interleaved snapshot.

For \(seqcom(u,v,thd\_view)\) , the result of interleaving two empty traces is still empty, illustrated in case 1.

\begin{align*} &{\bf case 1} seqcom(\langle \rangle ,\langle \rangle ,thd\_view)= _{df} \lbrace \langle \rangle \rbrace ,\\ &{\bf case 2} seqcom(u,\langle \rangle ,thd\_view)= _{df} \lbrace u\rbrace , \ seqcom(\langle \rangle ,v,thd\_view)= _{df} \lbrace v\rbrace . \end{align*}

For the two traces to be interleaved, if one of them is empty and the other is nonempty, the result follows the nonempty one, modeled in case 2.

For general scenarios, \(seqcom(u,v,thd\_view)\) can engage in the first snapshot in u, which is modeled as the function \(seqcom_l(u,v,thd\_view)\) , or in the next trace v, which is formalized by the function \(seqcom_r(u,v,thd\_view)\) .

\begin{align*} &seqcom(u,v,thd\_view) = _{df} seqcom_l(u,v,thd\_view) \cup seqcom_r(u,v,thd\_view). \end{align*}

Different from the traditional interleaving semantics, not only the execution of \(seqcom_l\) , but also that of \(seqcom_r\) must satisfy some requirements. In the following, we give the explanation about them in detail.

For function \(seqcom_l\) , the first snapshot in u can always be scheduled. However, when the snapshot comes, the parameters in it will be changed to record the information of the subprogram in front of it. Different types of operations will make distinguished effects on the production of the traces, modeled as \(case_i(u,v)\) , and Table 2 gives a brief introduction to them.

\begin{align*} &seqcom_l(u,v,(rel\_view,cur\_view,acq\_view))\\ =_{df} &\left(\begin{array}{l} case_0(u,v) \triangleleft \ \pi _4(hd(u))=0 \ \triangleright \\ \left(\begin{array}{l} case_1(u,v) \triangleleft \ \pi _1(\pi _3(hd(u)))=1 \ \triangleright \\ \left(\begin{array}{l} case_2(u,v) \triangleleft \ \pi _1(\pi _3(hd(u)))=2 \ \triangleright \\ \left(\begin{array}{l} case_{10}(u,v) \triangleleft \ \pi _1(\pi _3(hd(u)))=10 \ \triangleright \\ \left(\begin{array}{l} case_{20}(u,v) \triangleleft \ \pi _1(\pi _3(hd(u)))=20\ \triangleright \\ \left(case_{-2}(u,v) \triangleleft \ \pi _1(\pi _3(hd(u)))=-2 \ \triangleright case_{03}(u,v) \right) \end{array} \right) \end{array} \right) \end{array} \right) \end{array} \right) \end{array} \right). \end{align*}

Table 2.

Cases	Description
\(case_0(u,v)\)	If the first snapshot in u is contributed by the environment, how to make it be the head of the interleaving of u and v.
\(case_1(u,v)\)	The snapshot of a promise is at the head of u.
\(case_2(u,v)\)	One snapshot of a fulfill takes the lead in u.
\(case_{10}(u,v)\)	A release fence instruction’s snapshot comes first in the trace u.
\(case_{20}(u,v)\)	The acquire fence instruction is scheduled first in u.
\(case_{-2}(u,v)\)	The snapshot of a release write is at the head of u.
\(case_{03}(u,v)\)	One snapshot of a register write or judgment takes the lead in u.

Table 2. The Description of Cases

Now, we give the detailed formalization of these cases. Here, we only list two cases, and others are put in Appendix A.

The snapshot of a promise operation at the beginning of the trace u would like to be scheduled first. It requires that the release view to the location \(\pi _1(\pi _1(hd(u)))\) kept in the function is used to expand the message view of the snapshot, which is formalized as \(case_1\) . Figure 11(1) gives an intuitive description of the changes in views of the interleaved snapshot.

\begin{align*} &case_1(u,v)\\ = _{df}& hd(u)[\underline{(\pi _4(\pi _1(hd(u))) \sqcup rel\_view[\pi _1(\pi _1(hd(u)))])}/ \underline{\pi _4(\pi _1(hd(u)))}] ^\wedge seqcom(tl(u),v,thd\_view). \end{align*}

Fig. 11.

If the head in the trace u, which is the snapshot of a release fence, wants to be executed firstly, the separated thread views of \(hd(u)\) are all modified. The reason is that a release fence will update the release views of all locations to match the current view of the function (thread), and the relation \(rel\_view \le cur\_view \le acq\_view\) always holds. This case is modeled as \(case_{10}\) in the following. The modification of the views of the interleaved snapshot and the function is exhibited in Figure 11(2).

\begin{align*} &case_{10}(u,v) \\ = _{df} &\left(\begin{array}{l} hd(u)[\lbrace (\_,cur\_view)\rbrace /\pi _1(\pi _2(hd(u))), cur\_view/\pi _2(\pi _2(hd(u))), acq\_view/\pi _3(\pi _2(hd(u)))] ^\wedge \\ seqcom(tl(u),v,(\lbrace (\_,cur\_view)\rbrace ,cur\_view,acq\_view)) \end{array} \right). \end{align*}

For function \(seqcom_r\) , if the first snapshot in the subsequent trace v wants to be triggered, it should satisfy the conditions that it is contributed by the environment, or it is of the promise operation produced by the thread itself and no release fence instructions or release writes exist in u. The former condition is modeled as \(\pi _4(hd(v))=0\) , and then we put \(hd(v)\) in the head of the interleaving of the traces u and v merely. The latter condition is expressed in the shadow area in the following formalization. Then, for \(hd(v)\) which is of a promise, the analysis is similar to \(case_1\) .

Now, we define the trace semantics for the sequential composition.

\begin{align*} &u;v = _{df} \lbrace s | s \in seqcom(u,v,(\lbrace \pm \rbrace ,\lbrace \bot \rbrace ,\lbrace \bot \rbrace))\rbrace \\ &traces(P;Q) = \bigcup \limits _{c} u;v, \text{where, }c = u \in traces(P) \wedge v \in traces(Q) \end{align*}

Below are some examples to help to understand the relaxed write, the release write, the release fence, and sequential composition. We first use Example 5 below to describe the intuitive understanding of the appearance of the release fence.

Example 5.

Consider \(x:=1;\text{fence-rel};y:=1\) , where the release fence works as a barrier for promises.

(1)

The generation of the traces \(seq_1\) (of \(x:=1\) ), \(seq_2\) (of fence-rel) and \(seq_3\) (of \(y:=1\) ) is shown in Figure 12. In it, we make every snapshot be framed.

(2)

Release fences serve as barriers for promises. Consequently, only the head in \(seq_1\) can be selected to be interleaved. Afterward, the snapshot of the corresponding fulfill operation is also triggered. They are put in the first and second positions in seq. At present, the release view of x, the current view and the acquire view of the thread (function) are all transferred into \(\lbrace (x,1)\rbrace\) .

(3)

Then, the snapshot \(((\text{fence-rel},\text{null},\text{null},\lbrace \rbrace),(\lbrace \pm \rbrace ,\lbrace \bot \rbrace , \lbrace \bot \rbrace),(10,\lbrace \rbrace ,\lbrace \rbrace),1)\) is fetched. According the views of the thread, the snapshot’s release view is modified to map every location to \(\lbrace (x,1)\rbrace\) , and current view and acquire view of the snapshot are both changed to \(\lbrace (x,1)\rbrace\) . Meanwhile, the release view of each location of the thread is \(\lbrace (x,1)\rbrace\) .

(4)

For the promise of the relaxed write to y, the change to this snapshot is that the release view of y of the thread (function) is also included in the message view. It means that the message view of the snapshot is \(\lbrace (x,1),(y,2)\rbrace\) .

(5)

For the separated thread views of the last snapshot in \(seq_3\) , they have been updated based on the views owned by the thread. They are \(\lbrace (\_,\lbrace (x,1)\rbrace),(y,\lbrace (y,2)\rbrace)\rbrace\) , \(\lbrace (x,1),(y,2)\rbrace\) and \(\lbrace (x,1),(y,2)\rbrace\) , respectively.

Fig. 12.

When a thread performs a release write to x, we update its release view to match its current view, while a release fence effects this update on the release views on all locations. Example 6 helps to illustrate the effects by release writes.

Example 6.

Consider the program \(x:=1;y_{\text{rel}}:=1\) which consists of a relaxed write to x and a release write to y.

(1)

The trace \(seq_1\) of the relaxed write \(x:=1\) is shown in Figure 13. And the release write also yields the trace \(seq_2\) .

(2)

According to the principle of the function seqcom, the snapshots in \(seq_1\) are interleaved firstly. And the release view of the location x, current view and acquire view in the thread (function) are all changed to \(\lbrace (x,1)\rbrace\) .

(3)

Upon executing the release write to y, the release view to y in the corresponding snapshot has been updated to be \(\lbrace (x,1),(y,2)\rbrace\) . It is the same to the changes of other views.

Fig. 13.

The description and formalization of Conditional, Iteration and parallel construct here are similar to those in Section 3, and we do not present them again.

6 Discussion of Trace Semantics for Weak Memory Models

So far, we have investigated the trace semantics for Total Store Order (TSO), Multi Copy Atomic (MCA) ARMv8 and C++11, which inspires us to seek the generality among them. In the following, we analyze the trace semantics for ARMv8 and release/acquire accesses under C++11, and discuss the possibility to provide a meta model of the trace semantics for weak memory models.

Under the MCA ARMv8 memory model, thread-local out-of-order, speculative execution, and thread-local buffering may bring in relaxed-memory effects.

\begin{align*} traces(P;Q) = \bigcup \limits _{c} seqcom(u,v), \text{where, }c = u \in traces(P) \wedge v \in traces(Q). \end{align*}

Similar to the trace model introduced in Section 5, the trace semantics of sequential composition under ARMv8 is given with the application of the function seqcom below, whose target is to interleave two traces u and v. For traces u and v, the first snapshot in u can always be scheduled.

\begin{align*} &seqcom(u,v) = _{df} \left(\begin{array}{l} hd(u) ^\wedge seqcom(tl(u),v)\\ \cup \left(\begin{array}{l} (hd(v) ^\wedge seqcom(u,tl(v)))\\ \ \ \ \triangleleft \ \pi _3(hd(v))=0\vee \bigvee \limits _{i \in \lbrace 1,2,3,4,5\rbrace } case_i(u,v) \ \triangleright \\ \phi \end{array}\right) \end{array}\right) \\ &\text{where, }seqcom(u, \langle \rangle) = \lbrace u\rbrace ,\ seqcom(\langle \rangle ,v) = \lbrace v\rbrace , seqcom(\langle \rangle , \langle \rangle) = \lbrace \langle \rangle \rbrace . \end{align*}

However, if the first snapshot in v wants to be triggered, it should satisfy the conditions that it is contributed by the environment, or it is done by the thread itself but meets one of the five requirements expressed by \(case_i\) , where \(i \in \lbrace 1,2,3,4,5\rbrace\) . \(case_i (i \in \lbrace 1,2,3,4,5\rbrace)\) describes that if the first in the latter trace v is the snapshot of fulfill fence, control fence, global assignment, local assignment, and branching condition, respectively.

Here, we explain and formalize \(case_2\) for an example. The snapshot of a control fence instruction (cfence in the formalization below) at the beginning of trace v would like to be scheduled first. It requires that any snapshot \(a^{\prime }\) related to a barrier ( \(\pi _2(a^{\prime })\) is equal to -1 or -2) or a branching condition ( \(\pi _2(a^{\prime })\) is 0), does not occur in the trace u. More details about the trace semantics of ARMv8 can be found in [29].

\begin{align*} case_2(u,v) = _{df} \left(\begin{array}{l} \pi _1(hd(v)) = \text{cfence} \wedge \pi _3(hd(v))=1\\ \wedge \ \forall a^{\prime } \in u \bullet \left(\begin{array}{l} \pi _3(a^{\prime }) = 1 \rightarrow \\ \left(\begin{array}{l} \pi _2(a^{\prime }) != 0 \wedge \pi _2(a^{\prime }) != -1 \wedge \pi _2(a^{\prime }) != -2 \end{array}\right) \end{array}\right) \end{array} \right). \end{align*}

Considering function \(seqcom_r(u,v)\) in Section 5, snapshot \(h(v)\) can be scheduled firstly if and only if it is contributed by the environment, or it is of the promise operation produced by the thread itself and no release fence instructions or release write exist in the former trace u. We find that the construction of the trace semantics and the important function seqcom in MCA ARMv8 and C++11 is similar.

In consequence, it is possible for us to study the trace semantics for the basic statements, and then explore the trace semantics for programs under weak memory models in the same form. The latter relies on the analysis of a variety of dependencies among program statements, which reflect the features of different memory models. As shown in Figure 14, they can be modeled by \(function_{TSO}\) , \(function_{ARMv8}\) , \(function_{C++11}\) , and so on. It says that the trace semantics for the C++11 memory model is the combination of that for weak memory models and \(function_{C++11}\) .

Fig. 14.

7 Related Work

There is an extensive literature on weak memory models, since modern hardware architectures and mainstream programming languages do not provide the sequentially consistent memory, and instead they have weak memory models.

Owens et al. [20] described a new x86-TSO model, which did not suffer from the ambiguities, weakness, or unsoundness of earlier models. Its abstract-machine definition should be intuitive for programmers, and its equivalent axiomatic definition supported the memevents exhaustive search. Pulte et al. [22] discussed the motivation for the new MCA ARMv8 concurrency architecture, including complexities that arose in the non-MCA case. Then, they defined two formal concurrency models: an operational one, and the axiomatic model of the revised ARMv8 specification. Mador-Haim et al. [18] presented an alternative memory model specification for POWER using an axiomatic style. It was believed to be significantly more abstract and concise than the previously published operational model.

Particularly, the semantics for the C++11 memory model has been studied for several years. In contrast to the C++ memory model, which relies on declarative semantics over event graphs, Kang et al. [10, 16] employed a more standard SC-style operational semantics for concurrency, in which the executions of different threads are non deterministically interleaved. It could account for a broad spectrum of features from the C++11 memory model and demonstrate the absence of bad “out-of-thin-air” behaviors. Owicki-Gries reasoning for concurrent programs uses Hoare logic together with an interference freedom rule for concurrency. Dalvandi et al. [5, 6] developed an assertion language for the C11 RAR memory model (a fragment of C11 with both relaxed and release-acquire accesses), which enabled re-use of the entire Owicki-Gries proof calculus except for the axiom of assignment. Then, they introduced the first deductive verification environment in Isabelle/HOL for C11-like weak memory programs. Modular Relaxed Dependencies (MRD) is a denotational semantics defined over an event structure, and has been recognized as a potential solution to the thin air problem in C and C++. Wright et al. [28] firstly proposed an operational semantics for MRD, proved to be sound and complete with respect to MRD. And on this basis, they presented an associated logic that generalized a recent Owicki-Gries framework for RC11 (repaired C11), and demonstrated the use of this logic.

This article applies the UTP (Unifying Theories of Programming) [8] approach to investigate the trace semantics for the C++11 memory model. The UTP approach combines the advantages of the operational semantics, denotational semantics and algebraic semantics to provide a unified theoretical framework for describing programming languages and systems. Li et al. [17] elaborated a modeling language for cyber-physical systems, and then explored the denotational semantics and algebraic semantics of their language based on the UTP approach. Xie et al. [32] proposed a process calculus BigrTiMo for the structure-aware mobile systems, and the semantics was formalized in the UTP framework. Safety-Critical Java (SCJ) is a version of Java for real-time programming, restricted to facilitate certification of implementations of safety-critical systems. Cavalcanti et al. [4] used the UTP theory to study the safety critical Java memory model.

Specially, for the denotational semantics, kavanagh and brookes [12] used pomsets (partially-ordered multiset) to provide a denotational semantics for SPARC TSO, which could capture exactly the behaviors permitted by SPARC TSO. Their denotational semantics assigned to each program a collection of pomsets. In addition, they also introduced a denotational semantic framework for shared-memory concurrent programs in a C11-style memory model [11]. This denotational approach was an alternative to techniques based on “execution graphs” and axiomatizations, and it allowed for compositional reasoning. Xiao et al. [31] investigated the trace semantics for TSO, acting in the denotational semantics style. All the valid execution results containing reordering can be described after kicking out those that do not satisfy program order and modification order. Under the revised ARMv8 architecture, the reordering occurs because of the absence of a variety of dependencies. Therefore, Xiao and Zhu [29] firstly studied the trace semantics of each single statement. After discussing the dependencies of these statements, they gave the trace semantics of the programs under ARMv8.

In this article, inspired by Promising Semantics, we study the denotational semantics for the relaxed accesses, and the release/acquire accesses under the C++11 memory model, where the trace structure is applied. On the basis of the discussion in Section 6, compared with other existing semantics, the trace semantics for C++11 proposed in this article, and that for TSO and MCA ARMv8 [29, 31], can be regarded as the foundation of the meta model of the trace semantics for weak memory models.

8 Conclusion and Future Work

The C++ language provided the weak memory model (i.e., C++11) to improve the performance in 2011. In this article, we have studied the trace semantics for the relaxed accesses, release writes, acquire reads and release, and acquire fences. Further, by giving the constraints on the time stamps in each sequence, or providing the formalization of the effects made by different types of operations, we presented all the possible behaviors of any program under C++11.

In the future, we will continue our work on C++11 memory model, especially exploring the semantics linking theories of C++11. We also want to implement the trace semantics and semantics linking in proof assistant Isabelle/HOL or Coq. Further, we would like to propose the trace semantics for any weak memory model.

Footnotes

Note that, we use \(\bot\) to map every location to the initial time stamp 0, and the notation \(\pm\) is applied to map each location to \(\lbrace \bot \rbrace\) .

The operator “ \(\sqcup\) ” is to select the maximum time stamp to any location x.

A The Formalization and Explanation of Some Cases

In Section 5, the cases \(case_0\) , \(case_2\) , \(case_{20}\) , \(case_{-2}\) , and \(case_{03}\) are also used to support the trace semantics of sequential composition under release/acquire accesses. Here, we give the formalization and explanation of them.

\(case_0\) is that the first in u is the snapshot of an environment operation, and it wants to become the head of the interleaving of u and v. It does not need to reach the extra requirements and make any changes.

\begin{align*} &case_0(u,v)=_{df}hd(u) ^\wedge seqcom(tl(u),v,thd\_view). \end{align*}

Provided that the first snapshot \(hd(u)\) in u is resulting from fulfilling a promise, the thread views kept in the function and those in the snapshot rely on each other for updates, with the application of the operator “ \(\sqcup\) ”. Then, they can keep track of the newest time stamps of locations.

\begin{align*} &case_2(u,v) = _{df}\left(\begin{array}{l} hd(u)[\underline{(rel\_view \sqcup \pi _1(\pi _2(hd(u))))}/\underline{\pi _1(\pi _2(hd(u)))}, \\ \ \ \ \ \ \ \ \ \ \ \ \underline{(cur\_view \sqcup \pi _2(\pi _2(hd(u))))}/\underline{\pi _2(\pi _2(hd(u)))}, \\ \ \ \ \ \ \ \ \ \ \ \ \underline{(acq\_view \sqcup \pi _3(\pi _2(hd(u))))}/\underline{\pi _3(\pi _2(hd(u)))}] ^\wedge \\ seqcom(tl(u),v,((rel\_view \sqcup \pi _1(\pi _2(hd(u)))), \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (cur\_view \sqcup \pi _2(\pi _2(hd(u)))),\\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (acq\_view \sqcup \pi _3(\pi _2(hd(u)))))) \end{array} \right). \end{align*}

The analysis of an acquire fence is similar to that of a release fence. The difference is that the acquire fence updates the current view of \(hd(u)\) to match the acquire view, and only the current view and acquire view of \(hd(u)\) are changed, which is denoted by \(case_{20}\) .

\begin{align*} &case_{20}(u,v) = _{df} \left(\begin{array}{l} hd(u)[\underline{acq\_view}/\pi _2(\pi _2(hd(u))), acq\_view/\pi _3(\pi _2(hd(u)))] ^\wedge \\ seqcom(tl(u),v,(rel\_view,\underline{acq\_view},acq\_view)) \end{array} \right). \end{align*}

\(case_{-2}\) is that the first in u is the snapshot of a release write, and it feels like becoming the head of the interleaving. When the thread T performs a release write to x, the write will construct the lightweight synchronization with an acquire read from x in another thread \(T^{\prime }\) . Then, the message view in the write, which originates from the release view of x, is responsible for informing \(T^{\prime }\) about the modifications in T.

Then, for the interleaved snapshot, the present \(cur\_view\) in the function will be included in the release view of \(\pi _1(\pi _1(hd(u)))\) , with the application of the operator “ \(\sqcup\) ”. We will also use \(cur\_view\) and \(acq\_view\) in the function to extend the current view and acquire view of the snapshot \(hd(u)\) . The same extension will be also brought into the \(rel\_view\) , \(cur\_view\) and \(acq\_view\) of the function.

In addition, the updated release view, as well as the message view \(\pi _4(\pi _1(hd(u)))\) in \(hd(u)\) will be used to take place of the original message view.

We define \(case_{03}\) to describe the case that the head \(hd(u)\) in u, which is the snapshot of a register write or the branching condition, wants to be interleaved firstly. Only the current view and acquire view of \(hd(u)\) , and \(cur\_view\) and \(acq\_view\) in the function need to be updated.

\begin{align*} &case_{03}(u,v)\\ =_{df}&\left(\begin{array}{l} hd(u)[ (cur\_view \sqcup \pi _2(\pi _2(hd(u))))/ \pi _2(\pi _2(hd(u))), \\ \ \ \ \ \ \ \ \ \ \ \ (acq\_view \sqcup \pi _3(\pi _2(hd(u))))/ \pi _3(\pi _2(hd(u)))] ^\wedge \\ seqcom(tl(u),v,(rel\_view, (cur\_view \sqcup \pi _2(\pi _2(hd(u)))), (acq\_view \sqcup \pi _3(\pi _2(hd(u)))))) \end{array} \right). \end{align*}

References

[1]

Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. 2011. Mathematizing C++ concurrency. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26–28, 2011.Thomas Ball and Mooly Sagiv (Eds.), ACM, 55–66.

Digital Library

Google Scholar

[2]

Mark John Batty. 2015. The C11 and C++11 Concurrency Model. Ph.D. Dissertation. University of Cambridge, UK.

Google Scholar

[3]

Stephen D. Brookes. 1996. Full abstraction for a shared-variable parallel language. Information and Computation 127, 2 (1996), 145–163.

Crossref

Google Scholar

[4]

Ana Cavalcanti, Andy J. Wellings, and Jim Woodcock. 2013. The safety-critical Java memory model formalised. Formal Aspects of Computing 25, 1 (2013), 37–57.

Digital Library

Google Scholar

[5]

Sadegh Dalvandi, Simon Doherty, Brijesh Dongol, and Heike Wehrheim. 2020. Owicki-gries reasoning for C11 RAR. In Proceedings of the 34th European Conference on Object-Oriented Programming, ECOOP 2020, November 15–17, 2020, Berlin, Germany (Virtual Conference).Robert Hirschfeld and Tobias Pape (Eds.), Vol. 166, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 11:1–11:26.

Google Scholar

[6]

Sadegh Dalvandi, Brijesh Dongol, Simon Doherty, and Heike Wehrheim. 2022. Integrating owicki-gries for C11-style memory models into Isabelle/HOL. Journal of Automated Reasoning 66, 1 (2022), 141–171.

Digital Library

Google Scholar

[7]

Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit Sarkar, Ali Sezgin, Luc Maranget, Will Deacon, and Peter Sewell. 2016. Modelling the ARMv8 architecture, operationally: Concurrency and ISA. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20–22, 2016. Rastislav Bodík and Rupak Majumdar (Eds.), ACM, 608–621.

Digital Library

Google Scholar

[8]

Charles Antony Richard Hoare and He Jifeng. 1998. Unifying Theories of Programming. Prentice Hall Englewood Cliffs.

Google Scholar

[9]

Zhé Hóu, David Sanán, Alwen Tiu, Yang Liu, Koh Chuen Hoa, and Jin Song Dong. 2021. An Isabelle/HOL formalisation of the SPARC instruction set architecture and the TSO memory model. Journal of Automated Reasoning 65, 4 (2021), 569–598.

Digital Library

Google Scholar

[10]

Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. 2017. A promising semantics for relaxed-memory concurrency. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18–20, 2017.Giuseppe Castagna and Andrew D. Gordon (Eds.), ACM, 175–189.

Digital Library

Google Scholar

[11]

Ryan Kavanagh and Stephen Brookes. 2018. A denotational account of C11-style memory. arXiv:1804.04214. Retrieved from https://arxiv.org/abs/1804.04214

Google Scholar

[12]

Ryan Kavanagh and Stephen Brookes. 2019. A denotational semantics for SPARC TSO. Logical Methods in Computer Science 15, 2 (2019), 1–23.

Google Scholar

[13]

Ori Lahav and Viktor Vafeiadis. 2016. Explaining relaxed memory models with program transformations. In Proceedings of the FM 2016: Formal Methods - 21st International Symposium, Limassol, Cyprus, November 9–11, 2016.John S. Fitzgerald, Constance L. Heitmeyer, Stefania Gnesi, and Anna Philippou (Eds.), Lecture Notes in Computer Science, Vol. 9995, 479–495.

Crossref

Google Scholar

[14]

Leslie Lamport. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers 28, 9 (1979), 690–691.

Digital Library

Google Scholar

[15]

Leslie Lamport. 2019. How to make a multiprocessor computer that correctly executes multiprocess programs. In Proceedings of the Concurrency: The Works of Leslie Lamport.Dahlia Malkhi (Ed.), ACM, 197–201.

Digital Library

Google Scholar

[16]

Sung-Hwan Lee, Minki Cho, Anton Podkopaev, Soham Chakraborty, Chung-Kil Hur, Ori Lahav, and Viktor Vafeiadis. 2020. Promising 2.0: Global optimizations in relaxed memory concurrency. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15–20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 362–376.

Digital Library

Google Scholar

[17]

Ran Li, Huibiao Zhu, and Richard Banach. 2022. Denotational and algebraic semantics for cyber-physical systems. In Proceedings of the 26th International Conference on Engineering of Complex Computer Systems. Hiroshima, Japan, March 26–30, 2022. IEEE, 123–132.

Google Scholar

[18]

Sela Mador-Haim, Luc Maranget, Susmit Sarkar, Kayvan Memarian, Jade Alglave, Scott Owens, Rajeev Alur, Milo M. K. Martin, Peter Sewell, and Derek Williams. 2012. An axiomatic memory model for POWER multiprocessors. In Proceedings of the Computer Aided Verification - 24th International Conference, CAV 2012, Berkeley, CA, USA, July 7–13.P. Madhusudan and Sanjit A. Seshia (Eds.), Lecture Notes in Computer Science, Vol. 7358, Springer, 495–512.

Digital Library

Google Scholar

[19]

Jeremy Manson, William W. Pugh, and Sarita V. Adve. 2005. The Java memory model. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2005, Long Beach, California, USA, January 12–14, 2005.Jens Palsberg and Martín Abadi (Eds.), ACM, 378–391.

Digital Library

Google Scholar

[20]

Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A better x86 memory model: x86-TSO. In Proceedings of the Theorem Proving in Higher Order Logics, 22nd International Conference. Munich, Germany, August 17–20, 2009.Stefan Berghofer, Tobias Nipkow, Christian Urban, and Makarius Wenzel (Eds.), Lecture Notes in Computer Science, Vol. 5674, Springer, 391–407.

Digital Library

Google Scholar

[21]

Gordon D. Plotkin. 1981. A Structural Approach to Operational Semantics. Aarhus university.

Google Scholar

[22]

Christopher Pulte, Shaked Flur, Will Deacon, Jon French, Susmit Sarkar, and Peter Sewell. 2018. Simplifying ARM concurrency: Multicopy-atomic axiomatic and operational models for ARMv8. Proc. ACM Program. Lang. 2, POPL (2018), 19:1–19:29.

Digital Library

Google Scholar

[23]

Susmit Sarkar, Peter Sewell, Jade Alglave, Luc Maranget, and Derek Williams. 2011. Understanding POWER multiprocessors. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4–8, 2011.Mary W. Hall and David A. Padua (Eds.), ACM, 175–186.

Digital Library

Google Scholar

[24]

Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O. Myreen. 2010. x86-TSO: A rigorous and usable programmer’s Model for x86 multiprocessors. Communications of the ACM 53, 7 (2010), 89–97.

Digital Library

Google Scholar

[25]

Daniel J. Sorin, Mark D. Hill, and David A. Wood. 2011. A Primer on Memory Consistency and Cache Coherence. Morgan & Claypool Publishers.

Digital Library

Google Scholar

[26]

Joseph E. Stoy. 1981. Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory. MIT press.

Google Scholar

[27]

David Anthony Watt. 1996. Programming Language Syntax and Semantics. Prentice Hall PTR.

Digital Library

Google Scholar

[28]

Daniel Wright, Mark Batty, and Brijesh Dongol. 2021. Owicki-gries reasoning for C11 programs with relaxed dependencies. In Proceedings of the Formal Methods - 24th International Symposium, FM 2021, Virtual Event, November 20–26, 2021.Marieke Huisman, Corina S. Pasareanu, and Naijun Zhan (Eds.), Lecture Notes in Computer Science, Vol. 13047, Springer, 237–254.

Digital Library

Google Scholar

[29]

Lili Xiao and Huibiao Zhu. 2022. UTP semantics for the MCA ARMv8 architecture. Journal of Systems Architecture 125 (2022), 102438.

Digital Library

Google Scholar

[30]

Lili Xiao, Huibiao Zhu, Mengda He, and Shengchao Qin. 2022. Algebraic semantics for C++11 memory model. In Proceedings of the 46th IEEE Annual Computers, Software, and Applications Conference. Los Alamitos, CA, USA, June 27–July 1, 2022. Hong Va Leong, Sahra Sedigh Sarvestani, Yuuichi Teranishi, Alfredo Cuzzocrea, Hiroki Kashiwazaki, Dave Towey, Ji-Jiang Yang, and Hossain Shahriar (Eds.), IEEE, 1–6.

Crossref

Google Scholar

[31]

Lili Xiao, Huibiao Zhu, and Qiwen Xu. 2021. Trace semantics and algebraic laws for total store order memory model. Journal of Computer Science and Technology 36, 6 (2021), 1269–1290.

Digital Library

Google Scholar

[32]

Wanling Xie, Huibiao Zhu, and Qiwen Xu. 2021. A process calculus BigrTiMo of mobile systems and its formal semantics. Formal Aspects Comput. 33, 2 (2021), 207–249.

Digital Library

Google Scholar

Index Terms

Trace Semantics for C++11 Memory Model
1. Theory of computation
  1. Semantics and reasoning
    1. Program semantics

Recommendations

Trace Semantics and Algebraic Laws for Total Store Order Memory Model
Abstract
Modern multiprocessors deploy a variety of weak memory models (WMMs). Total Store Order (TSO) is a widely-used weak memory model in SPARC implementations and x86 architecture. It omits the store-load constraint by allowing each core to employ a ...
A promising semantics for relaxed-memory concurrency
POPL '17: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages

Despite many years of research, it has proven very difficult to develop a memory model for concurrent programming languages that adequately balances the conflicting desiderata of programmers, compilers, and hardware. In this paper, we propose the first ...
Repairing sequential consistency in C/C++11
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation

The C/C++11 memory model defines the semantics of concurrent memory accesses in C/C++, and in particular supports racy "atomic" accesses at a range of different consistency levels, from very weak consistency ("relaxed") to strong, sequential consistency ...

Comments

Information & Contributors

Information

Published In

Formal Aspects of Computing Volume 36, Issue 3

September 2024

119 pages

EISSN:1433-299X

DOI:10.1145/3613674

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 September 2024

Online AM: 05 June 2024

Accepted: 20 May 2024

Revised: 24 December 2023

Received: 08 July 2023

Published in FAC Volume 36, Issue 3

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
“Digital Silk Road” Shanghai International Joint Lab of Trustworthy Intelligent Software
Shanghai Trusted Industry Internet Software Collaborative Innovation Center

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
116
Total Downloads

Downloads (Last 12 months)116
Downloads (Last 6 weeks)74

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

1 Introduction

2 Semantic Model for Relaxed Accesses

3 Trace Semantics for Relaxed Accesses

4 Semantic Model for Release/Acquire Accesses

5 Trace Semantics for Release/Acquire Accesses

6 Discussion of Trace Semantics for Weak Memory Models

7 Related Work

8 Conclusion and Future Work

Footnotes

A The Formalization and Explanation of Some Cases

References

Index Terms

Recommendations

Trace Semantics and Algebraic Laws for Total Store Order Memory Model

A promising semantics for relaxed-memory concurrency

Repairing sequential consistency in C/C++11

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations