Re: questions about memory_order_seq_cst fence



On Jun 14, Joshua Maurice <joshuamaur...@xxxxxxxxx> wrote:
On Jun 14, 3:05 pm, Anthony Williams <anthony....@xxxxxxxxx> wrote:

Masakuni Oishi <yam...@xxxxxxxxxxxx> writes:
/*** code 1 ***/
// Initially
atomic<int> x(0), y(0);

// Thread 1:
y.store(1, memory_order_release);
atomic_thread_fence(memory_order_seq_cst);
r1 = x.load(memory_order_acquire);

// Thread 2:
x.store(1, memory_order_release);

// Thread 3:
r2 = x.load(memory_order_acquire);
atomic_thread_fence(memory_order_seq_cst);
r3 = y.load(memory_order_acquire);
/***************/

In the above code, is r1 == 0 && r2 == 1 && r3 == 0 possible?
I think it should be prohibited, but I couldn't make sure that
from the C++0x FDIS.

Premise 1: r3 == 0.

Since the only store to y is the store of 1 by thread 1, r3 == 0 implies
that the load from y on thread 3 is reading the initial value of y.

29.3p6 of N3290 says:

"For atomic operations A and B on an atomic object M, where A modifies M
and B takes its value, if there are memory_order_seq_cst fences X and Y
such that A is sequenced before X, Y is sequenced before B, and X
precedes Y in S, then B observes either the effects of A or a later
modification of M in its modification order."

If the fence in thread 1 precedes the fence in thread 3 in S, then the
read at r3 must therefore see the value written by the store in thread
1. Consequently, the fence in thread 3 must precede the fence in thread
1 in S.

Premise 2: r2 == 1.

The load of x in thread 3 reading 1 implies it has read the value of the
store to x from thread 2.

I agree with your proof up to this point.

Me, too.

We already know that the fence in thread 3 precedes the fence in thread
1 in S.

29.3p7 of N3290 says:

"For atomic operations A and B on an atomic object M, if there are
memory_order_seq_cst fences X and Y such that A is sequenced before X, Y
is sequenced before B, and X precedes Y in S, then B occurs later than A
in the modification order of M."

In our case, the fence in thread 3 occurs before the fence in thread 1
in S. The fence in thread 3 is thus X, the fence in thread 1 Y, the read
from x in thread 3 is A and the read from x in thread 1 is B.

Thus the read of x in thread 1 must occur later in the modification
order of x than the read in thread 3. Since the read in thread 3 is of
the last store to x (1), the read in thread 1 must also read that value,
and r1 == 1.

I disagree with this. I think you made the same mistake which I made
in my first post of this thread. Only /modifications/ appear in a
modification order. Reads do not appear in the modification order. At
least, this is the simple reading of 1.10 / 7, and its name is
"modification order", not "access order". Also note that the
definition of a "release sequence" in 1.10 / 8 does not make sense if
reads may appear in modification orders. Also note that the definition
of "visible sequence of side effects" in 1.10 / 14 does not make sense
if reads may appear in modification orders. (Note that there's some
ambiguity w.r.t. volatile atomic objects, as volatile reads do count
as side effects, but non-volatile reads AFAIK are not side effects.)

As I'm beginning to understand this, there is a typo in 29.3 / 7. The
exact text is:
[]
For atomic operations A and B on an atomic object M, if there are
memory_order_seq_cst fences X and Y such that A is sequenced before X,
Y is sequenced before B, and X precedes Y in S, then B occurs later
than A in the modification order of M.
[/]
I think it's meant to read:
[]
For atomic **write (and atomic read-modify-write operations)** A and B
on an atomic object M, if there are memory_order_seq_cst fences X and
Y such that A is sequenced before X, Y is sequenced before B, and X
precedes Y in S, then B occurs later than A in the modification order
of M.
[/]

When you said "the read from x in thread 3 is A and the read from x in
thread 1 is B.", that is where you went wrong. You cannot apply
"29.3 / 7" to atomic reads A and B on an atomic object M, only atomic
writes (and atomic read-modify-write operations).

I agree.

I'm still dealing with the implication that A precedes B in the total
order S of seq_cst operations does not imply that A happens-before B
nor A inter-thread happens before B nor A synchronizes-with B - and
that's irritating and slightly surprising. At least, I haven't yet
found a rule which states that, and I've looked decently hard, and
I've even found a note saying as much in 29.3 / 8.

Even though the total order S does not imply happens-before
relationship, S gives constraints of visibility which is
mentioned in n3290, 29.3p3-p7.

But, I think it is not enough.
In particular, the following constraint should be added
to n3290, 29.3:
/*****/
// Thread 1:
r1 = x.load(memory_order_relaxed);
atomic_thread_fence(memory_order_seq_cst);

// Thread 2:
atomic_thread_fence(memory_order_seq_cst);
r2 = x.load(memory_order_relaxed);
/*****/
in this code, if the fence in thread 1 precedes the fence
in thread 2 in S, and the value of r1 comes from a
modification A on x, then the load operation in thread 2
should not take the value of the modification which precedes A
in the modification order of x.

If there was a constraint like this, then we could prove that
r1 == 0 && r2 == 1 && r3 == 0 is not allowed in the first code.

-- masakuni
.