Re: questions about memory_order_seq_cst fence



Anthony Williams <anthony....@xxxxxxxxx> wrote:
Anthony Williams <anthony....@xxxxxxxxx> writes:
Joshua Maurice <joshuamaur...@xxxxxxxxx> writes:

29.3p7 of N3290 says:

"For atomic operations A and B on an atomic object M, if there are
memory_order_seq_cst fences X and Y such that A is sequenced before X, Y
is sequenced before B, and X precedes Y in S, then B occurs later than A
in the modification order of M."

I think it's meant to read:
[]
For atomic **write (and atomic read-modify-write operations)** A and B
on an atomic object M, if there are memory_order_seq_cst fences X and
Y such that A is sequenced before X, Y is sequenced before B, and X
precedes Y in S, then B occurs later than A in the modification order
of M.
[/]

When we drafted this words, I think we meant "operation". It is the
reference to "modification order" that is the typo. What is meant is
that if A is a write and B is a read then B must read the value written
by A, or something later in the modification order; if A is a write and
B is a write then B must be later than A in the MO; if A is a read and B
is a write then A must read a value prior to B in the MO, and if A is a
read and B is a read then B must read the same value as A or a value
later in the MO.
After rereading this section of the standard several times, I'm no
longer sure. I'll have to check my notes.

I've checked my notes. My recollection was wrong.

We *discussed* requiring seq_cst fences to order loads like this, but
decided to explicitly limit the orderings to writes. The approved
wording said "where A and B modify M", but this somehow got lost.

In particular, it was felt that requiring this would add complexity to
the IA-64 implementation of fences, and potentially require that relaxed
operations on that architecture would also have to be more expensive,
thus requiring people to pay for features they didn't need.

Interesting.
If so, for the code 1 in my first post, is the outcome
r1 == 0 && r2 == 1 && r3 == 0 possible on IA-64?

In fact, I encountered this problem when I was implementing
a variant of work-stealing deque (*).
And I'm dissatisfied with the result that the code 2 works well
but the code 1 doesn't.

(*) http://portal.acm.org/citation.cfm?doid=277651.277678

-- masakuni
.