Re: questions about memory_order_seq_cst fence



Masakuni Oishi <yam...@xxxxxxxxxxxx> wrote:
Anthony Williams <anthony....@xxxxxxxxx> wrote:
Alexander Terekhov <terek...@xxxxxx> writes:
Anthony Williams wrote:

Masakuni Oishi <yam...@xxxxxxxxxxxx> writes:

If so, for the code 1 in my first post, is the outcome
r1 == 0 && r2 == 1 && r3 == 0 possible on IA-64?

I believe so.

Perhaps this is relevant:

http://download.intel.com/design/itanium/downloads/25142901.pdf

See 3.3.7.1 Total ordering of WB Releases.

The example is this:

/*** code 1 ***/
// Initially
atomic<int> x(0), y(0);

// Thread 1:
y.store(1, memory_order_release);
atomic_thread_fence(memory_order_seq_cst);
r1 = x.load(memory_order_acquire);

// Thread 2:
x.store(1, memory_order_release);

// Thread 3:
r2 = x.load(memory_order_acquire);
atomic_thread_fence(memory_order_seq_cst);
r3 = y.load(memory_order_acquire);
/***************/

In the above code, is r1 == 0 && r2 == 1 && r3 == 0 possible?

Under the C++0x memory model, the acquire and release operations here
serve no purpose --- there are no stores prior to the releases, so there
is no imposed ordering, and they might as well all be
memory_order_relaxed.

The above code is thus equivalent to the code below, as far as the C++0x
memory model goes, and the compiler would be conforming if it compiled
it as such:

/***************/
// Initially
atomic<int> x(0), y(0);

// Thread 1:
y.store(1, memory_order_relaxed);
atomic_thread_fence(memory_order_seq_cst);
r1 = x.load(memory_order_relaxed);

// Thread 2:
x.store(1, memory_order_relaxed);

// Thread 3:
r2 = x.load(memory_order_relaxed);
atomic_thread_fence(memory_order_seq_cst);
r3 = y.load(memory_order_relaxed);
/***************/

I am not an expert on IA-64, but I would expect a relaxed write on IA-64
to be plain ST --- an unordered write --- and a relaxed load to be a
plain LD --- an unordered read.

Given that, the total ordering of WB releases is not relevant here, and
I suspect that r1 == 0 && r2 == 1 && r3 == 0 is possible.

Hmm... Interesting.

I'm not an expert on IA-64, too, but I think
the code would be compiled as such:

// Thread 1
US1:   st   [y] = 1
MF1:   mf
UL1:   ld   r1 = [x]

// Thread 2
US2:   st   [x] = 1

// Thread 3
UL2:   ld   r2 = [x]
MF2:  mf
UL3:   ld   r3 = [y]

If the memory model of IA-64 allows ordering such as:
  RV_3(US2) -> F(MF2) -> F(MF1) -> RV_1(US2)
then the outcome r1 == 0 && r2 == 1 && r3 == 0 is possible,
I think.

The Intel's specification seems not prohibiting such ordering,
but I'm not sure that it may occur actually.

Here is another example.

/***************/
// Initially
atomic<int> x(0), y(0);

// Thread 1:
r3 = y.load(memory_order_relaxed);
atomic_thread_fence(memory_order_seq_cst);
r1 = x.load(memory_order_relaxed);

// Thread 2:
x.store(1, memory_order_relaxed);

// Thread 3:
r2 = x.load(memory_order_relaxed);
atomic_thread_fence(memory_order_seq_cst);
y.store(1, memory_order_relaxed);
/***************/

In this code, the outcome r1 == 0 && r2 == 1 && r3 == 1
is NOT allowed under the C++0x memory model.
(29.8p2 and 1.10p16)

But, if we compile the code on IA-64,

// Thread 1
UL3:   ld   r3 = [y]
MF1:   mf
UL1:   ld   r1 = [x]

// Thread 2
US2:   st   [x] = 1

// Thread 3
UL2:   ld   r2 = [x]
MF2:  mf
US1:   st   [y] = 1

it seems that the outcome r1 == 0 && r2 == 1 && r3 == 1
is possible.
Isn't it?

-- masakuni
.