Re: removing store/load in Peterson's algorithm?



On Dec 23, 10:29 pm, "James" <n...@xxxxxxxxxxxx> wrote:
"Dmitriy Vyukov" <dvyu...@xxxxxxxxx> wrote in message
news:756561b8-810a-40ef-99e1-ff7138e13661@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

On Dec 21, 8:32 pm, "James" <n...@xxxxxxxxxxxx> wrote:
[...]



STORE16(&state1, 1);
LOAD32(&state1);
LOAD32(&state2);

Can the load from state2 rise above the 16-bit store? It seems like
it
should be able to. Where am I going wrong?

[...]
Yes, it can. But there is no such code in Peterson algorithm, there is
only 1 load.

I was referring to a load that occurred within the critical section. Can
loads from the critical section rise above the 16-bit store? Can that
cause
any problems?
I can only cite myself:
I think one need only #LoadLoad | #LoadStore:
STORE16(&state, 1);
LOAD32(&state);
if (...) ...
membar #LoadLoad | #LoadStore; //no-op for SPARC TSO and x86
// nothing from here can hoist above mutex acquisition

But the:

STORE16(&state, 1);
LOAD32(&state);

will not prevent a store from within the critical section from rising above
it. This would mess things up right?

It depends on hardware.
On IA-32, Intel64, SPARC TSO it won't mess.
On IA-64, SPARC RMO it can.

--
Dmitriy V'jukov
.



Relevant Pages

  • Re: removing store/load in Petersons algorithm?
    ... > I was referring to a load that occurred within the critical section. ... will not prevent a store from within the critical section from rising above it. ... This would mess things up right? ...
    (comp.programming.threads)
  • Re: AMD Bulldozer optimization guide
    ... p. 21 - a single macro-op can handle load and store to the same address, whereas micro-ops can only be load and store. ... L3$ - non-inclusive victim cache ... p. 80 - load-execute instructions are preferred over separate load and execute instructions. ...
    (comp.arch)
  • Re: A "killer" macro
    ... (defconstant load 8) ... (defconstant store 9) ... giving the opcode behind that mnemonic on that architecture. ... sophisticated way to work around the limitations of the "case" macro. ...
    (comp.lang.lisp)
  • Re: [PATCH resend 5/9] MIPS: sync after cacheflush
    ... assume one of the three for an uncached load: ... from) the store buffer, no external cycle on the bus is seen. ... The load bypasses the stores and therefore reaches the external bus ...
    (Linux-Kernel)
  • Re: How Many Processor Cores Are Enough?
    ... A load by Pi is considered performed at a point in time when the ... A store by Pi is considered performed with respect to Pk (i and k ... It's defined in the Itanium manuals and is equivalent to Sparc TSO ...
    (comp.arch)