Re: Memory fence instructions on x86




"PillMonsta" <chris@xxxxxxxxxxxxx> wrote in message
news:1153777240.210462.14260@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Dear all,

Does anyone have a readable document, or alternatively source which
demonstrates when and where sfence, lfence and mfence instructions are
required for programming atomic operations on P4 and 686?
I have heard conflicting reports that the fence instructions are not
required on SMP P4, but I doubt this. Information on this seems to be
pretty scarce and any help would be very welcome.

Thanks.

Chris


FWIW, here is my "current" take on the x86:

http://groups.google.com/group/comp.programming.threads/msg/68ba70e66d6b6ee9?hl=en

This brief description seems to cover x86 and UltraSPARC T1 TSO. That is
every explicit memory barrier operation is a nop, except #StoreLoad... Store
followed by load to different location can be reordered on x86 or sparcV9...




My experimental implementation of Petersons Algorithm demonstrates the need
for a #StoreLoad barrier on x86:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/c49c0658e2607317/1e45b4b16bad9784?lnk=gst&q=chris+thomasson+peterson&rnum=1#1e45b4b16bad9784

Notice how there is no explicit barrier for the "unlock" functions... Again,
this is because "current" x86 stores automatically take care of #LoadStore
dependences...




http://groups.google.com/group/comp.programming.threads/msg/ca2f1af4552233df

That was a trick to exploit the fact that in TSO model, stores are
"basically" equivalent to:

1. #LoadStore|#StoreStore > Release barrier
2. Peform The Actuall Store




http://groups.google.com/group/comp.programming.threads/browse_frm/thread/0e07adf138f0091d/ca2f1af4552233df?#ca2f1af4552233df
(read all of this)

!!> Please note that Intel explicitly states that these rules may not hold
true for "future" x86 memory models... So always have a "backup" plan that
uses the lfence, sfence, and mfence instructions in the "correct" places...




http://appcore.home.comcast.net/
http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_gcc_asm.html

Here is my implementation of a "simple" x86 assembly based atomic operations
abstraction... This code uses mfence, so you may have to change this if your
processor doesn't support the SSE 2, IIRC...


.