Re: Variable confidence/urgency prefetch?




Seongbae Park wrote:
[snip]
US4+ has weak and strong prefetches -
the processor is allowed to drop weak prefetches on various
conditions (typically prefetch queue full condition or TLB miss),
whereas the strong prefetch are not dropped (almost) no matter what.
The intention is just as you said - one for potentially higher cost
but guaranteed prefetching, the other for guaranteed
small/no runtime cost but no guarantee of actual prefetching.
In contrast, US3 and US4 have only weak prefetches
which often forced the compiler to issue duplicate prefetches
to ensure all necessary data are really prefetched.

See 8.72.2 Weak versus Strong Prefetch of UA2005:

http://opensparc.sunsource.net/specs/UA2005-current-draft-P-EXT.pdf

A belated thank you for the pointer. (BTW, why is saving this public
document unallowed--especially as a little extra effort allows one to
save it anyway?)

It seems UltraSPARC does not have an equivalent to Alpha's
Write Hint 64bytes or PPC's Data Cache Block Zero (or Allocate).
Did the guardians of the ISA judge that the block write instruction
(which writes 8 FP registers to memory) was adequate for most
uses of WH64/DCB[ZA] or that such instructions provide little
benefit in real programs or what? It seems that some memory
allocators (especially a binary buddy allocator?) could use such to
avoid accesses to main memory for blocks that initially contain no
meaningful data.


Paul Aaron Clayton
just a technophile

.



Relevant Pages

  • Re: Variable confidence/urgency prefetch?
    ... US3 and US4 have only weak prefetches ... Write Hint 64bytes or PPC's Data Cache Block Zero (or Allocate). ... It seems that some memory ...
    (comp.arch)
  • Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
    ...  If I optimize my prefetches by trial and error for my ... Though this was code which had quite predicable memory access ... access locality and vectorization (SSE) ... writing to memory whereas the SSE2 version uses movntpd to write to ...
    (sci.image.processing)
  • Re: Share Your Experience with 3DNow, SSE, SSE2 etc.
    ... speed improvement over 10% is seen over 87 code. ... If I optimize my prefetches by trial and error for my ... Though this was code which had quite predicable memory access ...
    (sci.image.processing)

Loading