Re: CPU benchmark for Xilinx PAR



Very interesting

I really doubt its the branch behaviour even though the Athlon series
has always been good on office type twisty apps. For branchy code
segments that fit in the I cache, these days the branches almost come
for free and guess right more often than not.

I'd hazard a guess it has more to do with the data set being very large
and missing the L1, L2 and TLBs way too often, "poor locality of
reference" , even 1% misses, maybe less maybe enough to wreak havoc.

It not difficult to create a simple data structure that holds millions
of items in a hash table and see even an Athlon xp2400 give up 300ns
avg accesses to each entry if all accesses appear random.rather than
the naive 1ns its L1 cache can actually do.

You can plot a graph of open random address width from 6bits to 24bits
and watch execution time go from 1n to 4ns and then roughly stepping
30ns 100ns 300ns for x[i] when i is coming from any old random no
generator and masked by width field. Measured on an xp2400.

If this simple test were run on various cpus, we could see how the
caching really works for graduating locality disaster cases and choose
accordingly.

Now EDA software doesn't deliberately do this, but might get some of
the same effect unintended simply by having to walk immense graphs and
trees. Think about it, draw a graph with millions of nodes and try to
label in such a way that it can be traversed with mostly low address
bit changes (high locality) when the nodes in the graph are allocated
completely in random fashion. Then think, how many operations actually
get performed on each link list traversal, a lot of the time it might
be just passing through looking for something, the worst possible
situation, all fetch no work.

I don't imagine there is much EDA code that looks like beautiful DSP
media codec stuff with super straight line high locality SSE tuned
code.

I could be all wrong, but I thinks it the Memory Wall effect and the
Opteron maybe does a better job of recovering. That also means a cpu
that concentrates on that aspect desn't even need a clock advantage as
long as it tolerates poor locality better.

I wonder if its possible to get stats from the cpu performance hardware
that shows what the cpu is really doing in memory, bit out of my
league.

I wonder if the EDA guys just crank out code or do they ever measure
algorithms on different x86 hardware at the cache level, curious?

I also wonder how much FPU is actually used and how so?.

On a threaded cpu designed to work with threaded memory where there is
little memory wall (latency tolerence all around), it doesn't take much
hardware to design a processor element in FPGA that can match Athlon
xp300, and 10 or so ganged together can then match xp3000 but you get
40 odd threads to fill instead of waiting on cache misses. Me, I'd
rather fill the threads (occam style) than wait, but most are not of
that opinion (yet).

Now if EDA ever becomes highly concurrent, (some have done this in VLSI
EDA from simulation to P/R) it does make possible some real speed ups
when real threading becomes pervasive in cpus (not this 2,4 thread
nonsence).

johnjakson at usa dot ...
transputer2 at yahoo dot ...

.



Relevant Pages

  • Re: GAMESS on 64bit architecture
    ... >> What type of compiler did you use? ... RAM is the best investment you can make. ... Athlon 64 Socket 939 budget will allow and Corsair XMS Xtreme Memory ... The CPU fan: SpeedFan 4.20 to look around in there. ...
    (sci.chem)
  • Re: xPC taget CPU - Athlon or Intel
    ... I have ordered an Athlon 64 FX processor and will post the bench test ... >> might also perform well with xPC Target in realtime use. ... >> I suppose if we got enough results that included CPU speed, ... >>> I have recently tested two Intel Pentium4 3.0 GHz CPU's ...
    (comp.soft-sys.matlab)
  • A7N8X Deluxe/Duron/WinXP problem
    ... So I pull out the CPU, restart, and finally get an error message. ... My Athlon was an Athlon XP 2600+, ... So I take the much slower CPU home, install it, and start my system and WinXP ... install Windows ME just so I can have a Windows to use to install XP from. ...
    (alt.comp.periphs.mainboard.asus)
  • Re: Athlon XP and Debian
    ... > I'm building a very affordable Agnula/DeMuDi machine. ... I just need to get an Athlon XP CPU. ... PCI slots and the last one 6 PCI slots. ...
    (Debian-User)

Loading