Re: Does RISC still offer a significant numbercrunching advantage?



On Mon, 03 Dec 2007 10:50:41 +0100, Terje Mathisen wrote:

Andrew Reilly wrote:
On Sat, 01 Dec 2007 11:05:11 +0100, Terje Mathisen wrote:

I believe IBM Power and possibly Itanium have the highest absolute
SPECFP scpres,

Is there a concise analysis of what the specific advantages are, here?
In the past, it seemed to be mostly because both Power and Itanium (and
Alpha, in its day) had two full sets of FP multipliers and adders,
where the x86 and amd64 processors only had half, or one (albeit with
SIMD of some sort). However the Core2-series seems to have the same
two sets of FPUs, so that should now be a wash, right? What else is
different?

The Core 2 fp unit has no FMAC, only separate FADD and FMUL opcodes, so
that's a net halving of the maximum FLOPS number.

Is it? I thought that the multipliers and adders were separate units in
Core2, which should be able to manage the same two MACs/cycle, assuming
that data flow and instruction issue width constraints are up to the
task. I thought that FMAC was mostly a precision win, with a side-order
of issue simplification, mediated by a three-operand complication.
(PowerPC FMAC has three register arguments, so it isn't necessarily an
"accumulate" in the traditional (to DSP) sense. Is the sum-source
operand explicit or implicit in the Itanium's FMAC?)

There's also the
problem that the instructions are 'OP dest, src', so sometimes you must
copy data from one register to another.

Doesn't that go away after the micro-op cracking and register renaming?

OTOH, an SSE MULPS works on 4 float values in parallel, and the source
operand can be in L1 cache memory and still run at register-to-register
speed.

But with some latency issues, presumably...

MULPD does the same for two double variables.

Cache size seems likely, but is there still a register set size effect?
Something to do with memory system performance? Some measurable net
detriment due to longer pipelines doing instruction decode?

Itanium wins mostly due to humonguous cache sizes.

The big Power systems have heroic (eDRAM?) caches too, don't they?

Cheers,

Andrew

.



Relevant Pages

  • Re: IBM 45nm -- new or licensed from Intel?
    ... the IA-32 8-bit register issue. ... instructions even if they had the option. ... to a constant pool, where the offset into the constant pool is, set up ... so it's not surprising if it uses more power. ...
    (comp.arch)
  • Re: The coming death of all RISC chips.
    ... It's true, Itanium was a flop. ... I started this thread by mentioning Itanics "hubris stupidity" of ... Including register windows! ... I have been arguing that the way forward is variable instruction width ...
    (comp.arch)
  • Re: Slight quandry with elderly relative and recent "let" of garage...
    ... My father has enduring power of attorney for him, ... I had an EPA for my mum for several years before I had to register it. ... and the donor can enter into contracts on their own voilition. ...
    (uk.legal)
  • Re: [PATCH] Remove some divide instructions
    ... Apparently cc-ing linux-kernel is not good enough ... Reassigning base to a register causes the ... generates the expected shifts and and masks for power of two divides. ...
    (Linux-Kernel)
  • Re: New Hurricane Chris!
    ... How much power does it take to run a server farm? ... A googlewatt. ... It was also a register on the VAX 11/780. ...
    (rec.travel.cruises)