Re: Does RISC still offer a significant numbercrunching advantage?
- From: Andrew Reilly <andrew-newspost@xxxxxxxxxxxxxxxxxxxxx>
- Date: 3 Dec 2007 10:36:13 GMT
On Mon, 03 Dec 2007 10:50:41 +0100, Terje Mathisen wrote:
Andrew Reilly wrote:
On Sat, 01 Dec 2007 11:05:11 +0100, Terje Mathisen wrote:
I believe IBM Power and possibly Itanium have the highest absolute
SPECFP scpres,
Is there a concise analysis of what the specific advantages are, here?
In the past, it seemed to be mostly because both Power and Itanium (and
Alpha, in its day) had two full sets of FP multipliers and adders,
where the x86 and amd64 processors only had half, or one (albeit with
SIMD of some sort). However the Core2-series seems to have the same
two sets of FPUs, so that should now be a wash, right? What else is
different?
The Core 2 fp unit has no FMAC, only separate FADD and FMUL opcodes, so
that's a net halving of the maximum FLOPS number.
Is it? I thought that the multipliers and adders were separate units in
Core2, which should be able to manage the same two MACs/cycle, assuming
that data flow and instruction issue width constraints are up to the
task. I thought that FMAC was mostly a precision win, with a side-order
of issue simplification, mediated by a three-operand complication.
(PowerPC FMAC has three register arguments, so it isn't necessarily an
"accumulate" in the traditional (to DSP) sense. Is the sum-source
operand explicit or implicit in the Itanium's FMAC?)
There's also the
problem that the instructions are 'OP dest, src', so sometimes you must
copy data from one register to another.
Doesn't that go away after the micro-op cracking and register renaming?
OTOH, an SSE MULPS works on 4 float values in parallel, and the source
operand can be in L1 cache memory and still run at register-to-register
speed.
But with some latency issues, presumably...
MULPD does the same for two double variables.
Cache size seems likely, but is there still a register set size effect?
Something to do with memory system performance? Some measurable net
detriment due to longer pipelines doing instruction decode?
Itanium wins mostly due to humonguous cache sizes.
The big Power systems have heroic (eDRAM?) caches too, don't they?
Cheers,
Andrew
.
- Follow-Ups:
- Re: Does RISC still offer a significant numbercrunching advantage?
- From: Terje Mathisen
- Re: Does RISC still offer a significant numbercrunching advantage?
- References:
- Does RISC still offer a significant numbercrunching advantage?
- From: mike3
- Re: Does RISC still offer a significant numbercrunching advantage?
- From: Terje Mathisen
- Re: Does RISC still offer a significant numbercrunching advantage?
- From: Andrew Reilly
- Re: Does RISC still offer a significant numbercrunching advantage?
- From: Terje Mathisen
- Does RISC still offer a significant numbercrunching advantage?
- Prev by Date: Re: 128 bit operations
- Next by Date: Re: Does RISC still offer a significant numbercrunching advantage?
- Previous by thread: Re: Does RISC still offer a significant numbercrunching advantage?
- Next by thread: Re: Does RISC still offer a significant numbercrunching advantage?
- Index(es):
Relevant Pages
|