Re: Software vs hardware floating-point [was Re: What happened ...]



In article <4AB580FE.404@xxxxxxxxxxxxxxx>,
Andy \"Krazy\" Glew <ag-news@xxxxxxxxxxxxxxx> wrote:

I believe you could indeed make a
'multiply_setup/mul_core1/mul_core2/mul_normalize' perform close to
dedicated hw, but you would have to make sure that _nobody_ except the
compiler writers ever needed to be exposed to it.

Trouble is, you need about 3x the instruction fetch/decode/scheduling
bandwidth. Since that is comparable to the actual instruction execution
in terms of power, depending on your machine, it is by no means a clear win.

Nobody claims that it is a clear win - certainly neither I nor Terje
would. My assertion is that it would be better, overall, NOT solely
for performance reasons - but no more than that.

And you wouldn't need three times the instruction throughput, except
for highly tuned HPC and benchmarketing. Few 'floating-point' codes
have more than about 10% of their instructions actually executing
floating-point operations. Remember that load and store don't count,
and I said that I would also have a 'direct' comparison operation,
too. When I last measured this (decades ago), it would have needed
very little more instruction throughput, and RISC codes have more
integer operations than the ones I looked at.

You would need to be working on a code that allowed nearly all of the FP
"primitive operations" to be optimized away for it to be a win on scalar
code.

Not so. That would be true for a very few codes, but others would
gain with little or no optimisation. For example, some codes spend
half their time switching between the pipelines (yes, really), and
others are dominated by calls to mathematical functions. By merging
the pipelines, the overheads for the latter could be reduced very
considerably.

Now, working out the winners and losers, and by how much, would be
part of the research project that this proposal would involve.
Nobody is saying that it could be done by waving a magic wand.

Anyway, this is nothing new. I investigated this with a mind to
exposing the primitives to the compiler in the P6 era. Trouble is, the
compiler had bigger fish to fry.

Yup. I never said that it was new - it predates my involvement in
computing, and the reason you say is the reason it has never been
restarted.


Regards,
Nick Maclaren.
.



Relevant Pages

  • Re: Advising the US Government on Malware Bots
    ... No. Consider the future when the HegemonyCPU (HCPU) is accepted ... instruction set for the local keyed PC. ... HJavascript interpretter. ... instructions that are keyed to compiler capabilities. ...
    (sci.crypt)
  • Re: About programing, a general question
    ... I found the answer in "The C Programming Language" by Brian W. Kernighan ... It is also, therefore, basic instructions for compiler writers ... pointers was much faster than subscripting through the same array. ... octal or hex is often based upon the instruction declination of the ...
    (Fedora)
  • Re: new crop of quantitative measurements?
    ... orthogonal architectures from which to ... I'm obviously not the only one to have noticed this combination, since even x86 cpus contain a parallel approximate reciprocal square root lookup instruction in the SSE instruction set, giving about 12 bits. ... In a cpu company, software people are second-rate citizens, which tends to cause the hw guys to look down on them, which tends to cause the really good compiler people to seem alternate employment. ... is the modest fraction of the time that register pressure makes ...
    (comp.arch)
  • Re: IAR MSP430 compiler problem
    ... Does anybody knows how to force compiler to use call instruction ... to next instruction after Spin function..... ... Interrupts are inherently asynchronous - if the thread can be suspended by an interrupt function, ...
    (comp.arch.embedded)
  • Re: RosAsm - right click
    ... writing an instruction that uses a variable ... variable...the immediate compiler should, of course, accept the ... want to see mostly mnemonics so put it back over to the left... ...
    (alt.lang.asm)