Re: store multiply in a Q number



Tim Wescott <tim@xxxxxxxxxxxxxxxx> wrote:
On 08/19/2010 10:25 AM, glen herrmannsfeldt wrote:
Tim Wescott<tim@xxxxxxxxxxxxxxxx> wrote:
On 08/19/2010 12:07 AM, glen herrmannsfeldt wrote:
(snip)

I don't know about most, but many, including gcc, recognize that
two 32 bit values are multiplied for a 64 bit product, and use
the appropriate multiply instruction. That includes IA32.

That would be a violation of the ANSI C standard, then -- the result of
a mathematical expression with variables all of a given type is supposed
to be of that type, which implies truncation. To get a 64 bit result
from 32 bit integers means you should cast the integers up _first_.

OK, I just did some tests. With the -m32 and -O2 options,
gcc will generate one imull for a 32x32=64 product, but
also one imull for 16x16=32 product.

(snip)
My understanding of optimization for gcc is that there is a layer in the
compiler that produces a "pseudo assembly" version of the code (I think
they call it "register level"). That layer's output is then handed off
to the optimizer which, for obvious reasons, is unique to the target
processor type.

I believe you can have it print out some of the intermediate, but
I just look at the final assembler output with -S.

So the amount of gain that one gets from optimization is going to vary
widely, and GCC has a reputation in non-x86 circles for not having
cutting edge optimization capabilities.

At any rate, if you're going to do something like math extensions to C
you'd best test it to see if it's really going to do good. In my
experience doing fractional math on an x86 processor is just stupid,
because it'll do floating point as fast as it does integer math. OTOH,
fractional math on a processor that has no native support for floating
point at all* can get you speedup ratios of 50:1 or better.

I don't have a gcc for any 16 bit targets, but you might want to
see if it can do a 16x16=32 multiply with the operands cast to long.

Which is a really, really long way to say that if you're considering it,
you should write some test algorithms and benchmark the hell out of them.

I also look at the assembly code to see if it does what I think
it should do. But yes, benchmark, too.

* the TMS320F2812 is an interesting in-between case: it has instructions
(such as one clock cycle normalization) that make streamlines
single-precision floating point exceedingly well, but it's still not
single-cycle.

Is there a gcc for that one?

-- glen

.



Relevant Pages

  • Re: store multiply in a Q number
    ... and GCC has a reputation in non-x86 circles for not having ... because it'll do floating point as fast as it does integer math. ... fractional math on a processor that has no native support for floating ... I _do_ know that it has an _exceedingly_ aggressive optimizer. ...
    (comp.dsp)
  • Re: The destruction of the C99 standard
    ... See for instance the problem with the extended precision. ... By default, GCC is non-conforming. ... actually, I think I noticed differences in the results of floating-point calculations before between my compiler and GCC, where GCC tends to use x87 for floating point, and my compiler tends to use SSE2. ...
    (comp.std.c)
  • Re: Jaccuse
    ... you also sell educational licences. ... If the RedHat guys or the SuSE guys posted answers in here along ... with gcc (which is just as well, as it would have been off-topic ...
    (comp.lang.c)
  • Re: STM32 ARM toolset advice?
    ... GCC Generates quite good code for the ARM these days. ... linux libraries for the TRITON boards, is that floating point operations ... I did find that an AT91SAM7S at 16Mhz using the IAR libraries, ...
    (comp.arch.embedded)
  • Re: Switch() parsing insanity
    ... Gcc is right. ... snip ... ... Scopes of identifiers ... function scope. ...
    (comp.lang.c)