Re: store multiply in a Q number
- From: glen herrmannsfeldt <gah@xxxxxxxxxxxxxxxx>
- Date: Thu, 19 Aug 2010 18:05:36 +0000 (UTC)
Tim Wescott <tim@xxxxxxxxxxxxxxxx> wrote:
On 08/19/2010 10:25 AM, glen herrmannsfeldt wrote:
Tim Wescott<tim@xxxxxxxxxxxxxxxx> wrote:
On 08/19/2010 12:07 AM, glen herrmannsfeldt wrote:(snip)
I don't know about most, but many, including gcc, recognize that
two 32 bit values are multiplied for a 64 bit product, and use
the appropriate multiply instruction. That includes IA32.
That would be a violation of the ANSI C standard, then -- the result of
a mathematical expression with variables all of a given type is supposed
to be of that type, which implies truncation. To get a 64 bit result
from 32 bit integers means you should cast the integers up _first_.
OK, I just did some tests. With the -m32 and -O2 options,
gcc will generate one imull for a 32x32=64 product, but
also one imull for 16x16=32 product.
(snip)
My understanding of optimization for gcc is that there is a layer in the
compiler that produces a "pseudo assembly" version of the code (I think
they call it "register level"). That layer's output is then handed off
to the optimizer which, for obvious reasons, is unique to the target
processor type.
I believe you can have it print out some of the intermediate, but
I just look at the final assembler output with -S.
So the amount of gain that one gets from optimization is going to vary
widely, and GCC has a reputation in non-x86 circles for not having
cutting edge optimization capabilities.
At any rate, if you're going to do something like math extensions to C
you'd best test it to see if it's really going to do good. In my
experience doing fractional math on an x86 processor is just stupid,
because it'll do floating point as fast as it does integer math. OTOH,
fractional math on a processor that has no native support for floating
point at all* can get you speedup ratios of 50:1 or better.
I don't have a gcc for any 16 bit targets, but you might want to
see if it can do a 16x16=32 multiply with the operands cast to long.
Which is a really, really long way to say that if you're considering it,
you should write some test algorithms and benchmark the hell out of them.
I also look at the assembly code to see if it does what I think
it should do. But yes, benchmark, too.
* the TMS320F2812 is an interesting in-between case: it has instructions
(such as one clock cycle normalization) that make streamlines
single-precision floating point exceedingly well, but it's still not
single-cycle.
Is there a gcc for that one?
-- glen
.
- Follow-Ups:
- Re: store multiply in a Q number
- From: robert bristow-johnson
- Re: store multiply in a Q number
- From: Tim Wescott
- Re: store multiply in a Q number
- References:
- store multiply in a Q number
- From: mahsad
- Re: store multiply in a Q number
- From: Korenje
- Re: store multiply in a Q number
- From: robert bristow-johnson
- Re: store multiply in a Q number
- From: Tim Wescott
- Re: store multiply in a Q number
- From: glen herrmannsfeldt
- Re: store multiply in a Q number
- From: Tim Wescott
- Re: store multiply in a Q number
- From: glen herrmannsfeldt
- Re: store multiply in a Q number
- From: glen herrmannsfeldt
- Re: store multiply in a Q number
- From: Tim Wescott
- store multiply in a Q number
- Prev by Date: Re: Precise Measurement of Phase Angle Between 2 Signals w/ Noise
- Next by Date: Re: store multiply in a Q number
- Previous by thread: Re: store multiply in a Q number
- Next by thread: Re: store multiply in a Q number
- Index(es):
Relevant Pages
|