Re: Highest possible speed algo in Vb or C or Pseudocode



On Jan 1, 10:06 pm, "gebe" <g...@xxxxxxxxxxxx> wrote:
I am trying to implement a fast 256 to 2048 (in 2's)algorithm in ASM
(DLL)not to compete with FFTW for versatility but SPEED in specialised
medias .
I am at the moment (using the best I could find) getting 36 usecs(1024
pts) on a 1800Mhz clocked AMD 32 bits and 78 uSec for 2048 points

That's not bad at all for a first try, but you still have a fairly
long way to go.

Let me assume you are using single precision, which should be more
than sufficient for most applications involving such small
transforms. In that case, according to the benchmark numbers on our
web site (http://www.fftw.org/speed/), FFTW can do a 1024-point
complex-data transform in about 10usecs on a 2.4GHz AMD Opteron 275
(32-bit mode), and a 2048-point transform in about 23usecs; naively
scaling by the clock speed to 1.8GHz gives 13 and 31usecs,
respectively. On a newer machine, a 3GHz Core Duo in 64-bit mode, the
times are a bit under 4usecs and 10usecs, respectively. And if your
data are purely real, then you can gain an additional factor of 1.5 to
2.

(It's fairly hard to get anywhere close to optimal performance on a
general-purpose CPU these days if you start with a textbook radix-2
FFT algorithm and try to optimize it, if that is what you are doing.)

Regards,
Steven G. Johnson
.



Relevant Pages

  • Re: Highest possible speed algo in Vb or C or Pseudocode
    ... not to compete with FFTW for versatility but SPEED in specialised ... medias. ... I have "lifted some stuff but the articles sources are ...
    (comp.dsp)
  • Highest possible speed algo in Vb or C or Pseudocode
    ... not to compete with FFTW for versatility but SPEED in specialised ... medias. ... I have "lifted some stuff but the articles sources are ...
    (comp.dsp)
  • Re: Highest possible speed algo in Vb or C or Pseudocode
    ... not to compete with FFTW for versatility but SPEED in specialised ... medias. ... I have "lifted some stuff but the articles sources are ... FFTW is optimized for speed. ...
    (comp.dsp)
  • FFTW: successive real2complex 1-D FFTs fail
    ... I've been experimenting lately with the great FFTW library in C, ... Perform 1-D FFTs on every row of the same image array ... I first create the plans, then I load the image into fftw_in, and then I ... The strange thing is that the second transform produces wrong results. ...
    (comp.dsp)
  • fftw - loosing accuracy in cosine transform
    ... Sorry if this is a wrong place for posting fftw related question. ... transform as opposed to the fouerier transform? ... in cosine transform it is O. ... sample array contains input to the FFTW program ...
    (comp.lang.fortran)