Re: FFT algoritms



On Nov 29, 11:37 pm, robert bristow-johnson
<r...@xxxxxxxxxxxxxxxxxxxx> wrote:
well, i said "for illustration". in particular i was referring to
Fig. 9-15 (p. 597) in my 1989 version of O&S. certainly if N=8 you
can do it with straight-line code. that's beside the point. i was
trying to infer, from the illustration, what the rhyme or reason there
is in the particular pointer manipulation from that N=8 illustration.
it still looks gnarly. for the case of a general N (say, a power of
2) and much bigger than 2^3, what is the tight and efficient algorithm
for determining the order of indexing for the node that you are
writing to, and once you've determined that, what are the nodes that
you're reading from? except for the first pass (which is just like
the first pass of an in-place DIT), i cannot see what that rhyme or
reason is. i'm imagine there may be one, but it doesn't look simple
or elegant.

No, there are certainly simple patterns to the memory access for in-
place algorithms without bit-reversal. They are not that complicated
to implement (as my 15-line example in another post shows). You
absolutely do not have to use a look-up-table of indices!

(One of the many reasons why I feel that butterfly diagrams are next
to useless in truly understanding FFT algorithms.)

(Of course, if you compare a highly optimized in-order FFT like FFTW
or MKL, it will be way faster than a textbook radix-2 algorithm even
if you skip the bitreversal stage. Highly optimized algorithms are
always going to be way more complicated than textbook radix-2, as long
as CPUs remain so complex.)

"In the flow graph of Fig. 9-15, the data are accessed
nonsequentially, the computation is not in place, and a
scheme for indexing the data is considerably more
complicated than in either of the two previous cases [two
in-place DIT structures]. Consequently, this structure
has no apparent advantages."

One reason to be cautious here is that O&S is comparing apples and
oranges: FFTs with bitreversed input or output, and FFTs with normal-
order output. Of course, if you can get away with permuted output it
will be simpler.

That's not the comparison I'm making: I'm comparing only FFTs with
normal-order output, that differ in whether you have a separate
bitreversal pass or whether you combine the permutations with the
computation.

i was reverberating that sentiment from the little bit that i know of
the radix-(2^n) FFT. that information could very well be stale. it's
also in the Yellow O&S (1975 - p. 300, Fig. 6.13, and they say the
same thing), so it's more than 3 decades old.

sometimes i think that O&S is timeless, other times it's just dated.

Comments on performance advantages from FFT algorithms that avoid a
separate bitreversal pass date back to the 60's, so in some sense it's
not a matter of being out-of-date. On the other hand, it seems to be
especially in the last 20 years that floating-point performance of
CPUs has totally outstripped memory performance, and that makes non-
arithmetic issues so much more relevant now.

The main problem is that textbooks are trying to teach students the
basic ideas of FFT algorithms rather than performance hacks, and
therefore they almost always limit their discussion to only two
algorithms for power-of-two N: radix-2 decimation in time and radix-2
decimation in frequency. If you limit yourself to these two cases,
you can't really have a useful discussion of performance optimization
on modern CPUs where performance is limited not by arithmetic counts
but rather by memory access and register/pipeline utilization.

Steven
.



Relevant Pages