Re: Optimal square calculation?



On Oct 1, 12:01 pm, Chris Bore <chris.b...@xxxxxxxxx> wrote:
On 1 Oct, 15:29, Vladimir Vassilevsky <antispam_bo...@xxxxxxxxxxx>
wrote:





Chris Bore wrote:
I have an application that involves many sums of squares. For
instance:

for (i = 0; i < N; i++
  sum += x[i]*x[i];

Is any clever optimization possible when doing sums of squares?

i = N;
ptr = x;

while(i--)
{
xtmp = *ptr++;
sum += xtmp*xtmp;

}

The code will be somewhat more optimal.

(The application is calculating statistics on very large 3D arrays of
16-bit integer data - typically 3 Gsamples.

Then the speed is limited by memory bandwidth.

Not entirely, although this is a problem that we have addressed. The
data are partitioned and processed as the stream is acquired, or read
from disk. My current aim includes being able to do some processing
'on the fly' as the acquisition proceeds. Eventually the processing
may be built into the acquisition system on an embedded platform, but
that is some time off.



There are also many cases
of sums of 4th power, and squares of differences.)

Then do SSE in assembler.

It is close enough to classic MAC but I feel there may be some benefit
from the fact that only one input argument needs to be loaded, or a
trick that avoids the multiply?

At the moment this is implemented on PC workstation (multi-processor
with multi-threading) which is the required platform.

Then split a long compiting cycle into two cycles running as the
different threads.

  Although at some

time it may be possible to review this.

Vladimir Vassilevsky
DSP and Mixed Signal Design Consultanthttp://www.abvolt.com- Hide quoted text -

- Show quoted text -- Hide quoted text -

- Show quoted text -

There's a paper about the numerical approximation to magnitude:

A. E. Filip, "A baker's dozen magnitude approximations and their
detection statistics," IEEE Trans. on Aerospace and Electronic
Systems, Jan. 1976, pp. 86-89.

All of them are based on:

R = sqrt( r**2 + q**2)

and

R = a1*x + b1*y for 0 <= theta <= theta_subscript_b

or

R = a2*x + b2*y for theta_subscript_b < theta < pi/4

where x = max( abs(i), abs(q) ) and y = min ( abs(i), abs(q) )

The author shows 13 different approximations based on using different
values for a1, b1, a2 and b2 in the above. I believe that using '1'
and '.5' for a1 and b1, and not using a2 and b2 is method number 3 in
the above paper.

The author notes that "The dependence of R on theta may be surprising
at first, since the magnitude is, by definition, independent of
theta. However, the theta dependence can be made explicit by writing
x and y in polar coordinates. The parameter theta_subscript_b is
included to allow for two distinct ranges for the approximation as a
means of reducing the approximation error."

In the conclusion, he notes: "In the class of one-region
approximations, perhaps the best trade-off is found in numbers 3 or 5
where the simple coefficients yield nearly minimum SNR loss for that
class. If a lower SNR loss is required, one must use a two-region
approximation."

So it would seem that you're using one of the better one-region
approximations (method 3). Whether any of the others would be more
useful to you is your call. Moreover, I know that the sum of squares
can't be factored (or, at least, that's what the mathematicians
claim). I spent about 15 minutes Googling the problem, and it seems
that there a bunch of statistical methods for doing it. Some of them
just break the problem up into smaller sums.

Since you're using 16 bit integers, is there any way of brute forcing
the problem? For instance, using look-up tables for the squares and
then just doing the sums? Or expressing things in binary and doing
Boolean operations?
.



Relevant Pages

  • Re: Proof for Expected Value Problem
    ... for theta between -pi and +pi, and the OP needs to compute ... analytical expression already or a good approximation thereof. ... So as sigma goes to infinity, ... with the coefficient of the approximation being ...
    (comp.dsp)
  • Re: A new computation of G from the Cavendish experiment
    ... approximation), and it lets us approximate the gravitational field's x- ... Yes, multiplied by two, this is an accurate estimate of the torque on ... does not act on the pendulum arm because it is independent of theta. ... It's still acting to shift the equilibrium theta. ...
    (sci.physics.research)
  • Re: Letter to Dirk Van de moortel
    ... that part of Classical Mechanics is outside your condition... ... sufficiently close to zero that you can assume gamma = 1. ... another popular approximation is assuming that for theta ... Hense "approximation". ...
    (sci.physics.relativity)
  • Re: Optimization problem (without constraints) using approximation
    ... for a fixed value of theta, find an approximation to the value of d ... it is a minimum and there are no local maxima. ... Chances are you meant minimum. ...
    (sci.math)