Engineering correlation



Hi. I've been used to calculating correlations with the formula,
assuming var(x) is the sample variance of x, var(y) is the sample
variance of y, and cov(x,y) is the sample covariance of x, and y, then

corr(x,y) = cov(x,y) / sqrt( var(x) * var(y))

with the range of this function being (-1,1) (inclusive, I keep on
forgetting whether [] or () means an inclusive range).

I have noted that in many DSP related books correlation (and
autocorrelation) are given simpler formulas. E.g. instead of, assuming
that m(x) is the mean of x and m(y) is the mean of y, calculating the
covariance as:

cov(x,y) = (SUM_i (x[i]-m(x)) * (y[i]-m(y)) ) / N

and then dividing that by the sqrt of the product of the variances, it
seems that frequently the correlation between X and Y is calculated
with much simpler formula that miss out various aspects of the
normalisation. E.g. frequently I see correlation between x and y
described as:

Rxy = SUM_i x[i] * y[i]

Some of the apparent differences disappear with trivial algebra. E.g.
since cov(x,y) and var(x), var(y) are all divided by N (or N-1 in
unbiased estimators) then these all cancel. But even after assuming
zero means removing the divisions by N throughout that still leaves us
with, assuming ss(x,y) = SUM_i x[i] * y[i] and ss(x) = SUM_i x[i] *
x[i] ....

cor( x, y ) = ss(x,y) / sqrt( ss(x) * ss(y) )

So the normalising factor sqrt( ss(x) * ss(y)) is missing from many
engineering definitions of correlation.

I'd like to ask why this is. I can see that sometimes the unnormalised
coefficient has additional information not present in the normalised
coefficient. E.g. an autocorrelation with lag 0 gives a measure of the
power (don't have the book with me to check terminology) of the signal,
and later peaks can be compared to this one. The calculations are also
much simpler, which could of course have big advantages if the
calculations are going to be performed in analogue hardware rather than
modern fast computers in the digital domain.

Are either of these reasons valid? Or are there other reasons? I am
vaguely aware that there was a sort of argument between engineers and
mathematicians as to whether i or j should be used to represent the
sqrt(-1). Is this similar?

Cheers,

Ross-c

.



Relevant Pages

  • Re: Engineering correlation
    ... variance of y, and covis the sample covariance of x, and y, then ... There is correlation Eand the correlation coefficient which is the correlation normalized by sqrtvar); ... normalisation. ... calculations are going to be performed in analogue hardware rather than ...
    (comp.dsp)
  • Another algorithm question: regression analysis/correlation
    ... Could anyone give me one or more algorithms for performing regression ... analysis, calculating the coefficient of correlation of a dataset, or ... least two types of linear correlation coefficient (rank and product ...
    (sci.math)
  • Re: Out of Memory
    ... If u are calculating correlation coefficients with corrcoef, my personal suggestion is to loop the corrcoef through pairs of time series instead of calculating it at once... ... Past experience taught me that is faster and avoids out of mem problems. ...
    (comp.soft-sys.matlab)
  • Re: Regression analysis -- how-to?
    ... > analysis, calculating the coefficient of correlation of a dataset, or ... Or give me a pointer as to where ...
    (comp.lang.pascal.delphi.misc)
  • Question on correlation
    ... I have a question on correlation which is given below. ... actually a word document with diagrams and equation editor stuff.. ... Calculating the correlation between A and B for daily returns involves ... overlapping weekly intervals and non-overlapping weekly intervals? ...
    (sci.stat.math)

Loading