Engineering correlation
- From: "Ross Clement (Email address invalid - do not use)" <clemenr@xxxxxxxxxx>
- Date: 16 Feb 2006 00:38:33 -0800
Hi. I've been used to calculating correlations with the formula,
assuming var(x) is the sample variance of x, var(y) is the sample
variance of y, and cov(x,y) is the sample covariance of x, and y, then
corr(x,y) = cov(x,y) / sqrt( var(x) * var(y))
with the range of this function being (-1,1) (inclusive, I keep on
forgetting whether [] or () means an inclusive range).
I have noted that in many DSP related books correlation (and
autocorrelation) are given simpler formulas. E.g. instead of, assuming
that m(x) is the mean of x and m(y) is the mean of y, calculating the
covariance as:
cov(x,y) = (SUM_i (x[i]-m(x)) * (y[i]-m(y)) ) / N
and then dividing that by the sqrt of the product of the variances, it
seems that frequently the correlation between X and Y is calculated
with much simpler formula that miss out various aspects of the
normalisation. E.g. frequently I see correlation between x and y
described as:
Rxy = SUM_i x[i] * y[i]
Some of the apparent differences disappear with trivial algebra. E.g.
since cov(x,y) and var(x), var(y) are all divided by N (or N-1 in
unbiased estimators) then these all cancel. But even after assuming
zero means removing the divisions by N throughout that still leaves us
with, assuming ss(x,y) = SUM_i x[i] * y[i] and ss(x) = SUM_i x[i] *
x[i] ....
cor( x, y ) = ss(x,y) / sqrt( ss(x) * ss(y) )
So the normalising factor sqrt( ss(x) * ss(y)) is missing from many
engineering definitions of correlation.
I'd like to ask why this is. I can see that sometimes the unnormalised
coefficient has additional information not present in the normalised
coefficient. E.g. an autocorrelation with lag 0 gives a measure of the
power (don't have the book with me to check terminology) of the signal,
and later peaks can be compared to this one. The calculations are also
much simpler, which could of course have big advantages if the
calculations are going to be performed in analogue hardware rather than
modern fast computers in the digital domain.
Are either of these reasons valid? Or are there other reasons? I am
vaguely aware that there was a sort of argument between engineers and
mathematicians as to whether i or j should be used to represent the
sqrt(-1). Is this similar?
Cheers,
Ross-c
.
- Follow-Ups:
- Re: Engineering correlation
- From: Stan Pawlukiewicz
- Re: Engineering correlation
- Prev by Date: Re: echo canceller initial convergence
- Next by Date: Re: DSP riddle
- Previous by thread: G.722 DIgital test sequences
- Next by thread: Re: Engineering correlation
- Index(es):
Relevant Pages
|
Loading