Re: Correlation coefficient not suited for small samples!? Cross validation as an alternative?
- From: Richard Ulrich <Rich.Ulrich@xxxxxxxxxxx>
- Date: Thu, 20 Apr 2006 16:57:29 -0400
On 18 Apr 2006 02:40:02 -0700, "zorritillito-googlegroups@xxxxxxxx"
<zorritillito-googlegroups@xxxxxxxx> wrote:
Thank you all very much!
Richard Ulrich wrote:
The formula "k/(n-1)" is especially useful to me. I did not find it
R-squared is a biased estimate, since the expectation of
R-squared under the null hypothesis of no-relation is
k/(n-1) for k variables in the regression with n cases.
neither in my textbooks nor in the internet. To be sure, may I ask you,
This relation comes from statistical estimation theory, and
I didn't find it in a quick search of the web, either.
if k is - as I would suppose - the number of regressors, not counting
the ouput variable; and if the formula has any restrictions like normal
distribution?
At the limit, it is evident that a regression line with one variable
(and an intercept) will perfectly fit any 2 discrete points -- R^2
is 1.0. The simple extension works out so that 2 variables fit
3 points, and so on. The only restriction that I am aware of is
that the distributions be "continuous", in this sense that there
are no ties, so that all the points are discrete. -- If you let the
same X-vector be paired with two different y values, the R^2 is
not going to be 1.0
As to the cross validation that I mentioned, I am afraid that this was
a silly thought of mine.
The regression coefficient does, as far as I understand, not work for
Regression coefficients have less dependence on variance,
so the are a *better* measure of linear relationship.
my purpose. Since I want to estimate the strength of the linear
relation, I guess I would have to standardize the regression
coefficient - ending up again with the correlation coefficient.
What is your "purpose"?
I mean, I think, you want to consider the universe
of comparisons that you are comparing *this* result to.
If "everyone" talks about correlations, then that is what
you need to refer to. I was saying, especially for a small
sample, you have less assurance that a random sample is going
to represent the range of data. (And correlations suffer
from a truncated range more than regression coefficients do.)
But could not a corrected estimator be a solution to my problem - like
What is your n? Who is having a problem with
looking at r's and treating them as estimates?
"Fisher ´s approximate unbiased estimator" r*(1+(1-r²)/2n) ?
Although this formula does the opposite of what I expected. In my
opinion a smaller sample size should yield a smaller correction term,
in order to compensate the contrary tendency of r (the tendency to
yield a bigger absolute value when n is small). Can you tell me what my
mistake is?
--
Rich Ulrich, wpilib@xxxxxxxx
http://www.pitt.edu/~wpilib/index.html
.
- References:
- Correlation coefficient not suited for small samples!? Cross validation as an alternative?
- From: zorritillito-googlegroups@xxxxxxxx
- Re: Correlation coefficient not suited for small samples!? Cross validation as an alternative?
- From: Richard Ulrich
- Re: Correlation coefficient not suited for small samples!? Cross validation as an alternative?
- From: zorritillito-googlegroups@xxxxxxxx
- Correlation coefficient not suited for small samples!? Cross validation as an alternative?
- Prev by Date: Re: CLT and regression
- Next by Date: Re: CLT and regression
- Previous by thread: Re: Correlation coefficient not suited for small samples!? Cross validation as an alternative?
- Next by thread: Re: reporting categories of continuous variable in a regression model
- Index(es):
Relevant Pages
|