Re: CLT and regression



On Wed, 19 Apr 2006 08:00:56 +0300, "Anon."
<bob.ohara@xxxxxxxxxxxxxxxxx> wrote:

r.c.reulen@xxxxxxxxx wrote:
Hello,

Can someone explain how the central limit theorem is related to
regression analysis? I understand the basics of the CLT, but don't
understand its relationship with regression analyis. I am conducting
an analysis with approx. 1000 subjects. These subjects have a score
between 0-100 on a certain physical functioning (PF) scale. PF is the
dependent variable in my analysis. The distribution of these scores are
highly skewed. Can I still use linear regression analysis? Or should I
go for non-parametric or bootstrapping techniques?

To follow up Ray's post, you should fit the regression (1000 subjects
isn't that large!), and look at the residuals for normality (e.g. with
normal probability plots). They may be normal, in which case you're
fine. If they're not, e.g. if they're skewed, then you could look at
using a Box-Cox transformation to get them normal: i.e. you use a power
transformation (y^alpha, and log(alpha) if alpha=0 is indicated).
Generally you don't have to be too precise with the transformation: for
positively skewed residuals, trying square root, cube root and log
transformations often gets you close enough to normality.

HTH

Bob

Let me disagree with some of the advice you've been given. With 1,000
observations, it's probably of very little importance that the errors
(or residuals) be normal. The CLT *is* relevant, because it says that
the estimated regression coefficients will be approximately normal so
long as the errors are independent.

Since this is based on an approximating argument, there is some level
of deviation from from normality that will make the normal
distribution approximation for the coefficients a poor one.

-*** Startz
.