More (me vs.) multiple regression



First off thanks Richard Ulrich for answering my other recently posted question.

This multiple regression business is overwhelming me at the minute and I wonder if any of the writers of the textbooks I've come across so far understand it fully enough to offer a concise, simple, no-nonsense explanation. Either they don't understand it themselves or they are not good at explaining things. If I understood it I'm sure I could offer an easier-to-understand explanation than the textbooks I've seen so far. They don't illustrate things with examples throughout or offer step by step guides. Does anyone know any really to understand books or articles that don't get off on using technical terms? I'm not a statistician but a social scientist so I'm not interested in the formulas per se.

As as example, why aren't there any books (that I've found anway) that say to carry out a regression analysis step by step? Surely this would make so much sense:
'step one - decide whether you are trying to predict something or use your model to demonstrate causation. to do this.... etc, what happens if you don't do this... common errors in doing this...'
'step two - explore the variables to make sure each is normally distributed. to do this, what happens if you don't, common errors, the extent that it matters depending on sample size...' etc
[note I don't know whether these should be the first steps or not as I'm thoroughly confused despite having read books and websites for the last few days]

Another thing the textbooks don't do is offer a simple explanation of the terms involved, a simple glossary that develops as the text does. Granted books may have glossaries at the end but they tend to be separate from the text and overly technical. The situation is made worse because many terms refer to very similar things, or things you don't need to worry about.

There are so many variables in regression analysis which also makes things very confusing, to a non numbers oriented person. And there are many different ways to do things. For example, to check for colinearity you can either check the tolerance or VIF values. But which is better? And why don't authors just say 'just check the VIF values and ignore the tolerance'. I mean I've read that tolerance is an outcome of VIF (or something similar) so why does SPSS even produce both measures in its output if only one is needed? Another point about VIF while I'm here. I've read it's too high if over 2 in some places and in others I've read it's too high if over 4. In fact I've read contradictory advice in the same book. From Miles and Shevlin: 'when the VIF is equal to four the standard error is doubled (sqrt4=2) and so four is often uesd as an arbitrary cut-off to determine when collinearity has become too serious.' Then very soon after a table is presented 'collinearity diagnostics for three independent variables' whereby one variable has a VIF of 2.108 and the text says 'the VIF is greater than two, alerting us to the possibility of collinearity'.

I have other issue with this whole collinearity thing. One of the assumptions of regression analysis is that the indepedent variables are not supposed to be related too much. I've read if they're highly related they're likely to be measuring the same underlying thing and if we use both in the model we are using their common variance twice. But doesn't it just mean they're both strong predictors of the dependent variable and therefore should be included in the model? Say if im trying to explain number of posessions (DV) and I have variables income and wealth (IVs). Now income and wealth are likely to be correlated, but to me they are both likely to be important predictors of possessions. So why can't I include them both? Why can't i just put them both in the model and regression analysis will do the rest? Isn't this the whole point of regression, to discern the relationship between variables? None of the textbooks I've come across so far explain these things and answer the question 'why?'. Why is there an arbitrary cut off point for how related the indepedent variables are? Relatedly, if you either have to drop one of two colinear variables, which one if you think both are (theoretically) important? And if you want to combine them, how should you if they can't conveniantly be labelled under a common variable name ? If you have colinearity between percentage of ethnic minority people in a population along with percentage of immigrants (IVs) and your dependent variable is english proficiency of population how do you combine ethnic % with immigrant % into something sensible?

I could go on. I've not even mentioned things like the entering order of variables (all that stepwise, forward, backward thing, which incidentally made no difference to my model when I experimented). Apparently this can have a major effect on the R^2 and surprise surprise, again there is no concensus. In fact I've read many authors don't mention such things in their published analysis so we can't trust their results. Is there is no consensus and poor analysis that is being made my paid researchers, what hopes do us students have? Why isn't there a standard, a guidebook to follow that is standardised and simple?

I feel like I've been beaten (up) by regression. I can't get my head round a multitude of technical terms. The penny hasn't dropped despite studying extremely hard and concentrating whilst doing it. Could it be that I'm reading the wrong texts, that there is no concensus on this thing or that I'm just not cut out for regresssion (which I refuse to believe). I mean, it even seems there is no consensus on what the thing is called. Is is multiple regression, linear regression, OLS regression? Are these things different?

Does anyone have any books or better yet journal articles that offer a great, simple, no nonsense explanation of regression. If not why hasn't anyone published one?

Many thanks



.



Relevant Pages

  • Re: Regression Analysis
    ... Thank you very much for reply and helping me in regression analysis. ... > You need to add a constant term and an error term to the fit: ...
    (comp.soft-sys.matlab)
  • Re: Regression Analysis
    ... Jimmy wrote: ... > Thank you very much for reply and helping me in regression analysis. ... >> You need to add a constant term and an error term to the fit: ...
    (comp.soft-sys.matlab)
  • Re: More (me vs.) multiple regression
    ... This multiple regression business is overwhelming me at the minute and I ... easier-to-understand explanation than the textbooks I've seen so far. ... books on case-studies in use of statistics. ...
    (sci.stat.consult)
  • Re: CLT and regression
    ... understand its relationship with regression analyis. ... Can I still use linear regression analysis? ... using a Box-Cox transformation to get them normal: ... transformations often gets you close enough to normality. ...
    (sci.stat.consult)
  • Re: CLT and regression
    ... understand its relationship with regression analyis. ... Can I still use linear regression analysis? ... The central limit theorem is not involved. ... distribution of the dependent variable is not directly relevant. ...
    (sci.stat.consult)