Re: OLS fails due to outliers, but relationship is very clear
- From: Peter Perkins <Peter.PerkinsRemoveThis@xxxxxxxxxxxxx>
- Date: Mon, 13 Aug 2007 09:43:35 -0400
Pawel, you'll probably get more traction in something like sci.stat.math, but here's my two cents:
Hard to tell what you mean by a "highly biased" fit, and I'm not sure what your criterion for calling something an outlier is. But low r^2 doesn't necessarily mean that the OLS model doesn't fit, it may simply mean that the error term is large relative to the range of mean response values (the fitted line) over the range of your predictor. In other words, despite the usual description, r^2 is not really a "goodness of fit" statistic, it's a "usefulness of prediction" statistic.
If you bin and take means (don't forget to weight by N_obs), you'll be estimating a slightly different model (means of observations, not raw observations), but an equivalent one, and your ultimate predictions probably won't change much. You'll just have to interpret your error estimate slightly differently.
Hope this helps.
- Peter Perkins
The MathWorks, Inc.
Pawel Zdziarski wrote:
I am looking at "simple" regression with one independent variable..
Both X's and Y's are very scattered with lots of outliers, and OLS fits a line which is highly biased with ridiculously small R^2.
However, when I bin the X's and look at mean value of Y in each bin, the relationship is very clear:
Bin X | Mean(Y)
-------------------------
0-0.5 | 1.58
0.5-1 | 2.59
1-2.5 | 3.04
2.5-5 | 4.12
5-7.5 | 6.88
7.5-10 | 5.7310 | 8What is the formal/preferred way of approaching this sort of problems? For example, I would like the bin ranges to be implied from my data, rather then picking them arbitrarily myself. And once that's done, come up with a summary statistic which I could compare between many samples (such as R^2 for OLS).
Generally, what should I search for to learn about statistics in this sort of "bin analysis"?
- References:
- OLS fails due to outliers, but relationship is very clear
- From: Pawel Zdziarski
- OLS fails due to outliers, but relationship is very clear
- Prev by Date: Re: Transfer the program from MATLAB to VC
- Next by Date: Re: distributions
- Previous by thread: Re: OLS fails due to outliers, but relationship is very clear
- Next by thread: Tiffany Jewelry Wholesale
- Index(es):
Relevant Pages
|