Re: Approximate solution to linear regression



On Jun 20, 3:11 am, "vincen...@xxxxxxxxx" <datashap...@xxxxxxxxx>
wrote:
On Jun 19, 12:44 pm, Paige Miller <paige.mil...@xxxxxxxxx> wrote:



On Jun 17, 3:46 pm, "vincen...@xxxxxxxxx" <datashap...@xxxxxxxxx>
wrote:

Problem can have 40,000 variables, most of them highly correlated.
More variables than observations in some cases. I came up with an
approach, and my question is

(1) is this an original approach?
(2) more importantly, does it always provide a fairly accurate
solution?

The problem and solution are described athttp://datashaping.com/contest14004.shtml
. The newsgroup can not render the mathematical formatting.

I haven't tried to go through your solution in any detail.

In similar situations, I use Partial Least Squares (PLS) Regression,
which is also an "approximate" method (actually, its a biased
regression) that doesn't care if you have highly correlated X
variables and many more Xs than observations. If you use the maximum
possible number of dimensions in PLS, you will get an OLS solution
without having to invert a matrix.

So, with that in mind, it seems to me your approximate solution is
trying to fit into a niche where there already is a solution, and the
PLS solution has proven useful in zillions of published articles. So
unless you can show that your approximate solution has better
properties than PLS, I don't see much of a need for it.

--
Paige Miller
paige\dot\miller \at\ kodak\dot\com

Thanks for your reply. I've heard that Lasso regression does similar
things too. Anyway, being efficient is much more important than being
original in this context: I'm not trying to publish an article, this
is not academic research. If I need to spend $150K to get PLS
regression software (SAS Enterprise Miner) and spend many hours
getting it to work, I'm MUCH better off re-inventing the wheel. So I
could rephrase my question as follows: am I re-inventing the wheel
quite well, meaning my approach is not significantly inferior to PLS
regression?

If PLS will work, try the R package charmingly named PLS. It is Open
Source (i.e.free to the user).
For info http://mevik.net/work/software/pls.html

To obtain R http://www.r-project.org/

.



Relevant Pages

  • Re: Approximate solution to linear regression
    ... In similar situations, I use Partial Least Squares Regression, ... So, with that in mind, it seems to me your approximate solution is ... I'm MUCH better off re-inventing the wheel. ...
    (sci.stat.consult)
  • Re: Approximate solution to linear regression
    ... regression software and spend many hours ... I'm MUCH better off re-inventing the wheel. ... That's fine when you have 0.5 gigabyte of data, ... Unless...you buy their more expensive "data mining" packages that cost ...
    (sci.stat.consult)
  • Re: Approximate solution to linear regression
    ... In similar situations, I use Partial Least Squares Regression, ... So, with that in mind, it seems to me your approximate solution is ... I'm MUCH better off re-inventing the wheel. ... A single-user corporate version of Stata is $1150.00, according to their website. ...
    (sci.stat.consult)