Re: Approximate solution to linear regression



vincent64@xxxxxxxxx wrote:
On Jun 19, 12:44 pm, Paige Miller <paige.mil...@xxxxxxxxx> wrote:
On Jun 17, 3:46 pm, "vincen...@xxxxxxxxx" <datashap...@xxxxxxxxx>
wrote:

Problem can have 40,000 variables, most of them highly correlated.
More variables than observations in some cases. I came up with an
approach, and my question is
(1) is this an original approach?
(2) more importantly, does it always provide a fairly accurate
solution?
The problem and solution are described athttp://datashaping.com/contest14004.shtml
. The newsgroup can not render the mathematical formatting.
I haven't tried to go through your solution in any detail.

In similar situations, I use Partial Least Squares (PLS) Regression,
which is also an "approximate" method (actually, its a biased
regression) that doesn't care if you have highly correlated X
variables and many more Xs than observations. If you use the maximum
possible number of dimensions in PLS, you will get an OLS solution
without having to invert a matrix.

So, with that in mind, it seems to me your approximate solution is
trying to fit into a niche where there already is a solution, and the
PLS solution has proven useful in zillions of published articles. So
unless you can show that your approximate solution has better
properties than PLS, I don't see much of a need for it.

--
Paige Miller
paige\dot\miller \at\ kodak\dot\com

Thanks for your reply. I've heard that Lasso regression does similar
things too. Anyway, being efficient is much more important than being
original in this context: I'm not trying to publish an article, this
is not academic research. If I need to spend $150K to get PLS
regression software (SAS Enterprise Miner) and spend many hours
getting it to work, I'm MUCH better off re-inventing the wheel. So I
could rephrase my question as follows: am I re-inventing the wheel
quite well, meaning my approach is not significantly inferior to PLS
regression?


You must be joking about the $150K. Doesn't R do PLS regression? And it looks like Stata has a routine to run the SAS implementation.

http://ideas.repec.org/c/boc/bocode/s456810.html

A single-user corporate version of Stata (intercooled, v10) is $1150.00, according to their website.


--
Bruce Weaver
bweaver@xxxxxxxxxxxx
www.angelfire.com/wv/bwhomedir
.



Relevant Pages

  • Re: Approximate solution to linear regression
    ... -- Systat also does PLS and it's $1300 (commerical, single user, according ... website, academic would be less, with more additional modules a little ... In similar situations, I use Partial Least Squares Regression, ... So, with that in mind, it seems to me your approximate solution is ...
    (sci.stat.consult)
  • Re: Approximate solution to linear regression
    ... In similar situations, I use Partial Least Squares Regression, ... So, with that in mind, it seems to me your approximate solution is ... I'm MUCH better off re-inventing the wheel. ...
    (sci.stat.consult)
  • Re: Approximate solution to linear regression
    ... The newsgroup can not render the mathematical formatting. ... In similar situations, I use Partial Least Squares Regression, ... So, with that in mind, it seems to me your approximate solution is ...
    (sci.stat.consult)
  • Re: How to calculate OR from estimated coefficients of a multinomial log regression?
    ... log regression model. ... I'm not very well versed in Stata, but I suspect you're using the mlogit procedure, right? ... That seems a bit odd to me, because Expin a logistic regression is usually described as giving the odds ratio. ... I know that some epidemiology books when discussing case-control studies give the formula for the odds ratio /), but call the result a relative risk (presumably because in a case-control study, the OR provides a good estimate of the RR). ...
    (sci.stat.math)
  • Re: How to calculate OR from estimated coefficients of a multinomial log regression?
    ... regression (by use of STATA). ... log regression model. ... usually described as giving the odds ratio. ... The values in the RRR (relative-risk reduction) column of the Stata ...
    (sci.stat.math)