[R] Stepwise Regression and PLS

Jinsong Zhao jinsong_zh at yahoo.com
Mon Feb 2 04:13:49 CET 2004


--- Frank E Harrell Jr <feh3k at spamcop.net> wrote:
> On Sun, 1 Feb 2004 11:09:28 -0800 (PST)
> Jinsong Zhao <jinsong_zh at yahoo.com> wrote:
> 
> > Dear all,
> > 
> > I am a newcomer to R. I intend to using R to do
> > stepwise regression and PLS with a data set (a
> 55x20
> > matrix, with one dependent and 19 independent
> > variable). Based on the same data set, I have done
> the
> > same work using SPSS and SAS. However, there is
> much
> > difference between the results obtained by R and
> SPSS
> > or SAS.
> > 
> > In the case of stepwise, SPSS gave out a model
> with 4
> > independent variable, but with step(), R gave out
> a
> > model with 10 and much higher R2. Furthermore,
> > regsubsets() also indicate the 10 variable is one
> of
> > the best regression subset. How to explain this
> > difference? And in the case of my data set, how
> many
> > variables that enter the model would be
> reasonable?
> > 
> > In the case of PLS, the results of mvr function of
> > pls.pcr package is also different with that of
> SAS.
> > Although the number of optimum latent variables is
> > same, the difference between R2 is much large.
> Why?
> > 
> > Any comment and suggestion is very appreciated.
> Thanks
> > in advance!
> > 
> > Best wishes,
> > 
> > Jinsong Zhao
> > 
> 
> In your case SPSS, SAS, R, S-Plus, Stata, Systat,
> Statistica, and every
> other package will agree in one sense, because
> results from all of them
> will be virtually meaningless.  Simulate some data
> from a known model and
> you'll quickly find out why stepwise variable
> selection is often a train
> wreck.
> 
> ---
> Frank E Harrell Jr   Professor and Chair          
> School of Medicine
>                      Department of Biostatistics  
> Vanderbilt University

For the case of stepwise regression, I have found that
the subsets I got using regsubsets() are collinear.
However, the variables in SPSS's result are not
collinear. I wonder what I should do to get a same or
better linear model.

Thanks!




More information about the R-help mailing list