[R] Stepwise Regression and PLS
Frank E Harrell Jr
feh3k at spamcop.net
Sun Feb 1 20:31:34 CET 2004
On Sun, 1 Feb 2004 11:09:28 -0800 (PST)
Jinsong Zhao <jinsong_zh at yahoo.com> wrote:
> Dear all,
>
> I am a newcomer to R. I intend to using R to do
> stepwise regression and PLS with a data set (a 55x20
> matrix, with one dependent and 19 independent
> variable). Based on the same data set, I have done the
> same work using SPSS and SAS. However, there is much
> difference between the results obtained by R and SPSS
> or SAS.
>
> In the case of stepwise, SPSS gave out a model with 4
> independent variable, but with step(), R gave out a
> model with 10 and much higher R2. Furthermore,
> regsubsets() also indicate the 10 variable is one of
> the best regression subset. How to explain this
> difference? And in the case of my data set, how many
> variables that enter the model would be reasonable?
>
> In the case of PLS, the results of mvr function of
> pls.pcr package is also different with that of SAS.
> Although the number of optimum latent variables is
> same, the difference between R2 is much large. Why?
>
> Any comment and suggestion is very appreciated. Thanks
> in advance!
>
> Best wishes,
>
> Jinsong Zhao
>
In your case SPSS, SAS, R, S-Plus, Stata, Systat, Statistica, and every
other package will agree in one sense, because results from all of them
will be virtually meaningless. Simulate some data from a known model and
you'll quickly find out why stepwise variable selection is often a train
wreck.
---
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list