[R] Stepwise

Frank E Harrell Jr f.harrell at vanderbilt.edu
Thu Sep 4 20:03:18 CEST 2008

Also consider the redun function in the Hmisc package, which does not 
use the response variable but uses flexible nonlinear additive models to 
predict each predictor variable from all the others, using a stepwise 
procedure in a formal redundancy analysis.


Ben Bolker wrote:
> Peter Flom <peterf <at> brainscope.com> writes:
>> Robin Williams wrote
>> <<<<
>> Is there any facility in R to perform a stepwise process on a model,
>> which will remove any highly-correlated explanatory variables? I am told
>> there is in SPSS. I have a large number of variables (some correlated),
>> which I would like to just chuck in to a model and perform stepwise and
>> see what comes out the other end, to give me an idea perhaps as to which
>> variables I should focus on.
>> Thanks for any help / suggestions.  
>> Stepwise is a bad method of selecting variables.  Far better methods are LASSO
> and LAR (least angle
>> regression), available in the LARS package and the LASSO2 package.
>> However, while both these methods are good, neither is a substitute for
> substantive knowledge.
>> Also, the key thing is not so much whether variables are correlated, but
> whether they are co-linear, which
>> is different.  If you have a great many variables, then you  can have a high
> degree of colinearity even with no
>> high pairwise correlations.  I've not done this in R, but 
>> RSiteSearch("collinearity", restrict = 'functions') yields 34 hits.
>> HTH
>> Peter
>   Another suggestion would be to do PCA on the predictor variables.
> And to read Frank Harrell's book on _Regression modeling strategies_.
>    cheers
>      Ben Bolker
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list