Frank E Harrell Jr
f.harrell at vanderbilt.edu
Thu Sep 4 20:03:18 CEST 2008
Also consider the redun function in the Hmisc package, which does not
use the response variable but uses flexible nonlinear additive models to
predict each predictor variable from all the others, using a stepwise
procedure in a formal redundancy analysis.
Ben Bolker wrote:
> Peter Flom <peterf <at> brainscope.com> writes:
>> Robin Williams wrote
>> Is there any facility in R to perform a stepwise process on a model,
>> which will remove any highly-correlated explanatory variables? I am told
>> there is in SPSS. I have a large number of variables (some correlated),
>> which I would like to just chuck in to a model and perform stepwise and
>> see what comes out the other end, to give me an idea perhaps as to which
>> variables I should focus on.
>> Thanks for any help / suggestions.
>> Stepwise is a bad method of selecting variables. Far better methods are LASSO
> and LAR (least angle
>> regression), available in the LARS package and the LASSO2 package.
>> However, while both these methods are good, neither is a substitute for
> substantive knowledge.
>> Also, the key thing is not so much whether variables are correlated, but
> whether they are co-linear, which
>> is different. If you have a great many variables, then you can have a high
> degree of colinearity even with no
>> high pairwise correlations. I've not done this in R, but
>> RSiteSearch("collinearity", restrict = 'functions') yields 34 hits.
> Another suggestion would be to do PCA on the predictor variables.
> And to read Frank Harrell's book on _Regression modeling strategies_.
> Ben Bolker
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help