[R] variable selection when categorical variables are available
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Tue Apr 11 23:59:32 CEST 2006
Mike Wolfgang wrote:
> Dear All,
>
> Probably it is not highly relevant question: Why do stepwise regression
> functions in R (step() or stepAIC()) add/delete categorical variables as a
> set? For example, I have a four-level factor variable d, so dummies are
> d1,d2,d3, as stepwise regression operates d, adding or removing, d1,d2,d3
> are simultaneously added/removed. What's the concern here if operating
> dummies individually? Model interpretability or anything else? (it seems
> shrinkage methods can operate them one by one)
>
> Thanks
> mike
You would be on shaky ground statistically and interpretation wise to
break up the variables. Stepwise regression causes enough problems
(invalidating most of the statististics from the final model) without
doing that.
Shrinkage methods do not operate on them one by one; they shrink the
estimates to the mean of all 4 groups (see for example the ols function
in the Design package).
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list