[R] variable selection when categorical variables are available

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Apr 11 23:59:32 CEST 2006


Mike Wolfgang wrote:
> Dear All,
> 
> Probably it is not highly relevant question: Why do stepwise regression
> functions in R (step() or stepAIC()) add/delete categorical variables as a
> set? For example, I have a four-level factor variable d, so dummies are
> d1,d2,d3, as stepwise regression operates d, adding or removing, d1,d2,d3
> are simultaneously added/removed. What's the concern here if operating
> dummies individually? Model interpretability or anything else? (it seems
> shrinkage methods can operate them one by one)
> 
> Thanks
> mike

You would be on shaky ground statistically and interpretation wise to 
break up the variables.  Stepwise regression causes enough problems 
(invalidating most of the statististics from the final model) without 
doing that.

Shrinkage methods do not operate on them one by one; they shrink the 
estimates to the mean of all 4 groups (see for example the ols function 
in the Design package).

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list