[Rd] step.lm() fails to drop {many empty 2-way factor cells}
(PR#3491)
maechler at stat.math.ethz.ch
maechler at stat.math.ethz.ch
Wed Jul 16 13:02:41 MEST 2003
Exec. Summary:
step() basically ``fails'' whereas MASS' stepAIC() does work
This may not be a bug in the strictest sense, but at least
something for the wish list. Unfortunately I have no time
currently to investigate further myself but want to be sure this
won't be forgotten:
The example is using a real data set with 216 observations on 9
variables -- where we have anonymized variable and factor level names.
The 8 explanatory variables are 4 factors (with 3 / 2 / 4 / 4 levels)
and 4 continuous covariates. The factor combinations observed
are very far from balanced, and have actually quite a few 0 cells.
The following is a reproducible R script
(you need internet connection and our FTP server working) :
## step.lm() "fails" when 2-way factor interactions have (many) empty cells
## whereas stepAIC() works fine
Dat <- read.table("ftp://stat.ethz.ch/U/maechler/R/step-ex.tab", header=TRUE)
Dat$f3 <- ordered(Dat$f3)
## Full Model : y ~ (f1+f2+f3+f4 + C1+C2+C3+C4)
## ---------- where f1..f4 are FACTORS and C1..C4 are continuous covariates
## "Look at data"
str(Dat)
summary(Dat)
## several 2-way interactions have empty cells, e.g.
with(Dat, table(f1,f4)) #!
with(Dat, table(f1,f3))
## see the full design:
with(Dat, ftable(table(f1,f2,f3,f4))) ##--> many many 0
pairs(Dat, col = 1+as.integer(Dat$f1), pch = 1+as.integer(Dat$sex))
## Fit a "full model" with all 2-way interactions :
lmAll <- lm(formula = y ~ .^2, data = Dat)
summary(lmAll) # many insignificant ones {BTW: why is it bad to have "*" here?}
drop1(lmAll)
## clearly shows interaction terms that should be dropped (would decrease AIC)
## BUT:
sr <- step(lmAll)
## nothing eliminated ?step claims it would work when drop1() does
library(MASS)
sAr <- stepAIC(lmAll) ## <<< stepAIC() *does* work
## but with instructive
## There were 19 warnings (use warnings() to see them)
warnings()
##- Warning messages:
##- 1: 0 df terms are changing AIC in: stepAIC(lmAll)
##- 2: 0 df terms are changing AIC in: stepAIC(lmAll)
##- ...
##- ...
summary(sAr)
More information about the R-devel
mailing list