[Rd] step.lm() fails to drop {many empty 2-way factor cells} (PR#3491)

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Jul 21 12:12:35 MEST 2003


Your example works on 2 of the 3 systems I tried (and in fact the warnings
you got on stepAIC are spurious).  It's all down to rounding errors.

I've added a fuzz in the termination test.

B

On Wed, 16 Jul 2003 maechler at stat.math.ethz.ch wrote:

> Exec. Summary:
>    step() basically ``fails'' whereas MASS' stepAIC() does work
> 
> This may not be a bug in the strictest sense, but at least
> something for the wish list.  Unfortunately I have no time
> currently to investigate further myself but want to be sure this
> won't be forgotten: 
> 
> The example is using a real data set with 216 observations on 9
> variables -- where we have anonymized variable and factor level names.
> The 8 explanatory variables are 4 factors (with 3 / 2 / 4 / 4 levels)
> and 4 continuous covariates.  The factor combinations observed
> are very far from balanced, and have actually quite a few 0 cells.
> 
> The following is a reproducible R script
> (you need internet connection and our FTP server working) :
> 
> 
> ## step.lm()  "fails"  when 2-way factor interactions have (many) empty cells
> ##             whereas stepAIC() works fine
> 
> Dat <- read.table("ftp://stat.ethz.ch/U/maechler/R/step-ex.tab", header=TRUE)
> Dat$f3 <- ordered(Dat$f3)
> 
> ## Full Model :  y ~ (f1+f2+f3+f4 + C1+C2+C3+C4)
> ## ----------   where f1..f4 are FACTORS  and C1..C4 are continuous covariates
> 
> ## "Look at data"
> str(Dat)
> summary(Dat)
> 
> ## several 2-way interactions have empty cells, e.g.
> with(Dat, table(f1,f4)) #!
> with(Dat, table(f1,f3))
> 
> ## see the full design:
> with(Dat, ftable(table(f1,f2,f3,f4))) ##--> many many 0
> pairs(Dat, col = 1+as.integer(Dat$f1), pch = 1+as.integer(Dat$sex))
> 
> ## Fit a "full model" with all 2-way interactions :
> lmAll <- lm(formula = y ~ .^2, data = Dat)
> summary(lmAll) # many insignificant ones {BTW: why is it bad to have "*" here?}
> 
> drop1(lmAll)
> ## clearly shows interaction terms that should be dropped (would decrease AIC)
> 
> ## BUT:
> sr <- step(lmAll)
> ## nothing eliminated  ?step claims it would work when drop1() does
> 
> library(MASS)
> sAr <- stepAIC(lmAll) ## <<< stepAIC() *does* work
> ## but with instructive
> ## There were 19 warnings (use warnings() to see them)
> warnings()
> ##- Warning messages:
> ##- 1: 0 df terms are changing AIC in: stepAIC(lmAll)
> ##- 2: 0 df terms are changing AIC in: stepAIC(lmAll)
> ##- ...
> ##- ...
> summary(sAr)
> 
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list