[R] Why does the order of terms in a formula translate into different models/ model matrices?

Ben Bolker bbolker at gmail.com
Fri Jan 27 17:29:46 CET 2012


Alexandra <alku <at> imm.dtu.dk> writes:

> 
> Dear all,
> 
> I have encountered some strange things when creating lm objects in R:  model
> depends on the order of the terms specified in a formula.
> Let us consider the following simple example:
> 
> > dat <- expand.grid(A = factor(c("a1", "a2")),
> +                    B = factor(paste("b", 1:4, sep="")),
> +                    rep = factor(1:2))
> > set.seed(12345)
> > dat$x <- 1:16 * .5 + runif(16)
> > dat$Y <- rnorm(16)
> 
> > dat

  [snip]

> 
> > logLik(m0 <- lm(Y ~ A:B + x:A, dat))
> 'log Lik.' -13.22186 (df=11)
> 
> > logLik(m1 <- lm(Y ~ x:A + A:B, dat))
> 'log Lik.' -13.66822 (df=10)
> 
> To me it is a bit strange that m0 and m1 models appear to have different
> loglikelihood (only the order of the terms in a formula was changed!) 
> 
> My guess is that the problem lies in model.matrix:
> 
> X1 <- model.matrix(~ x:A + A:B, dat)
> X2 <- model.matrix(~ A:B + x:A, dat)
> 
> ## number of columns:
> ncol(X1) ## 9
> 
> > ncol(X2) ## 11
> 
> ## rank of design matrices:
> qr(X1)$rank ## 9
> 
> qr(X2)$rank ## 10

   I agree that this seems weird, and I haven't been able to figure
it out yet.  My best (not very well-informed) guess is that there is
something going on with automatic dropping of terms that appear to
be aliased?? and that this test is (perhaps unintentionally) order-
dependent. I don't see anything obvious in the R code 
(model.matrix.default) that would be responsible, and I haven't
had time (and probably won't) to go through the underlying C code
(do_modelmatrix in src/main/model.c) to see what's going on.

  For what it's worth, the ultimate reference for how this stuff
is *supposed* to work is (I think) the "White Book" (Statistical
Models in S, Chambers and Hastie) -- I don't alas have a copy.
I don't see any further references to model.matrix or 'model matrix'
in the R manuals, other than a fairly short description in section
11 of the "Intro to R".

  In other words, I'm stumped too and I hope someone else can
provide a more informed answer.

  Ben Bolker



More information about the R-help mailing list