[R] Why does the order of terms in a formula translate into different models/ model matrices?

Fri Jan 27 14:04:30 CET 2012

Dear all,

I have encountered some strange things when creating lm objects in R:  model
depends on the order of the terms specified in a formula.
Let us consider the following simple example:

> dat <- expand.grid(A = factor(c("a1", "a2")),
+                    B = factor(paste("b", 1:4, sep="")),
+                    rep = factor(1:2))
> set.seed(12345)
> dat$x <- 1:16 * .5 + runif(16)
> dat$Y <- rnorm(16)

> dat
    A  B rep        x          Y
1  a1 b1   1 1.220904 -0.2841597
2  a2 b1   1 1.875773 -0.9193220
3  a1 b2   1 2.260982 -0.1162478
4  a2 b2   1 2.886125  1.8173120
5  a1 b3   1 2.956481  0.3706279
6  a2 b3   1 3.166372  0.5202165
7  a1 b4   1 3.825095 -0.7505320
8  a2 b4   1 4.509224  0.8168998
9  a1 b1   2 5.227705 -0.8863575
10 a2 b1   2 5.989737 -0.3315776
11 a1 b2   2 5.534535  1.1207127
12 a2 b2   2 6.152373  0.2987237
13 a1 b3   2 7.235685  0.7796219
14 a2 b3   2 7.001137  1.4557851
15 a1 b4   2 7.891203 -0.6443284
16 a2 b4   2 8.462495 -1.5531374

> logLik(m0 <- lm(Y ~ A:B + x:A, dat))
'log Lik.' -13.22186 (df=11)

> logLik(m1 <- lm(Y ~ x:A + A:B, dat))
'log Lik.' -13.66822 (df=10)

To me it is a bit strange that m0 and m1 models appear to have different
loglikelihood (only the order of the terms in a formula was changed!) 

My guess is that the problem lies in model.matrix:

X1 <- model.matrix(~ x:A + A:B, dat)
X2 <- model.matrix(~ A:B + x:A, dat)

## number of columns:
ncol(X1) ## 9

> ncol(X2) ## 11

## rank of design matrices:
qr(X1)$rank ## 9

qr(X2)$rank ## 10

Will be very grateful if someone could help me here!

Thanks,

Alexandra

--
View this message in context: http://r.789695.n4.nabble.com/Why-does-the-order-of-terms-in-a-formula-translate-into-different-models-model-matrices-tp4333408p4333408.html
Sent from the R help mailing list archive at Nabble.com.