[R] Why does the order of terms in a formula translate into different models/ model matrices?
cberry at tajo.ucsd.edu
cberry at tajo.ucsd.edu
Fri Jan 27 22:39:24 CET 2012
Alexandra <alku at imm.dtu.dk> writes:
> Dear all,
>
> I have encountered some strange things when creating lm objects in R: model
> depends on the order of the terms specified in a formula.
> Let us consider the following simple example:
>
>> dat <- expand.grid(A = factor(c("a1", "a2")),
> + B = factor(paste("b", 1:4, sep="")),
> + rep = factor(1:2))
>> set.seed(12345)
>> dat$x <- 1:16 * .5 + runif(16)
>> dat$Y <- rnorm(16)
>
>> dat
> A B rep x Y
> 1 a1 b1 1 1.220904 -0.2841597
> 2 a2 b1 1 1.875773 -0.9193220
> 3 a1 b2 1 2.260982 -0.1162478
> 4 a2 b2 1 2.886125 1.8173120
> 5 a1 b3 1 2.956481 0.3706279
> 6 a2 b3 1 3.166372 0.5202165
> 7 a1 b4 1 3.825095 -0.7505320
> 8 a2 b4 1 4.509224 0.8168998
> 9 a1 b1 2 5.227705 -0.8863575
> 10 a2 b1 2 5.989737 -0.3315776
> 11 a1 b2 2 5.534535 1.1207127
> 12 a2 b2 2 6.152373 0.2987237
> 13 a1 b3 2 7.235685 0.7796219
> 14 a2 b3 2 7.001137 1.4557851
> 15 a1 b4 2 7.891203 -0.6443284
> 16 a2 b4 2 8.462495 -1.5531374
>
>
>> logLik(m0 <- lm(Y ~ A:B + x:A, dat))
> 'log Lik.' -13.22186 (df=11)
>
>> logLik(m1 <- lm(Y ~ x:A + A:B, dat))
> 'log Lik.' -13.66822 (df=10)
>
> To me it is a bit strange that m0 and m1 models appear to have different
> loglikelihood (only the order of the terms in a formula was changed!)
>
> My guess is that the problem lies in model.matrix:
>
> X1 <- model.matrix(~ x:A + A:B, dat)
> X2 <- model.matrix(~ A:B + x:A, dat)
Close, but not quite. The problem lies in terms()
Here are the attr(terms(...),"factors") matrices:
> attributes(terms(Y ~ x:A + A:B,data=dat))$factors
x:A A:B
Y 0 0
x 2 0
A 2 2
B 0 1
> attributes(terms(Y ~ A:B + x:A ,data=dat))$factors
A:B A:x
Y 0 0
A 2 2
B 2 0
x 0 1
As you see, the encoding of x and B are treated differently under the
two orderings.
See ?terms.object for what those codes mean.
Same deal for these seemingly equivalent formulae:
> attributes(terms(Y ~ (x + A + B)^2-A,data=dat))$factors
x B x:A x:B A:B
Y 0 0 0 0 0
x 1 0 2 1 0
A 0 0 1 0 1
B 0 1 0 1 1
> attributes(terms(Y ~ (A + B + x)^2-A,data=dat))$factors
B x A:B A:x B:x
Y 0 0 0 0 0
A 0 0 1 1 0
B 1 0 2 0 1
x 0 1 0 1 1
>
AFAICS, this is a bug.
HTH,
Chuck
>
>
> ## number of columns:
> ncol(X1) ## 9
>
>> ncol(X2) ## 11
>
> ## rank of design matrices:
> qr(X1)$rank ## 9
>
> qr(X2)$rank ## 10
>
>
> Will be very grateful if someone could help me here!
>
> Thanks,
>
> Alexandra
>
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Why-does-the-order-of-terms-in-a-formula-translate-into-different-models-model-matrices-tp4333408p4333408.html
> Sent from the R help mailing list archive at Nabble.com.
>
--
Charles C. Berry Dept of Family/Preventive Medicine
cberry at ucsd edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list